Chinese Language Processing at Brandeis University



  • Chinese word segmentation, POS tagging, syntactic parsing
  • Chinese semantic role labeling
  • Chinese temporal processing
  • Chinese Discourse Analysis
  • Machine Translation

These are some of the areas we have been working on.


What We Do

Brandeis University's Chinese Language Processing program is anchored by linguistic corpora annotated with morphological, syntactic, semantic and discourse structures. The Chinese Treebank, started at University of Pennsylvania, is a segmented, part-of-speech tagged, and fully bracketed corpus. The latest release is CTB7.0, soon available through the LDC. It has 51 thousand sentences, 1.2 million words, and 1.9 Chinese characters. The sources of this corpus include newswire, magazine articles, broadcast news, broadcast conversations, and weblogs. The segmentation, POS-tagging and syntactic bracketing standards are fully documented.

Ongoing Projects

  • Chinese Treebank (DARPA TIDES, GALE, BOLT)
  • Chinese Proposition Bank (NSF, DARPA GALE, BOLT)
  • Temporal inference (NSF)

Copyright 2012