Brandeis CLPG -- Home

Chinese Language Processing at Brandeis University

Chinese word segmentation, POS tagging, syntactic parsing
Chinese semantic role labeling
Chinese temporal processing
Chinese Discourse Analysis
Machine Translation

These are some of the areas we have been working on.

What We Do

Brandeis University's Chinese Language Processing program is anchored by linguistic corpora annotated with morphological, syntactic, semantic and discourse structures. The Chinese Treebank, started at University of Pennsylvania, is a segmented, part-of-speech tagged, and fully bracketed corpus. The latest release is CTB7.0, soon available through the LDC. It has 51 thousand sentences, 1.2 million words, and 1.9 Chinese characters. The sources of this corpus include newswire, magazine articles, broadcast news, broadcast conversations, and weblogs. The segmentation, POS-tagging and syntactic bracketing standards are fully documented.

Ongoing Projects

Chinese Treebank (DARPA TIDES, GALE, BOLT)
Chinese Proposition Bank (NSF, DARPA GALE, BOLT)
STAGES (NSF)
Temporal inference (NSF)
ARBITER (IARPA)