The Chinese Treebank Project

The Chinese Treebank Project started at the IRCS of University of Pennsylvania. Later on, it moved to the CLEAR Lab the University of Colorado at Boulder. There are still two old websites for the project which are no longer actively maitained, one at PENN and another at CU. The information there is very outdated.

The development of the Chinese Treebank has been supported by DOD, NSF and DARPA TIDES, GALE and BOLT Programs. The latest release of the Chinese Treebank is CTB 9.0 and the genres covered in this release include newswire, magazine articles, broadcast news, broadcast conversations, newsgroups and weblogs, discussion forums. The corpus is currently under expansion and more genres will be included in future releases.

A semantic layer of annotation has been added to the Chinese TreeBank via the Chinese Proposition Bank Project. The latest release of the Chinese Proposition Bank is CPB 3.0, which is also released via the Linguistic Data Consortium.


2014: Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features.
Zhiguo Wang and Nianwen Xue
Proceedings of ACL 2014., Baltimore, MD.
2013: A Lattice-based Framework for Joint Chinese Word Segmentation, POS-tagging and Parsing.
Zhiguo Wang, Chengqing Zong and Nianwen Xue
Proceedings of ACL 2013., Sophia, Bulgaria.
2012: Extending and Scaling up the Chinese Treebank Annotation .
Xiuhong Zhang and Nianwen Xue
Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP-2012)., Tianjin, China.
2007: Tapping the implicit information for the PS to DS conversion of the Chinese Treebank.
Nianwen Xue
Proceedings of the Sixth International Workshop on Treebanks and Linguistics Theories,Bergen, Norway. pdf

This paper describes the tool that converts the phrase structure representation of the Chinese Treebank to dependency structure to generate the Chinese section of the CoNLL 2009 Share Task data. The code for the tool can be downloaded here.

2005: The Penn Chinese TreeBank: Phrase Structure Annotation of a Large Corpus.
Nianwen Xue, Fei Xia, Fu-Dong Chiou, and Martha Palmer
Natural Language Engineering, 11(2)207-238.
2002: Building a Large-Scale Annotated Chinese Corpus
Nianwen Xue, Fu-Dong Chiou, and Martha Palmer
Proceedings of the 19th. International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, 2002.
2001: Facilitating Treebank Annotation with a Statistical Parser
Fu-Dong Chiou, David Chiang, and Martha Palmer
Proceedings of the Human Language Technology Conference (HLT 2001), San Diego, California, 2001.
2000: Developing Guidelines and Ensuring Consistency for Chinese Text Annotation
Fei Xia, Martha Palmer, Nianwen Xue, Mary Ellen Okurowski, John Kovarik, Fu-Dong Chiou, Shizhe Huang, Tony Kroch, and Mitch Marcus
Proceedings of the second International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece, 2000.

Workshops and meetings

  • 1st CLP Workshop (6-7/98), Philadelphia, USA
  • meeting during ACL-98, Montreal, Canada (8/98)
  • meeting during ICCIP-98, Beijing, China (11/98)
  • meeting during ACL-99, Maryland, USA (6/99)
  • 2nd CLP Workshop (10/00), Hong Kong, China

