Chinese Treebank Project
The Chinese Treebank Project
Descriptions of the project:
The Chinese Treebank Project started at the IRCS of University of Pennsylvania. Later on, it moved to the CLEAR Lab the University of Colorado at Boulder. There are still two old websites for the project which are no longer actively maitained, one at PENN and another at CU. The information there is very outdated.
The development of the Chinese Treebank has been supported by DOD, NSF and DARPA TIDES, GALE and BOLT Programs. The latest release of the Chinese Treebank is CTB 9.0 and the genres covered in this release include newswire, magazine articles, broadcast news, broadcast conversations, newsgroups and weblogs, discussion forums. The corpus is currently under expansion and more genres will be included in future releases.
A semantic layer of annotation has been added to the Chinese TreeBank via the Chinese Proposition Bank Project. The latest release of the Chinese Proposition Bank is CPB 3.0, which is also released via the Linguistic Data Consortium.
Annotation guidelines for the Chinese Treebank
- 2014: Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features.
- Zhiguo Wang and Nianwen Xue
- Proceedings of ACL 2014., Baltimore, MD.
- 2013: A Lattice-based Framework for Joint Chinese Word Segmentation, POS-tagging and Parsing.
- Zhiguo Wang, Chengqing Zong and Nianwen Xue
- Proceedings of ACL 2013., Sophia, Bulgaria.
- 2012: Extending and Scaling up the Chinese Treebank Annotation .
- Xiuhong Zhang and Nianwen Xue
- Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP-2012)., Tianjin, China.
- 2007: Tapping the implicit information for the PS to DS conversion of the Chinese Treebank.
- Nianwen Xue
- Proceedings of the Sixth International Workshop on Treebanks and Linguistics Theories,Bergen, Norway. pdf
This paper describes the tool that converts the phrase structure representation of the Chinese Treebank to dependency structure to generate the Chinese section of the CoNLL 2009 Share Task data. The code for the tool can be downloaded here.
- 2005: The Penn Chinese TreeBank: Phrase Structure Annotation of a Large Corpus.
- Nianwen Xue, Fei Xia, Fu-Dong Chiou, and Martha Palmer
- Natural Language Engineering, 11(2)207-238.
Building a Large-Scale Annotated Chinese Corpus
- Nianwen Xue, Fu-Dong Chiou, and Martha Palmer
- Proceedings of the 19th. International Conference on Computational
Linguistics (COLING 2002), Taipei, Taiwan, 2002.
Facilitating Treebank Annotation with a Statistical Parser
- Fu-Dong Chiou, David Chiang, and Martha Palmer
- Proceedings of the Human Language Technology Conference (HLT 2001), San
Diego, California, 2001.
Developing Guidelines and Ensuring Consistency for Chinese Text Annotation
- Fei Xia, Martha Palmer, Nianwen Xue,
Mary Ellen Okurowski, John Kovarik, Fu-Dong Chiou,
Shizhe Huang, Tony Kroch, and Mitch Marcus
- Proceedings of the second International Conference on Language Resources
and Evaluation (LREC 2000), Athens, Greece, 2000.
Workshops and meetings
1st CLP Workshop (6-7/98), Philadelphia, USA
meeting during ACL-98, Montreal, Canada (8/98)
meeting during ICCIP-98, Beijing, China (11/98)
meeting during ACL-99, Maryland, USA (6/99)
2nd CLP Workshop (10/00), Hong Kong,
Links to other sites
Penn English Treebank Project
Penn Korean Treebank Project
Last modified on December 28, 2012.