CoNLL 2016 Shared Task

Blind test sets

We have released the blind test sets used in English and Chinese. The datasets are in the same format as the training and development sets described below.

Training and development datasets

The participants must fill out the registration form and the license agreement form to obtain the full dataset for the task, which requires the permission from the Linguistic Data Consortium (LDC). Once you formally register as a participant of the shared task and email in the agreement to LDC (ldc@ldc.upenn.edu), you will receive the instructions for downloading the dataset.

Github Repo

The shared task has a github repo. It hosts many files that are required for developing a system such as:

Tutorial on how to work with the data
System output evaluator
System output validator
other utility and analytic tools
Sample end-to-end discourse parser
Sample data

If you find a bug or have other analytic or utility code to share with the group, you are more than welcome to make a pull request. We will help edit and document the code as well if you do not have time for that.

Other resources

The resources that are allowed for the closed track in English are listed below:

Brown Clusters [ README | 100 classes | 320 classes | 1000 classes | 3200 classes ]
MPQA Subjectivity Lexicon [ Detail | Download ]
Skip-gram Neural Word Embeddings [ Detail | Download ]
VerbNet v3.2 [ Detail | Download ]

The resources that are allowed for the closed track in Chinese are still limited. We provide the following:

Brown Clusters [ README | 1000 classes | 3200 classes ]
Skip-gram Neural Word Embeddings [ Detail | README | Binary | text ]