TimeBank 1.1
TimeBank Browser
Return to TimeML.org
|
|
Introduction
This is a preliminary release of the TimeBank corpus, a set of 186 news
report documents annotated with the 1.1 version of the TimeML standard for
temporal annotation. This release should also include a copy of the TimeML schema version 1.1.
Provenance
Some of the documents in TimeBank are from the DUC1 summarization
evaluation that NIST ran in 2001 (files whose names start with "AP",
"LA" "SJMN", and also of the "WSJ" files). The rest of the articles
are from ACE corpora. The file names that start with "ABC", "CNN",
"PRI", "VOA", "ea", and "ed" are broadcast news. Those that start with
"APW" and "NYT" are newsire. All of these are included in LDC catalog
item LDC2003T11. The other ACE corpus is WSJ. The documents are included in LDC catalog
item LDC99T42.
Excluded Documents
A number of TimeBank documents have been excluded from this release
because they were annotated by 'naive' annotators and the results were
not judged to be immediately usable.
Areas Under Revision in the Current Release
These documents were annotated during the creation of the TimeML standard
and the Tango TimeML Graphical Organizer tool.
They constitute both a test domain for development and a proof of concept.
As such, they should be considered preliminary. The user should be advised
that efforts are under way to revise these documents.
In particular, the following aspects of the annotation are being reviewed:
incomplete temporal linking
Temporal relations have been manually annotated between selected event pairs or event/time pairs only. A temporal closure algorithm will create links between many more pairs of events/times.
event classes
Event classification is currently being improved based on
multiple annotations by different annotators.
annotation of tense/aspect
Incomplete tense/aspect information is available for many events.
incomplete subordinated linking
Some conditional structures and purposive clauses may not have
been annotated.
Non-TimeML Markup
The documents contain other kinds of xml markup, including document format
and structure information, named entity recognition, sentence boundary
information, and others. We have made no attempt to review this
information.
If you have any questions about TimeBank, please contact the group through Jessica Littman.
|