Programming Assignment 1: 

Temporal Ordering in Natural Language Texts 

Fall 2004, cs112
James Pustejovsky


Handed Out:
Tuesday, November 16, 2004
Due Date: Tuesday December 7, 2004

This assignment should be E-MAILED to: cs112@cs.brandeis.edu per the directions below.

Specification

Introduction:

In the last assignment, you were asked to manually annotate a text using TimeML.  Part of that task was to order the temporal entities using TLINKs with types similar to those of the Allen Temporal Calculus.  In this assignment, you will try to automate that process.  The result should be something like the ordering you gave for the birthday story.

We will provide you with two simple texts.  The time expressions, events, and signals will  already be picked out, as well as ALINKS and SLINKS. Your job is to order the events using strategies we have discussed in class. In TimeML terms, this amounts to introducing the appropriate TLINKS

Linguistic cues:

Think about what clues the text offers to figure out how the events are ordered.  Things you should consider are:
    Mary arrived while Eric was taking a shower. He had been sleeping for the whole afternoon.

where:
    arrived                    tense="PAST"
                                    aspect="NONE"
    was taking               tense="PAST"
                                    aspect="PROGRESSIVE"
    had been sleeping    tense="PAST"
                                     aspect="PERFECTIVE_PROGRESSIVE"
                "I will think  about the plan", Mary said.
                Mary wanted to play soccer and convinced Eric to go the school field.
                He kicked the ball but it didn't move.
              Mary proposed to play soccer together.
              Eric regretted having accepted for the rest of his life.

 
          Similarly, I_STATES and STATES tend to denote events with an extension longer than occurrences:

                Eric ate slowly. He wasn't feeling well that day.

          Can we use this information in our cases at hand?      

Set of temporal relations:

The temporal relations you have to work with are the ones established as values of the attribute relType in TimeML TLINKS:
BEFORE 
AFTER
INCLUDES
IS_INCLUDED
DURING
SIMULTANEOUS
IAFTER
IBEFORE
IDENTITY
BEGINS
ENDS
BEGUN_BY
ENDED_BY

Texts to annotate:

These are the two texts upon which you should base your application: text1 and text2.

Technicalities

Programming Language Choice:

Preferably, your code should be developed in Python, but programs written in Perl, Java, or C will be also accepted. Bear in mind, however, that we can't provide as much support for those languages as for Python.

Processing the texts:

Being annotated in TimeML, the texts you will be working on are XML compliant. An effective way to process them is then by means of XML standard parsers. Note however that his will facilitate the retrieving and processing of the annotated entities (events, signals, temporal expressions, SLinks, and ALinks), but you may still need to work with other untagged elements in the text. For the task at hand,  tree-based parsers may work better than event-based ones, since they provide a static structure that you can traverse at any point to obtain the desired info. As some of you may already know, the W3C standard for a tree-based API for working with XML documents is DOM (Document Object Model).

If you are planning to work with Python, you can use minidom, the basic DOM parsing system, or pulldom, a more complete one. The import comand is:

from xml.dom.minidom import parser

or alternatively:

from xml.dom.pulldom import parser

Documentation on these modules can be found at:

http://www.python.org/doc/current/lib/markup.html

http://pyxml.sourceforge.net/topics/docs.html

Folks novice to XML parsers may want to refer to the very tutorial-like introduction:

http://pyxml.sourceforge.net/topics/howto/xml-howto.html


Make sure that your environment variable PYTHONPATH is set up to /usr/lib/python2.2 when working from the machines in the Berry patch. You can test it by typing:

echo $PYTHONPATH

If $PYTHONPATH is not set up appropriately, you can add the needed path in one of the following ways, depending on whether you work from a cshell or bash terminal:
echo 'setenv PYTHONPATH "/usr/lib/python2.2:\${PYTHONPATH}"' >> ~/.cshrc
source ~/.cshrc
echo 'export PYTHONPATH="/usr/lib/python2.2:${PYTHONPATH}"' >> ~/.bashrc
source ~/.bashrc
If you are planning to work with another language, make sure that you provide us with all the information necessary to run your piece of code (including problems due to path assignment issues).

Input and output:

Whatever is the language you use, your program MUST take a TimeML annotated text as input (extension .xml), which will contain no TLINKs, and return it with the appropriate TLINKs introduced (extension .tml.xml). To run it from python, the command should be equal or equivalent to:

python temporalOrder.py input_file.xml

For programs in other languages, let us know how to call them.


Make sure the output file is TimeML compliant; that is, it follows the TimeML spec for TLINKS:

lid ::= ID
{lid ::= LinkID
LinkID ::= l<integer>}
eventInstanceID ::= IDREF
{eventInstanceID ::= EventInstanceID}
timeID ::= IDREF
{timeID ::= TimeID}
signalID ::= IDREF
{signalID ::= SignalID}
relatedToEventInstance ::= IDREF
{relatedToEventInstance ::= EventInstanceID}
relatedToTime ::= IDREF
{relatedToTime ::= TimeID}
relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' | 'DURING' |
                      
'SIMULTANEOUS' | 'IAFTER' | 'IBEFORE' | 'IDENTITY' |
                      
'BEGINS' | 'ENDS' | 'BEGUN_BY' | 'ENDED_BY'
Also, check that your output file is XML well-formed. For that purpose trying to open it using Explorer or Mozilla should be enough.

Before submitting:

Document your code as much as necessary (funtion of each module, strategies applied to attain the task, etc.), especially if programming in a language other than Python. This way, you can be evaluated for partial credit even if your piece of code doesn't run. We are more concerned with the proper performance of the application, but are also grading your programming skills.

We will test your code on the Berry patch machines (check here for public workstations there). Whatever is the programming language, its version, or the environment you use to develop your assignment, make sure your application DOES run in any of the machines there before submitting it. Once submitted, NO later and improved versions will be accepted of programs that do not run on the Berry patch machines.

Grading

The success of your program will be judged based in part on how well it does on the two texts we provide you with, but we will also test it with another text of our choosing.