Programming Assignment 4: First Steps to Question Answering
CS 114b, Spring 2008
Due: Sunday, 4 May 23:59:59
In this exercise you are to write a simple question answering system.
It will answer questions posed to it about named entities in a
corpus. The corpus will be TreeBank, and you will use PropBank
predicate argument labels to differentiate the role a named entity has
in the relation.
The separate tasks you need to write are:
- CREATE-INDEX: Create an index of predicates and argument heads (to be
used as entities) from the TreeBank.
- ANALYZE-QUESTION: Write a question-analyzer which is based on the
NLTK chunking technology. This will parse the questions posed to the
system and convert them to PropBank-friendly features and lexical stems
that will be associated with the index you created in CREATE-INDEX.
- MATCH-QUERY-TO-DOC: Write a query-document matcher that matches the
features created from ANALYZE-QUESTION to the corpus. The program
returns those offsets that match the constraints established by the
parser of the query.
Here are the questions you need to be able to parse, and match against
the corpus and index.
- What hurt USX results?
- What are exporters scrounging for?
- What did Nixon harp on?
- Who are fundamentalist investors wooing?
- What did Bhutto defeat?
- What wouldn't Reliance do?
- What couldn't people identify precisely?
- What have some U.S. firms reaped?
- What may imports of sweaters be doing to a domestic industry?