Programming Assignment 4: First Steps to Question Answering

CS 114b, Spring 2008
Due: Sunday, 4 May 23:59:59

In this exercise you are to write a simple question answering system. It will answer questions posed to it about named entities in a corpus. The corpus will be TreeBank, and you will use PropBank predicate argument labels to differentiate the role a named entity has in the relation.

The separate tasks you need to write are:

  1. CREATE-INDEX: Create an index of predicates and argument heads (to be used as entities) from the TreeBank.
  2. ANALYZE-QUESTION: Write a question-analyzer which is based on the NLTK chunking technology. This will parse the questions posed to the system and convert them to PropBank-friendly features and lexical stems that will be associated with the index you created in CREATE-INDEX.
  3. MATCH-QUERY-TO-DOC: Write a query-document matcher that matches the features created from ANALYZE-QUESTION to the corpus. The program returns those offsets that match the constraints established by the parser of the query.

Here are the questions you need to be able to parse, and match against the corpus and index.

  1. What hurt USX results?
  2. What are exporters scrounging for?
  3. What did Nixon harp on?
  4. Who are fundamentalist investors wooing?
  5. What did Bhutto defeat?
  6. What wouldn't Reliance do?
  7. What couldn't people identify precisely?
  8. What have some U.S. firms reaped?
  9. What may imports of sweaters be doing to a domestic industry?