Assignment 5:
Shallowing Parsing: Category Groups and Named Entities


Due: March 31, 1997

Now that you have the experience with part-of-speech tagging, you can move on to the more interesting and significantly harder task of finding phrases in the text.

Part 1: Shallow Noun Phrase Parsing

Your first task is to identify the Noun Phrases in a text. For example, consider the tagged sentence below:

The/DT green/JJ dog/NN ate/VB a/DT large/JJ cookie/NN on/IN
the/DT table/NN ./.

From this you should be able to identify those sequences that are legitimate Noun Phrases (NPs), as specified by a set of phrase structure rules (given partially below). An example of the grouped output on the above sentence would be as follows:

<NX>The/DT green/JJ dog/NN</NX> ate/VB <NX>a/DT large/JJ
cookie/NN</NX> on/IN <NX>the/DT table/NN</NX> ./.

Your program will take tagged text as input. This text is the output of running your tagger on the text provided for the previous assignment, found in input file. As with Assignment 4, use the same lexicon as before.

Noun Phrase Grammar Rules

The following patterns will catch about 80% of the NPs in the text, but there are many phrases not covered by this set. Furthermore, this set of rules does not allow recursion to capture the real phrasing of the NPs as needed for full interpretation. Refer to the tagset .
NX --> DT NN			NX --> CD
NX --> NN			NX --> NNP CD
NX --> PRP			NX --> NN NN
NX --> NNP			NX --> JJ NN NNS
NX --> NNS			NX --> DT NNP NNP
NX --> DT JJ NN			NX --> DT NNP
NX --> DT NNS			NX --> CD NNS
NX --> ???			NX --> WP$ NNS
NX --> JJ NNS			NX --> PRP$ NN NN 
NX --> NNP NNP			NX --> NNP NNPS
NX --> PRP$ NN			NX --> NNP NNP NNP
NX --> NN NNS			NX --> NNP NN
NX --> JJ NN 			NX --> NN NNS NN 
NX --> WDT			NX --> EX
NX --> PRP$ NNS			NX --> DT NNP NNP NNP 
NX --> WP			NX --> DT NNP NNP NN NN 
NX --> DT NN NN 		NX --> DT NNP NN 
NX --> DT CD NNS 		NX --> DT VBG NN 

Part 2: Shallow Verb Cluster Parsing

Your next task is to identify the verb clusters in a text. For example, consider the tagged sentence below:

The/DT green/JJ dog/NN was/VBD eating/VB a/DT large/JJ cookie/NN
while/IN John/NNP slept/VBD ./.

You should identify the legitimate Verb Clusters (VCs), as specified by a set of phrase structure rules (given partially below). An example of the grouped output on the above sentence would be as follows, using the NP groups as well:

<NX>The/DT green/JJ dog/NN</NX> <VX>was/VBD eating/VBG</VX> 
<NX>a/DT large/JJ cookie/NN</NX> 
while/IN <NX>John/NNP</NX> <VX>slept/VBD</VX> ./.

Verb Cluster Grammar Rules

The following patterns will catch about 90% of the VCs in the text: Again, look at the tagset to identify the tags.
VX --> VBD			VX --> VBZ RB VBP
VX --> VBZ			VX --> VBZ RB VB
VX --> VBP			VX --> VBP VBN JJ
VX --> MD VB			VX --> VBP RB VBN JJ
VX --> VBP VBN			VX --> VBD RB VBN
VX --> VBZ VBN			VX --> VBD RB VB
VX --> VBN			VX --> VBD VBD
VX --> VB			VX --> VBD RB VBN
VX --> MD VB VBN		VX --> MD RB VP 
VX --> VBD VBN			VX --> MD VB VBG
VX --> VBP RB VB		
VX --> VBP VBN VBG		
VX --> VBP VBG			
VX --> VBZ VBG			

Part 3: Named Entity Parsing

Given a Yahoo News article, extract all names and label them with one of the following semantic tags: PERSON, INSTITUTION, PLACE, TIME.

For instance:

President Clinton has chosen U.S. Army Gen. Wesley Clark to become commander of all allied NATO forces and American troops in Europe, a senior Pentagon official said Monday.
[President Clinton]/PERSON
[U.S. Army Gen. Wesley Clark]/PERSON
[NATO]/INSTITUTION
[Europe]/PLACE
[Pentagon]/INSTITUTION
[Monday]/TIME

The news article you should work with is available in a untagged version and a tagged version. Use the tagged version unless you want to show off your tokenizer and tagger.