Now that you have the experience with part-of-speech tagging, you can move on to the more interesting and significantly harder task of finding phrases in the text.
Your first task is to identify the Noun Phrases in a text. For example, consider the tagged sentence below:
The/DT green/JJ dog/NN ate/VB a/DT large/JJ cookie/NN on/IN the/DT table/NN ./.
From this you should be able to identify those sequences that are legitimate Noun Phrases (NPs), as specified by a set of phrase structure rules (given partially below). An example of the grouped output on the above sentence would be as follows:
<NX>The/DT green/JJ dog/NN</NX> ate/VB <NX>a/DT large/JJ cookie/NN</NX> on/IN <NX>the/DT table/NN</NX> ./.
Your program will take tagged text as input. This text is the output of running your tagger on the text provided for the previous assignment, found in input file. As with Assignment 4, use the same lexicon as before.
NX --> DT NN NX --> CD NX --> NN NX --> NNP CD NX --> PRP NX --> NN NN NX --> NNP NX --> JJ NN NNS NX --> NNS NX --> DT NNP NNP NX --> DT JJ NN NX --> DT NNP NX --> DT NNS NX --> CD NNS NX --> ??? NX --> WP$ NNS NX --> JJ NNS NX --> PRP$ NN NN NX --> NNP NNP NX --> NNP NNPS NX --> PRP$ NN NX --> NNP NNP NNP NX --> NN NNS NX --> NNP NN NX --> JJ NN NX --> NN NNS NN NX --> WDT NX --> EX NX --> PRP$ NNS NX --> DT NNP NNP NNP NX --> WP NX --> DT NNP NNP NN NN NX --> DT NN NN NX --> DT NNP NN NX --> DT CD NNS NX --> DT VBG NN
Your next task is to identify the verb clusters in a text. For example, consider the tagged sentence below:
The/DT green/JJ dog/NN was/VBD eating/VB a/DT large/JJ cookie/NN while/IN John/NNP slept/VBD ./.
You should identify the legitimate Verb Clusters (VCs), as specified by a set of phrase structure rules (given partially below). An example of the grouped output on the above sentence would be as follows, using the NP groups as well:
<NX>The/DT green/JJ dog/NN</NX> <VX>was/VBD eating/VBG</VX> <NX>a/DT large/JJ cookie/NN</NX> while/IN <NX>John/NNP</NX> <VX>slept/VBD</VX> ./.
VX --> VBD VX --> VBZ RB VBP VX --> VBZ VX --> VBZ RB VB VX --> VBP VX --> VBP VBN JJ VX --> MD VB VX --> VBP RB VBN JJ VX --> VBP VBN VX --> VBD RB VBN VX --> VBZ VBN VX --> VBD RB VB VX --> VBN VX --> VBD VBD VX --> VB VX --> VBD RB VBN VX --> MD VB VBN VX --> MD RB VP VX --> VBD VBN VX --> MD VB VBG VX --> VBP RB VB VX --> VBP VBN VBG VX --> VBP VBG VX --> VBZ VBG
Given a Yahoo News article, extract all names and label
them with one of the following semantic tags: PERSON, INSTITUTION,
PLACE, TIME
.
For instance:
President Clinton has chosen U.S. Army Gen. Wesley Clark to become commander of all allied NATO forces and American troops in Europe, a senior Pentagon official said Monday.
PERSON
PERSON
INSTITUTION
PLACE
INSTITUTION
TIME
The news article you should work with is available in a untagged version and a tagged version. Use the tagged version unless you want to show off your tokenizer and tagger.