What is Shallow Discourse Parsing?

A typical text consists of sentences that are glued together in a systematic way to form a coherent discourse. Shallow discourse parsing is the task of parsing a piece of text into a set of discourse relations between two adjacent or non-adjacent discourse units. We call this task shallow discourse parsing because the relations in a text are not connected to one another to form a connected structure in the form of a tree or graph.

Discourse Analysis in the Penn Discourse Treebank

There are alternative conceptions of the discourse structure and discourse annotated corpora come in different flavors. For the CoNLL Shared Task this year, we chose to use the Penn Discourse Treebank as the shared task data set as it is the largest corpus of its kind. The PDTB annotates a text with a set of discourse relations. A discourse relation is composed of:

  • a discourse connective, which can be a coordinating conjunction (e.g., "and", "but"), subordinating conjunction (e.g. "if", "because"), or a discourse adverbial (e.g., "however", "also"). In an implicit discousre relaiton, a discourse connective is omitted.
  • two Arguments of the discourse connective, Arg1 and Arg2, which are typically text spans the size of clauses or sentences.
  • the sense of the discourse connective, which characterizes the nature of the relationship between the two arguments of the connective (e.g., contrast, instantiation, temporal precedence).

Examples of discourse relations

Here is a paragraph taken from the document wsj_1000 in the PDTB. A shallow discourse parser will output a bunch of discourse relations, which can be visualized below. Arg1 is shown in red, and Arg 2 is shown in blue. The discourse connective is underlined.

Explicit Discourse Relations

According to Lawrence Eckenfelder, a securities industry analyst at Prudential-Bache Securities Inc., "Kemper is the first firm to make a major statement with program trading." He added that "having just one firm do this isn't going to mean a hill of beans. But if this prompts others to consider the same thing, then it may become much more important."

The discourse connective is 'but', and the sense is Comparison.Concession.

Implicit Discourse Relations

According to Lawrence Eckenfelder, a securities industry analyst at Prudential-Bache Securities Inc., "Kemper is the first firm to make a major statement with program trading." He added that "having just one firm do this isn't going to mean a hill of beans. But if this prompts others to consider the same thing, then it may become much more important."

The omitted discourse connective is 'however'. and the sense is Comparison.Contrast.