Course Project

The course project will be done in groups of approximately 3 people. Components of the project and due dates are listed below. The work is to be done as a group except for the final peer evaluation, which is done individually. We will be forming groups in the first2-3 weeks of class.

Groups should keep all their work in the class GoogleDocs folder in the class Googledocs. Please do not put things there from other courses. We will be accessing this from time to time for grading.

https://drive.google.com/a/brandeis.edu/#folders/0B6z1otdg2OZuNmhUZEk3ZFlsT1k

When submitting assignments through Latte

  • Zip all files together with your name or group name in the file names
  • Include your name or group name in the zip file name

Late submission rules

  • You can only get an extension by asking for permission before the due date
  • No permission will be granted after the due date except for cases of dire emergency
  • While I accept all reasonable excuses, don't overuse the privilege
  • Waiting until the 11th hour to start an assignment is not a reasonable excuse

Project components and Due Dates:

The following shows the major deadlines through the semester with due dates and deliver method. Please check this as they may change as the semester progresses.

Deliverable Date Description Delivery
Annotation goal Feb 6 Write up of the annotation goal, including what your team should be able to express what you hope to achieve with the annotation, and how far-reaching the task will be. Latte, Group
Group Contract Feb 6 Document that you and your teammates will use to assign responsibility for specific tasks within the larger project Latte, Group
Task Description and Corpus Selection Feb 13 Longer version of the annotation goal with preliminary answers to the Matter Checklist Group folder
Short in class presentation
Initial Annotation Spec Feb 13 Initial spec shoudl focus on Model (terms, relations, interpretation) Group folder
Full Annotation Spec Feb 24 Full annotation specification. Focus on the corpus and what you are annotating, connect to the goal Group folder
Short in class presentation
Annotation Guidelines Mar 3 Draft of your annotation guidelines. Focus on examples, particularly edge cases. Annotators will provide feedback Group folder
Short in class presentation
Final Annotation Guidelines Mar 6 Begin swap annotation Group folder
Adjudication and precision and recall April 14 Annotation completed Include results in writeup
Train Machine Learning Algorithm April 24 Description of algotithms and experiments to evaluate results Incldes results in writeup
Write-up Apr 28 A report of the entire annotation process Latte, Group
Presentation, During finals week Presentation of research findings Latte, Group
Peer Evaluation Same day as presentation Evaluation form about the participation of you and the other members of your group Latte, Individual

A. Annotation goal

There are many different topics in linguistics that you and your team can choose to examine over the course of the semester. Below is a list of possible topics—you can choose any one of these or select your own, as long as your task is approved by us.

  • sentiment/attitude/stance
  • syntactic dependencies
  • semantic role or semantic function labeling
  • anaphora
  • multimodal annotation
  • Textual cohesion and coherence
  • metonymy
  • coercion (GLML)
  • narrative structure
  • temporal information (TimeML)
  • spatial information (ISO-Space)
  • presuppositions, implicatures, inferences

When writing up the annotation goal, your team should be able to express what you hope to achieve with the annotation, and how far-reaching the task will be. For example, if you want to do genre annotation of newspaper articles, how specific will your categories be? Will you do broad categories, such as “world news”, “local news”, “sports”, or will you be more specific: “sports: baseball”, “world news: Europe”?

B. Group Contract

The group contract is a document that you and your teammates will use to assign responsibility for specific tasks within the larger project. These can be divided however you want, as long as everyone in the group feels that the tasks are shared equally. There are a few options for how to divide the work — some ways you might want to consider are:

  • Divide by task: each group member gets a number of the tasks outlined in the grade section; equality is determined by grade percentage points
  • Divide by skill: the group discusses what skills they bring to the group, and the work within the tasks is divided up accordingly.
  • Everyone takes part: if you prefer that everyone have a say in all the aspects of the grade, the we recommend outlining roles within the group. For example, one person might be in charge of starting each task, one person will be the timekeeper, and the other will be the editor.

Please note that everyone in the group will be doing annotation, so that should not factor into the task assignment. Once you are all satisfied with the contract, everyone in the group must sign it and it must be turned in to John Vogel. The contract will be useful later in the peer evaluation section. Changes to the contract can be made throughout the semester, but must be submitted (signed) to John.

C. Task Description and Corpus Selection

The task description is a longer version of the annotation goal, where you will begin to define exactly how your annotation task will be done. This is closely related to the corpus selection, which will define the type of documents you will annotate (news articles, movie reviews, selected sentences displaying a phenomenon), the source of your documents (Internet, news archives, another corpus), and the size of the corpus (this will vary by task). The task description should be at least 1 page, typed.

D. Annotation Schema

The schema document is what describes the tags and attributes that will be used for the annotation task. This description is in the form of a DTD-like document that can be input into the annotation environment so the annotators will be able to label the texts appropriately. Specifics for how these documents are created will be provided later in the semester.

E. Task Specification

The specification document contains the detailed instructions for the annotation task that will be provided to annotators. This document must be clear, contain relevant examples, and outline any exceptions or special cases that the annotators might encounter. There will be two versions of this: the initial version that is provided to the annotators, and the revised version that will be produced after the annotation is completed.

F. Annotation

Your group will not be performing its own annotation task—rather, each group will be given the schema and specification for a different group. Grading for annotation will be based partially on whether you as an annotator met the deadlines, and partly on how well you followed the instructions given by the other group.

G. Adjudication and precision and recall

Adjudication is the process of creating a gold standard corpus from the documents provided by your annotators. No annotation effort is without mistakes, and part of how an annotation task is evaluated is how closely the annotators agree to each other and to the gold standard once it is completed. Adjudication is done by hand. Your group must write a program that can calculate precision and recall for each annotator when compared against the gold standard. A write-up of common annotator mistakes must be turned in, along with the code and the gold standard.

H. Train Machine Learning Algorithm

Once a gold standard has been created, it can be used for testing and training various machine learning algorithms. The algorithms that your group implements will depend largely on your annotation topic, but for most groups this will involve using the NLTK or a similar tool kit, not writing your own machine learning algorithm.

I. Write-up

A report of the entire annotation process will be a major part of your grade. The format will be similar to conference articles in the Computational Linguistics field, e.g., acl.sty. You will have the opportunity to turn in a draft for evaluation and review prior to the final deadline.

J. Presentation

Each group will present their research findings to the rest of the class. This will occur during the period allotted for final exams. Each group member must present the work that they did—if each member collaborated on all parts of the project, then each member still must give part of the presentation.

K. Peer Evaluation

Once the presentations are completed, each of you will be asked to fill out an evaluation form about the participation of the other members of your group. These will be collected immediately by the TA, and taken into account when calculating final grade.