-----

BioLink 2004:

Linking Biological Literature, Ontologies and Databases: Tools for Users

Boston, May 6, 2004


Workshop in conjunction with NAACL/ HLT 2004


Workshop Schedule

Poster Presentations

HLT-NAACL 2004

Important Dates

Areas of Interest

Intended Audience

Submission Format

Organizing Committee

Program Committee



Related Links

Important Dates

  • Submissions due
  • Monday, Feb. 9, 2004: Acceptances
  • Monday, Feb. 23, 2004: Camera ready copy due
  • Thursday, May 6, 2004: Workshop

This workshop will bring together researchers from the fields of bioinformatics, natural language processing, ontologies, data mining, and information retrieval. Our focus will be on tools that can provide improved access and cross-indexing for the biomedical literature, databases and ontologies. We strongly encourage presentation of approaches that support end users and user-defined tasks.

Biological databases have become increasingly important resources in this field. These databases contain a mix of data types, including sequence data (DNA and protein sequences), structured data such as molecular weights or GC content, and annotations in terms of controlled vocabularies and, increasingly, ontologies such as the Gene Ontology, as well as free text data in comment fields. Many biological databases are manually curated, that is, constructed by PhD biologists who read the literature and encode the information contained in the literature in the appropriate fields of the database that they are building.

Biomedical researchers then access these databases, the literature, and other biological resources, for example, to look for cross-organism comparisons (e.g, to determine the function of a new protein by homology with known proteins from other organisms). This search may combine structured data from experiments, such as micro-array data, with features developed from the literature (e.g., classes of proteins mentioned in the same article as another protein), in conjunction with classifications from ontologies such as GO. There are numerous points in this scenario where tools from information extraction, ontologies, data mining, and information retrieval could help.

Areas of interest

The workshop will explore two challenges: 1) providing more transparent access to the literature for both humans and computers; and 2) linking the literature and the databases via the use of standardized nomenclatures and ontologies. We encourage papers that assess tools in terms of their utility to the intended set of uses and users, whether database curators or researchers trying to find all references to a research topic, or tools that aggregate information across multiple databases. We encourage papers describing work in the following areas, as applied to biology and bioinformatics:

  • Information extraction
  • Information retrieval
  • Text mining and clustering
  • Term recognition
  • Ontology construction and ontology mapping
  • Tools for biological database curation and annotation
  • Visualization tools for viewing clustered or extracted information or meta-data
  • Construction of pathways from literature and databases
  • Analysis of gene array data using computational linguistic techniques
  • Evaluation methods
  • User-centric evaluation of tools

Intended Audience

This workshop is intended for researchers in text mining and natural language processing applied to biology, as well as for practitioners who are applying these techniques to their specific problem areas. The workshop will include submitted talks plus invited talks from the user community, to highlight biologists' needs and problems. It will also include short reports from recent challenge evaluations, e.g., the TREC Genomics Track, and the BioCreAtIvE evaluation.

This workshop follows on previous work from the bioinformatics, information extraction and text mining communities. On the bioinformatics side, there have been a series of sessions at the Pacific Symposium on Biocomputing devoted to text mining and ontologies. There have been three meetings of the Special Interest Group for Text Mining in Biology at the annual meeting of the Intelligent Systems for Molecular Biology. On the language processing side, there have been two previous ACL workshops on biomedical text mining, ACL 2002, ACL 2003.

Expected number of participants: 45

Submission format:

Full papers of maximum 8 pages (including references, figures) Authors should follow the main conference ACL style format.

Please email submissions to Lynette Hirschman:
lynette@mitre.org.

Organizing Committee

  • Lynette Hirschman, MITRE (Chair)
  • James Pustejovsky, Brandeis University (Co-Chair)
  • Ian Donaldson, The Blueprint Initiative - North America
  • Carol Friedman, Columbia
  • William Hayes, Astrazeneca
  • Marc Light, University of Iowa

Program Committee

  • Sophia Ananiadou
  • John Cleary
  • Mark Craven
  • Patricia Dyck
  • David Eichmann
  • Udo Hahn
  • Larry Hunter
  • Alexa McCray
  • Joyce Mitchell
  • Alex Morgan
  • See-Kiong Ng
  • Padmini Srinivasan
  • Robert Stevens
  • Lorrie Tanabe
  • Jun'ichi Tsjuii
  • Bonnie Webber
  • Pierre Zwiegenbaum