|

|
This workshop will bring together researchers from the fields of bioinformatics, natural language processing, ontologies, data mining, and information retrieval. Our focus will be on tools that can provide improved access and cross-indexing for the biomedical literature, databases and ontologies. We strongly encourage presentation of approaches that support end users and user-defined tasks. Biological databases have become increasingly important resources in this field. These databases contain a mix of data types, including sequence data (DNA and protein sequences), structured data such as molecular weights or GC content, and annotations in terms of controlled vocabularies and, increasingly, ontologies such as the Gene Ontology, as well as free text data in comment fields. Many biological databases are manually curated, that is, constructed by PhD biologists who read the literature and encode the information contained in the literature in the appropriate fields of the database that they are building.
Biomedical researchers then access these databases, the
literature, and other biological resources, for example, to
look for cross-organism comparisons (e.g, to determine the
function of a new protein by homology with known proteins
from other organisms). This search may combine structured
data from experiments, such as micro-array data, with
features developed from the literature (e.g., classes of
proteins mentioned in the same article as another protein),
in conjunction with classifications from ontologies such as
GO. There are numerous points in this scenario where tools
from information extraction, ontologies, data mining, and
information retrieval could help.
The workshop will explore two challenges: 1) providing more transparent access to the literature for both humans and computers; and 2) linking the literature and the databases via the use of standardized nomenclatures and ontologies. We encourage papers that assess tools in terms of their utility to the intended set of uses and users, whether database curators or researchers trying to find all references to a research topic, or tools that aggregate information across multiple databases. We encourage papers describing work in the following areas, as applied to biology and bioinformatics:
Intended AudienceThis workshop is intended for researchers in text mining and natural language processing applied to biology, as well as for practitioners who are applying these techniques to their specific problem areas. The workshop will include submitted talks plus invited talks from the user community, to highlight biologists' needs and problems. It will also include short reports from recent challenge evaluations, e.g., the TREC Genomics Track, and the BioCreAtIvE evaluation.
This workshop follows on previous work from the
bioinformatics, information extraction and text mining
communities. On the bioinformatics side, there have been a series of
sessions at the Pacific Symposium
on Biocomputing
devoted to text mining and
ontologies. There have been three meetings of the Special
Interest Group for Text Mining in Biology at the annual
meeting of the Intelligent Systems for
Molecular Biology. On the language
processing side, there have been two previous ACL workshops
on biomedical text mining, ACL 2002,
ACL 2003.
Expected number of participants: 45
Full papers of maximum 8 pages (including references, figures) Authors should follow the main conference ACL style format.
Please email submissions to
Lynette Hirschman:
Program Committee
|