Personal Blog | Research Blog | About
WormBase in the era of comparative genomics: Model Organism Databases and the challenges of multiple species
Todd W. Harris1, Lincoln D. Stein1, and the WormBase Consortium2,3,4.
2005. Genome Informatics Meeting, Cambridge, England.
1 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
2 Department of Genetics, Washington University at St. Louis, St. Louis MO
3 Wellcome Trust Sanger Genome Institute, Hinxton UK
4 Howard Hughes Medical Institute, Caltech, Pasadena, CA
Model Organism Databases (MODs) such as WormBase, FlyBase, and SGD aim to develop highly curated and annotated resources of their respective organisms. Typically, MODs seek to provide a web-accessible interface to their databases, with the central aims focusing on data storage, annotation, analysis, and distribution.

MODs extend beyond the genomic sequence and related annotations to include a vast array of diverse experimental data ranging from studies of single genes to large-scale genomic and proteomic analyses. This includes data such as genetic and biochemical interactions, strains, alleles, reagents, expression patterns and profiles, homologies, and literature citations. Thus, MODs integrate vast amounts of information in both an organism-centric but also broadly biologically relevant manner. Now, as the initial wave of genome sequencing recedes and comparative sequencing begins in earnest, MODs are redefining their role in the research community.

The explosion in sequencing of related genomes and comparative analyses poses serious questions for those who build and maintain model organism databases. Where on the spectrum between data analysis and data distribution do MODs reside? What is the most sensible and intuitive way to present data from different species within the sophisticated and highly developed user interfaces of MODs? How can we represent pairwise and multiple genome alignments without sacrificing information content or making it overly complex? How can we make the resource more accessible to increasingly sophisticated users who need access to not just single page-based views, but also to large swaths of the database?

As a mature MOD, WormBase Ð the central data repository for C. elegans Ð is currently addressing many of these issues. Now entering its third phase of existence, WormBase is well-positioned to face these challenges. The first phase focused on refining the interface and tools available for browsing the C. elegans genome. The second phase saw the inclusion of the complete genomic sequence of C. briggsae and comparative analyses between C. briggsae and C. elegans. In the third phase, WormBase will be incorporating sequence and comparative data from three additional species slated for completion in 2004, as well as a requisite reworking of the software that drives WormBase. This presentation will focus on the problems and lessons learned at WormBase in incorporating multiple genomes into the resource.


Todd W. Harris, PhD (harris@cshl.org)
$Id: abstract_boiler.shtml,v 1.1.1.1 2005/10/18 18:15:29 todd Exp $