posted October 5, 2010

New Software Speeds Genome Analysis

New software developed with the involvement of faculty and student researchers at Hope College is helping to speed dramatically the time-intensive, painstaking task of transforming the immense amount of genomic data being collected worldwide into working models of how organisms function at a fundamental level.

The work was recently highlighted in an article published in the August issue of the prestigious monthly scientific journal "Nature Biotechnology."  In 2009, the journal ranked first in biotechnology and applied microbiology for its "impact factor" - a measure of the frequency with which the publication's articles are cited elsewhere - as determined by Thomson Reuters Journal Citation Reports.

According to Hope faculty members Dr. Aaron Best (pictured left) and Dr. Matthew DeJongh (pictured right), co-leaders of the Hope research team, the web-based "Model SEED" resource, which is available at http://www.theseed.org/models, can reduce to 48 hours most of the calculations that used to take a year to render manually - and the research team is working on speeding up the rest.  Even as the Model SEED is already being used by scientists internationally, they noted that the team is also looking at a further step: how to use the models in examining a variety of questions, from the way that some bacteria cause diseases to how other bacteria produce energy and clean up pollution.

The resource reflects collaboration between researchers at Argonne National Laboratory - where Model SEED is based - the University of Chicago and Hope.  The "Nature Biotechnology" article's six co-authors, representing the three institutions on the research team, include three from Hope: DeJongh, who is an associate professor of computer science; Best, an associate professor of biology; and recent graduate Paul Frybarger, who worked on the project as a student.  Lead author is Christopher S. Henry of Argonne National Laboratory and the University of Chicago; the other authors, Ben Linsay and Rick L. Stevens, are both with the University of Chicago as well.

DeJongh noted that the impetus for the Model SEED grew from the reality that the tools for working with genomic information haven't kept up with the explosion in data collection made possible by advances in biotechnology.

"The capability of generating data has been outpacing the ability to analyze it, so we've needed software to analyze this genome-level data," he said.  "That's where the tools that we've created come in."

"This kind of work used to take a person about a year to do on a manual basis," DeJongh said.  "We can effectively do 80 to 90 percent of that work in 48 hours now, and we're working on tools to assist scientists doing that remaining 10 to 20 percent by hand."

Combined, the speed and analytical capabilities are an important new tool, Best explained.

"A particularly valuable outcome of this work is that the software provides a powerful framework for comparing how different organisms perform certain functions or respond to conditions.  This is a framework that has been lacking," he said.  "A resource like this allows researchers to, for instance, look at two very closely related strains of bacteria--one that causes disease, another that doesn't - and better understand the differences in how they function."

The Model SEED is accessible to researchers around the globe through the RAST (Rapid Annotation using Subsystems Technology) genome analysis service available through Argonne National Laboratory.  The SEED project is a nationwide, open-source effort to develop and share genomic data.

Best and DeJongh and Hope students have been working on the project for about five years.  In addition to receiving National Science Foundation (NSF) grant support for their work, Best and DeJongh both received Towsley Research Scholar awards from Hope and funding from a multi-purpose Howard Hughes Medical Institute grant to Hope.

In developing and testing the Model SEED, the research team generated 130 models of a variety of forms of bacteria - life forms that involve thousands of distinct biochemical processes.  Subsequent comparison of the models with real-life data has found the system to be 87 percent accurate.

"It is remarkable that this level of agreement between model and experimental data is achieved with little manual input into the model-building process and for a very diverse group of bacteria," Best said.  "As researchers begin to focus in on models for individual bacteria, not only will this accuracy increase, but the models have the potential to reveal and predict important behaviors that will help us understand how these bacteria interact with their environment."

The overall effort is currently supported by two major grants from the NSF.  The first, awarded to Best and DeJongh in 2008, has supported the development of the models and modeling software, and will continue through this spring as the team seeks to automate some of the remaining analysis.  The other, a two-year award that Best and DeJongh received with colleague Dr. Nathan Tintle of the mathematics faculty, is supporting taking the simulation further, from creating a model that accurately reflects the systems being represented to using the model to see how the systems respond to different situations.

"The new grant builds on this resource to use these models to analyze other kinds of data," DeJongh said.  "What can we get out of it when we start feeding it environmental data and make simulations and statistical predictions?"