Biology Department

image
Bioinformatics Online Laboratory

THE GENOME IS A BOOK

There are twenty-three chapters called chromosomes.
Each chapter contains several thousand stories called genes.
Each story is made up of paragraphs called exons,
which are interrupted by advertisements called introns.
Each paragraph is made up of words called codons.
Each word is written in letters called bases.

There are 1 billion words in the human "book."

From Genome, by Matt Ridley, 2000




Objectives
  • To visualize the connection between DNA, RNA, and protein (i.e. transcription and translation).
  • To understand how RNA processing is involved with the production of a final protein product.
  • To learn how to access and utilize databases such as Entrez Nucleotide, BlastN, ClustalW, and Structure.
  • To visualize how simple changes in DNA can cause serious disease.
  • To understand how sequence homology relates to evolution.
  • To understand the power of Bioinformatics.

Biology and the Information Age

Technology is constantly changing the different fields of science. Recently, high speed computers have dramatically reduced the time needed to sequence entire genomes. Scientists around the world can benefit from having access to the complete genomes of C. elegans (nematode), D. melanogaster (fruit fly), A. thaliana (mustard seed plant), and even human beings! Studies of these complete genomes make it possible to develop new methods of gene therapy, produce new drugs, and directly compare genetic variations between different species. It is now possible to study what genes are turned on during development and when, how proteins interact with each other, and the roles of genes and proteins in disease just to name a few of the benefits to having a complete sequence. While this new wealth of knowledge provides innumerable benefits to furthering science, one has to wonder how it is possible to keep up with the fast paced field of biology. This laboratory will help you to understand how computers are making it easier for scientists to benefit from the information now available to them.


So what is bioinformatics?

Well, because the field is in its infancy there are varying definitions. Some say that it is the science of developing computer databases and algorithms for the purpose of speeding up and enhancing biological research (whatis.com). Others define it as the science and technology of learning, managing and processing biological information. (Definition taken from Missouri University lecture). No matter how it is defined, the field is becoming increasingly valuable as scientists find that large amounts of new information are available to them.


Goals of this laboratory

One of the more common benefits to combining computers with science is demonstrated by several online databases. These sites contain large amounts of related data such as collections of nucleotide sequences or amino acid sequences and make it possible for scientists to search for all information available on a topic at once rather than spending weeks or months to search through data located in different areas. For example, if a scientist in Germany believes he has sequenced a gene but is unsure of its identity, he can access a nucleotide database and run his sequence against the millions of sequences already identified by other scientists around the world. Comparison between his sequence and other similar sequences of known function may provide him with clues about the identity and function of his sequenced gene.

Besides demonstrating the power of Bioinformatics, this laboratory will help you visualize the connection between a DNA sequence and the final protein product encoded by that sequence, as well as all of the steps in between! Remember that each of the databases you will use are interrelated in some way just as the steps from DNA to RNA to protein are related. For example, the Entrez Nucleotide sequence database is only one small part of a much larger database produced by the National Center for Biotechnology Information. As illustrated below, each database in the larger collection is linked to each of the other databases. (You will learn more about the specifics behind each database in the illustration at a later point during the lab.)


Figure 1: Diagram of how NCBI links together smaller databases into one larger collection.


These linked databases make it possible to gain information from many different areas while performing only one search. A scientist can use a given DNA sequence and find out about the amino acid sequence, protein structure, or even diseases caused by a mutation in that sequence.
Preparing for the Lab (and the Pre-lab quiz!)

This laboratory will use different online databases to study the beta-globin gene family and the diseases that result from mutations to the beta-globin gene.