blast

What is BLAST?
BLAST (Basic Local Alignment Search Tool) represents a powerful nucleic acid and protein alignment algorithm that makes pairwise comparisons between a submitted sequence ("query") and sequences found in the NCBI databases ("subject").

Tips for using BLAST

All BLAST searches require you to input a Query Sequence either by cutting and pasting sequence in FASTA format or by entering an accession number.  You can also select which part of the database you want to search; for example, ‘RefSeq’ is curated to be a non-redundant dataset (including one copy of each gene or protein and excluding multiple copies of each record).  You can also limit your search to specific species.

Frequently asked questions:

What is FASTA format and why is it important?
FASTA (pronounced “FAST-A”) format begins with a first line of information beginning with the > sign, which can be followed by identifying information such as the accession number and species.  Starting on the next line is the sequence.  Note that there are no line numbers or punctuation: it is perfect for cutting and pasting into the “Enter Query Sequence” box for a BLAST search.

What form of BLAST do I use?
There are several versions of BLAST to choose from depending on what you know and what you want to find.  A common example is a BLASTp search with an amino acid sequence from one organism to look for orthologs in other species.  As another example, if you had a nucleotide sequence but wanted to compare the functional product of that sequence to other sequences in the database, you could use BLASTx.

Program Subject/Database Query
BLASTp protein protein
BLASTn nucleotide nucleotide
BLASTx protein nt.-->protein
tBLASTn nt.-->protein protein
tBLASTx nt.-->protein nt.-->protein

 

How do I interpret the results of my BLAST search?
There are several sections on the Results page that you see following a BLAST search:

1. The first major section is a graphical display of the strongest matches (hits) to the submitted sequence, color-coded according to the alignment score. An alignment score (S) indicates how strong the match was (higher is better).  A statistical measure of the significance of the match is given as (E); the E value is the expectation that the match would have been found in the database by chance alone (lower is better).

2. The second section is a detailed list of hits ordered by their alignment scores. Note that each line gives the identification information for the protein followed by the alignment score and the E value. 

3. Further down the page, the output gives the actual alignments for the various hits in the list.  Note that in each alignment that portions of the Query sequence (what you submitted) and the Subject (what the algorithm found) sequences are lined up (“aligned”). The middle line compares the two sequences: empty spaces indicate mismatches and a + sign indicates similarity between the two different amino acids compared.  Sometimes one sequence must be “cut” and a gap introduced (denoted by -), in order to make this sequence align in the optimal way with the other sequence.  Just above the alignment you can find information about the percentage of identical nucleotides or amino acids between the two sequences, which can be a very useful measure of how similar two sequences are.

How do I copy an alignment into a report? 
For a PC, try using the steps outlined below (For a Mac, try the Grab program):

1. When you have the image you want on your screen, press Alt+PrintScrn at the same time.  

2. Open the Paint program.

3. Use the “paste” command to place your screen shot into Paint.

4. Select regions you want to cut and paste.

What are tools that allow you to compare protein or nucleic acid sequences?

  • ClustalW is a global alignment algorithm that can align multiple sequences at a time.
  • PROSITE at ExPASy allows you to search for conserved motifs or domains based on primary amino acid sequence.
  • The Softberry website houses many utilities of interest, including gene finding algorithms.
  • Sequence Manipulation Suite is another useful website when analyzing DNA or protein sequences.

Type any of the tools above into Google or another search engine to find them online.

Additional help is provided on the BLAST website.