entrez

What is ENTREZ?

Entrez is a cross NCBI database search.

Tips for using ENTREZ:

Type your search term(s) in the box at the top of the page and click on the Go button. Each NCBI database is listed below the search box. After a search, each database will have a number next to it. This is the number of hits on your search term in each database.

If you want to find a nucleotide or amino acid sequence corresponding to a particular gene or protein, the Protein, Nucleotide, and Gene databases are among the most useful. Look for hits in these databases following your general search and click on the database name to see the results list or more information on that particular database. You may also wish to consult Gene FAQs or Nucleotide and Protein FAQs.

Frequently asked Questions:

How can I find coding nucleotides using the NCBI databases? 
The Nucleotide database contains genome, gene, and transcript sequences from a variety of sources. To obtain coding sequence it may be easier to search the Gene database. If you search in the Gene database and click on an entry, you can scroll down to a RefSeq section that will provide accession numbers for nucleotide (as well as protein) sequences.  Clicking on the nucleotide accession number will bring you to a sequence record.

What are accession numbers? 
Accession numbers are identifier numbers that are attached to each sequence file when it was entered into the database. Accession numbers have different formats depending on the type of entry.  The formats “NP_(6 digits” and “NM_(6 digits)” refer to protein or nucleotide sequence, respectively from the RefSeq database.  This is a collection of curated, non-redundant genomic DNA, transcript (RNA), and protein sequences produced by NCBI.

How do I interpret the sequence records tied to RefSeq entries that I find through searching the Protein or Gene databases?
This file format is a rich source of information. Some items of note:

    • the ACCESSION number and DEFINITION are at the top of the record
    • the SOURCE tells us the organism from which the sequence data has been taken
    • the REFERENCE section, with literature related to sequencing and characterization
    • the FEATURES list, which may include conserved domains, regulatory or binding sites, etc.  Features are ordered from the 5’ end of a nucleotide sequence or the amino terminus of a protein sequence.  The ‘CDS’ link may be especially helpful in defining the regions of an entry that code for functional product.
    • the sequence itself may be found at the bottom of the entry. 

Additional help is provided on the Entrez website.