Bioinformatics and Functional Genomics

Chapter: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | App 1 | App 2


Chapter 4: Basic Local Alignment Search Tool (BLAST)


Web resources from Chapter 4
Website URL
Main BLAST page http://www.ncbi.nlm.nih.gov/BLAST/
FASTA at NCBI http://www.ncbi.nlm.nih.gov/BLAST/fasta.html.
Information on Databases at NCBI http://www.ncbi.nlm.nih.gov/blast/html/blastcgihelp.html
Conserved Domain Database http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml
Karlin-Altschul statistics http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html#head2
significant scores http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/rules.html
DNA Database of Japan http://www.ddbj.nig.ac.jp/

 

Tables

Table 4-1. Protein sequence databases that can be searched by standard BLAST searching (modified from http://www.ncbi.nlm.nih.gov/blast/html/blastcgihelp.html#protein_databases).
Database Description
Nr Non-redundant GenBank coding sequences & PDB & SwissProt & PIR & PRF
Month Sequence data released in the previous 30 days
Swissprot Most recent release from SwissProt
Drosophila Drosophila proteins from the Drosophila Genome Project (http://www.fruitfly.org)
S. cerevisiae Saccharomyces cerevisiae (yeast) proteins
Ecoli Escherichia coli proteins
Pdb Protein data bank at Brookhaven (http://www.rcsb.org/pdb/)
alu Translations of select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences. It is available by anonymous FTP from ncbi.nlm.nih.gov (under the/pub/jmc/alu directory).
 
Table 4-2. Nucleotide sequence databases that can be searched by standard BLAST searching (modified from http://www.ncbi.nlm.nih.gov/blast/html/blastcgihelp.html#nucleotide_databases).
Database Description
nr All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). No longer "non-redundant."
Month All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days.
Drosophila genome Drosophila genome provided by Celera and Berkeley Drosophila Genome Project (BDGP)(http://www.fruitfly.org/).
Dbest Database of GenBank+EMBL+DDBJ sequences from EST Divisions
Dbsts Database of GenBank+EMBL+DDBJ sequences from STS Divisions
Htgs Unfinished High Throughput Genomic Sequences
Gss Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences.
S. cerevisiae Yeast (Saccharomyces cerevisiae) genomic nucleotide sequences
E. coli Escherichia coli genomic nucleotide sequences
Pdb Sequences derived from the 3-dimensional structure from Brookhaven Protein Data Bank (http://www.rcsb.org/pdb/)
Vector Vector subset of GenBank(R), NCBI, in ftp://ncbi.nlm.nih.gov/blast/db/
Mito Database of mitochondrial sequences
alu Select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences (available by anonymous FTP from ncbi.nlm.nih.gov under the /pub/jmc/alu directory).
epd Eukaryotic Promotor Database (http://www.genome.ad.jp/dbget-bin/www_bfind?epd)

 

Table 4-3. The effect of changing the threshold values on a blastp search. These three searches were done using the NCBI blastcl3 program (NetBLAST) using retinol-binding protein 4 (NP_006735) as a query. When the threshold parameter (f) was changed to 5 there were over 2 billion hits and 589 million extensions, ultimately producing more gapped HSPs (146) than found with the higher threshold value of 17.
  f=11 (default) f=5 f=17
Number of sequences in database 1,046,476 1,046,476 1,046,476
Number of hits to database 129,839,417 2,200,945,350 12,002,487
Number of extensions 5,198,652 589,935,555 61,838
Number of successful extensions 8,377 13,145 1,117
Number of HSP's gapped 145 146 93
 
Table 4-4. The relationship of E values to P values in BLAST, using Equation 4-8. Small E values (0.05 or less) correspond closely to the P values.
E value P value
10 0.99995460
5 0.99326205
2 0.86466472
1 0.63212056
0.1 0.09516258
0.05 0.04877058
0.001 0.0009995
0.0001 0.0001000
 
 
 

Return to Contents