NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Bioinformatics consists of a computational approach to biomedical information management and analysis. It is being used increasingly as a component of research within both academic and industrial settings and is becoming integrated into both undergraduate and postgraduate curricula. The new generation of biology graduates is emerging with experience in using bioinformatics resources and, in some cases, programming skills.
The National Center for Biotechnology Information ( NCBI) is one of the world's premier Web sites for biomedical and bioinformatics research. Based within the National Library of Medicine at the National Institutes of Health, USA, the NCBI hosts many databases used by biomedical and research professionals. The services include PubMed, the bibliographic database; GenBank, the nucleotide sequence database; and the BLAST algorithm for sequence comparison, among many others.
Although each NCBI resource has online help documentation associated with it, there is no cohesive approach to describing the databases and search engines, nor any significant information on how the databases work or how they can be leveraged, for bioinformatics research on a larger scale. The NCBI Handbook is designed to address this information gap.
All of our users know how to execute a straightforward PubMed or BLAST search. However, feedback from help desk personnel and booth staff at scientific meetings suggests that people often want to know how to use our resources in a more sophisticated manner and are frequently unaware of less well-known databases that might be helpful to them. The intended audience for The NCBI Handbook is, therefore, the growing number of scientists and students who would like a more in-depth guide to NCBI resources—powerusers and aspiring powerusers.
The NCBI Handbook is focused on the relatively stable information about each resource; it is not a point-and-click user guide (this type of information can be found in the online help documents, referred to frequently but not repeated, in the Handbook). Each chapter is devoted to one service; after a brief overview on using the resource, there is an account of how the resource works, including topics such as how data are included in a database, database design, query processing, and how the different resources relate to each other. For example, the BLAST chapter briefly describes what to use BLAST for, the various varieties of the BLAST algorithm, and BLAST statistics, before discussing output formats, query processing, and tips for setting up a BLAST database. A certain amount of biological knowledge is assumed.
The online content will be updated when necessary, although major changes are not expected to occur more than once every few years. (For example, PubMed query processing does not change dramatically year after year.) We hope that The NCBI Handbook will provide a valuable reference for anyone who wants to use our resources more effectively.
Part 1. The Databases
Chapter 1. GenBank: The Nucleotide Sequence Database
Ilene Mizrachi.Created: October 9, 2002; Last Update: August 22, 2007.
- International Collaboration
- Confidentiality of Data
- Direct Submissions
- Bulk Submissions: High-Throughput Genomic Sequence (HTGS)
- Whole Genome Shotgun Sequences (WGS)
- Bulk Submissions: EST, STS, and GSS
- Bulk Submissions: HTC and FLIC
- Submission Tools
- Sequence Data Flow and Processing: From Laboratory to GenBank
- Microbial Genomes
- Third Party Annotation (TPA) Sequence Database
- Appendix: GenBank, RefSeq, TPA and UniProt: What’s in a Name?
Chapter 2. PubMed: The Bibliographic Database
Kathi Canese, Jennifer Jentsch, and Carol Myers.Created: October 9, 2002; Last Update: August 13, 2003.
Chapter 3. Macromolecular Structure Databases
Eric Sayers and Steve Bryant.Created: October 9, 2002; Last Update: August 13, 2003.
- Content of the Molecular Modeling Database (MMDB)
- Content of the Conserved Domain Database (CDD)
- Finding and Viewing Structures
- Finding and Viewing Structure Neighbors
- Finding and Viewing Conserved Domains
- Finding and Viewing Proteins with Similar Domain Architectures
- Links Between Structure and Other Resources
- Saving Output from Database Searches
- Frequently Asked Questions
Chapter 4. The Taxonomy Project
Scott Federhen.Created: October 9, 2002; Last Update: August 13, 2003.
- Adding to the Taxonomy Database
- Using the Taxonomy Browser
- The Taxonomy Database: TAXON
- Nomenclature Issues
- Taxonomy in Entrez: A Quick Tour
- The Common Tree Viewer
- Indexing Taxonomy in Entrez
- The Taxonomy Statistics Page
- Other Relevant References
- NCBI Taxonomists
- Contact Us
- Appendix 1. TAXON nametypes
- Appendix 2. Functional classes of TAXON scientific names
- Appendix 3. Other TAXON data types
Chapter 5. The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation
Adrienne Kitts and Stephen Sherry.Created: October 9, 2002; Last Update: February 2, 2011.
- Searching dbSNP
- Submitted Content
- Computed Content (The dbSNP Build Cycle)
- dbSNP Resource Integration
- How to Create a Local Copy of dbSNP
- Appendix 1. dbSNP report formats.
- Appendix 2. Rules and methodology for mapping
- Appendix 3 Alignment profiling function
- Appendix 4. 3D structure neighbor analysis.
Chapter 6. The Gene Expression Omnibus (GEO): A Gene Expression and Hybridization Repository
Ron Edgar and Alex Lash.Created: October 9, 2002; Last Update: August 13, 2003.
Chapter 7. Online Mendelian Inheritance in Man (OMIM): A Directory of Human Genes and Genetic Disorders
Donna Maglott, Joanna S. Amberger, and Ada Hamosh.Created: October 9, 2002.
Chapter 8. The NCBI BookShelf: Searchable Biomedical Books
Bart Trawick, Jeff Beck, and Jo McEntyre.Created: October 9, 2002; Last Update: August 13, 2003.
Chapter 9. PubMed Central (PMC): An Archive for Literature from Life Sciences Journals
Jeff Beck and Ed Sequeira.Created: October 9, 2002; Last Update: August 13, 2003.
Chapter 10. The SKY/CGH Database for Spectral Karyotyping and Comparative Genomic Hybridization Data
Turid Knutsen, Vasuki Gobu, Rodger Knaus, Thomas Ried, and Karl Sirotkin.Created: October 9, 2002; Last Update: August 13, 2003.
Chapter 11. The Major Histocompatibility Complex Database, dbMHC
Adrienne Kitts, Michael Feolo, and Wolfgang Helmberg.Created: May 27, 2003; Last Update: August 13, 2003.
- Chapter 1. GenBank: The Nucleotide Sequence Database
Part 2. Data Flow and Processing
Chapter 12. Sequin: A Sequence Submission and Editing Tool
Jonathan Kans.Created: October 9, 2002; Last Update: August 13, 2003.
Chapter 13. The Processing of Biological Sequence Data at NCBI
Karl Sirotkin, Tatiana Tatusova, Eugene Yaschenko, and Mark Cavanaugh.Created: October 9, 2002; Last Update: March 14, 2006.
Chapter 14. Genome Assembly and Annotation Process
Paul Kitts.Created: October 9, 2002; Last Update: August 13, 2003.
- Overview of the Genome Assembly and Annotation Process
- The Input Data
- Preparation of the Input Sequences
- Alignment of Sequences to the Input Genomic Sequences
- Genome Assembly
- Annotation of Genes
- Annotation of Other Features
- Product Data Sets
- Production of Maps That Display Genome Features
- Public Release of Assembly and Models
- Integration with Other Resources
- Chapter 12. Sequin: A Sequence Submission and Editing Tool
Part 3. Querying and Linking the Data
Chapter 15. The Entrez Search and Retrieval System
Jim Ostell.Created: October 9, 2002; Last Update: August 13, 2003.
Chapter 16. The BLAST Sequence Analysis Tool
Tom Madden.Created: October 9, 2002; Last Update: August 13, 2003.
- How BLAST Works: The Basics
- BLAST Scores and Statistics
- BLAST Output: 1. The Traditional Report
- BLAST Output: 2. The Hit Table
- BLAST Output: 3. Structured Output
- BLAST Code
- Appendix 1. FASTA identifiers
- Appendix 2. Readdb API
- Appendix 3. Excerpt from a demonstration program doblast.c
- Appendix 4. A function to print a view of a SeqAlign: MySeqAlignPrint
Chapter 17. LinkOut: Linking to External Resources from Entrez Databases
Kathy Kwan.Created: October 9, 2002; Last Update: August 13, 2003.
Chapter 18. The Reference Sequence (RefSeq) Database
Kim Pruitt, Garth Brown, Tatiana Tatusova, and Donna Maglott.Created: October 9, 2002; Last Update: April 6, 2012.
Chapter 19. Gene: A Directory of Genes
Donna Maglott, Kim Pruitt, and Tatiana Tatusova.Created: March 3, 2005; Last Update: December 12, 2011.
Chapter 20. Using the Map Viewer to Explore Genomes
Susan M. Dombrowski and Donna Maglott.Created: October 9, 2002; Last Update: August 13, 2003.
Chapter 21. UniGene: A Unified View of the Transcriptome
Joan U. Pontius, Lukas Wagner, and Gregory D. Schuler.Created: October 9, 2002; Last Update: August 13, 2003.
Chapter 22. The Clusters of Orthologous Groups (COGs) Database: Phylogenetic Classification of Proteins from Complete Genomes
Eugene V. Koonin.Created: October 9, 2002; Last Update: August 13, 2003.
- Chapter 15. The Entrez Search and Retrieval System
Part 4. User Support
Chapter 23. User Services: Helping You Find Your Way
David Wheeler and Barbara Rapp.Created: October 9, 2002; Last Update: August 13, 2003.
Chapter 24. Exercises: Using Map Viewer
David Wheeler, Kim Pruitt, Donna Maglott, Susan Dombrowski, and Andrei Gabrelian.Created: November 4, 2002; Last Update: August 13, 2003.
- 1. How Do I Obtain the Genomic Sequence around My Gene of Interest?
- 2. If I Have Physical and/or Genetic Mapping Data, How Do I Use the Map Viewer to Find a Candidate Disease Gene in That Region?
- 3. How Can I Find and Display a Gene with the Map Viewer?
- 4. How Can I Analyze a Gene Using the Map Viewer?
- 5. How Can I Create My Own Transcript Models with the Map Viewer?
- 6. Using the Mouse Map Viewer
- 7. How Can I Find Members of a Gene Family Using the Map Viewer?
- 8. How Can I Find Genes Encoding a Protein Domain Using the Map Viewer?
- Chapter 23. User Services: Helping You Find Your Way
Your browsing activity is empty.
Activity recording is turned off.
Turn recording back onSee more...