Try as I might, I couldn’t make this post pithy. Sorry.
Based on some in-depth questions I’ve heard at the Reference Desk this month, this is a short long post on the structure and organization of medical information developed by U.S. agencies which collect, organize, share and otherwise distribute biological information for the purposes of basic science, clinical or translational research.
Graduate medical, dental or PhD students already search MEDLINE and other literature sources from National Center for Biotechnology Information (NCBI) but the purpose of this post is to illustrate ways to search these resources more effectively, or at least more time-efficiently. If the first part of the post is too basic for you, please shoot down to the second section.
National Library of Medicine (NLM) and National Institutes of Health (NIH) are the U.S. agencies responsible for managing and administrating the NCBI, whose stated mission is to:
“… develop new information technologies to aid in the understanding of fundamental molecular and genetic processes that control health and disease. More specifically, the NCBI has been charged with creating automated systems for storing and analyzing knowledge about molecular biology, biochemistry, and genetics; facilitating the use of such databases and software by the research and medical community; coordinating efforts to gather biotechnology information both nationally and internationally; and performing research into advanced methods of computer-based information processing for analyzing the structure and function of biologically important molecules “.
- Click here and here to see the collections of information resources accessible via the NCBI website.
- Please take a look at this nice visualization of digitally- interconnected resources available from NCBI servers.
Entrez – also called the “life sciences search engine” – was designed by NCBI staff as a means to enable users to search across multiple databases or indexes to retrieve integrated search results from sequence, mapping, taxonomy and structural data for both human and non-human subjects.
Below is screenshot showing results from a search done recently on Entrez for data from NCBI servers on Protein 53, a human transcription factor:
Wow – That search retrieved almost too much information! What if your search requirements don’t include the need for data about genomics or DNA sequencing?
Then consider the list of open-access Literature Databases available from NCBI. A few of the best are highlighted here:
Pithy Librarian says: “Try searching MEDLINE first.”
Library User responds: “Do you mean search PubMed?“
MEDLINE, another NCBI database, is a major component of PubMed* but there is more there than just that database. Happily, librarians from NLM have written a good MEDLINE FAQ page that explains those details.
MEDLINE is a medical literature index containing 19,000,000 records indexing articles from 5,200 international biomedical journals (in 28 languages), and covering the time period of 1948 through 2009. Each year, approximately 600,000 new records are added to the database. In other words, it’s a big database to search–but not as big as Scopus, a real Godzilla of a database, weighing in at 38,000,000 records. The printed precursor to MEDLINE was Index Medicus, which is no longer being produced.
A key concept to remember when searching MEDLINE is that the database is indexed using what librarians call a “controlled vocabulary” – officially called the Medical Subject Headings List (or MeSH), a standardized thesaurus of 300,000+ terms used to electronically index each new article.
Think of MeSH terms as “tags“… similar to tagging your photos in Flickr.
How does these tags get into MEDLINE? Actual (i.e., human) medical librarians working at the National Library of Medicine read and digitally assign appropriate MeSH terms to describe the contents and scope of individual journal articles. These information scientists are trained indexers and generally have other advanced degrees in biology, molecular genetics and so on which enable them to “parse” the mechanics of what the published article is about.
The majority of MEDLINE citations are tagged with 8 to 12 MeSH terms**. Because of those hand-crafted tags attached electronically to each journal article, when we search for a specific MeSH term, those records are retrieved into our citation list. It is a scientific way to search. It is definitely not Googling.
A different way of constructing a precise search statement is to select MeSH terms in combination with the list of clinical subheadings which combine with MeSH terms to narrow a search in an elegant way. Clinical qualifiers are defined by NLM as:
” … 83 topical qualifiers used for indexing and cataloging in conjunction with [MeSH] descriptors. Qualifiers afford a convenient means of grouping together those citations which are concerned with a particular aspect of a subject. Not every qualifier is suitable for use with every subject heading…. Subheadings are linked to the full record in the MeSH Browser.”
Following is a screenshot of the MeSH page showing the list of qualifiers which can be combined with the MeSH term “Pancreatic Neoplasms“:
Sailors in old Hollywood movies sometimes were heard to yell, “Land Ho!”
That is what I thought after scanning the 201o New MeSH Headings List recently released, and seeing a few new terms that medical librarians really like – such as this one:
The list of new MeSH Descriptors is always interesting to browse. National Library of Medicine states that currently there are 25,186 descriptors in the 2009 MeSH List. Read the Introduction to 2010 MeSH List here.
Many excellent handouts and links to tutorials about using information resources from NCBI can be viewed at this link, and thanks to the librarians at the National Networks of Libraries of Medicine-Greater Midwest Region for creating this page.
Finally, please note that there are many more resources on the NCBI server than those explained above. Gene libraries, DNA, RNA, proteins analysis or sequencing are very much out of the scope of my expertise.
One example is the link shown below – for DNA & RNA Resources – as one place to start exploration of genes, protein and sequence analysis (screenshot below):
* PubMed has undergone recent design changes this month, although the “old” and the “new” PubMed versions will co-exist for the present.
** MEDLINE is a database comprised of 19,000,000 individual records. Indexing a new citation requires careful attention to detail; tagging (indexing) for MEDLINE is never done by bots to create links based on the number of hits of a given term. Each record is considered and evaluated by hand, which accounts for the indexing backlog (i.e., the difference between the moment when a new journal citation is delivered from the publisher to NLM and is put into the database, versus the period of time that it takes for that individual citation to show up in MEDLINE with a complete set of tags). The whole indexing process generally is completed at NLM within 45-60 days.