Author: J. Caleb Goodwin, M.S., B.S. (2015)
Primary advisor: Elmer Bernstam, MD
Committee members: Todd R. Johnson, PhD; Trevor Cohen, PhD; Thomas C. Rindflesch, PhD
PhD thesis, The University of Texas School of Health Information Sciences at Houston.
Thanks to science and technology, access to factual knowledge of all kinds is rising exponentially while dropping in unit cost... [we are] are drowning in information, while starving for wisdom. (E. O. Wilson, 1992)
Clinicians and researchers can no longer keep up-to-date with literature manually, even in specialized domains. This problem of extracting knowledge from the rapidly created literature was declared as precluding the existence of experts in medical sub-disciplines in the appropriately titled article “On the Impossibility of Being an Expert” (Fraser, 2010). The authors argued that expertise could theoretically be obtained just as it was time to retire. One method to help cope with the increasing information overload is information retrieval (IR) systems that help users identify relevant information within large document collections. IR systems become increasingly important as the volume of scientific literature increases.
The National Library of Medicine’s (NLM) PubMed is the most widely used IR tool for accessing the MEDLINE database of biomedical literature (Falagas, Pitsouni, Malietzis, & Pappas, 2008). PubMed provides access to over 19 million articles and processes over 1.5 billion queries a year (Islamaj Dogan, Murray, Neveol, & Lu, 2009). By default, PubMed ranks the results by reverse chronological order1. Reverse chronological order ranking is only sufficient if the user is seeking the most recent articles. Other information needs such as finding important articles are not well served by reverse chronological ranking. In addition, results sets returned from the PubMed IR system can be very large. For example, a query for “breast cancer” returns over 200,000 citations. Clearly this result set is too large for manual review. Ranking by importance or relevance could assist the user in finding articles that are relevant for their information need. In addition, users on average look at only the first ten results making ranking by relevance to the query a priority (Islamaj Dogan, et al., 2009).