Skip to Content
SBMI Horizontal Logo

Data Accuracy in Medical Record Abstraction

Author: Merideth Nahm, MS (2010)

Primary Advisor: Jiajie Zhang, PhD

Committee Members: Todd R. Johnson, PhD; Constance M. Johnson PhD, RN; Jack W. Smith MD, PhD; Amy Franklin, PhD

PhD Thesis, The University of Texas School of Health Information Sciences at Houston.


Medical record abstraction is the process in which a human manually searches through a medical record
to identify data required for a secondary use. Abstraction involves some direct matching of information
found in the record to the data required, but also includes operations on the data such as categorizing,
coding, transforming, interpreting, summarizing, and calculating. The abstraction process results in a
summary of information about a patient for a specific secondary data use. Medical Record Abstraction
remains a primary mode of data collection in clinical research, quality improvement, performance
measurement, disease surveillance, and other secondary data uses. 
While hundreds of articles mention factors that may impact the accuracy of abstracted data, the
information in the literature until now has not been synthesized, and the majority of the work has been
done in the absence of a theoretical framework.
Information generation, collection, and representation are central to informatics. Generation, collection,
and representation impact data and information quality; in turn, data and information quality impact use. 
Medical Record Abstraction is about the interaction of humans with the processes, tools, representations,
and environment in which the abstraction occurs. In medical record abstraction, a human being is an
agent in the collection and transformation of data. That human-data-representation interaction is an
informatics problem that, until now, has not yet been addressed from an informatics perspective.
The work presented here was motivated by the lack of consensus and lack of evidence supporting
methods used in the collection and management of clinical research data and their impact on the quality
of the data.  This work began with a quantitative literature review and pooled analysis of data error rates
reported in the clinical trial and registry literature.  This first paper included in this compilation associated
medical record abstraction with the highest error rates of the data collection and processing methods
common in clinical research. Thus, data quality in medical record abstraction became the focus of further
The second paper in the compilation, a formal concept analysis of data quality, was necessary for further
investigation of data quality in medical record abstraction. The concept analysis clarified the
multidimensionality of data quality, and focused my work on the dimension of data accuracy, i.e.,
correctness of the data values. 
My study of data accuracy in medical record abstraction was initiated with a review and formal synthesis
of the medical record abstraction literature. Working with the literature helped identify appropriate
theoretical frameworks and led the way to a classification system for factors impacting the accuracy of
medical record abstraction. The factors impacting data accuracy in medical record abstraction reported in
the literature were assed (content validity) through a two cohort, four round Delphi process. The third
paper in this compilation presents the results of this work.
The fourth and final paper in this compilation investigates one factor, cognitive load, consistently
indicated in the literature as impacting data accuracy in medical record abstraction.  Representational
Analysis methodology was applied to asses the possibility that abstractor cognitive load during
abstraction reaches published limits of human cognition.  This work demonstrated that cognitive load
during abstraction from characteristics of the data elements alone, not only reaches, but in 9% of the data
elements, exceeds human cognitive limits. 
This work lays the groundwork for additional research and furthers both the science of informatics and
the clinical and translational research to which it is applied.