Recognition and Disambiguation of Clinical Abbreviations
A pervasive characteristic of clinical texts is their frequent use of abbreviations, which brings additional challenges for clinical Natural Language Processing (NLP). Recognition and identification of abbreviations is an important, challenging task in clinical NLP. Correct identification of clinical abbreviations requires up-to-date databases of abbreviations and their possible meanings (sense inventory) and efficient disambiguation systems that can accurately determine the right meaning of an abbreviation at a given context.
This funded project (Real-time Disambiguation of Abbreviations in Clinical Notes, NLM R01LM010681) is to develop a framework that can 1) recognize abbreviations from clinical text; 2) build sense inventories of clinical abbreviations; 3) disambiguate abbreviations based on context; and 4) real-time encode abbreviations to remove ambiguity at the entry time. Here, we collected useful clinical abbreviation resources, publications and systems/frameworks in one resource page to facilitate clinical abbreviation research.
- Abbreviation databases
- LRABR from UMLS
- ADAM from Wei Zhou et al.
- 12,130 pathology abbreviations from Berman JJ.
- Sense distribution database
- 448 abbreviations with sense frequency distribution from admission notes
- Abbreviation sense inventory from Vanderbilt Discharge Summary notes
- Abbreviation sense inventory from Vanderbilt Clinic Visit notes
The CARD framework is an open-source framework for Clinical Abbreviation Recognition and Disambiguation (CARD). The CARD framework composed of three major components: 1) the clinical abbreviation recognition module; 2) the sense inventory module; 3) abbreviation sense disambiguation module. A wrapper module was developed for integrating the CARD system with existing clinical NLP systems.
Please follow the links below to download each component:
- Clinical abbreviation recognition module
- Sense inventory module
- Abbreviation sense disambiguation module
Abbreviation resource websites
2011, AMIA: Detecting Abbreviations Using Machine Learning Methods
2012, AMIA: A Comparative Study of Current Clinical NLP Systems on Handling
2013, AMIA: Building a framework for handling clinical abbreviations – a long
journey of understanding shortened words
2013, AMIA: Word Sense Disambiguation of Clinical Abbreviations with Hyperdimensional Computing
- Wu Y, Rosenbloom ST, Denny JC, Miller RA, Mani S, Giuse DA, Xu H. Detecting abbreviations in discharge summaries using machine learning methods. AMIA Annu Symp Proc. 2011, 1541-9. [PMCID: PMC3243185]
- Xu H, AbdelRahman S, Jiang M, Fan JW, Huang Y. An Initial Study of Full Parsing of Clinical Text using the Stanford Parser. International Workshop on Biomedical and Health Informatics, IEEE Conference of Bioinformatics and Biomedicine (BIBM), 2011. [NO PMCID]
- Xu H, AbdelRahman S, Lu Y, Denny JC, Doan S. Applying semantic-based probabilistic context free grammar to medical language processing – a preliminary study on parsing medication sentences. J Biomed Inform 2011, 44(6): 1068-75. [PMCID: PMC3226929]
- Chen Y, Mani S, Xu H. Applying active learning to assertion classification of concepts in clinical text. J Biomed Inform 2012, 45(2): 265-272. [PMCID: PMC3306548]
- Jiang M, Chen Y, Liu M, Rosenbloom ST, Mani S, Denny JC, Xu H. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assoc. 2011, 18(5):601-6. [PMCID: PMC3168315]
- Xu H, Wu Y, Elhadad N, Stetson P, Friedman C. A New Clustering Algorithm for Detecting Rare Senses of Abbreviations in Clinical Notes. J Biomed Inform. 2012 45(6):1075-83. [PMCID pending]
- Xu H, Stetson PD, Friedman C. Combining Corpus-derived Sense Profiles with Estimated Frequency Information to Disambiguate Clinical Abbreviations. AMIA Annu Symp Proc. 2012. 1004-13.. [PMCID PMC3540457]
- Jiang M, Denny JC, Tang B, Cao H, Xu H. Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method. AMIA Annu Symp Proc. 2012. 409-16-13, [PMCID PMC3540581]
- Wu Y, Levy MA, Micheel CM, Yeh P, Tang B, Cantrell MJ, Cooreman SM, Xu H. Identifying the status of genetic lesions in cancer clinical trial documents using machine learning. BMC Genomics, 2012, 13 Suppl 8:S21 [PMCID PMC3535695]
- Wu Y, Rosenbloom ST, Denny JC, Miller RA, Giuse DA, Xu H. A comparative study of current clinical natural language processing systems on handling abbreviations in discharge summaries. AMIA Annu Symp Proc. 2012. 997-1003. [PMCID PMC3540461]
- Doan S, Collier N, Xu H, Pham HD, and Tu MP. Recognition of medication information from discharge summaries using ensembles of classifiers. BMC Medical Informatics and Decision Making. 2012, 12(1):36. [PMID: 22564405]
- Moon S, Berster BT, Xu H, Cohen T. Word sense disambiguation of clinical abbreviations with hyperdimensional computing. AMIA Annu Symp Proc. 2013. [PMCID pending]
- Fan JW, Yang EW, Jiang M, Prasad R, Loomis RM, Zisook DS, Denny JC, Xu H, and Huang Yang. Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences. J Am Med Inform Assoc. 2013, 20(6):1168-77 [PMCID:PMC3822122].
- Chen Y, Cao H, Mei Q, Zheng K, Xu H. Applying Active Learning to Supervised Word Sense Disambiguation in MEDLINE. J Am Med Inform Assoc. 2013, 20(5):1001-6. [PMCID: PMC3756255]
- Chen Y, Carroll RJ, Shah A, Eyler AE, Denny JC, Xu H. Applying active learning to high-throughput phenotyping algorithms for electronic health records data. J Am Med Inform Assoc. 2013, 20(e2):e253-9. [PMCID: PMC3861916]
- Tang B, Wu Y, Jiang M, Chen Y, Denny JC, Xu H. A hybrid system for temporal information extraction from clinical text. J Am Med Inform Assoc. 2013, 20(5):828-35. [PMCID: PMC3756274]
- Wei W, Cronin RM, Xu H, Lasko TA, Bastarache L, Denny JC. Development and evaluation of an ensemble resource linking medications to their indications. J Am Med Inform Assoc. 2013, 20(5):954-61. [PMCID: PMC3756263]
- Tang B, Cao H, Wu Y, Jiang M, Xu H. Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features. BMC Medical Informatics and Decision Making 2013, 13(Suppl 1):S1. [PMCID: PMC3618243]
- Wu Y, Tang B, Jiang M, Moon S, Denny JC, Xu H. Clinical Acronym/Abbreviation Normalization using a Hybrid Approach. 2013. Proceedings of CLEF 2013 Evaluation Labs and Workshop.
- Tang B, Wu Y, Jiang M, and Xu H. A Machine Learning based System for Disorder Concept Extraction. 2013. Proceedings of CLEF 2013 Evaluation Labs and Workshop.
- Wu Y, Denny JC, Rosenbloom ST, Miller RA, Giuse D, Song M, Xu H. A Prototype Application for Real-time Recognition and Disambiguation of Clinical Abbreviations. ACM Seventh International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO), 2013, San Francisco, USA.
- Tang B, Wang X, Wu Y, Jiang M, Wang J, Xu H. Recognizing Chemical Entities in Biomedical Literature using Conditional Random Fields and Structured Support Vector Machines. BioCreative Challenge Evaluation Workshop 2013 vol. 2, 70.
- Wu Y, Denny JC, Rosenbloom ST, Miller RA, Giuse D, Song M, Xu H. A preliminary study of clinical abbreviation disambiguation in real time. BMC Medical Informatics and Decision Making, 2014, In Press