Skip to Content
SBMI Horizontal Logo

NLP Service Docker Image

Our team has built a NLP service coupled to a terminology service that is capable of extracting specific concepts from sentences and providing synonyms to the extracted concepts. There are two major components of this system:

1) NLP service: consists of two different solutions that are used to identify biomedical concepts from sentences: (a) general Medical Subject Headings concepts extracted using the existing MetaMap Lite system1 and (b) specific concepts such as diseases, chemicals, genes, and biological processes identified using in-house rule-based and machine learning-based NLP pipelines developed using CLAMP( Identified entities are mapped to Unified Medical Language System (UMLS) Concept Unique Identifiers.

2) Terminology service: The terminology server is based on Sci-Graph ( and Neo4j, via adoption of major ontologies such as Medical Subject Headings, SNOMED CT, Gene Ontology, Foundational Model of Anatomy, National Center for Biotechnology Information Taxonomy, and Hugo Gene Nomenclature. These different terminologies are integrated in the context of the UMLS Metathesaurus to obtain a unified terminology of related terms. The web service supports real-time concept and relationship (e.g., synonym and parent-child) identification, which is used along with NLP service.


Running the docker image : 

1. Load the docker image tar file to your environment :
     docker load --input <docker tar image full path>
2. See you image is loaded with a unique id in your environment :
     docker image ls
3. Run the image using it’s ID :
     docker run -dit -p 8080:8080 -p 9000:9000 <image ID> 
NLP Service Docker Image Download


1. Demner-Fushman D, Rogers WJ, Aronson AR. MetaMap Lite: an evaluation of a new Java implementation of MetaMap. J Am Med Inform Assoc. 2017;24(4):841–44.

2. Soysal, E., Wang, J., Jiang, M., Wu, Y., Pakhomov, S., Liu, H., & Xu, H. (2017). CLAMP–a toolkit for efficiently building customized clinical natural language processing pipelines. Journal of the American Medical Informatics Association, 25(3), 331-336. 

3. Chen X, Gururaj AE, Ozyurt B, Liu R, Soysal E, Cohen T, Tiryaki F, Li Y, Zong N, Jiang M, Rogith D, Salimi M, Kim HE, Rocca-Serra P, Gonzalez-Beltran A, Farcas  C, Johnson T, Margolis R, Alter G, Sansone SA, Fore IM, Ohno-Machado L, Grethe JS, Xu H. DataMed - an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc. 2018 Jan 13. doi: 10.1093/jamia/ocx121. [Epub ahead of print] PubMed PMID: 29346583.