Project

Medical Image Tagger

Semi-automatic biomedical image annotation tool using NLP and medical ontologies.

Gitee Recommended Open Source 科蓝杯 Silver Award

Introduction

Medical Image Tagger (MITagger) is a collaborative web-based annotation platform developed with Harvard Medical School to accelerate semi-automatic tagging of biomedical figures. By combining NLP-extracted figure legends with the BioPortal Annotator REST API, the system recommends structured tags from major medical ontologies, reducing the manual effort of medical annotators by 20%.

The platform later evolved into a broader medical big data initiative, ShuYi Technology (数翼科技), which won the Silver Award at the “科蓝杯” 9th HFUT Student Entrepreneurship Competition in 2014.

I was responsible for the backend architecture and database schema design.

Architecture

The backend is a Django application backed by MySQL. Tag recommendations are generated asynchronously via a multi_thread_recommend worker pool so the web server remains responsive during long-running API calls. Elasticsearch powers full-text tag search, and NLTK handles figure-legend tokenization before text is submitted to the BioPortal Annotator.

Medical images are ingested as TIFF files, converted to JPEG for browser display using Wand (backed by ImageMagick) and libtiff.

Ontology-Based Tag Taxonomy

Each image is linked to a scientific article (by DOI) and carries a figure legend. The BioPortal Annotator scans that legend text and maps matched spans to concepts in six medical axes:

AxisOntology
AnatomySNOMEDCT
Disease & SymptomsSNOMEDCT
Genetics, Proteins & ProcessesGene Ontology (GO)
ImagingNCIT
Medical InterventionSNOMEDCT
Pharmaceutical AgentRXNORM

Matched concepts are persisted as Recommendations. Annotators review the candidates, accepting or rejecting each tag; accepted tags are stored in ImageTag and exported to CSV/XML.

Annotation Workflow
  1. An administrator uploads articles (by DOI) and their associated images.
  2. The system assigns images to annotators and assigns a reviewer for each annotator via ImageAssignments.
  3. On opening an image, the figure legend is sent to the BioPortal Annotator in a background thread. Returned concepts are cached in Recommendations so repeat visits skip the remote call.
  4. The annotator selects or deselects each suggested tag and may add free-form tags.
  5. A reviewer audits completed annotations before they are finalized.
Technical Evolution

The first version of MITagger was built in Java. After collecting nearly one million labeled samples from the live web service, the team added an offline batch-processing mode for bulk annotation. The second version was rewritten in Python/Django to accelerate iteration and to make it easier to integrate the NLP and ontology recommendation pipeline.

Screenshots