Medical Image Tagger
Semi-automatic biomedical image annotation tool using NLP and medical ontologies.
Gitee Recommended Open Source 科蓝杯 Silver Award
Introduction
Medical Image Tagger (MITagger) is a collaborative web-based annotation platform developed with Harvard Medical School to accelerate semi-automatic tagging of biomedical figures. By combining NLP-extracted figure legends with the BioPortal Annotator REST API, the system recommends structured tags from major medical ontologies, reducing the manual effort of medical annotators by 20%.
The platform later evolved into a broader medical big data initiative, ShuYi Technology (数翼科技), which won the Silver Award at the “科蓝杯” 9th HFUT Student Entrepreneurship Competition in 2014.
I was responsible for the backend architecture and database schema design.
Architecture
The backend is a Django application backed by MySQL. Tag recommendations are generated asynchronously via a multi_thread_recommend worker pool so the web server remains responsive during long-running API calls. Elasticsearch powers full-text tag search, and NLTK handles figure-legend tokenization before text is submitted to the BioPortal Annotator.
Medical images are ingested as TIFF files, converted to JPEG for browser display using Wand (backed by ImageMagick) and libtiff.
Ontology-Based Tag Taxonomy
Each image is linked to a scientific article (by DOI) and carries a figure legend. The BioPortal Annotator scans that legend text and maps matched spans to concepts in six medical axes:
| Axis | Ontology |
|---|---|
| Anatomy | SNOMEDCT |
| Disease & Symptoms | SNOMEDCT |
| Genetics, Proteins & Processes | Gene Ontology (GO) |
| Imaging | NCIT |
| Medical Intervention | SNOMEDCT |
| Pharmaceutical Agent | RXNORM |
Matched concepts are persisted as Recommendations. Annotators review the candidates, accepting or rejecting each tag; accepted tags are stored in ImageTag and exported to CSV/XML.
Annotation Workflow
- An administrator uploads articles (by DOI) and their associated images.
- The system assigns images to annotators and assigns a reviewer for each annotator via
ImageAssignments. - On opening an image, the figure legend is sent to the BioPortal Annotator in a background thread. Returned concepts are cached in
Recommendationsso repeat visits skip the remote call. - The annotator selects or deselects each suggested tag and may add free-form tags.
- A reviewer audits completed annotations before they are finalized.
Technical Evolution
The first version of MITagger was built in Java. After collecting nearly one million labeled samples from the live web service, the team added an offline batch-processing mode for bulk annotation. The second version was rewritten in Python/Django to accelerate iteration and to make it easier to integrate the NLP and ontology recommendation pipeline.

The Disqus comment system is loading ...
If the message does not appear, please check your Disqus configuration.