Automated indexing using NLM's Medical Text Indexer (MTI) compared to human indexing in Medline: a pilot study




Automated indexing, human indexers, information retrieval, Medical Text Indexer (MTI), Medline, PubMed


Objective: In 2002, the National Library of Medicine (NLM) introduced semi-automated indexing of Medline using the Medical Text Indexer (MTI). In 2021, NLM announced that it would fully automate its indexing in Medline with an improved MTI by mid-2022. This pilot study examines indexing using a sample of records in Medline from 2000, and how an early, public version of MTI's outputs compares to records created by human indexers.

Methods: This pilot study examines twenty Medline records from 2000, a year before the MTI was introduced as a MeSH term recommender. We identified twenty higher- and lower-impact biomedical journals based on Journal Impact Factor (JIF) and examined the indexing of papers by feeding their PubMed records into the Interactive MTI tool.

Results: In the sample, we found key differences between automated and human-indexed Medline records: MTI assigned more terms and used them more accurately for citations in the higher JIF group, and MTI tended to rank the Male check tag more highly than the Female check tag and to omit Aged check tags. Sometimes MTI chose more specific terms than human indexers but was inconsistent in applying specificity principles.

Conclusion: NLM’s transition to fully automated indexing of the biomedical literature could introduce or perpetuate inconsistencies and biases in Medline. Librarians and searchers should assess changes to index terms, and their impact on PubMed’s mapping features for a range of topics. Future research should evaluate automated indexing as it pertains to finding clinical information effectively, and in performing systematic searches.

Author Biographies

Eileen Chen, University of British Columbia

Student, School of Information

Julia Bullard, University of British Columbia

Assistant Professor, School of Information

Dean Giustini, University of British Columbia

Librarian, UBC Biomedical Branch


National Library of Medicine. Frequently Asked Questions about Indexing for MEDLINE [Internet]. [rev. 10 Jan 2022; cited 27 Jun 2022] <>

NISO Z39.4 Working Group. ANSI/NISO Z39.4-2021, Criteria for Indexes [Internet]. NISO; 2021 [cited 27 Jun 2022]. (Available from: <>)

Anderson JD, Pérez-Carballo J. The nature of indexing: how humans and machines analyze messages and texts for retrieval. Part I: Research, and the nature of human indexing. Inf Process Manag. 2001;37(2):231–54. DOI: 10.1016/S0306-4573(00)00026-1

Ehrensberger-Dow M, Massey G. Constraints on creativity: The case of CAT tools. In Proceedings of Translata II: Translation Studies and Translation Practice; 2014; Innsbruck, Austria. (Available from: <>)

Névéol A, Shooshan SE, Humphrey SM, Mork JG, Aronson AR. A recent advance in the automatic indexing of the biomedical literature. J Biomed Inform. 2009 Oct;42(5):814–23. DOI: 10.1016/j.jbi.2008.12.007

Wintner S. Translationese: Between Human and Machine Translation. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Tutorial Abstracts [Internet]. Osaka, Japan: The COLING 2016 Organizing Committee; 2016 [cited 20 Jun 2022]. p. 18–9. (Available from: <>)

Todorova ET. The impact of CAT tools on the creativity of students of Translation and Interpreting [Internet]. Newcastle University; 2020 [cited 21 Apr 2022]. (Available from: <>)

Jarrahi MH. In the Age of the Smart Artificial Intelligence: AI’s Dual Capacities for Automating and Informating Work. Bus Inf Rev. 2 Oct 2019. DOI: 10.1177/0266382119883999

Ruiz ME, Aronson A. User-centered Evaluation of the Medical Text Indexing (MTI) System. National Library of Medicine; 2007. (Available from: <>)

Murphy LS, Reinsch S, Najm WI, Dickerson VM, Seffinger MA, Adams A, Mishra SI. Searching biomedical databases on complementary medicine: the use of controlled vocabulary among authors, indexers and investigators. BMC Complement Altern Med. 2003;3:3. DOI: 10.1186/1472-6882-3-3

Portaluppi F. Consistency and Accuracy of the Medical Subject Headings® Thesaurus for Electronic Indexing and Retrieval of Chronobiologic References. Chronobiol Int. 2007 Jan 1;24(6):1213–29. DOI: 10.1080/07420520701791570

Wieland S, Dickersin K. Selective exposure reporting and Medline indexing limited the search sensitivity for observational studies of the adverse effects of oral contraceptives. J Clin Epidemiol. 2005;58(6):560–7. DOI: 10.1016/j.jclinepi.2004.11.018

Mork J, Aronson A, Demner-Fushman D. 12 years on - Is the NLM medical text indexer still useful and relevant? J Biomed Semant. 2017;8(1):8. DOI: 10.1186/s13326-017-0113-5

National Library of Medicine. Structured Abstracts [Internet]. U.S. National Library of Medicine. [rev. 8 Aug 2018; cited 16 May 2022].<>

National Library of Medicine. Medical Text Indexer Processing Flow [Internet]. 2006. (Available from: <>)

National Library of Medicine. Medical Text Indexer Output Help Information [Internet]. [cited 27 Jun 2022]. <>

National Library of Medicine. MeSH Browser [Internet]. [rev. 29 Apr 2020; cited 18 Apr 2022]. <>

Holdcroft A. Gender bias in research: how does it affect evidence based medicine? J R Soc Med. 2007;100(1):2–3. DOI: 10.1258/jrsm.100.1.2

Demner-Fushman D, Mork J. A Report to the Board of Scientific Counselors April 2016 [Internet]. 2016 Apr. (Available from: <>)

Letter to NLM about MeSH [Internet]. Google Docs. 2022 [cited 22 Jun 2022]. <>

Paulus FM, Cruz N, Krach S. The Impact Factor Fallacy. Front Psychol. 2018 Aug 20;9:1487. DOI: 10.3389/fpsyg.2018.01487

Lancaster FW. Indexing and abstracting in theory and practice. Illinois: Univ. of Illinois; 1991.





Original Investigation