Sometimes the apple does fall far from the tree: a case study on automatic indexing precision errors in PubMed

Authors

DOI:

https://doi.org/10.5195/jmla.2025.2110

Keywords:

Abstract and Indexing, MEDLINE, automatic indexing, PubMed, Medical subject heading

Abstract

Objective: This case study identifies the presence and prevalence of precision indexing errors in a subset of automatically indexed MEDLINE records in PubMed (specifically, all MEDLINE records automatically indexed with the MeSH term Malus, the genus name for apple trees). In short, how well does automatic indexing compare [figurative] apples to [literal] apples? 

Methods: 1,705 MEDLINE records automatically indexed with the MeSH term Malus underwent title/abstract and full text screening to determine whether they were correctly indexed (i.e., the records were about Malus, meaning they discussed the literal fruit or tree) or incorrectly indexed (i.e., they were not about Malus, meaning they did not discuss the literal fruit or tree). The context and type of indexing error were documented for each erroneously indexed record.

Results: 135 (7.9%) records were incorrectly indexed with the MeSH term Malus. The most common indexing error was due to the word "apple" being used in similes, metaphors, and idioms (80, or 59.2%), with the next most common error being due to "apple" being present in a name or term (50, or 37%). Additional indexing errors were attributed to the use of "apple" in acronyms, and, in one case, a reference to Sir Isaac Newton.

Conclusion: As indicated by this study's findings, automatic indexing can commit errors when indexing records that have words with non-literal or alternative meanings in their titles or abstracts. Librarians should be mindful of the existence of automatic indexing errors, and instruct authors on how best to ameliorate the effects of them within their own manuscripts.

References

MEDLINE Overview: National Library of Medicine; 2024 [cited 2024]. Available from: https://www.nlm.nih.gov/medline/medline_overview.html.

Journal Selection for MEDLINE: National Library of Medicine; 2024 [cited 2024]. Available from: https://www.nlm.nih.gov/medline/medline_journal_selection.html.

PubMed Overview: National Library of Medicine; 2023 [cited 2024]. Available from: https://pubmed.ncbi.nlm.nih.gov/about/.

Welcome to Medical Subject Headings: National Library of Medicine; 2024 [cited 2024]. Available from: https://www.nlm.nih.gov/mesh/meshhome.html.

Medical Subject Headings: National Library of Medicine; 2023 [cited 2024]. Available from: https://www.nlm.nih.gov/mesh/intro_preface.html.

Use of MeSH Indexing: National Library of Medicine; 2023 [cited 2024]. Available from: https://www.nlm.nih.gov/mesh/intro_indexing.html.

Wieland S, Dickersin K. Selective exposure reporting and Medline indexing limited the search sensitivity for observational studies of the adverse effects of oral contraceptives. J Clin Epidemiol. 2005 Jun;58(6):560–7. Epub 20050418. DOI: 10.1016/j.jclinepi.2004.11.018.

Chang AA, Heskett KM, Davidson TM. Searching the literature using medical subject headings versus text word with PubMed. Laryngoscope. 2006 Feb;116(2):336–40. DOI: 10.1097/01.mlg.0000195371.72887.a2. Available from: https://onlinelibrary.wiley.com/doi/pdfdirect/10.1097/01.mlg.0000195371.72887.a2?download=true.

Jenuwine ES, Floyd JA. Comparison of Medical Subject Headings and text-word searches in MEDLINE to retrieve studies on sleep in healthy individuals. J Med Libr Assoc. 2004 Jul;92(3):349–53. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC442177/pdf/i0025-7338-092-03-0349.pdf.

Mork J, Aronson A, Demner-Fushman D. 12 years on - Is the NLM medical text indexer still useful and relevant? J Biomed Semantics. 2017 Feb 23;8(1):8. Epub 20170223. DOI: 10.1186/s13326-017-0113-5. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5324252/pdf/13326_2017_Article_113.pdf.

About Indexing Initiative: National Library of Medicine; [cited 2024]. Available from: https://lhncbc.nlm.nih.gov/ii/information/about.html.

Aronson AR, Bodenreider O, Chang HF, Humphrey SM, Mork JG, Nelson SJ, et al. The NLM Indexing Initiative. Proc AMIA Symp. 2000:17–21.

Chen E, Bullard J, Giustini D. Automated indexing using NLM's Medical Text Indexer (MTI) compared to human indexing in Medline: a pilot study. J Med Libr Assoc. 2023 Jul 10;111(3):684–94. DOI: 10.5195/jmla.2023.1588. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10361558/pdf/jmla-111-3-684.pdf.

MEDLINE 2022 Initiative: Transition to Automated Indexing [Internet]. NLM Techincal Bulletin; 2021. Available from: https://www.nlm.nih.gov/pubs/techbull/nd21/nd21_medline_2022.html

Rae AR, Pritchard DO, Mork JG, Demner-Fushman D. Automatic MeSH indexing: revisiting the subheading attachment problem. AMIA Annu Symp Proc. 2020;2020:1031–40. Epub 20210125. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075546/pdf/139_3413087.pdf.

Mork J, Jimeno Yepes A, Aronson A. The NLM Medical Text Indexer System for indexing biomedical literature. 2013. Available from: https://lhncbc.nlm.nih.gov/ii/information/Papers/MTI_System_Description_Expanded_2013_Accessible.pdf.

Frequently Asked Questions about Indexing for MEDLINE: National Library of Medicine; 2023 [cited 2024]. Available from: https://www.nlm.nih.gov/bsd/indexfaq.html.

Aronson AR, Mork JG, Gay CW, Humphrey SM, Rogers WJ. The NLM Indexing Initiative's Medical Text Indexer. Stud Health Technol Inform. 2004;107(Pt 1):268–72. Available from: https://ebooks.iospress.nl/pdf/doi/10.3233/978-1-60750-949-3-268.

Sticco A. NLM Office Hours: MEDLINE Indexing Update 2024. Available from: https://www.nlm.nih.gov/oet/ed/pubmed/02-24_oh_medline-automated-indexing.html.

MTIX: the next-generation algorithm for automated indexing of MEDLINE: National Library of Medicine; 2024 [2025]. Available from: https://www.nlm.nih.gov/pubs/techbull/ma24/ma24_mtix.html.

Hadfield RM. Delay and bias in PubMed medical subject heading (MeSH<sup>®</sup>) indexing of respiratory journals. medRxiv. 2020:2020.10.01.20205476. DOI: 10.1101/2020.10.01.20205476. Available from: http://medrxiv.org/content/early/2020/10/04/2020.10.01.20205476.abstract.

Irwin AN, Rackham D. Comparison of the time-to-indexing in PubMed between biomedical journals according to impact factor, discipline, and focus. Res Social Adm Pharm. 2017 Mar–Apr;13(2):389–93. Epub 20160505. DOI: 10.1016/j.sapharm.2016.04.006.

Rodriguez RW. Comparison of indexing times among articles from medical, nursing, and pharmacy journals. Am J Health Syst Pharm. 2016 Apr 15;73(8):569–75. DOI: 10.2146/ajhp150319.

Fernandez-Llimos F, Negrão LG, Bond C, Stewart D. Influence of automated indexing in Medical Subject Headings (MeSH) selection for pharmacy practice journals. Res Social Adm Pharm. 2024 Jun 12. Epub 20240612. DOI: 10.1016/j.sapharm.2024.06.003.

Brief Communication – concerning algorithmic indexing in MEDLINE. Journal of EAHIL. 2024 03/17 [cited 2025/04/09];20(1):18–21. DOI: 10.32384/jeahil20604. Available from: https://doi.org/10.32384/jeahil20604.

Moore DAQ, Yaqub O, Sampat BN. Manual versus machine: How accurately does the Medical Text Indexer (MTI) classify different document types into disease areas? PLoS One. 2024;19(3):e0297526. Epub 20240313. DOI: 10.1371/journal.pone.0297526. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10936797/pdf/pone.0297526.pdf.

NLM Curation at Scale Workshop 2022. Available from: https://www.youtube.com/watch?v=vS6iJmuEKlE.

Jaded Librarian. PubMed's auto-indexing feature assumes all Quaker parrots are Protestants, judging by the MeSH terms assigned: X; 2023. Available from: https://x.com/reneedmarshall/status/1625831712767766530.

Incorporating values for indexing method in MEDLINE/PubMed XML [Internet]. NLM Technical Bulletin; 2018. Available from: https://www.nlm.nih.gov/pubs/techbull/ja18/ja18_indexing_method.html#note

Email from NLM Support Team (2). Email message to: Paije Wilson. 2025

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Bmj. 2021 Mar 29;372:n71. Epub 20210329. DOI: 10.1136/bmj.n71.

Email from NLM Support Team. Email message to: Paije Wilson.: National Library of Medicine Help Desk; 2024

OLDMEDLINE Data: National Library of Medicine; 2024 [cited 2024]. Available from: https://www.nlm.nih.gov/databases/databases_oldmedline.html.

Pereira FA, Werner L, Milverton EJ, Coroneo MT. Miyake-Apple posterior video analysis/photographic technique. J Cataract Refract Surg. 2009 Mar;35(3):577–87. DOI: 10.1016/j.jcrs.2008.11.059.

Ahlgren LS. Apple peel jejunal atresia. J Pediatr Surg. 1987 May;22(5):451–3. DOI: 10.1016/s0022-3468(87)80268-3.

Menzdorf L, Drenck T, Akoto R, Hartel M, Krause M, Guttowski D, et al. Clinical results after surgical treatment of posterolateral tibial plateau fractures ("apple bite fracture") in combination with ACL injuries. Eur J Trauma Emerg Surg. 2020 Dec;46(6):1239–48. Epub 20200926. DOI: 10.1007/s00068-020-01509-8.

Tackling metastasis. Nat Cancer. 2022 Jan;3(1):1–2. DOI: 10.1038/s43018-021-00327-0.

Schene MR, Wyers CE, Driessen AMH, Souverein PC, Gemmeke M, van den Bergh JP, et al. Imminent fall risk after fracture. Age Ageing. 2023 Oct 2;52(10). DOI: 10.1093/ageing/afad201.

NLM Help Desk: National Library of Medicine [cited 2024]. Available from: https://support.nlm.nih.gov/support/create-case/.

Lin J. Is searching full text more effective than searching abstracts? BMC Bioinformatics. 2009 2009/02/03;10(1):46. DOI: 10.1186/1471-2105-10-46. Available from: https://doi.org/10.1186/1471-2105-10-46.

You R, Liu Y, Mamitsuka H, Zhu S. BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text. Bioinformatics. 2021 May 5;37(5):684–92. DOI: 10.1093/bioinformatics/btaa837. Available from: https://repository.kulib.kyoto-u.ac.jp/dspace/bitstream/2433/275589/1/bioinformatics_btaa837.pdf.

Dai S, You R, Lu Z, Huang X, Mamitsuka H, Zhu S. FullMeSH: improving large-scale MeSH indexing with full text. Bioinformatics. 2020 Mar 1;36(5):1533–41. DOI: 10.1093/bioinformatics/btz756. Available from: https://pmc.ncbi.nlm.nih.gov/articles/PMC7523651/pdf/btz756.pdf.

Gay CW, Kayaalp M, Aronson AR. Semi-automatic indexing of full text biomedical articles. AMIA Annu Symp Proc. 2005;2005:271–5. Available from: https://pmc.ncbi.nlm.nih.gov/articles/PMC1560666/pdf/amia2005_0271.pdf.

Jimeno-Yepes AJ, Plaza L, Mork JG, Aronson AR, Díaz A. MeSH indexing based on automatically generated summaries. BMC Bioinformatics. 2013 Jun 26;14:208. Epub 20130626. DOI: 10.1186/1471-2105-14-208. Available from: https://pmc.ncbi.nlm.nih.gov/articles/PMC3706357/pdf/1471-2105-14-208.pdf.

Tonin FS, Gmünder V, Bonetti AF, Mendes AM, Fernandez-Llimos F. Use of 'Pharmaceutical services' Medical Subject Headings (MeSH) in articles assessing pharmacists' interventions. Explor Res Clin Soc Pharm. 2022 Sep;7:100172. Epub 20220820. DOI: 10.1016/j.rcsop.2022.100172. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9445408/pdf/main.pdf.

Downloads

Additional Files

Published

2025-10-23

Issue

Section

Original Investigation