Automated tools for systematic review screening methods: an application of machine learning for sexual orientation and gender identity measurement in health research

Authors

  • Ashleigh J. Rich Duke University
  • Emma L. McGorray Northwestern University
  • Carrie Baldwin-SoRelle University of North Carolina Chapel Hill
  • Michelle Cawley University of North Carolina Chapel Hill
  • Karen Grigg
  • Lauren B. Beach Northwestern University
  • Gregory Phillips II Northwestern University
  • Tonia Poteat Duke University

DOI:

https://doi.org/10.5195/jmla.2025.1860

Keywords:

Sexual and Gender Minorities, Health, Methods, Systematic Review, Automation

Abstract

Objective: Sexual and gender minority (SGM) populations experience health disparities compared to heterosexual and cisgender populations. The development of accurate, comprehensive sexual orientation and gender identity (SOGI) measures is fundamental to quantify and address SGM disparities, which first requires identifying SOGI-related research. As part of a larger project reviewing and synthesizing how SOGI has been assessed within the health literature, we provide an example of the application of automated tools for systematic reviews to the area of SOGI measurement.

Methods: In collaboration with research librarians, a three-phase approach was used to prioritize screening for a set of 11,441 SOGI measurement studies published since 2012. In Phase 1, search results were stratified into two groups (title with vs. without measurement-related terms); titles with measurement-related terms were manually screened. In Phase 2, supervised clustering using DoCTER software was used to sort the remaining studies based on relevance. In Phase 3, supervised machine learning using DoCTER was used to further identify which studies deemed low relevance in Phase 2 should be prioritized for manual screening.

Results: 1,607 studies were identified in Phase 1. Across Phases 2 and 3, the research team excluded 5,056 of the remaining 9,834 studies using DoCTER. In manual review, the percentage of relevant studies in results screened manually was low, ranging from 0.1 to 7.8 percent.

Conclusions: Automated tools used in collaboration with research librarians have the potential to save hundreds of hours of human labor in large-scale systematic reviews of SGM health research.

Downloads

Additional Files

Published

2025-01-14

Issue

Section

Original Investigation