Why do biomedical researchers learn to program? An exploratory investigation





Reproducibility, Programming Workshops, Biomedical Research


Objective: As computer programming becomes increasingly important in the biomedical sciences and more libraries offer programming classes, it is crucial for librarians to understand how researchers use programming in their work. The goal of this study was to understand why biomedical researchers who enrolled in a library-sponsored workshop wanted to learn to program in R and Python.

Methods: Semi-structured in-depth interviews were performed with fourteen researchers registered for beginning R and Python programming workshops at the University of California, San Francisco Library. A thematic analysis approach was used to extract the top reasons that researchers learned to program.

Results: Four major themes emerged from the interviews. Researchers wanted to learn R and Python programming in order to perform their data analysis independently, to be an informed collaborator, to engage with new forms of big data research, and to have more flexibility in the tools that they used for their research.

Conclusions: Librarians designing programming workshops should remember that most researchers are hoping to apply their new skills to a specific research task such as data cleaning, data analysis, and statistics and that language preferences can vary based on research community as well as personal preferences. Understanding the programming goals of researchers will make it easier for librarians to partner effectively and offer services that are critically needed in the biomedical community.


National Academies of Sciences E. Reproducibility and replicability in science [Internet]. The Academies; 2019 [cited 23 May 2019]. <https://www.nap.edu/catalog/25303/reproducibility-and-replicability-in-science>.

Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible computational research. PLOS Comput Biol. 2013 Oct 24;9(10):e1003285.

Markowetz F. Five selfish reasons to work reproducibly. Genome Biol. 2015 Dec 8;16(1):274.

Samsa G, Samsa L. A guide to reproducibility in preclinical research. Acad Med J Assoc Am Med Coll. 2019 Jan;94(1)47–52. DOI: http://dx.doi.org/10.1097/ACM.0000000000002351.

Wilson G, Aruliah DA, Brown CT, Chue Hong NP, Davis M, Guy RT, Haddock SH, Huff KD, Mitchell IM, Plumbley MD, Waugh B, White EP, Wilson P. Best practices for scientific computing. PLOS Biol. 2014 Jan 7;12(1):e1001745.

Lowndes JSS, Best BD, Scarborough C, Afflerbach JC, Frazier MR, O’Hara CC, Jiang N, Halpern BS. Our path to better science in less time using open data science tools. Nat Ecol Evol. 2017 May 23;1(6):0160.

NIH Library, National Institutes of Health. Data services [Internet]. The Library [cited 13 Sep 2019]. <https://www.nihlibrary.nih.gov/training/data-services>.

New York University (NYU) Health Sciences Library. Data services [Internet]. The Library [cited 13 Sep 2019]. https://hsl.med.nyu.edu/data-services>.

Stanford Libraries, Stanford University. Learning and using R at Stanford [Internet]. The Libraries [cited 13 Sep 2019]. <https://library.stanford.edu/projects/r>.

Oliver JC, Kollen C, Hickson B, Rios F. Data science support at the academic library. J Libr Adm. 2019 Mar 20;59(3):241–57.

UCLA Library, University of California, Los Angeles. R programming fundamentals [Internet]. The Library [cited 13 Sep 2019]. <https://www.library.ucla.edu/events/r-programming-fundamentals>.

Pascuzzi P, Nelson MRS. Integrating data science tools into a graduate level data management course. J EScience Librariansh. 2018 Dec 20;7(3). DOI: http://dx.doi.org/10.7191/jeslib.2018.1152.

Software Carpentry. About us [Internet]. Software Carpentry [cited 1 Dec 2017]. <http://software-carpentry.org/about/>.

The Carpentries. Become a member organisation [Internet]. The Carpentries [cited 31 Jul 2019]. <https://carpentries.org/membership/>.

Pugachev S. What are “The Carpentries” and what are they doing in the library? Portal Libr Acad. 2019 Apr;19(2):209–14.

Atwood T, Creamer A, Dull J, Goldman J, Lee K, Leligdon L, Oelker SK. Joining together to build more: the New England Software Carpentry Library Consortium. J EScience Librariansh. 2019 Jul 29;8(1). DOI: http://dx.doi.org/10.7191/jeslib.2019.1161.

Guest G, Bunce A, Johnson L. How many interviews are enough?: an experiment with data saturation and variability. Field Methods. 2006 Feb 1;18(1):59–82.

Guest G, MacQueen K, Namey E. Applied thematic analysis [Internet]. Thousand Oaks, CA: SAGE Publications; 2012 [cited 10 Jul 2019]. <http://methods.sagepub.com/book/applied-thematic-analysis>.

Haque A, Engel J, Teichmann SA, Lönnberg T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med. 2017 Aug 18;9(1):75.

University of California, San Francisco (UCSF) Collaborative Learning Environment (CLE). Data science initiative [Internet]. The University [cited 6 Aug 2019]. <https://courses.ucsf.edu/course/index.php?categoryid=499>.

Bioconductor. Home [Internet]. Bioconductor [cited 6 Aug 2019]. <https://www.bioconductor.org/>.

DataCamp. Choosing R or Python for data analysis? an infographic [Internet]. DataCamp [cited 6 Aug 2019]. <https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis>.

Library Carpentry [Internet]. Library Carpentry [cited 6 Aug 2019]. <https://librarycarpentry.org/index.html>.

Federer L. Demystifying R: an introduction for librarians [Internet]. Medical Library Association [cited 6 Aug 2019]. <http://www.medlib-ed.org/products/1151/demystifying-r-an-introduction-for-librarians>.

Yelton A. Introduction to Python programming for librarians [Internet]. Library Juice Academy [cited 6 Aug 2019]. <http://libraryjuiceacademy.com/066-python.php>.






Original Investigation