We will develop a prototype tool to (1) link research software mentions detected in research papers to software repositories through supervised machine learning; (2) integrate the results with UMETRICS data; and (3) produce initial methodological and substantive analyses. The objective is to design, develop, and implement a data linkage framework that detect and disambiguate named entities in text data and link them to entities in other databases with research, software, and funding information. The work we propose here will, if successful, set the stage for more complete integration of software metadata into large scale data production at bibliographic data services. Integration of linked publication-repository metadata with the UMETRICS dataset created and maintained at IRIS will situate research software production and use in the funded teams and collaboration networks. This project thus represents an important first step in the development of rich, accessible data to support a wide range of research about the development, use and effects of a key intermediate research project, software.
Funding:
Sloan, Alfred P., Foundation
Funding Period:
12/01/2022 to 11/30/2024