Introduction
Making Software FAIR: A machine-assisted workflow for research software
SoFAIR implementation will establish research software citations & mentions as first-class bibliographic records. This will enable correct attribution, which is key for researchers, and provide a new way research software is identified, validated, registered and archived for the long term.
The SoFAIR workflow (Figure 1) shows how stakeholders, tools and infrastructures work together. An author deposits a piece of research software in a code repository [1]. The author then deposits a manuscript that contains either explicit or implicit mentions of that software [2]. The research paper is then harvested from the repository by CORE and software mentions extracted from the full text research paper using extended state-of-the-art ML tools (GROBID / Softcite (Lopez, et al., 2021)) [3]. Via the CORE Repository Dashboard, a request to validate the extracted mentions is made available to the repository [4] and, with the authorisation of the repository manager, routed to the author (e.g. by means of an email notification) who validates this request [5]. Once validated, the repository issues an asset registration request to Software Heritage [6] who permanently archive the new software asset [7] and issue a permanent identifier for the new asset and send this back to the repository [8].
Figure 1: The SoFAIR workflow
Following this, repositories will be able to expose information linking software assets with research outputs that mention them within their OAI-PMH feed to aggregators and do so in an interoperable fashion. The efficacy of the solution will be validated within two disciplinary use cases: 1) a life sciences use case conducted in cooperation with Europe PMC and a 2) digital humanities use case conducted in cooperation with DARIAH. An additional multidisciplinary use case will be conducted in cooperation with the HAL repository.
The audience and actors
This documentation is intended for developers implementing the SoFAIR workflow.
The actors that are part of the workflow scenario are:
- The author / research team
- The Open Access repositories curators / archivists / moderators
- The annotators (when annotation isn’t automatic)