Foreword
The SoFAIR project, derived from Making Software FAIR, is dedicated to addressing a critical issue in research by interconnecting publications with the software they use, develop or share, improving the findability and accessibility of research software, two of the FAIR principles, originally designed for data. Leveraging the existing capabilities of open scholarly infrastructures operated by project partners, SoFAIR is an international collaboration coordinated by The Open University, in conjunction with Inria, France; Brno University of Technology, Czech Republic; the Polish Academy of Sciences (PAN), Poland; and The European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI), United Kingdom.
The SoFAIR project will extend the capabilities of critical and widely used open scholarly infrastructures (CORE, Software Heritage, HAL) and tools (GROBID) operated by the consortium partners, delivering and deploying an effective solution for the management of the research software lifecycle. The project is divided into four main thematic areas:
- Enhancing current tools for the identification of research software mentions within full-text scholarly documents.
- Validation of these software mentions by the authors of the papers themselves. This will be achieved by working with the global network of Open Access repositories indexed by CORE.
- The registration and archival of these assets by Software Heritage including the minting of unique persistent identifiers (SWHIDs).
- The direct linking of software assets with SWHIDs to the manuscripts that produced them via enhanced article metadata.
Scalable workflow for the software assets lifecycle for open repositories
The origins of the Open Access movement are often traced to the Budapest Open Access Initiative (BOAI) originating in 2002. The BOAI manifesto defined two routes to OA, the gold and the green route. The green route, defined by the approach of depositing research pre-prints and postprints into repositories and setting the foundations for the creation of a network of open repositories. This movement was phenomenally successful, resulting in the global network of thousands of repositories in operation today across the vast majority of countries.
More recently, open science extended these principles to the issue of making research data more available, as these are essential for providing evidence underlying research results and the basis for reproducibility. In response, the last few years have witnessed a significant growth of research data repositories, both subject and institutional. To a significant extent, the archival problem for research manuscripts has already been addressed, and important positive steps have been taken towards a similar solution for research data. However, software has often been left aside in open science debates and not so well covered in terms of scholarly infrastructures.
While the key drivers for Open Access are primarily about ethics and fairness; free access to research results created using public funds should not be privatised; the key drivers for openness of research data and software are mostly about adding a level of transparency that should increase trust in research.
In order to increase equity, fairness and trust in research, all research outputs, i.e. manuscripts, data and software, should be made openly available and preserved for future generations through the use of open scholarly infrastructures, operating in accordance with a set of community established guiding principles, The Principles of Open Scholarly Infrastructure (POSI), (Neylon et al. 2020)