We're excited to announce ORCA (Open-source software Research and Community Activity), the Emerging Technology Observatory's new tracker for open-source software (OSS) used in science and technology research. Drawing on a wide variety of data sources, including Github Archive and ETO's Merged Academic Corpus, ORCA tracks OSS usage, development activity, and community engagement across a wide range of software projects and research subjects.
Why ORCA? One answer is found (as answers often are) in an xkcd comic:
Like so much else, modern science and technology research increasingly depends on a wide range of OSS projects, from general-purpose tools originally created by big corporations to specialized libraries and utilities maintained by nonprofits, academic groups, or even individuals. Many of these projects aren't well-known, and they may not have the resources they need to keep supporting the research enterprise.
To keep things humming along, we need to know which OSS projects are used where - and how they're doing. How actively are they being maintained? Do they depend on a small number of contributors, or a broader, more sustainable community? Are they getting used more or less over time? Are bugs and other issues piling up faster than they can be cleared?
ORCA is a new, easy-to-use interface for tackling these sorts of questions for software used in research. Pick any research field to explore data on OSS projects that support it. We comb ETO's Merged Academic Corpus and other research data sources for citations to open-source projects, then "roll up" the projects field by field (for more on this process, check ORCA's documentation):
You can use the ORCA interface to:
- Compare OSS projects in a particular research area according to different metrics of project activity, interest and health.
- Track activity, usage, and community engagement trends over time for specific repos or for all repos in a particular field.
- Sort and filter projects by field, programming language, license, and various activity metrics.
To get a sense of how it works, try exploring questions like:
- Which OSS projects are mentioned the most in AI research?
- How much do different astronomy-related OSS projects depend on their most active contributors?
- How do the OSS projects most relevant to computer vision research compare across different health and activity metrics?
- How many issues were opened vs closed in 2022 in the repo for the Alibi machine learning explainability library?
ORCA is live today at https://orca.eto.tech. As always, feel free to contact us with any questions, or drop by for live support during our standing office hours. Good luck exploring! 🌊🌊🌊