ETO's Cross-Border Tech Research Metrics dataset includes metrics for cross-border research in emerging technology domains, such as AI, robotics, and cybersecurity.
The dataset focuses on countries, not organizations or individuals. No data is included on individual people and organizations within those countries.
The data do not give a complete picture of tech-related activity. There are many ways to assess these activities. This dataset includes only one type of metric - research publications.
The data have a lag, making counts incomplete for recent years.
There are some errors and gaps in our process for assigning publications to countries. We use metadata from the sources that feed our Merged Academic Corpus to associate publications with countries. There are sometimes errors or gaps in this metadata. We use various methods to fix these problems, but some errors remain. As a result, some publications remain unlinked to countries and others may be linked to the wrong countries.
The metrics are based on primarily English-language sources that miss many Chinese-language publications. These metrics are ultimately derived from ETO ’s Merged Academic Corpus, which omits many Chinese-language publications. Because of this, metrics related to Chinese articles should be interpreted with caution.
The metrics omit small per-year counts. If a country pair has fewer than 25 joint publications in a year, we omit that year of data for that country pair. Read more >>
What are the terms of use?
This dataset is subject to ETO's general terms of use. If you use it, please cite us.
The Cross-Border Tech Research Metrics dataset consists of topic-specific csv tables, each organized as follows:
Name
Type
Description
country1
text
A country's name.
country2
text
Another country's name.
field
text
A research field.
year
number
The year of publication.
num_articles
number
The number of articles related to the specified field that were jointly published by researchers associated with country1 and country2 in the specified year. Read more about our methodology >>
complete
boolean
Indicates whether the row is from a year where we consider our data materially complete. If complete is false, we consider the data from that year materially incomplete and you should use it with caution.
Topics covered
The dataset currently covers the following fields of research. We plan to add more over time.
How we link publications to particular fields (note that the Cross-Border Tech Research Metrics dataset includes emerging tech fields unrelated to AI; these fields are linked using the same approach)
Nuances of counting and deduplicating publications
This dataset covers research publications whose authors are affiliated with institutions in multiple countries. Each such publication "counts" as a joint paper for each pair of countries affiliated with any authoring institutions. So, for example:
A publication with authors from New York University only would be omitted from the metrics in this dataset.
A publication with authors from New York University and Oxford University would be counted as one publication for the United States-United Kingdom country pair.
A publication with authors from New York University, Oxford University, and Peking University would be counted as one publication for the United States-United Kingdom country pair, one publication for the United States-China country pair, and one publication for the China-United Kingdom country pair.
A publication with authors from New York University, Harvard University, and Oxford University would be counted as one publication for the United States-United Kingdom country pair. (There's no "double counting" when multiple universities from the same country are involved.)
Omitted data
If a country pair has fewer than 25 joint publications in a year, we omit that year of data for that country pair. We consider numbers this small potentially unreliable due to unavoidable "background noise" in our underlying data sources (e.g., errors in linking authors to institutions or institutions to countries).
Credits
Engineering: Jennifer Melot
Documentation: Zach Arnold
Emerging technology topic classifications are based on work supported in part by the Alfred P. Sloan Foundation under Grant No. G-2023-22358.