The Country Activity Tracker (CAT) is a dashboard of AI activity at the national level. It includes metrics on research, patents, and private-market investment, providing insight into AI competition and cooperation around the world. Users can compare AI activity statistics for any country or group of countries worldwide, explore data on specific AI subfields and applications, and track AI-related collaborations and exchanges between nations.
You can use CAT to:
CAT uses ETO's Merged Academic Corpus for research data, Digital Science (Dimensions) and 1790 Analytics data for patents, and Crunchbase for company and investment data. Read more >>
No, other than the names of certain authors and patent assignees (all taken from public documents).
If you use data from CAT in your own work, please cite the "Emerging Technology Observatory's Country Activity Tracker: Artificial Intelligence" and include the link to the tool.
These instructions focus on the desktop version of the tool. Some features may be missing or act differently on mobile devices.
CAT includes three basic views, one for each "dataset" or metric group - research, patents, and private-sector activity (companies and investments). Each view includes a different set of customizable tables and visuals.
Start with the selection bar at the top of the tool:
Use the "Dataset" dropdown selector to choose a group of metrics to display - the tool will update automatically to match your selection. Then, use the other selectors to specify:
You can restore the defaults at any time with the "Clear" button.
Look for dropdown menus, sorting buttons, and similar elements in the CAT tables to customize your analysis. The data in each table will update in real time as you make your selections.
Hover over the "?" icons to learn more about different sections and data points.
As you work with CAT, your browser's address bar will update to reflect the applied filters and selections. Copy the URL in order to return to the same view later.
Measuring a country's AI activity using CAT's three metric groups: research, patents, and private-sector activity (companies and investments). Each group includes detailed metrics and data on trends over time.
Comparing countries across different metrics of AI activity. Users can build customized lists of countries, regions, and political groupings to compare across many of CAT's metrics.
Tracking trends in transnational AI activity, such as cross-border investment and co-publication.
Identifying leading AI institutions and companies within a country or group of countries using CAT's "top ten" features.
Since CAT's launch in late 2022:
We’ll add new public examples here as we learn about them.
CAT uses different datasets for its research metrics, patent metrics, and investment metrics.
Research data in CAT comes from ETO's Merged Academic Corpus (MAC), which contains detailed information on over 270 million scholarly articles from around the world. Every article in the MAC is tagged as AI-related or not using an automated, classifier-based process; CAT uses the AI-related articles only. For more details, see the MAC documentation.
CAT attributes articles to countries based on the author organizations listed in each article, as recorded in MAC metadata. (Here (and generally in ETO resources) we use "country" informally, as a shorthand term for sovereign countries, independent states, and certain other geographic entities. Read more >>) In CAT, an article "counts for" a given country if it lists at least one author affiliated with an organization in that country. The MAC relies on the article to determine the author's organization; for instance, an article listing "Jane Smith, University of Texas" as its author would be attributed to the United States even if Professor Smith later moved to the University of Tokyo. By the same token, authors are associated with the country of their listed institution even if they're not "from" that country: once she moved to the University of Tokyo, Professor Smith's articles would count for Japan, even if she was born and raised in Chicago.
If an article lists authors from organizations in more than one country, the article will "count toward" multiple countries in CAT. However, if a single article has multiple authors from the same country, it will only be counted once for that country.
So, for example:
The top ten authors for each country are the ten authors with the most citations to articles they released while affiliated with institutions in that country. (We exclude authors with fewer than five articles published since 2010.) Authors may appear in the top ten for multiple countries if their output in each country qualifies them for each country's list. For example, if Professor Smith published highly cited articles in Texas and in Tokyo, she might make the top ten list for the United States (based on her Texas articles) and for Japan (based on her Tokyo articles).
Note that the "Affiliation" column for each author in the "Top Ten Authors" table is populated using the institution where the author received the most citations. For example, if Professor Smith worked at Georgia Tech before moving to Texas, but the articles she published in Georgia have fewer total citations than the articles she published in Texas, she would be listed in the table as affiliated with the University of Texas, not Georgia Tech.
We classify articles into AI subfields using subject assignment scores in the Merged Academic Corpus, which are generated algorithmically. (The MAC's subject scoring models only work on English-language articles; we impute scores to non-English articles based on the average scores of the articles it cites or is cited by.)
CAT includes the following subfields:
We use each article's scores for selected common AI-related subjects to assign it to up to three of these subfields. Depending on their scores, some articles may not be assigned to subfields (for example, articles on niche subjects or whose topical focus is uncertain). Articles that lack subject assignment scores altogether are also left out of the subfield categorizations. Generally, these are non-English articles with insufficient citation data to impute scores, as described above.
CAT uses patent data from 1790 Analytics and Dimensions, and applies methods developed jointly by CSET and 1790 to screen and structure the data. In addition to the information that follows, you can learn more about this dataset and download an index of its contents in the CSET/1790 github repo.
The CAT dataset includes over 400,000 AI-related patent families, which are groups of patent documents related to the same invention. These documents may include patent applications, which are requests pending at a country's patent office for the grant of a patent, and granted patents, which are approved requests awarding a property right for that invention. (We exclude other types of patent documents, such as amendments or other administrative documents.) In CAT, each patent family is counted as a single "patent." If the family includes at least one granted patent, the family is counted as a "granted patent." If the family only includes patent applications, it's treated as a "patent application."
Inventors often file patents for the same invention in multiple jurisdictions, since each jurisdiction's patent office can only enforce patent protections in their jurisdiction. For example, a company with U.S.-patented products might seek patents in France if it plans to start manufacturing or selling the same products there. CAT's patent dataset includes data from 52 different patent offices around the world, including national offices (such as the U.S. Patent and Trademark Office) and international offices (such as the European Patent Office). When an inventor seeks a patent for an invention in more than one of these jurisdictions, all of the documents from every jurisdiction are counted as part of the same patent.
To make this more concrete, suppose:
At this point, CAT would count two patents for Jane: one EPO granted patent (for the robot) and one Chinese patent application (for the software). Note that there are at least four patent documents involved: an EPO application, EPO patent grant, and Chinese application for the robot, and an EPO application for the software. But the first three documents all relate to the same invention, so CAT counts them together.
CAT's patent metrics describe where patents are being filed, not which country has the most patents. In other words, CAT can't tell you how many AI patents are owned by Americans, but it can tell you how many patents were filed in the U.S. patent office. There may be overlap between these two categories, but it's not a perfect match: for example, about half of patent applications filed in the U.S. are from overseas.
We are working to build inventor nationality metrics into future versions of CAT. In the meantime, you can use the existing, filing location-based metrics to understand where AI innovators are most interested in protecting their inventions - and in turn, where they may be conducting R&D, manufacturing, marketing, expanding operations, or competing with foreign companies.
CAT includes only AI-related patents. CSET and 1790 Analytics developed a method to identify these patents from broader 1790 and Dimensions patent data holdings using a combination of keywords and patent classifications, which are categories applied to individual patents by some patent offices. We also used keywords and classifications to link each patent to different AI techniques (e.g., machine learning, logic models), applications (e.g., speech processing, computer vision), and industries (e.g., life sciences, transportation). A patent can have more than one of any of these labels: for example, a patent for a robot that recognizes and responds to spoken commands might be assigned to the robotics and speech processing applications. For more information on this method, you can read CSET's paper on AI patents or visit the Github repo for the CSET/1790 project.
CAT adapts other data from the patent dataset to generate metrics:
In the process of combining different patent data sources into the CAT database, duplicate patents are created. We use patent IDs and patent family IDs, which are unique identifiers assigned by national patent offices, to detect and resolve these duplicates. We use data from the CSET/1790 project when different sources give different information for the same patent.
The investment and company metrics in CAT rely on data from Crunchbase, a commercial dataset. Specifically, CAT uses Crunchbase data related to equity investment into privately held, AI-related companies.
Crunchbase data has gaps, especially for companies with a lower public profile, but we believe it is a relatively comprehensive and accurate source for the sort of data CAT uses. For more details, please refer to the methodology section and appendices in this CSET report; the process we used to evaluate and extract data from Crunchbase is generally similar to the process described there.
There is no single objective definition of an "AI company." We take a deliberately broad approach, using three different criteria to identify AI-related companies in Crunchbase. Any privately held company that meets at least one of the criteria is counted as an AI-related company in CAT, and investments involving that company will be included in CAT's investment metrics.
These criteria are designed to capture a wide range of companies with AI-related activities across the globe, even for smaller countries or territories. (Many of the companies included in PARAT fulfilled multiple criteria.) Because of this broad approach, they may capture some companies and investments in Crunchbase that others might not consider AI-related. At the same time, they may leave out some companies and investments others would describe as AI-related.
CAT assigns each AI-related company, and all the investments into that company, one or more application fields based on the company's industry tags and groups in Crunchbase. This table maps the tags and groups to CAT application fields.
CAT adapts other Crunchbase data to generate investment and company metrics:
CAT's cross-border investment tables should be interpreted with care. Most private-market AI investment transactions, such as venture capital deals, combine contributions from multiple investors, and the exact amount of each investor's contribution is rarely disclosed. This makes it impossible to add up the total investment from investors in a specified country. Instead, the numbers in CAT's cross-border investment tables reflect the total value or count of investment transactions with at least one participating investor from that country.
For example, in this investment comparison, $5,936 million ($5.9 billion) is the value of transactions with a target company in the United States and at least one participating investor from Canada - not the amount Canadians invested into U.S. AI companies.
The CAT user interface is updated as new features are developed. The underlying data is currently updated a minimum of once a quarter, although we plan to automate more frequent updates within the next year.
Use our general issue reporting form, or click on the "Submit feedback" icons embedded in the tool to report issues related to specific data points.
|8/18/22||Initial release on CSET's website|
|10/19/22||Updated version launched on ETO's website|