Attention Substack users! ETO blog posts are also available on Substack.
The Emerging Technology Observatory is launching enhancements to several of our tools and datasets. These enhancements stem from improvements to our Merged Academic Corpus (MAC) which contains detailed information on over 280 million scholarly articles, resulting in a more up-to-date, accessible, and accurate look into the research landscape using ETO’s Research Almanac, Country Activity Tracker, Private-sector AI-Related Activity Tracker, and more.
MAC Improvements
First, we've improved the MAC’s author affiliation metadata to more accurately show the organizations and countries represented in the scholarly literature. We created an internal entity resolution pipeline and our results outperform the MAC’s prior affiliation metadata (primarily provided by the MAC’s underlying sources). This improvement added more than 3.5 million new country affiliations and 3 million new organization affiliations, correcting errors in author affiliation data from our underlying data sources and increasing the accuracy and reliability of our data. Note that one change that may be surprising is that some publication counts have gone down with this update, due to corrected country and affiliations.
Second, we've expanded our research fields classification. We updated our model for determining a publication’s research field(s) and modified our research fields hierarchy. As a result, some publications that were previously classified under one field will now be classified under another. This change will allow for exploration of more research fields of interest across ETO tools.
Third, we've updated the data sources for our Merged Academic Corpus, moving toward more open-source datasets. In particular, we removed data from Clarivate’s Web of Science from the MAC, resulting in the loss of roughly 7% of previously included publications (as of May 2025). We confirmed this reduced coverage was relatively uniform across regions and fields, and did not lead to a notable impact on aggregated counts and figures. This update will create new opportunities for data download and tool enhancements across the ETO platform.
Updated Tools and Datasets
Updated MAC data is now incorporated in the following ETO tools:
The following ETO datasets also include the updated data:
- Emerging Technology Overlay for OpenAlex
- Cross-Border Tech Research Metrics
- Country AI Activity Metrics
- Private-Sector AI Indicators

Note that as part of this update, each of these tools and datasets now includes the latest available data, as reflected on the "last updated" date for each resource. Updates to many of these resources were paused around late 2024 to early 2025 as we were updating the MAC's underlying sources, but they are now resumed.
More to come
We are excited to incorporate the MAC changes into many of our tools and datasets, but one of our tools has yet to be updated – the Map of Science. While making improvements to the MAC, we've also been making updates to the research clustering method used in our Map of Science. The results are in, and we look forward to launching our updated research clusters and Map of Science soon. Stay tuned!
Shout out to CSET’s data team for designing and implementing new solutions to improve the MAC’s sources, affiliations, and research fields. As always, we love feedback and are glad to help – visit our support hub to contact us, book live support with an ETO staff member or access the latest documentation for our tools and data. 🤖