ETO Logo
Documentation: Advanced Semiconductor Supply Chain Dataset

Overview

What is this dataset?

The Advanced Semiconductor Supply Chain Dataset includes manually compiled, high-level information about the tools, materials, processes, countries, and firms involved in the production of advanced logic chips. The current version of the dataset reflects how CSET researchers understood this supply chain in mid-2025, drawing on industry data and publicly available analyses.

How do I get it?

The dataset csv files are available on Github.

Access the dataset

What can I use it for?

You can use this dataset to:

  • Learn about how advanced logic chips are produced and the tools, materials, and processes that are involved.
  • Assess countries' and companies' role in the supply chain using the dataset's extensive provider information.
  • Identify "chokepoints," market concentration, dependency relationships, and other structural features of the supply chain.

View examples >>

Which ETO products use it?

What are its sources?

Most of the data was adapted by CSET researchers from industry data provided by TechInsights. We augmented the TechInsights data with other publicly available analyses and information from prior CSET research. Read more >>

What are its main limitations?

  • The data are relatively high-level. This dataset was designed to orient non-specialists to the supply chain for advanced chips. It may be less relevant to professional supply chain managers, regulatory compliance analysts, or other specialists requiring very granular data. Read more >>
  • Some data are out of date. As of July 2025, most of the supply chain segments represented in the dataset use 2024 market sizes and market shares. Some supply chain segments are out of date already (i.e., using 2019 data), and more will go stale over time. The dataset will be refreshed periodically, but the next update isn't scheduled yet. Read more >>
  • The country market share and company nationality data rely on headquarters of the company's ultimate parent, making the tool less useful for research questions that depend on where production is physically happening. Read more >>

Does it contain sensitive information, such as personally identifiable information?

No.

What are the terms of use?

This dataset is subject to ETO's general terms of use. If you use it, please cite us.

How do I cite it?

Please cite the "Emerging Technology Observatory Advanced Semiconductor Supply Chain Dataset (2025 release)," including the link. If you use the explorer tool to access the data, you can cite that tool instead.

Structure and content

The dataset consists of five csv tables: inputs, providers, provision, sequence, and stages.

inputs

This table includes basic information about inputs to advanced chip production. Inputs include processes, tools, and materials. Material inputs are consumed in the production process (e.g. photoresist, wafers); tools are durable (e.g. photolithography equipment).

Column nameTypeDescription
input_idtext (ID)A unique alphanumeric identifier for the input.
input_nametextThe name of the input.
typetextWhether the input is a process, tool, design, or material input.
stage_nametextThe name of the production stage to which the input belongs. For inputs of type process only.
stage_idtext (ID)For inputs of type process only, indicates the ID of the production stage in which the process takes place. Connects to the stages table.
descriptiontextA short narrative summary of the input and its significance. Written by CSET researchers. Many summaries are adapted from the CSET report The Semiconductor Supply Chain: Assessing National Competitiveness.
yearyearThe year for which market size and/or market share data is provided for this input.
market_share_chart_global_market_size_infotextTotal global revenue from sales of the input for the year specified in year.
market_share_chart_captiontextA caption for the market share charts displayed for this input.
market_share_sourcetextThe source of the market size and share data provided for this input.

providers

This table lists countries and firms that provide inputs to advanced chips. A provider may be listed multiple times if it has more than one alias.

Column nameTypeDescription
provider_nametextThe name of the provider. Countries are identified with their three-digit ISO codes (ISO 3166). Here (and generally in ETO resources) we use "country" informally, as a shorthand term for sovereign countries, independent states, and certain other geographic entities. Read more >>
aliastextAnother name for the provider.
provider_idtext (ID)A unique alphanumeric identifier for the provider.
provider_typetextWhether the provider is a country or an organization.
countrytext (ISO 3166)For providers of type organization, indicates the country in which the organization is headquartered.

provision

This table describes the specific inputs provided by each country and firm, presented as provider-input pairs.

Column nameTypeDescription
provider_nametextThe name of a provider.
provider_idtext (ID)The unique identifier of the provider. Connects to the providers table.
provided_nametextThe name of an input provided by the provider.
provided_idtext (ID)The unique identifier of an input provided by the provider. Connects to the inputs table.
share_providedpercentageThe provider's market share for the specified input in a given year. This figure is generally available for countries, rather than firms, and refers in that case to the collective market share of all firms headquartered in that country. (In some cases, a provider country will not have a share_provided value for a particular input, reflecting limitations in the underlying dataset.)
yeartextThe year in which the provider specified in provider_name had the market share percentage specified in share_provided.
sourcetextThe source of the data provided for each provider-input pair.

sequence

This table describes the relationships between different inputs. There are two types of relationship described: inputs that "go into" other resources (e.g., in the case of a material that is used in a process, or a process that occurs directly before another process), and inputs that are specific subtypes of other defined inputs (e.g., EUV photolithography machines are designated as a type of photolithography equipment).

Column nameTypeDescription
input_nametextThe name of an input.
input_idtext (ID)The unique identifier of the input. Connects to the inputs table.
goes_into_nametextThe name of another input into which the initial input is incorporated or otherwise connected.
goes_into_idtext (ID)The unique identifier of the input identified in goes_into_name. If this field is populated, then is_type_of_id will not be populated. Connects to the inputs table.
is_type_of_nametextIf the initial input is a sub-type of another kind of input, the name of the second input will be listed here.
is_type_of_idtext (ID)The unique identifier of the input identified in is_type_of_name. If this field is populated, then goes_into_id will not be populated. Connects to the inputs table.

stages

This table describes different stages of the production process for advanced chips.

Column nameTypeDescription
stage_nametextThe canonical name of the stage.
stage_idtext (ID)A unique alphanumeric identifier for the stage.
descriptiontextA short narrative summary of the stage and its significance. All summaries are adapted from the CSET report The Semiconductor Supply Chain: Assessing National Competitiveness.

Sources and methodology

Unless otherwise specified, data in the Advanced Semiconductor Supply Chain Dataset is derived by CSET analysts from the TechInsights Chip Market Research Services (CMRS) Semiconductor Equipment Database (May 2025 release). The CMRS equipment database includes company-level revenue data for various semiconductor industry inputs organized hierarchically (e.g., EUV lithography tools are organized under lithography tools). CSET analysts mapped the revenue data for different inputs in the CMRS market segmentation to the most closely related inputs in the Advanced Semiconductor Supply Chain Dataset, in some cases mapping multiple CMRS inputs to a single one in our dataset. Based on that mapping, we populated the provision table as follows:

  • Each company's market share is calculated by dividing the company's revenue for a given input by the total revenue for that input across all providers.
  • Each country's market share is the total market share of all companies whose ultimate parent is headquartered in that country.
  • Rows for "various countries" or "various companies" for a given input include the total market share assigned to "Other" companies in the CMRS database, plus the market share of any companies that had no more than 1% market share for every input in our dataset.

Unless otherwise specified, market size figures in the inputs table are the sum of revenues across all companies in the CMRS database in the year specified.

Companies that had no more than 1% market share for every input in our dataset were not assigned a country affiliation. Therefore, the market share of countries with many tiny suppliers would be underrepresented in our dataset, however, we have no specific reason to suspect there are cases like this.

Known limitations

  • Much of the data comes from vendors, industry groups, or other commercially oriented organizations that don't fully disclose their methods. We believe these sources are credible but aren't able to vet their methodology in detail.
  • The data are relatively high-level. This dataset was designed for non-specialists, and may be less relevant if very granular data are needed.
  • Some data are out of date. As of July 2025, most of the supply chain segments represented in the dataset use 2024 market sizes and market shares. Some supply chain segments are out of date already (i.e., using 2019 data), and more will go stale over time. The dataset will be refreshed periodically, but the next update isn't scheduled yet.
  • The country market share and company nationality data rely on company headquarters locations, making the tool less useful for research questions that depend on where production is physically happening. In this dataset, country-level market shares are based on the individual market shares of firms whose ultimate parent is headquartered in that country, not the production physically occurring within that country. Similarly, individual providers are assigned to countries based on their ultimate parent's headquarters location, not location of operations. For example, a French-headquartered company would be assigned to France even if all of its manufacturing takes place in Asia.

Using the data

How can I access the data?

The dataset csv files are available on Github.

What can I use it for?

Learn about how and where advanced logic chips are produced and the tools, materials, and processes that are involved. You can read directly from the raw data or use our Explorer tool to browse visually.

Examples

Assess countries' and companies' role in the supply chain using the dataset's extensive provider information. If you have a specific input in mind, you can open it in the Explorer tool to quickly view associated countries (usually with per-country market share) and firms. More complex queries can be performed on the raw data using your favorite analysis tool.

Identify "chokepoints," market concentration, dependency relationships, and other structural features of the supply chain. You can use the Explorer's market concentration filter as an entry point here. More complex structural characteristics can be browsed visually with the Explorer tool or defined systematically with other tools.

Examples
  • Tracking trends over time. This dataset provides a "snapshot" view of the semiconductor supply chain. It doesn't include any historical data.
  • Supply chain management uses, such as product sourcing, logistics management, or vetting vendors. The dataset generally isn't specific or current enough for these applications.
  • Measuring market scale, profitability, or other economic metrics related to the semiconductor sector. This dataset doesn't include financial or economic data other than country market share for specific inputs.
  • Researching individual firms in detail. This dataset includes relatively high-level information about individual firms. It doesn't include granular company-level information such as manufacturing capacity or location of specific facilities.
  • Assessing the physical location of chip production. Some fields in the dataset may include some information on the physical location of a firm's production operations, but in general the dataset is not designed for research questions that turn on where chip production is actually happening. Read more >>
  • Uses that require highly up-to-date data. This dataset contains 2024 market sizes and market shares for most inputs. We believe the data give a good overall picture of the global semiconductor supply chain, but some inputs use older data from 2019 or 2022.

Maintenance

How are the data updated?

Because a substantial part of this dataset is collected manually by analysts, updating it takes significant work and time. We plan to periodically release new, comprehensively updated versions annually at most. Older versions will remain accessible on this page and in Github. The next update is not yet scheduled.

Between these major updates, there may be minor revisions to individual data points based on user feedback. These revisions will be logged on the Github pages for the relevant tables.

How can I report an issue?

Use our general issue reporting form. Or, if you access the dataset through the Supply Chain Explorer, you can submit issue reports for specific fields or data points using the "Report an Issue" links embedded in the tool. Read more >>

Credits

Much of the data in the Advanced Semiconductor Supply Chain Dataset is derived by CSET analysts from the TechInsights Chip Market Research Services (CMRS) Semiconductor Equipment Database (May 2025 release). The dataset also incorporates data published by World Semiconductor Trade Statistics (WSTS) and the Semiconductor Industry Association (SIA), among other sources.

Prior releases of the dataset drew on data from Saif M. Khan, Alexander Mann, and Dahlia Peterson, The Semiconductor Supply Chain: Assessing National Competitiveness (Center for Security and Emerging Technology, January 2021).

Additional support came from:

  • Data collection and analysis: Jacob Feldgoise, Hanna Dohmen, Zach Arnold, Sriya Guduru, Ari Filler
  • Engineering: Jennifer Melot, Neha Singh, Brian Love
  • Review: John VerWey, Hanna Dohmen
  • Documentation: Zach Arnold, Jacob Feldgoise
  • Special thanks: Saif Khan

Major change log

7/14/25July 2025 update: updated most data from 2019 to 2024, revised input taxonomy, and updated country affiliations.
10/13/22Initial release
ETO Logo

Keep in touch

Twitter