Documentation: Advanced Semiconductor Supply Chain Dataset

Overview

What is this dataset?

The Advanced Semiconductor Supply Chain Dataset includes manually compiled, high-level information about the tools, materials, processes, countries, and firms involved in the production of advanced logic chips. The current version of the dataset reflects how researchers understood this supply chain in early 2021. It uses a wide variety of sources, such as corporate websites and disclosures, specialized market research, and industry group publications.

How do I get it?

The dataset csv files are available on Github.

What can I use it for?

You can use this dataset to:

  • Learn about how advanced logic chips are produced and the tools, materials, and processes that are involved.
  • Assess countries' and companies' role in the supply chain using the dataset's extensive provider information.
  • Identify "chokepoints," market concentration, dependency relationships, and other structural features of the supply chain.

View examples >>

Which ETO products use it?

What are its sources?

Most of the data is taken from the CSET report The Semiconductor Supply Chain: Assessing National Competitiveness, published in 2021. In 2022, ETO researchers augmented the company profiles in the data with information manually collected from producer websites and the open internet. Read more >>

What are its main limitations?

  • The data are relatively high-level. This dataset was designed to orient non-specialists to the supply chain for advanced chips. It may be less relevant to professional supply chain managers, regulatory compliance analysts, or other specialists requiring very granular data. Read more >>
  • Some data may be out of date. Most of the dataset was compiled in 2020 using sources from the preceding several years. We believe it still gives a good overall picture of the global semiconductor supply chain. That said, some of the specific data points are almost certainly out of date already, and more will go stale over time. The dataset will be refreshed periodically, but the next update isn't scheduled yet. Read more >>
  • The country market share and company nationality data rely on company headquarters locations, making the tool less useful for research questions that depend on where production is physically happening. Read more >>

Does it contain sensitive information, such as personally identifiable information?

No.

What are the terms of use?

This dataset is subject to ETO's general terms of use. If you use it, please cite us.

How do I cite it?

Please cite the "Emerging Technology Observatory Advanced Semiconductor Supply Chain Dataset (2022 release)," including the link. If you use the explorer tool to access the data, you can cite that tool instead.

Structure and content

The dataset consists of five csv tables: inputs, providers, provision, sequence, and stages.

inputs

This table includes basic information about inputs to advanced chip production. Inputs include processes, tools, and materials. Material inputs are consumed in the production process (e.g. photoresist, wafers); tools are durable (e.g. photolithography equipment).

Column nameTypeDescription
input_nametextThe name of the input.
input_idtext (ID)A unique alphanumeric identifier for the provider.
typetextWhether the input is a process, tool, or material input.
stage_nametextThe name of the production stage to which the input belongs. For inputs of type process only.
stage_idtext (ID)For inputs of type process only, indicates the ID of the production stage in which the process takes place. Connects to the stages table.
descriptiontextA short narrative summary of the input and its significance. All summaries are adapted from the CSET report The Semiconductor Supply Chain: Assessing National Competitiveness.

providers

This table lists nations and firms that provide inputs to advanced chips.

Column nameTypeDescription
provider_nametextThe name of the provider. Countries are identified with their three-digit ISO codes (ISO 3166).
provider_idtext (ID)A unique alphanumeric identifier for the provider.
provider_typetextWhether the provider is a country or an organization.
provider_nametext (ISO 3166)For providers of type organization, indicates the country in which the organization is headquartered.

provision

This table describes the specific inputs provided by each country and firm, presented as provider-input pairs.

Column nameTypeDescription
provider_nametextThe name of a provider.
provider_idtext (ID)The unique identifier of the provider. Connects to the providers table.
provided_nametextThe name of an input provided by the provider.
provided_idtext (ID)The unique identifier of an input provided by the provider. Connects to the inputs table.
share_providedpercentageThe provider's market share for the specified input. This figure is generally available for countries, rather than firms, and refers in that case to the collective market share of all firms headquartered in that country. (In some cases, a provider country will not have a share_provided value for a particular input, reflecting limitations in the underlying dataset.)
negligible_market_sharetextWhether the provider accounts for a negligible share of the global market in the specified input. This field will only be populated if share_provided is empty (but will not be populated in every such instance). A company will be designated a negligible provider of a given input if (a) its headquarters country has less than 2% or less of the global market share for that input or (b) the provider accounts for a negligible share of the global market in the specified input, as indicated by the CSET report The Semiconductor Supply Chain: Assessing National Competitiveness.

sequence

This table describes the relationships between different inputs. There are two types of relationship described: inputs that "go into" other resources (e.g., in the case of a material that is used in a process, or a process that occurs directly before another process), and inputs that are specific subtypes of other defined inputs (e.g., EUV photolithography machines are designated as a type of photolithography equipment).

Column nameTypeDescription
input_nametextThe name of an input.
input_idtext (ID)The unique identifier of the input. Connects to the inputs table.
goes_into_nametextThe name of another input into which the initial input is incorporated or otherwise connected.
goes_into_idtext (ID)The unique identifier of the input identified in goes_into_name. If this field is populated, then is_type_of_id will not be populated. Connects to the inputs table.
is_type_of_nametextIf the initial input is a sub-type of another kind of input, the name of the second input will be listed here.
is_type_of_idtext (ID)The unique identifier of the input identified in is_type_of_name. If this field is populated, then goes_into_id will not be populated. Connects to the inputs table.

stages

This table describes different stages of the production process for advanced chips.

Column nameTypeDescription
stage_nametextThe canonical name of the stage.
stage_idtext (ID)A unique alphanumeric identifier for the stage.
descriptiontextA short narrative summary of the stage and its significance. All summaries are adapted from the CSET report The Semiconductor Supply Chain: Assessing National Competitiveness.

Sources and methodology

Data sources

Almost all data are taken from the CSET report The Semiconductor Supply Chain: Assessing National Competitiveness, which was published in 2021. Information in that report was gathered manually from a wide variety of sources and was generally current as of 2019. Refer to the relevant sections and footnotes in the paper for specific sourcing. Certain company names and countries were updated by ETO researchers in 2022.

Collection, processing, and enrichment

All data from the 2021 CSET report were manually extracted from the report and entered into this dataset. Updated company information was manually collected by ETO researchers from credible open sources, including company websites, trade publications, and major news outlets. For about 75% of companies, the information was independently reviewed for sourcing and accuracy in collection by at least one other researcher.

Known limitations

  • Some of the data in the 2021 CSET report comes from vendors, industry groups, or other commercially oriented organizations that don't fully disclose their methods. We believe these sources are credible but weren't able to vet their methodology in detail.
  • The data are relatively high-level. This dataset was designed for non-specialists, and may be less relevant if very granular data are needed.
  • Some data may be out of date. Most of the dataset was compiled in 2020 using sources from the preceding several years. We believe it still gives a good overall picture of the global semiconductor supply chain. That said, some of the specific data points are almost certainly out of date already, and more will go stale over time. The dataset will be refreshed periodically, but the next update isn't scheduled yet.
  • The country market share and company nationality data rely on company headquarters locations, making the tool less useful for research questions that depend on where production is physically happening. In this dataset, country-level market shares are based on the individual market shares of firms headquartered in that country, not the production physically occurring within that country. Similarly, individual providers are assigned to countries based on headquarters location, not location of operations. (For example, a French-headquartered company would be assigned to France even if all of its manufacturing takes place in Asia.)

Using the data

How can I access the data?

The dataset csv files are available on Github.

What can I use it for?

Learn about how and where advanced logic chips are produced and the tools, materials, and processes that are involved. You can read directly from the raw data or use our Explorer tool to browse visually.

Examples

Assess countries' and companies' role in the supply chain using the dataset's extensive provider information. If you have a specific input in mind, you can open it in the Explorer tool to quickly view associated countries (usually with per-country market share) and firms. More complex queries can be performed on the raw data using your favorite analysis tool.

Examples

Identify "chokepoints," market concentration, dependency relationships, and other structural features of the supply chain. You can use the Explorer's market concentration filter as an entry point here. More complex structural characteristics can be browsed visually with the Explorer tool or defined systematically with other tools.

Examples
  • Tracking trends over time. This dataset provides a "snapshot" view of the semiconductor supply chain. It doesn't include any historical data.
  • Supply chain management uses, such as product sourcing, logistics management, or vetting vendors. The dataset generally isn't specific or current enough for these applications. Measuring market scale, profitability, or other economic metrics related to the semiconductor sector. This dataset doesn't include financial or economic data other than country market share for specific inputs.
  • Researching individual firms in detail. This dataset includes relatively high-level information about individual firms. It doesn't include granular company-level information such as manufacturing capacity, location of specific facilities, or corporate ownership.
  • Assessing the physical location of chip production. Some fields in the dataset may include some information on the physical location of a firm's production operations, but in general the dataset is not designed for research questions that turn on where chip production is actually happening. Read more >>
  • Uses that require highly up-to-date data. The data in this dataset was generally current as of 2019 (or 2022, for certain details related to companies). We believe the data still give a good overall picture of the global semiconductor supply chain, but specific data points may be out of date.

Maintenance

How are the data updated?

Because this dataset is collected manually by analysts, updating it takes significant work and time. We plan to periodically release new, comprehensively updated versions annually at most. Older versions will remain accessible on this page and in Github. The next update is not yet scheduled.

Between these major updates, there may be minor revisions to individual data points based on user feedback. These revisions will be logged on the Github pages for the relevant tables.

How can I report an issue?

Use our general issue reporting form. Or, if you access the dataset through the Supply Chain Explorer, you can submit issue reports for specific fields or data points using the "Report an Issue" links embedded in the tool. Read more >>

Credits

This dataset is based on Saif M. Khan, Alexander Mann, and Dahlia Peterson, The Semiconductor Supply Chain: Assessing National Competitiveness (Center for Security and Emerging Technology, January 2021).

Additional support came from:

  • Data collection and analysis: Zach Arnold, Sriya Guduru, Ari Filler
  • Engineering: Jennifer Melot, Neha Singh
  • Review: John VerWey
  • Documentation: Zach Arnold
  • Special thanks: Saif Khan

Major change log

10/13/22Initial release