Documentation: PATHWISE

Overview:

What is this dataset?

The PATHWISE dataset includes various workforce and education metrics for AI and Cyber talent in the United States. The dataset includes these metrics for all U.S. states and core-based statistical areas (CBSAs). You can use ETO’s PATHWISE to explore these metrics for each region.

Which ETO products use it?

This dataset powers ETO's PATHWISE tool.

What are its sources?

PATHWISE is built by consolidating data from the following sources:

What are its main limitations?

  • Education levels do not include associates or non-degree award programs. PATHWISE’s current iteration exclusively focuses on bachelor’s, master’s and doctoral degrees.
  • Education metrics lag behind the workforce metrics. PATHWISE is built using education data from 2023, and workforce data from January to August 2025. We aim to reduce the time lag between these sources as new data will be added regularly.
  • PATHWISE inherits additional limitations from its source datasets, including:
    • Workforce demand only indicates the number of job postings. We do not have any information on whether a given posting was filled or not.
    • Workforce data largely relies on online activity. The underlying data for all workforce variables are primarily based on online job postings and profiles, which may exclude postings and/or profiles that don’t have a digital presence.

What are the terms of use?

Because this dataset incorporates licensed data from commercial providers, it is not publicly available. However, you can view most of the data in it using PATHWISE

How do I cite it?

Because the dataset is not publicly available, you should cite PATHWISE or this documentation page instead. To cite PATHWISE, please use "CSET Emerging Technology Observatory PATHWISE", including the link.

Structure and content:

The basic unit of the PATHWISE dataset is the geographic region, which is either a state (including the District of Columbia) or a Core-Based Statistical Area (CBSA). CBSAs, as defined by the U.S. Office of Management and Budget, consist of counties (or county-equivalents) that contain a core area with a substantial population nucleus along with adjacent communities having significant economic and social integration with the core area.

For each region, we calculate the workforce and education metrics using our methodology. Read More >>

Table 1: Metrics used in this dataset
VariableDescription
RegionThe name of the State or the Core-Based Statistical Area
Emerging Technology TalentEmerging Technology Talent type - either AI or Cyber
Demand (All)The total number of job postings for Emerging Technology Talent type
Demand (Government)The number of job postings in the federal government for Emerging Technology Talent type
Demand (Non - Government)The number of job postings in the private sector for Emerging Technology Talent type
Share of Total DemandThe share of job postings for Emerging Technology Talent type among all job postings in a Region
SupplyThe number of worker profiles in Emerging Technology Talent type
Share of Total SupplyThe share of worker profiles in Emerging Technology Talent type among all worker profiles in a Region
Demand: SOC-5 NameThe top five Standard Occupational Classification (SOC) code titles from the job postings for Emerging Technology Talent type and Region, along with their frequencies
Supply: SOC-5 NameThe top five Standard Occupational Classification (SOC) code titles from the worker profiles for Emerging Technology Talent type and Region, along with their frequencies
Educational Institute The names of the top 5 educational institutions with most graduates for Emerging Technology Talent type in a Region
Education LevelThe degree levels available. This includes Bachelor’s, Master’s and Doctoral Degrees, and the total sum of graduates
Graduate CountsThe number of graduates in Emerging Technology Talent type-related fields, based on Classification of Instructional Programs (CIP) codes, for an Education Level in an Educational Institute.

Note: Variables names are slightly modified in this documentation for clarity

Methodology:

The following methods were performed in the given order, to obtain the final dataset for PATHWISE.

Identifying Emerging Technology Demand and Supply:

We used CSET’s published definitions of the AI and Cyber workforce to identify their demand and supply. These methods are based on Standard Occupational Classification (SOC) codes, a federal statistical standard used to classify workers into occupational categories. We use the 5 digit SOC codes as they are the most detailed occupational classification. We identified the AI and Cyber workforce as follows:

CSET defines the AI workforce as the set of occupations that include people who are qualified to work in AI or on an AI development team, or have the requisite knowledge, skills, and abilities (KSAs) such that they could work on an AI product or application with minimal training. Through this definition, 54 SOC-5 codes are identified as AI-related occupations. For PATHWISE, we narrowed it down to 32 SOC-5 codes that best represent occupations that are involved in the technical development of an AI system or product. These SOC-5 codes were then used to filter Lightcast’s job postings and profiles dataset to obtain AI job postings and AI profiles respectively. [include box: For the full list of occupations, refer to Appendix A in Gehlhaus and Mutis, The U.S. AI Workforce (January 2021) ]

👀

For the full list of occupations, refer to Appendix A in Gehlhaus and Mutis, The U.S. AI Workforce (January 2021).

CSET defines the Cyber workforce with a crosswalk mapping cybersecurity-related Occupational Information Network (O*NET) codes to the NICE Workforce Framework for Cybersecurity, which establishes a common lexicon of cybersecurity work roles and KSAs. For PATHWISE, we further mapped the O*NET codes to SOC-5 codes, and used them to filter Lightcast’s job postings and profiles dataset to obtain Cyber job postings and Cyber profiles respectively.

👀

For the full list of occupations, refer to the crosswalk’s dataset documentation.

Identifying AI and Cyber Educational Programs:

We analyzed the post-secondary education history of identified emerging technology talent employee profiles. In particular, we focused on the Classification of Instructional Programs (CIP) codes, a standardized code defined by the NCES to identify instructional program specialties within educational institutions. We used the 6 digit CIP code as they are the most detailed level for instructional programs.

We tested the following three approaches to identify the most relevant CIP codes for AI and Cyber talent:

  • Counting all degree programs within our emerging technology talent employee profiles
  • Fractional counting for all degree programs mentioned in each emerging technology talent employee profile
  • Only counting the highest educational degree in each emerging technology talent employee profile

All three methods resulted in the same set of top CIP codes for AI and Cyber profiles. Based on significant drops in frequencies, we include the top 8 CIPs for AI and top 5 CIPs for Cyber talent. Tables 2 and 3 provide the final list of AI and Cyber relevant CIPs

Table 2: AI-relevant CIP codes and names
CIP 6 CodeCIP 6 Name
11.0701Computer Science
14.1001Electrical and Electronics Engineering
52.0201Business Administration and Management, General
14.1901Mechanical Engineering
11.0103Information Technology
14.0101Engineering, General
14.0901Computer Engineering, General
52.1201Management Information Systems, General
Table 3: Cyber-relevant CIP codes and names
CIP 6 CodeCIP 6 Name
11.0701Computer Science
52.0201Business Administration and Management, General
11.0103Information Technology
14.1001Electrical and Electronics Engineering
52.1201Management Information Systems, General

Identifying federal government demand:

We identify federal government job postings within the AI and Cyber job postings subsets by tagging postings whose source is usajobs.gov.

Identifying top 5 SOC Job Titles:

We aggregate the AI and Cyber postings dataset by SOC-5 codes and titles and identify the top 5 SOC codes by frequency. The same process is repeated for the AI and Cyber profiles.

Identifying number of graduates:

We count all graduates whose major degree corresponded to our AI or Cyber CIP codes and group them by degree level. We then map these results to the physical location of each educational institution, excluding educational programs or universities that operate fully online.

Final Consolidation:

We merge our job posting and profiles dataset with the number of AI and Cyber graduates for our geographic regions.

Maintenance:

How are the data updated?

Workforce metrics from Lightcast are updated monthly and education metrics from NCES IPEDS are updated annually.

Credits:

  • Data collection and analysis: Jacob Feldgoise, Sonali Subbu Rathinam
  • Engineering: Jacob Feldgoise, Sonali Subbu Rathinam
  • Review: Jacob Feldgoise, Katherine Quinn
  • Documentation: Sonali Subbu Rathinam

The PATHWISE tool is based on work supported by CSET’s partnership with the NobleReach Foundation.

Major change log

2025-10-30Initial release

Keep in touch

Twitter
LinkedIn
Substack
Email
RSS
Terms of Use and Privacy Policy