ETO Logo
Documentation: AGORA
AGORA is in beta. This documentation is incomplete. We are in the process of compiling full documentation for AGORA and its underlying dataset. This interim documentation is provided for convenience and is not final. Feel free to contact us with any questions not answered here.

Overview

What is this tool?

AGORA (AI GOvernance and Regulatory Archive) is a living collection of AI-relevant laws, regulations, standards, and other governance documents ("instruments") from the United States and around the world. AGORA includes summaries, instrument text, thematic tags, and filters to help you quickly discover and analyze key developments in AI governance. Its easy-to-use interface includes plain-English summaries, detailed metadata, and full text for hundreds of AI-focused laws and policies, with new data being added continuously.

What can I use it for?

  • Find recent laws, policies, standards and similar documents - termed "instruments" in AGORA - from a wide range of jurisdictions and organizations, including governments as well as private companies.
  • Quickly get up to speed on specific documents of interest using AGORA's plain-English summaries - or dive into the full text of any document.
  • Sort and filter instruments by criteria including keyword, date, jurisdiction, and significance.
  • Use AGORA's original thematic taxonomy to find instruments related to specific AI governance themes, from evaluation and information disclosure to public investment.
  • Identify instruments that address specific applications of AI, such as medicine, finance, or military uses.
  • Analyze AGORA data in depth using our bulk data export.

What are its most important limitations?

  • AGORA is in beta. Many features are incomplete, data is being continuously added, and all aspects of the tool are subject to change.
  • AGORA’s focus on instruments that directly address AI means that many AI-relevant laws, regulations, and norms are excluded by design. In particular, laws of general applicability - for example, securities regulation, civil rights law, or even common-law tort doctrines - have implications for modern AI. However, this body of often older, AI-relevant general law is a potentially unbounded set, and the applicability of any particular instrument may be debated. Accordingly, we exclude these instruments from AGORA’s scope. Though this makes AGORA a less comprehensive guide to the overall AI governance landscape, it reduces uncertainty over the bounds of the resource. Also, AI-focused implementations of more general laws - such as agency regulations or guidance explaining how broad existing authorities will be applied to AI - are in scope for AGORA, reducing the impact of the exclusion. Nonetheless, this limitation means AGORA cannot substitute for careful region-, context-, and sector-specific analysis of regulations relevant to particular groups, organizations, or AI use cases, though it can enhance such analysis.
  • Many other instruments that do meet AGORA's scope are not yet included. AGORA’s nominal scope is far broader than the set of documents collected to date. In particular, the current dataset skews heavily toward U.S. law and policy. We plan to experiment with automation (discussed below) and seek more resources to increase processing volume and multi-language coverage. In the meantime, we have focused annotators’ efforts on especially high-profile or consequential instruments (e.g., adopted rather than proposed policies) and on sets of instruments of particular interest to current AGORA stakeholders.
  • Annotators exercise judgment in applying AGORA’s scope and taxonomy, raising the risk of inconsistent screening and tagging. To mitigate risks of reliability and rigor, we provide detailed conceptual definitions, along with examples and decision heuristics. Further, we implement a dual annotation process (initial annotation followed by validation and reconciliation) for each instrument, with disagreements elevated to AGORA leadership and resolved according to defined procedures. Finally, we maintain a searchable central repository of prior questions and answers and make it accessible to all AGORA annotators online.
  • Longer or more thematically diverse instruments may have many associated thematic tags, and it may not be immediately clear which parts of the instruments justify which tags. AGORA’s taxonomy currently applies to instruments as a whole. At present, annotators are tagging some of these instruments at the section or subsection level, but the resulting granular data are not yet integrated into the AGORA interface and public dataset. Over time, we plan to increase the proportion of instruments coded at the section or subsection level and integrate the outputs into the AGORA interface and dataset.

What are its sources?

AGORA runs on an original ETO dataset. The contents of the dataset are compiled from official sources, then screened, processed, and annotated according to original methods developed by ETO and Purdue GRAIL. Read more >>

Does it contain sensitive information, such as personally identifiable information?

No.

What are the terms of use?

The AGORA tool and data are subject to ETO's general terms of use. If you use the tool, please cite us.

How do I cite it?

AGORA is in beta. Data and features are subject to change. In general, we don't recommend citing AGORA during the beta period. Feel free to contact us with any questions.

If you use data from AGORA in your work, please cite "Emerging Technology Observatory AGORA" and include the link to the tool.

Sources and methodology

Flowchart indicating stages of the AGORA data production workflow, including collection, screening, compilation, annotation, validation, and incorporation in the final dataset.
The AGORA data production workflow.

Scope

AGORA includes laws, regulations, standards, and similar instruments that directly and substantively address the development, deployment, or use of artificial intelligence technology. The intent of this scoping definition is to encompass the large majority of instruments created by lawmakers, regulators, and standard-setters in direct response to advances in modern machine learning and related technologies.

Applying subjective elements of this definition, such as “directly and substantively,” inevitably involves judgment. When screening documents for inclusion in AGORA, we try to constrain this judgment by defining heuristics; for example, screeners are instructed that an instrument does not “substantively” address artificial intelligence if it only mentions AI in contextual or non-operative language, e.g., “findings of Congress” provisions in bills, passing mentions, or Federal Register explanatory text accompanying a new regulation. Screeners are also instructed that instruments addressing AI-related concepts such as machine learning, machine autonomy, or algorithmic decision making should generally be considered in scope; that is, the presence or absence of the specific term “artificial intelligence” is not determinative. In turn, instruments addressing technologies such as autonomous vehicles and synthetic media are all potentially within AGORA’s scope (and indeed, the current dataset includes instruments related to these topics).

👉
Read the full set of current AGORA scoping heuristics and accompanying guidance here.

Critically, the requirement that instruments “directly” address artificial intelligence generally excludes laws predating the rise of modern machine learning, even if they are broad enough in scope to bear on AI. We draw this line to ensure that AGORA’s scope is manageable in practice and to reinforce the dataset’s emphasis on policies created in response to 21st century developments in AI, rather than the entire set of policy instruments that may affect individual sectors and governance writ large. Note, however, that more recent instruments that tailor these broad laws to the specific context of AI would qualify for inclusion in AGORA. For example, while the Civil Rights Act of 1964 would not be included in AGORA, a related federal regulation or guidance document applying the Act to racially discriminatory AI is within AGORA’s scope.

Collection, screening, and compilation

Candidate documents for inclusion in AGORA are currently collected manually or using semi-automated means (e.g., saved queries against larger datasets) from a wide range of official and unofficial sources, reflecting the decentralized, largely ad-hoc status of current AI governance tracking. These sources include:

  • Official, general-purpose regulatory compilations, suchas the Congress.gov service for United States federal legislation and the Federal Register for United States federal regulation, and comparable subnational sources.
  • Unofficial compilations of law and policy relevant to AI,digital issues, or related topics. Examples include the International Association of Privacy Professionals (IAPP) Global AI Law and Policy Tracker, the OECD AI Policy Observatory, and the OCEANIS collection of standards.
  • Informal lists compiled by researchers, typically focusedon particular topics or scopes of interest, such as criminal law, frontier model governance, ethical frameworks, industry standards, or Chinese AI regulation.

Human screeners review these sources (periodically, in the case of sources that update) and assess each instrument against the AGORA scoping definition.

For instruments judged to be in scope, screeners locate the authoritative text of the instrument (for example, on the official website of the United States Congress or a state legislature) and use it to populate basic metadata such as title and date of introduction. Annotators also identify “packages,” or larger, thematically diverse instruments containing AI-related portions amidst other, AI-unrelated material. A typical example is the annual National Defense Authorization Act (NDAA) in the United States, a massive, largely AI-unrelated law with some diverse AI-related provisions sprinkled throughout in recent years. NDAAs and other such packages are divided into conceptually discrete AGORA instruments, corresponding to sections, subsections, or other subdivisions in the packages, according to standing guidance (reproduced in the appendices).

Annotation and validation

Using the basic metadata and authoritative text compiled during the screening process, AGORA annotators generate summaries and thematic codes for each in-scope instrument using the AGORA taxonomy and further instructions provided in the AGORA codebook. A custom-built Airtable interface structures the annotator workflow and facilitates quick and accurate annotation.

AGORA’s summaries are meant mainly to help users skim and sift, rather than as an analytic resource in themselves; the codebook provides brief instructions for short- and longform summaries, but significant discretion is left to annotators. We recently began using large language models to provide "first drafts" of summaries for annotators to review. An alert appears next to all machine-generated summaries in the AGORA interface that are awaiting review by a human.

After initial annotation, a second annotator (designated the “validator”) reviews each instrument in full and discusses any disagreements with the initial annotator. Note that these are not fixed roles; each AGORA annotator serves as initial annotator on some instruments and as validator on others.

Once all issues identified in validation have been resolved, the instrument’s record, consisting of validated metadata, short and long summaries, and thematic codes, is marked complete. Complete records are periodically added to the public AGORA dataset and web interface using an automated script.

The AGORA thematic taxonomy

AGORA includes a conceptual taxonomy that is inspired by scholarly and policy literature, but intended to be useful to a wide range of potential users and reasonably intuitive to both those users and AGORA annotators. This necessarily involves balancing comprehensiveness with parsimony and interpretability. The taxonomy was drafted by an interdisciplinary team with training in law, data engineering, public policy, political science, AI governance, and quantitative and qualitative social science methods, with input from potential AGORA users in government, academia, and the private sector, and has been refined iteratively based on annotator and user feedback.

The taxonomy consists of discrete concepts (“codes”) organized into five overarching domains:

  • Risk factors governed: The characteristics addressed by the instrument that affect AI systems’ propensity to cause harm, closely related to ethical concerns. Surveying these characteristics is a focus of many current AI policy efforts, while others focus on specific risk characteristics such as bias or insecurity. AGORA’s risk codes are adapted from the risk categories outlined in the widely-used AI Risk Management Framework issued by the U.S. National Institute of Standards and Technology.
  • Harms addressed: The potential harmful consequences of the development or use of AI that this instrument means to prevent. Per AGORA’s definitions, “harms” are the consequences of “risks” (prior bullet). For example, an AI system that is insecure or biased (risk characteristics) might end up causing harm to physical health or financial loss (harms). AGORA’s harm codes are adapted from categories of harm developed by CSET researchers for use with the AI Incident Database.
  • Governance strategies: The means provided in the instrument to address, assess, or otherwise act with respect to the development, deployment, or use of AI. These are essentially the proposed solutions to the policy problems or goals articulated. They comprise the largest group of codes in AGORA, reflecting the wide range of regulatory tools and tactics being used to address AI and its challenges, from disclosure and evaluation requirements to convening, institutional creation, and pilot and testbed creation. AGORA’s governance strategy codes were developed iteratively by the AGORA leadership team over the first several months of annotation.
  • Incentives for compliance: The types of incentives provided for people, organizations, etc., to comply with the requirements of the instrument. This small group of codes covers positive and negative incentives commonly seen in AI-related statutes and regulations, such as subsidies and fines, respectively.
  • Application domains addressed: Any economic or social sectors, such as healthcare or defense, specifically addressed in the instrument as contexts for AI development, deployment, or use. AGORA’s application codes are adapted from the TINA industry taxonomy used in prior analyses of AI-related investment and are comparable in granularity to 2-digit NAICS codes, with one exception: given significant attention to the use of AI by governments in particular, the code for government applications of AI is divided into subcategories, allowing for more precise analysis.

Currently, these categories encompass 77 different codes in total. To facilitate consistent annotation of abstract concepts, most codes are given detailed definitions. Annotators are also given examples of instrument text meeting more complex coding definitions as well as related keywords, exceptions, and considerations influencing interpretation.

During the annotation process, annotators read each instrument in full, then decide whether each of the 77 codes in the AGORA taxonomy applies at any point in the instrument,based on the definition and (where available) examples and keywords provided in the codebook. In deciding, annotators are instructed to consider only the operative text of each instrument; to focus on what the instrument explicitly states or clearly and directly implies; and to ignore material unrelated to artificial intelligence.

👉
Read the AGORA codebook, including detailed definitions for all concepts in the AGORA taxonomy.

Data export

A bulk export of all data available through the AGORA interface is under development. We expect the schema for instrument metadata will look something like this, with instrument full text provided in a separate file. If you're interested in early access to this resource, please contact us.

Maintenance

How is it updated?

During the beta period, AGORA is updated frequently, but on an ad-hoc basis.

How can I report an issue?

Use our general issue reporting form.

Roadmap

During the beta period, our priorities include:

Interface

  • Building a "collection view" that organizes instruments into thematic collections (e.g., facial recognition, NDAA provisions, state laws) for easy browsing.
  • Enriching full-text display with term highlighting and thematic tagging and summarization at the section level.
  • Making thematic tags easier to browse and select as filters in the main view and the single-instrument view.
  • Making it easier to filter records by country and specific authorities (e.g. specific states or regulatory agencies).

Dataset

  • Incorporating new instruments from various jurisdictions and sources, aiming for full ongoing coverage of United States federal laws (proposed and enacted), federal regulations, and enacted state laws as well as extensive coverage of instruments from other jurisdictions.
  • Adding a new set of thematic tags corresponding to specific types of AI (e.g., language models, synthetic media).
  • Developing a bulk data export.
  • Experimenting with LLM-based thematic tagging at the instrument and section level.
  • Integrating the AGORA data production pipelines with available APIs from official sources and other data providers.
👉
Help shape AGORA! We welcome input on our roadmap and would love to hear what you want to do with AGORA. Please contact us to set up a chat.

Credits

AGORA is a project of the Emerging Technology Observatory with support from Purdue GRAIL.

Interface

  • Concept and design: Zach Arnold, Brian Love, Niharika Singh
  • Engineering: Jennifer Melot, Brian Love, Niharika Singh
  • Documentation: Zach Arnold, Daniel Schiff

Dataset

  • Schema, codebook and taxonomy: Zach Arnold, Daniel Schiff, Kaylyn Jackson Schiff
  • Editors: Lindsay Jenkins, Ashley Lin, Konstantin Pilz
  • Annotators: Zoe Borden, Eileen Chen, Ogadinma Enwereazu, Ari Filler, Diya Kalavala, Mallika Keralavarma, Kieran Lee, Sophia Lu, Anjali Nookala, Maya Snyder, Jayan Srinivasan, Alina Thai, Julio Wang

Major change log

6/28/24Beta release
ETO Logo

Keep in touch

Twitter