A chair is for sitting. A clock is for telling time. To look at these objects is to understand their primary use. Until recently, AI was, in most cases, a similar technology where design and use were closely linked. A facial recognition system recognized faces, a spellchecker checked spelling. Today though, with the advent of powerful “transformer models,” a single AI application can (at least in appearance) be used to countless ends — to write poetry, evaluate a resume, identify bird species, and diagnose diseases. As possible use cases become broader, so do the potential risks, which now range from the malicious, such as generating propaganda or sexual images of children, to the inadvertent, such as providing misleading election or health information.
With these advances, companies and governments are rapidly integrating AI into new systems and domains (Knight, 2023). In response, policymakers are scrambling to regulate AI in order to mitigate its risks and maximize its potential benefits. This has manifested in a flurry of political activity, which in the US alone includes dozens of proposed federal bills, a small number of state laws and hundreds more state bills, the longest executive order ever issued, and a tide of regulatory guidance.
However, when designing new regulations, policymakers face an empirical dilemma: they must regulate AI without any access to real-world data on how people and businesses are using these systems. Unlike social media and the internet where user behavior is often public and leaves observable data traces, general-purpose AI systems are largely accessed through private one-on-one interactions such as chatbots. AI companies collect user interaction data but are reluctant to share it even with vetted researchers due to privacy concerns (Bommasani et al., 2024; Sanderson & Tucker 2024). Instead, companies allow researchers and other external parties to probe their systems for vulnerabilities through practices such as red-teaming (Friedler et al., 2023). While these methods can help prevent AI systems from being used for the worst possible use cases they do not offer empirical insights about the harms users experience in the real world.
The lack of available empirical information about how people use general-purpose AI systems makes it extremely challenging to develop evidence-informed policy. Three potential methods can help address this use case information gap each with its own benefits and challenges:
Data donations: Users can voluntarily share data about their own interactions with AI systems directly with researchers (Sanderson & Tucker 2024). Researchers can also allow users to donate data directly typically through browser extensions without needing permission or support from companies (Shapiro et al., 2021). Data donations raise few privacy concerns but may introduce sampling bias since those with the interest and technical skills to donate their data may not represent AI users writ large (van Driel et al., 2022).
Transparency reports: AI companies can analyze data about how people use their systems and share their findings with the public (Bommasani et al., 2024; Vogus & Llansó 2021). Companies can solicit feedback from experts in high-risk domains such as healthcare about what information would be useful. Transparency reports raise little privacy risk but can be opaque in methodologies and details potentially serving company interests (Parsons 2017).
Direct access to log data: Companies could provide this access directly or indirectly by running queries on behalf of researchers voluntarily or potentially mandated under law (Lemoine & Vermeulen 2023). Direct access poses significant privacy risks that technical interventions might partially mitigate but not sufficiently justify practice. Companies may resist granting direct data access as it could jeopardize reputation or expose corporate secrets.
This paper proceeds in three parts: first describing why closing this gap is important along with associated challenges; second detailing three approaches mentioned above; finally offering recommendations for implementing these approaches benefiting both researchers ultimately public while safeguarding users' privacy.
Definitions And Scope
This paper focuses on researcher access specifically targeting popular consumer-facing general-purpose applications sharing chat logs built by foundation model developers like OpenAI's ChatGPT Google's Gemini Anthropic's Claude practical reasons rather than importance considering trade secrecy concerns beyond scope study focusing significant societal effects likely resources necessary infrastructure making usage available researchers.
Borrowing EU Act definition "general-purpose" GPAI Article Section "AI model trained large amount self-supervision scale displaying significant generality capable competently performing wide range distinct tasks regardless market integrated variety downstream applications." Concepts like "generality" "capability" debated focusing chatbot applications designed covering broadest domains narrow uses customer service chatbots.
Clarifying "AI systems," focusing primarily chatlogs text media content messages responses limited revealing context usage example asking write email unpaid payment phishing scam navigating awkward conversation associate money discussed later exposing personal identifiable challenging conceal researchers.
Use case including metadata encompassing details conversation timestamps session identifiers versions error logs violations refusals actions regenerating response flagging content high-risk re-identification outside scope study.