Volume 28 - Symposium Issue | Yale Journal of Law & Technology

Anonymity, Consent, And Other Noble Lies: An Empirical Study of The Data Economy

Authors:

Joel Reardon, Serge Egelman, Kenneth A. Bamberger & Laurel E. McGrane

Volume:

28

Issue:

Spring

Starting Page Number:

432

Year:

2026

Preview:

While legal scholars have cited decades of computer science research that demonstrates why anonymity is hard (and that datasets should not be labelled as “anonymous” cavalierly), industry and legal practitioners have not heeded those warnings: many organizations trafficking in consumer data continue to assert to customers, courts, and regulators, that their data is anonymous or “deidentified.” We acquired datasets from multiple data brokers to demonstrate empirically why this is false. Using publicly available email addresses found in data breaches posted on the Internet, we trivially reidentified 88% of the hashed email addresses that we obtained; using modern password-cracking techniques, we were able to reidentify 97% of the 6 million email addresses that we collected. Reidentifying hashed email addresses need not rely on illicit data or specialized hardware: by constructing rainbow tables with synthetic data representative of typical email addresses, we reidentified most of the hashed email addresses. In all cases, the hashed email addresses were linked to other device-based identifiers (e.g., mobile device advertising IDs, IPs, etc.), demonstrating why device-based identifiers have long been considered personally identifiable information. Relatedly, organizations trafficking in this data make another assertion, that this data was collected from consumers with their consent. To evaluate this claim, we performed a survey (n=369), in which we emailed a subset of the reidentified individuals in our datasets to recruit them to participate. This survey asked participants about their recollections of having provided consent (99% had no recollection) and their feelings about the sale of their information (94% were opposed, while 77% said they planned to submit deletion requests). Overall, our study shows that hashed email addresses and device identifiers do not come close to meeting commonly understood definitions of “anonymous” or “deidentified” data, and that any notion of “consent” must also involve a similarly tortured definition. We argue that this industry and its defenders are not simply misinformed or indifferent to the veracity of their statements, but that this is an example of Plato’s “noble lie”: their entire social order relies on these demonstrably untrue statements being believed by courts, regulators, policymakers, and the public.

Abstract:

While legal scholars have cited decades of computer science research that demonstrates why anonymity is hard (and that datasets should not be labelled as “anonymous” cavalierly), industry and legal practitioners have not heeded those warnings: many organizations trafficking in consumer data continue to assert to customers, courts, and regulators, that their data is anonymous or “deidentified.” We acquired datasets from multiple data brokers to demonstrate empirically why this is false.

reardonegelmanbambergermcgrane_28yalejltech432.pdf

The Hypocrisy of Data Governance

Authors:

Zubair Shafiq, Olivia Figueira, Athina Markopoulou, Woodrow Hartzog & Michael Lavine

Volume:

28

Issue:

Spring

Starting Page Number:

393

Year:

2026

Preview:

“Data governance” is an empty term, like a Rorschach inkblot just waiting to be filled with meaning. Tech companies take advantage of this ambiguity to craft narratives about their data-governance capabilities to fit their audience and purpose. On one hand, tech companies brag about their data-governance capabilities when it fits their business model (for example, to advertisers) and public image (for example, to their customers). On the other hand, tech companies claim that meaningful data governance is challenging or impossible when accountability is demanded. In this Article, we argue that tech companies systematically misrepresent or selectively ignore their data-governance capabilities. To demonstrate our point, we present two case studies showing how tech companies adopt inconsistent and self- serving positions when it comes to the treatment of consumers’ personal information. First, we show examples where tech companies actively identify children to deliver personalized advertising and content recommendations but disclaim the knowledge or ability to identify children when legal obligations attach. Second, we show how tech companies commonly claim they do not know whether the information collected by their tracking tools is protected health information (PHI) under HIPAA, even though standard techniques enable such classification. We conclude this article by arguing for a more sustained critique and skepticism of the concept and implementation of data governance. Lawmakers could better scrutinize what constitutes reasonable efforts under existing data protection rules, they could better tailor new rules to the data governance capabilities of tech companies, and finally, lawmakers could better scrutinize the use of the term “data governance” as an efficacy claim within the law of consumer protection.

Abstract:

“Data governance” is an empty term, like a Rorschach inkblot just waiting to be filled with meaning. Tech companies take advantage of this ambiguity to craft narratives about their data-governance capabilities to fit their audience and purpose. On one hand, tech companies brag about their data-governance capabilities when it fits their business model (for example, to advertisers) and public image (for example, to their customers). On the other hand, tech companies claim that meaningful data governance is challenging or impossible when accountability is demanded.

shafiqfigueiramarkopoulouhartzoglavine_28yalejltech393.pdf

Economic Rationales for Regulating Behavioral Ads

Authors:

Pegah Moradi, Cristobal Cheyre, Alessandro Acquisti

Volume:

28

Issue:

Spring

Starting Page Number:

336

Year:

2026

Preview:

Advocates for regulating behaviorally targeted advertisements tend to focus on ethical and legal justifications for regulation. Meanwhile, the advertising technology industry has staunchly opposed regulation by drawing on economic arguments, contending that such regulation would be harmful to advertisers, consumers, publishers, and data intermediaries alike—ultimately undermining innovation and accessibility of free products across the Internet. In this Article, we analyze the theoretical and empirical economic literature on the costs and benefits of privacy regulation in the context of behavioral advertising in order to evaluate the strength of economic arguments for and against regulation. Our analysis suggests that recent enforcement actions against ad-technology firms and movements across the world for online privacy regulations may be justifiable not merely on ethical or moral grounds, but on economic grounds. We show that current economic arguments used by the ad industry to oppose privacy regulation are poorly substantiated, and therefore, do not outweigh valid legal and ethical justifications for privacy regulation. Furthermore, there are valid theoretical and empirical economic justifications for regulating behavioral ads. Rather than resulting in a loss of welfare for consumers, regulation may produce a reduction of harms and a more balanced allocation of the costs and benefits of data accumulation. Still, future economic work must move from analyzing narrow micro-level effects to research designs that are both rigorous and encompassing, allowing for a fuller understanding of impacts across stakeholders to more effectively inform privacy regulation.

Abstract:

Advocates for regulating behaviorally targeted advertisements tend to focus on ethical and legal justifications for regulation. Meanwhile, the advertising technology industry has staunchly opposed regulation by drawing on economic arguments, contending that such regulation would be harmful to advertisers, consumers, publishers, and data intermediaries alike—ultimately undermining innovation and accessibility of free products across the Internet.

moradicheyreacquis_28yalejltech336.pdf

Privacy Paradox in Digital Service Taxation

Authors:

Zhaoyi Li

Volume:

28

Issue:

Spring

Starting Page Number:

181

Year:

2026

Preview:

As the digital economy expands, tax jurisdictions face increasingly large challenges, as taxable activities like online shopping and advertising frequently extend beyond national borders. This shift has led to the emergence of the European Union’s Digital Services Tax (“DST”). While current discussions on this topic focus on the optimal methods and equitable distribution of taxing rights among countries, they overlook user privacy issues inherent in taxes like the DST. In light of the ongoing debate over whether the U.S. should tax digital transactions, this Article examines the legal framework of the DST and explores its implications from a data privacy perspective. By analyzing the implications of taxing the collection, use, and security of consumer data in the digital economy, this Article illustrates the broader effects of digital taxes on privacy rights and compliance. While the DST offers fiscal benefits, it simultaneously raises significant privacy concerns that must be addressed to safeguard consumer interests in an increasingly data-driven marketplace. To resolve this tension, this Article advances a privacy-centric model for the DST, integrating privacy protection measures directly into the DST’s structure and objectives. This comprehensive approach underscores the need for a harmonized framework that balances the economic goals of taxation with the protection of individual privacy, fostering a fairer and more equitable digital ecosystem for all stakeholders.

Abstract:

As the digital economy expands, tax jurisdictions face increasingly large challenges, as taxable activities like online shopping and advertising frequently extend beyond national borders. This shift has led to the emergence of the European Union’s Digital Services Tax (“DST”). While current discussions on this topic focus on the optimal methods and equitable distribution of taxing rights among countries, they overlook user privacy issues inherent in taxes like the DST. In light of the ongoing debate over whether the U.S.

li_28yalejltech181.pdf

Disciplining Mechanisms: Governing Data Markets with Competition and Regulation

Authors:

Peter Ormerod

Volume:

28

Issue:

Spring

Starting Page Number:

308

Year:

2026

Preview:

The past decade has witnessed conceptual renewals in both competition law and information privacy law. These regulatory movements—Neo-Brandeis antitrust and structural data governance—share the objective of recalibrating the balance of power between individuals and the massive data-processing firms that now dominate modern life. Despite their common ends, policy interventions drawn from these schools of thought can work at cross purposes: competitive pressure can induce data exploitation, and privacy rules tend to benefit the largest firms. This Essay exposes the friction in their relationship and oﬀers guidance on how to mediate their tension. Competition policy alone will prove ineﬀective at indirectly disciplining most data activities, so policymakers should largely favor the structural data-governance approach to address the information economy’s pathologies. But pro-competition policies will nevertheless be essential to reining in firms that are too big to meaningfully regulate and may also prove helpful in solving certain discrete data-processing problems. Policymakers today have two distinct mechanisms for disciplining firms’ data-driven activities. This Essay describes them, exposes their contours, and oﬀers those policymakers guidance on how best to deploy them.

Abstract:

The past decade has witnessed conceptual renewals in both competition law and information privacy law. These regulatory movements—Neo-Brandeis antitrust and structural data governance—share the objective of recalibrating the balance of power between individuals and the massive data-processing firms that now dominate modern life. Despite their common ends, policy interventions drawn from these schools of thought can work at cross purposes: competitive pressure can induce data exploitation, and privacy rules tend to benefit the largest firms.

ormerod_28yalejltech308.pdf

Governing Toxic Data

Authors:

Diane Lourdes Dick, Joseph W. Yockey

Volume:

28

Issue:

Spring

Starting Page Number:

279

Year:

2026

Preview:

Companies increasingly boast to the public markets about their massive digital transformations and the value of their extraordinary customer insights. In this way, data is emerging as a crown jewel asset with unique corporate-governance implications under state and federal laws. For those firms touting data and other digital resources as among their most valuable assets, compliance with evolving cybersecurity and privacy laws, regulations, customer expectations, digital norms, and best practices will be the key to unlocking this value. By the same token, when compliance and policy gaps become pronounced, data and other digital assets can become toxic; not only will they fail to serve as drivers of corporate value, but they may generate significant liabilities. This category of “toxic” data can cause firms to incur massive litigation costs and regulatory fines and penalties, as well as major reputational damage that can destroy brand equity and erode market share. In light of recent signals by the U.S. Securities and Exchange Commission that it intends to focus on these risks, companies and their advisors must now anticipate that well-funded teams of regulators will aggressively monitor corporate disclosures and investigate compliance in an effort to carry out their mission to protect investors and maintain fair, orderly, and efficient markets. In response to this evolutionary enforcement moment, this Article provides the first comprehensive review of the corporate governance of data and other digital assets under state business-entities laws and the federal securities laws, paying special attention to evolving fiduciary responsibilities to monitor, oversee, and report on the risks associated with what we call toxic data.

Abstract:

Companies increasingly boast to the public markets about their massive digital transformations and the value of their extraordinary customer insights. In this way, data is emerging as a crown jewel asset with unique corporate-governance implications under state and federal laws. For those firms touting data and other digital resources as among their most valuable assets, compliance with evolving cybersecurity and privacy laws, regulations, customer expectations, digital norms, and best practices will be the key to unlocking this value.

dick_yockey_28yalejltech279.pdf

Information About Data

Authors:

Mihailis E. Diamantis, Chen Sun, Rishab Nithyanand

Volume:

28

Issue:

Spring

Starting Page Number:

238

Year:

2026

Preview:

Deterrence-based approaches to privacy enforcement rely on an overlooked and often false premise—that firms know what their own data practices are. There is good reason for skepticism because operational information tends to become siloed within firm subunits. Information about data management is no different. Firms may neglect to memorialize relevant information in reports for internal distribution. And even if such reports are generated, they may not be presented in a manner that is intelligible across firm constituencies. This paper looks outside of privacy law for a solution. Recent scholarship on securities disclosures has highlighted the variety of goals that disclosures serve. While the traditional purpose of financial disclosures is to inform outside investors, the process of preparing disclosures has beneficial internal effects too. It forces firms to study their own financial health and ensures that relevant corporate units are apprised of the results. Mandatory disclosures about corporate data practices could have similarly beneficial effects. While some states already require firms to publish generic information about data practices to consumers, these disclosures lack basic attributes that make financial disclosures effective—they lack detail, no human signs them, and they are not filed with any state authority. Securities- style disclosures hold more promise. By carefully tailoring the content, format, and required signatories of data practice disclosures, authorities could force firms to generate, translate, and internally propagate important information about data. Firms that actually know what they are doing with data are more susceptible to efforts aimed to deter data misuse.

Abstract:

Deterrence-based approaches to privacy enforcement rely on an overlooked and often false premise—that firms know what their own data practices are. There is good reason for skepticism because operational information tends to become siloed within firm subunits. Information about data management is no different. Firms may neglect to memorialize relevant information in reports for internal distribution. And even if such reports are generated, they may not be presented in a manner that is intelligible across firm constituencies. This paper looks outside of privacy law for a solution.

diamantis_sun_nithyanand_28yalejltech238.pdf

The Physicist and The Sheep Farmer

Authors:

Ari Ezra Waldman

Volume:

28

Issue:

Spring

Starting Page Number:

213

Year:

2026

Preview:

This Essay explores two historical events—the exposure of the Daigo Fukuryū Maru (Lucky Dragon #5) to nuclear fallout from a U.S. thermonuclear bomb test in the Pacific Ocean and the contamination of the Cumbrian Fells in the United Kingdom as a result of the nuclear explosion at the Chernobyl disaster— to better understand what, if anything, can the history of technoscientific advising in policymaking contexts teach scholars about technical expertise in policymaking today? The Essay then teases out three lessons. First, expertise in political contexts is never unmediated, meaning that technical expertise should be understood as filtered through social, political economic, and other kinds of biases. Second, informational technologies are multifaceted sociotechnical systems such that giving one form of expertise a privilege over decision-making is a recipe for skewed policymaking. Third, sociotechnical systems operating in the physical world are subject to acute and irresolvable indeterminacies that make the kind of reduction to numbers preferred by technical expertise inappropriate. Sociolegal scholars working in law and technology should consider these lessons in context.

Abstract:

This Essay explores two historical events—the exposure of the Daigo Fukuryū Maru (Lucky Dragon #5) to nuclear fallout from a U.S. thermonuclear bomb test in the Pacific Ocean and the contamination of the Cumbrian Fells in the United Kingdom as a result of the nuclear explosion at the Chernobyl disaster— to better understand what, if anything, can the history of technoscientific advising in policymaking contexts teach scholars about technical expertise in policymaking today? The Essay then teases out three lessons.

waldman_28yalejltech213.pdf

Public Utility for What? Governing AI Datastructures

Authors:

Julie E. Cohen

Volume:

28

Issue:

Spring

Starting Page Number:

135

Preview:

Both in the U.S. and in Europe, initiatives for AI governance have focused principally on identifying and mitigating the risks created by AI models and their downstream uses rather than on those created by the datasets on which the models are trained. However, some of the most intractable dysfunctions of generative AI systems involve datasets. In particular, the very large datasets amassed by dominant providers of generative AI and related services are rapidly taking on infrastructural characteristics and importance. Effective AI governance therefore requires an infrastructural turn in thinking about data. First, the Article explains the significance of the infrastructure lens and sketches some of the distinctive implications of data infrastructures, in particular, for governance of networked digital processes and the social and economic activities that they facilitate. Next, it explores two interrelated problems manifesting within generative AI systems—simulation and sociopathy—that illustrate the extent to which the project of AI governance is, unavoidably, a data governance project. In brief, generative AI models trained on mass content from the open internet are also trained on data infrastructures that have been developed for behaviorist, extractive purposes and that encourage the production and spread of particular kinds of content and particular styles of communication. Last, the article considers whether the concept of public utility, now the subject of growing interest among legal scholars who study regulated industries, might supply a possible foundation for tackling the data governance problems associated with generative AI systems. The public utility model, however, addresses only some of the considerations that the infrastructure lens highlights. It is highly attuned to questions about access to infrastructures and their outputs but relatively insensitive to questions about infrastructure configuration and input sourcing. The problems of simulation and sociopathy belong in the latter category.

Abstract:

Both in the U.S. and in Europe, initiatives for AI governance have focused principally on identifying and mitigating the risks created by AI models and their downstream uses rather than on those created by the datasets on which the models are trained. However, some of the most intractable dysfunctions of generative AI systems involve datasets. In particular, the very large datasets amassed by dominant providers of generative AI and related services are rapidly taking on infrastructural characteristics and importance.

cohen_28yalejltech135.pdf

Online Age Gating: An Interdisciplinary Evaluation

Authors:

Noah Apthorpe, Brett Frischmann, Yan Shvartzshnaider

Volume:

28

Issue:

Spring

Starting Page Number:

66

Year:

2026

Preview:

The recent surge in regulation seeking to establish age-based governance online is part of a decades-long attempt to establish online zoning. It is driven by active development of technologies to estimate or verify user age based on various characteristics of users, their credentials, or their activities. However, these developments have heightened prevailing concerns that online age gating technology will inevitably be abused and misused to cause a variety of privacy harms and rights infringements. This paper examines this ongoing debate by bridging technical and legal scholarship to explore the current state of online age-based governance. We discuss the current legal and policy landscape, the current status of online age gating technologies, and provide recommendations to guide legal and technological scholarship and practice. Our interdisciplinary assessment is particularly important and timely, given the recent flurry of state and federal laws that aim to implement age gating online and ongoing litigation challenging such laws.

Abstract:

The recent surge in regulation seeking to establish age-based governance online is part of a decades-long attempt to establish online zoning. It is driven by active development of technologies to estimate or verify user age based on various characteristics of users, their credentials, or their activities. However, these developments have heightened prevailing concerns that online age gating technology will inevitably be abused and misused to cause a variety of privacy harms and rights infringements.

apthorpe_frischmann_shvartzshnaider_28yalejltech66.pdf

You are here

Volume 28 - Symposium Issue