28 Yale J.L. & Tech. 135
Both in the U.S. and in Europe, initiatives for AI governance have focused principally on identifying and mitigating the risks created by AI models and their downstream uses rather than on those created by the datasets on which the models are trained. However, some of the most intractable dysfunctions of generative AI systems involve datasets. In particular, the very large datasets amassed by dominant providers of generative AI and related services are rapidly taking on infrastructural characteristics and importance. Effective AI governance therefore requires an infrastructural turn in thinking about data. First, the Article explains the significance of the infrastructure lens and sketches some of the distinctive implications of data infrastructures, in particular, for governance of networked digital processes and the social and economic activities that they facilitate. Next, it explores two interrelated problems manifesting within generative AI systems—simulation and sociopathy—that illustrate the extent to which the project of AI governance is, unavoidably, a data governance project. In brief, generative AI models trained on mass content from the open internet are also trained on data infrastructures that have been developed for behaviorist, extractive purposes and that encourage the production and spread of particular kinds of content and particular styles of communication. Last, the article considers whether the concept of public utility, now the subject of growing interest among legal scholars who study regulated industries, might supply a possible foundation for tackling the data governance problems associated with generative AI systems. The public utility model, however, addresses only some of the considerations that the infrastructure lens highlights. It is highly attuned to questions about access to infrastructures and their outputs but relatively insensitive to questions about infrastructure configuration and input sourcing. The problems of simulation and sociopathy belong in the latter category.