LAION-5B is a dataset with 5.8 billion image and text pairs—too large to make sense of. It is an open-source foundation dataset that trains AI models such as Stable Diffusion. In collaboration with the Knowing Machines Project at New York University, Der SPIEGEL and Paper Trail Media investigated this dataset, which is meant to give machines a comprehensive representation of the world to build a vocabulary of things and concepts. In this talk, Christo Buschek traces the construction of the dataset to better understand its contents, implications, and entanglements. He will show the curatorial mechanisms chosen to construct the dataset and how those mechanisms propagate biases of other machine learning models and datasets and structural biases of the AI field itself. Investments in AI systems are in the trillions, and those systems are deployed at a neck-break speed. To be able to investigate the datasets behind them is an essential tool for journalists to holistically interrogate AI systems—the data, the models, and the emerging effects of AI.