Working Papers

Network Competition and Exclusive Contracts: Evidence from News Agencies [paper]

[abstract] This paper studies exclusive vertical contracts in network industries and asks whether exclusive arrangements intended to be anti-competitive in one market segment can be pro-competitive in another. The setting is news agencies in the early 20th-century United States, which historically operated with exclusive territory contracts intended to create local newspaper monopolies. I examine whether exclusive territories granted by the Associated Press (AP) to member newspapers inadvertently created demand for and facilitated the growth of the AP's primary rival, United Press (UP). I introduce a model that captures the demand for news agencies, newspaper entry, and news agency network formation. I estimate the model using a unique dataset that includes news agencies' subscriptions, costs, and physical maps of their networks over time. I find that economies of scale and network effects form considerable natural barriers to entry for news agencies. Counterfactual simulations show that UP likely would have exited if AP exclusive territory contracts were illegal. In contrast, contracts that require AP newspapers to subscribe exclusively to the AP would have weakened UP as well as incumbent AP newspapers that can no longer bundle content from both news agencies.


American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers (w/ Melissa Dell, Jacob Carlson, Tom Bryan, Emily Silcock, Abhishek Arora, Zejiang Shen, Luca D’Amico-Wong, Pablo Querubín, and Leander Heldring) [paper] [dataset] [GitHub]

Neural Information Processing Systems — Datasets and Benchmarks, 2023

[abstract] Existing full text datasets of U.S. public domain newspapers do not recognize the often complex layouts of newspaper scans, and as a result the digitized content scrambles texts from articles, headlines, captions, advertisements, and other layout regions. OCR quality can also be low. This study develops a novel, deep learning pipeline for extracting full article texts from newspaper images and applies it to the nearly 20 million scans in Library of Congress's public domain Chronicling America collection. The pipeline includes layout detection, legibility classification, custom OCR, and association of article texts spanning multiple bounding boxes. To achieve high scalability, it is built with efficient architectures designed for mobile phones. The resulting American Stories dataset provides high quality data that could be used for pre-training a large language model to achieve better understanding of historical English and historical world knowledge. The dataset could also be added to the external database of a retrieval-augmented language model to make historical information - ranging from interpretations of political events to minutiae about the lives of people's ancestors - more widely accessible. Furthermore, structured article texts facilitate using transformer-based methods for popular social science applications like topic classification, detection of reproduced content, and news story clustering. Finally, American Stories provides a massive silver quality dataset for innovating multimodal layout analysis models and other multimodal applications.

Work in Progress

Urban Migration, Public Health Amenities, and Local Newspapers in 1870-1940 U.S. (w/ Anaïs Galdin) draft coming soon, slides available by email

[abstract] This study investigates newspapers' impact on the migratory decisions of rural households in the 1870-1940 United States, emphasizing the role of information in shaping migration decisions. Contributing to the urban and health economics literature, we offer new insights about how information, specifically newspaper portrayals of public health advancements like water filtration and sewage systems, shapes rural-urban migration patterns. We show that access to information about urban health conditions through newspapers played a crucial role in encouraging rural households to move to cities over the turn of the 19th century. By exploiting a novel linkage between full-count U.S. Census data and a unique historical newspaper dataset at the county level, which leverages text-mining and Natural Language Processing (NLP) algorithms, we provide novel evidence that rural households responded to newspaper narratives on public health investments and typhoid occurrences. Rural migrants with access to information about public health investments migrated in higher proportion to sanitation-adopting cities compared to rural migrants with no access to such information, and avoided cities affected by pandemics more than their non-informed counterparts. This finding is consistent with the literature that suggests that such investments significantly reduced mortality from waterborne diseases (typhoid and diphtheria) and improved quality of life, making cities more attractive places to live. As policymakers consider strategies to revitalise urban areas in the post-pandemic era, our study highlights the potential importance of information dissemination in promoting urban growth and development.

Publisher Multi-homing in Digital Advertising Markets: Evidence from Websites and Mobile Applications (w/ So Hye Yoon and Jie Zhou)