Culture and
Computation Lab

Welcome! The Culture and Computation Lab is a lab at Cornell University specializing in the intersection of the arts, social sciences, and engineering. Our work builds on research in cultural analytics, natural language processing, digital humanities, legal studies, etc.

Read more about our work here, and see our members here.

2025-10-10

NarraBench: A Comprehensive Framework for Narrative Benchmarking

Sil Hamilton, Matthew Wilkens, Andrew Piper

2025-08-02

Show or Tell? Modeling the evolution of request-making in Human-LLM conversations

Shengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno

arXiv.org, 2025

2025-07-26

Are You There God? Lightweight Narrative Annotation of Christian Fiction with LMs

Rebecca M. M. Hicke, Brian Haggard, Mia Ferrante, Rayhan Khanna, David Mimno

arXiv.org, 2025

2025-05-20

Too Long, Didn't Model: Decomposing LLM Long-Context Understanding With Novels

Sil Hamilton, Rebecca M. M. Hicke, Matthew Wilkens, David Mimno

arXiv.org, 2025

2025-05-20

Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning

Yusuf Denizay Donder, Derek Hommel, Andrea Wen-Yi Wang, David Mimno, Unso Eun Seo Jo

arXiv.org, 2025

2025-04-25

Data Paradigms in the Era of LLMs: On the Opportunities and Challenges of Qualitative Data in the WILD

Shengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno

CHI Extended Abstracts, 2025

2025-04-08

The Zero Body Problem: Probing LLM Use of Sensory Language

Rebecca M. M. Hicke, Sil Hamilton, David Mimno

arXiv.org, 2025

2025-04-02

Tasks and Roles in Legal AI: Data Curation, Annotation, and Verification

Allison Koenecke, Edward H. Stiglitz, David Mimno, Matthew Wilkens

arXiv.org, 2025

2025-03-31

Endometriosis Communities on Reddit: Quantitative Analysis

Federica Bologna, Rosamond Thalken, Kristen Pepin, Matthew Wilkens

Journal of Medical Internet Research, 2025

2025-03-31

Do Chinese models speak Chinese languages?

Andrea Wen-Yi Wang, Unso Eun Seo Jo, David Mimno

arXiv.org, 2025

2025-02-26

A City of Millions: Mapping Literary Social Networks At Scale

Sil Hamilton, Rebecca M. M. Hicke, David Mimno, Matthew Wilkens

Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities, 2025

2025-02-20
2025-02-05

Looking for the Inner Music: Probing LLMs' Understanding of Literary Style

Rebecca M. M. Hicke, David M. Mimno

Computational Humanities Research, 2025

2025-01-01

Lost in Space: Optimizing Tokens for Grammar-Constrained Decoding

Sil Hamilton, David Mimno

arXiv.org, 2025

2024-10-16

Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media

R. Kristensen-Mclachlan, Rebecca M. M. Hicke, Márton Kardos, M. Thunø

Workshop on Computational Humanities Research, 2024

2024-10-13

Quilt: Custom UIs for Linking Unstructured Documents to Structured Datasets

Pragya Kallanagoudar, Chithra Anand, Rolando Garcia, Rebecca M. M. Hicke, Aditya G. Parameswaran, Eunice Jun, Sarah E. Chasins

ACM Symposium on User Interface Software and Technology, 2024

2024-10-11

SCIENCE IS EXPLORATION: Computational Frontiers for Conceptual Metaphor Theory

Rebecca M. M. Hicke, R. Kristensen-Mclachlan

Workshop on Computational Humanities Research, 2024

2024-09-29

Judicial self fashioning: Rhetorical performance in Supreme Court opinions

Rosamond Thalken, David M. Mimno, Matthew Wilkens

Discourse Studies, 2024

2024-09-17

Says Who? Effective Zero-Shot Annotation of Focalization

Rebecca M. M. Hicke, Yuri Bizzoni, P. Moreira, R. Kristensen-Mclachlan

arXiv.org, 2024

2024-08-01

Endometriosis Online Communities: How Machine Learning Can Help Physicians Understand What Patients Are Discussing Online.

Kristen Pepin, Federica Bologna, Rosamond Thalken, Matthew Wilkens

Journal of minimally invasive gynecology, 2024

2024-07-17

Automate or Assist? The Role of Computational Models in Identifying Gendered Discourse in US Capital Trial Transcripts

Andrea Wen-Yi Wang, Kathryn Adamson, Nathalie Greenfield, Rachel Goldberg, Sandra Babcock, David Mimno, Allison Koenecke

AAAI/ACM Conference on AI, Ethics, and Society, 2024

2024-07-12

How Chinese are Chinese Language Models? The Puzzling Lack of Language Policy in China's LLMs

Andrea Wen-Yi Wang, Unso Eun Seo Jo, Lu Jia Lin, David Mimno

arXiv.org, 2024

2024-07-02

What We Talk About When We Talk About LMs: Implicit Paradigm Shifts and the Ship of Language Models

Shengqi Zhu, Jeffrey M. Rzeszotarski

North American Chapter of the Association for Computational Linguistics, 2024

2024-04-19

Stronger Random Baselines for In-Context Learning

Gregory Yauney, David M. Mimno

arXiv.org, 2024

2024-02-29

Endometriosis Online Communities: A Quantitative Analysis

Federica Bologna, MS Rosamond Thalken, MS Kristen Pepin, Mph Matthew Wilkens Md

medRxiv, 2024

2024-02-06
2024-01-31
2024-01-14

The Afterlives of Shakespeare and Company in Online Social Readership

Maria Antoniak, David M. Mimno, Rosamond Thalken, Melanie Walsh, Matthew Wilkens, Gregory Yauney

Journal of Cultural Analytics, 2024

2024-01-01

Contextualized Topic Coherence Metrics

Hamed Rahimi, David M. Mimno, Jacob Louis Hoover, Hubert Naacke, Camélia Constantin, Bernd Amann

Findings, 2024

2024-01-01

“Get Their Hands Dirty, Not Mine”: On Researcher-Annotator Collaboration and the Agency of Annotators

Shengqi Zhu, Jeffrey M. Rzeszotarski

Findings of the Association for Computational Linguistics ACL 2024

2023-11-29

Hyperpolyglot LLMs: Cross-Lingual Interpretability in Token Embeddings

Andrea Wen-Yi Wang, David Mimno

Conference on Empirical Methods in Natural Language Processing, 2023

2023-11-15

Data Similarity is Not Enough to Explain Language Model Performance

Gregory Yauney, Emily Reif, David M. Mimno

Conference on Empirical Methods in Natural Language Processing, 2023

2023-10-27

Modeling Legal Reasoning: LM Annotation at the Edge of Human Agreement

Rosamond Thalken, Edward H. Stiglitz, David M. Mimno, Matthew Wilkens

Conference on Empirical Methods in Natural Language Processing, 2023

2023-10-27

T5 meets Tybalt: Author Attribution in Early Modern English Drama Using Large Language Models

Rebecca M. M. Hicke, David M. Mimno

Workshop on Computational Humanities Research, 2023

2023-07-07

Deep distant reading: The rise of realism in Scandinavian literature as a case study

Jens Bjerring-Hansen, Matthew Wilkens

Orbis Litterarum, 2023

2023-07-06

Establishing Connectivity and Trust in High Schools During COVID-19

Lisa De Leon, Matthew Wilkens

The Scholarship Without Borders Journal, 2023

2023-05-28

More than Classification: A Unified Framework for Event Temporal Relation Extraction

Quzhe Huang, Yutong Hu, Shengqi Zhu, Yansong Feng, Chang Liu, Dongyan Zhao

Annual Meeting of the Association for Computational Linguistics, 2023

2023-05-27

Grounding Characters and Places in Narrative Text

Sandeep Soni, Amanpreet Sihra, Elizabeth F. Evans, Matthew Wilkens, David Bamman

Annual Meeting of the Association for Computational Linguistics, 2023

2023-05-22

A Pretrainer’s Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

S. Longpre, Gregory Yauney, Emily Reif, Katherine Lee, Adam Roberts, Barret Zoph, Denny Zhou, Jason Wei, Kevin Robinson, David M. Mimno, Daphne Ippolito

North American Chapter of the Association for Computational Linguistics, 2023

2023-01-23

Sensemaking About Contraceptive Methods Across Online Platforms

LeAnn McDowall, Maria Antoniak, David M. Mimno

International Conference on Web and Social Media, 2023

2023-01-12
2023-01-01

MultiHATHI: A Complete Collection of Multilingual Prose Fiction in the HathiTrust Digital Library

S. Hamilton, Andrew Piper

Journal of Open Humanities Data, 2023

2023-01-01

Mrs. Dalloway Said She Would Segment the Chapters Herself

Peiqi Sui, Lin Wang, S. Hamilton, Thorsten Ries, Kelvin Wong, Stephen Wong

WNU, 2023

2023-01-01

Large Language Models and NER: better results with less work

Rosamond Thalken, Matthew Wilkens, David M. Mimno

Digital Humanities Conference, 2023

2023-01-01

The Chatbot and the Canon: Poetry Memorization in LLMs

Lyra D'Souza, David Mimno

Workshop on Computational Humanities Research, 2023

2022-10-13

The COVID That Wasn’t: Counterfactual Journalism Using GPT

S. Hamilton, Andrew Piper

LATECHCLFL, 2022

2022-10-07

Breaking BERT: Evaluating and Optimizing Sparsified Attention

Siddhartha Brahma, Polina Zablotskaia, David M. Mimno

arXiv.org, 2022

2022-10-05

Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

Jacob Eisenstein, D. Andor, Bernd Bohnet, Michael Collins, David M. Mimno

arXiv.org, 2022

2022-10-01

Word Clouds in the Wild

Rebecca M. M. Hicke, Maanya Goenka, E. Alexander

2022 IEEE 7th Workshop on Visualization for the Digital Humanities (VIS4DH), 2022

2022-04-17

Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED

Quzhe Huang, Shibo Hao, Yuan Ye, Shengqi Zhu, Yansong Feng, Dongyan Zhao

Annual Meeting of the Association for Computational Linguistics, 2022

2021-11-12

On-the-fly Rectification for Robust Large-Vocabulary Topic Inference

Moontae Lee, Sungjun Cho, Kun Dong, David M. Mimno, D. Bindel

International Conference on Machine Learning, 2021

2021-10-05

Open bibliographic data and the Italian National Scientific Qualification: Measuring coverage of academic fields

Federica Bologna, A. Iorio, S. Peroni, Francesco Poggi

Quantitative Science Studies, 2021

2021-09-22

‘Tecnologica cosa’: Modeling Storyteller Personalities in Boccaccio’s ‘Decameron’

A. Feder Cooper, Maria Antoniak, Christopher De Sa, Marilyn Migiel, David M. Mimno

LATECHCLFL, 2021

2021-09-15

Comparing Text Representations: A Theory-Driven Approach

Gregory Yauney, David M. Mimno

Conference on Empirical Methods in Natural Language Processing, 2021

2021-07-09

Reply to Linton: Perspectival interference up close

Jorge Morales, Axel Bax, C. Firestone

Proceedings of the National Academy of Sciences of the United States of America, 2021

2021-06-30

Too isolated, too insular: American Literature and the World

Matthew Wilkens

Journal of Cultural Analytics, 2021

2021-06-10

Academics evaluating academics: a methodology to inform the review process on top of open citations

Federica Bologna, A. Iorio, S. Peroni, Francesco Poggi

arXiv.org, 2021

2021-06-03

Exploring Distantly-Labeled Rationales in Neural Network Models

Quzhe Huang, Shengqi Zhu, Yansong Feng, Dongyan Zhao

Annual Meeting of the Association for Computational Linguistics, 2021

2021-06-03

Three Sentences Are All You Need: Local Path Enhanced Document Relation Extraction

Quzhe Huang, Shengqi Zhu, Yansong Feng, Yuan Ye, Yuxuan Lai, Dongyan Zhao

Annual Meeting of the Association for Computational Linguistics, 2021

2021-06-01

Separating the wheat from the chaff: A topic and keyword-based procedure for identifying research-relevant text*✰

Alicia Eads, Alexandra Schofield, Fauna Mahootian, David M. Mimno, Rens Wilderom

Poetics, 2021

2021-05-18
2021-03-14
2021-01-01
2020-11-23
2020-10-30

Finding Domain-Specific Grounding in Noisy Visual-Textual Documents

Gregory Yauney, Jack Hessel, David M. Mimno

Conference on Empirical Methods in Natural Language Processing, 2020

2020-10-23

Topic Modeling with Contextualized Word Representation Clusters

Laure Thompson, David M. Mimno

arXiv.org, 2020

2020-07-08
2020-06-12

Sustained representation of perspectival shape

Jorge Morales, Axel Bax, C. Firestone

Proceedings of the National Academy of Sciences of the United States of America, 2020

2020-02-05

Imagined Examples of Painful Experiences Provided by Chronic Low Back Pain Patients and Attributed a Pain Numerical Rating Score

R. Griffin, Maria Antoniak, P. D. Mac, Vladimir N. Kramskiy, S. Waldman, David M. Mimno

Frontiers in Neuroscience, 2020

2020-01-11

Making, Preserving, and Curating Born-Digital Literature

Anastasia Salter, Marjorie C. Luesebrink, Dene Grigar, Leonardo Flores, Julian Ankney, Nicholas Binford, Kathryn Manis, Ricardo A. Ramirez, Troy Rowden, R. Snyder, Rosamond Thalken, N. Idris

2020-01-01

Correlation

Matthew L. Jockers, Rosamond Thalken

Text Analysis with R, 2020

2020-01-01

Token Distribution Analysis

Matthew L. Jockers, Rosamond Thalken

Text Analysis with R, 2020

2020-01-01

Parsing TEI XML

Matthew L. Jockers, Rosamond Thalken

2020-01-01

Measures of Lexical Variety

Matthew L. Jockers, Rosamond Thalken

Text Analysis with R, 2020

2020-01-01

Parsing and Analyzing Hamlet

Matthew L. Jockers, Rosamond Thalken

2020-01-01

Introduction to dplyr

Matthew L. Jockers, Rosamond Thalken

2020-01-01

Do It KWIC(er) (and Better)

Matthew L. Jockers, Rosamond Thalken

2020-01-01

Hapax Richness

Matthew L. Jockers, Rosamond Thalken

Text Analysis with R, 2020

2020-01-01

Text Analysis with R

Matthew L. Jockers, Rosamond Thalken

Quantitative Methods in the Humanities and Social Sciences, 2020

2020-01-01

Part of Speech Tagging and Named Entity Recognition

Matthew L. Jockers, Rosamond Thalken

2020-01-01

Classification

Matthew L. Jockers, Rosamond Thalken

Text Analysis with R, 2020

2020-01-01

R Basics

Matthew L. Jockers, Rosamond Thalken

Text Analysis with R, 2020

2020-01-01

Network Analysis Finds Shifts in the History of Modern Architecture

Gregory Yauney, David M. Mimno

Digital Humanities Conference, 2020

2020-01-01

Token Distribution and Regular Expressions

Matthew L. Jockers, Rosamond Thalken

2020-01-01

Do It KWIC

Matthew L. Jockers, Rosamond Thalken

Text Analysis with R, 2020

2020-01-01

Sentiment Analysis

Matthew L. Jockers, Rosamond Thalken

Text Analysis with R, 2020

2020-01-01

Prior-aware Composition Inference for Spectral Topic Models

Moontae Lee, D. Bindel, David M. Mimno

International Conference on Artificial Intelligence and Statistics, 2020

2020-01-01

Topic Modeling

Matthew L. Jockers, Rosamond Thalken

Text Analysis with R, 2020

2020-01-01

First Foray into Text Analysis with R

Matthew L. Jockers, Rosamond Thalken

Text Analysis with R, 2020

2020-01-01

Replication and Computational Literary Studies

Christof Schöch, K. Dalen-Oskam, Maria Antoniak, Fotis Jannidis, David M. Mimno

Digital Humanities Conference, 2020

2020-01-01

Accessing and Comparing Word Frequency Data

Matthew L. Jockers, Rosamond Thalken

Text Analysis with R, 2020

2020-01-01

Constructing and Analyzing Short Science Fiction at Scale

Laure Thompson, David M. Mimno

Digital Humanities Conference, 2020

2019-11-07

Narrative Paths and Negotiation of Power in Birth Stories

Maria Antoniak, David M. Mimno, K. Levy

Proc. ACM Hum. Comput. Interact., 2019

2019-11-01

Practical Correlated Topic Modeling and Analysis via the Rectified Anchor Word Algorithm

Moontae Lee, Sungjun Cho, D. Bindel, David M. Mimno

Conference on Empirical Methods in Natural Language Processing, 2019

2019-11-01

The Cultural Economies of Digital Books

Matthew Wilkens

American Literary History, 2019

2019-07-02

How We Do Things With Words: Analyzing Text as Social and Cultural Data

D. Nguyen, Maria Liakata, S. Dedeo, Jacob Eisenstein, David Mimno, Rebekah Tromble, J. Winters

Frontiers in Artificial Intelligence, 2019

2019-07-01

Boosted negative sampling by quadratically constrained entropy maximization

Taygun Kekeç, David M. Mimno, D. Tax

Pattern Recognition Letters, 2019

2019-01-01

Computational Prediction of Elapsed Narrative Time

Gregory Yauney, T. Underwood, David M. Mimno

2019-01-01

Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents

Jack Hessel, Lillian Lee, David M. Mimno

Conference on Empirical Methods in Natural Language Processing, 2019