|
on Big Data |
| By: | Wayne Gao; Sukjin Han; Annie Liang |
| Abstract: | Large language models (LLMs) are increasingly used to predict human behavior. We propose a measure for evaluating how much knowledge a pretrained LLM brings to such a prediction: its equivalent sample size, defined as the amount of task-specific data needed to match the predictive accuracy of the LLM. We estimate this measure by comparing the prediction error of a fixed LLM in a given domain to that of flexible machine learning models trained on increasing samples of domain-specific data. We further provide a statistical inference procedure by developing a new asymptotic theory for cross-validated prediction error. Finally, we apply this method to the Panel Study of Income Dynamics. We find that LLMs encode considerable predictive information for some economic variables but much less for others, suggesting that their value as substitutes for domain-specific data differs markedly across settings. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2601.12343 |
| By: | Leland D. Crane; Xiaoyu Ge; Flora Haberkorn; Rithika Iyengar; Seung Jung Lee; Viviana Luccioli; Ryan Panley; Nitish R. Sinha |
| Abstract: | Large Language Models (LLMs) are highly accurate in classification tasks, however, substantial computational and financial costs hinder their large-scale deployment in dynamic environments. Knowledge Distillation (KD) where a LLM ""teacher"" trains a smaller and more efficient ""student"" model, offers a promising solution to this problem. However, the distillation process itself often remains costly for large datasets, since it requires the teacher to label a vast number of samples while incurring significant token consumption. To alleviate this challenge, in this work we explore the active learning (AL) as a way to create efficient student models at a fraction of the cost while preserving the LLM's performance. In particular, we introduce M-RARU (Multi-class Randomized Accept/Reject Uncertainty Sampling), a novel AL algorithm that significantly reduces training costs. M-RARU employs an innovative strategy combining uncertainty with a randomized accept-reject mechanism to select only the most informative data points for the LLM teacher. This focused approach significantly minimizes required API calls and data processing time. We evaluate M-RARU against random sampling across five diverse student models (SVM, LDA, RF, GBDT, and DistilBERT) on multiple benchmark datasets. Experiments demonstrate that our proposed method achieves up to 80\% reduction in sample requirements as compared to random sampling, substantially improving classification accuracy while reducing financial costs and overall training time. |
| Keywords: | Machine learning; Sampling; Computational techniques |
| JEL: | C38 C45 C55 |
| Date: | 2025–12–15 |
| URL: | https://d.repec.org/n?u=RePEc:fip:fedgfe:102367 |
| By: | Shengwu Du; Flora Haberkorn; Isabel Kitschelt; Seung Jung Lee; Anderson Monken; Dylan Saez; Kelsey Shipman; Sandeep Thakur |
| Abstract: | We apply various natural language processing tools to see if the Beige Book is helpful in understanding economic activity. The Beige Book is a gathering of anecdotal compilations of current economic conditions from each Federal Reserve Bank, which is released to the public prior to FOMC meetings. We find that even controlling for lagged GDP growth and other metrics, the Beige Book sentiment provides meaningful explanatory power in nowcasting GDP growth and forecasting recessions, even more so than the yield spread or other news sentiment measures. The results on economic activity even hold in regional panel analysis. The Beige Book offers many more insights on the economy that can be gathered from even simple keyword tabulations. Topic modeling can also inform us about the different factors driving the narrative across particular periods of interest. |
| Keywords: | Now-casting; Business fluctuations and cycles; Recessions; Sentiment |
| JEL: | E32 E37 |
| Date: | 2026–01–15 |
| URL: | https://d.repec.org/n?u=RePEc:fip:fedgfe:102374 |
| By: | Paker, Meredith; Stephenson, Judy; Wallis, Patrick |
| Abstract: | Understanding long-run economic growth requires reliable historical data, yet the vast majority of long-run economic time series are drawn from incomplete records with significant temporal and geographic gaps. Conventional solutions to these gaps rely on linear regressions that risk bias or overfitting when data are scarce. We introduce “past predictive modeling, ” a framework that leverages machine learning and out-of-sample predictive modeling techniques to reconstruct representative historical time series from scarce data. Validating our approach using nominal wage data from England, 1300-1900, we show that this new method leads to more accurate and generalizable estimates, with bootstrapped standard errors 72% lower than benchmark linear regressions. Beyond just bettering accuracy, these improved wage estimates for England yield new insights into the impact of the Black Death on inequality, the economic geography of pre-industrial growth, and productivity over the long-run. |
| Keywords: | machine learning; predictive modeling; wages; black death; industrial revolution |
| JEL: | J31 C53 N33 N13 N63 |
| Date: | 2025–06–13 |
| URL: | https://d.repec.org/n?u=RePEc:ehl:wpaper:128852 |
| By: | Pietro Bini; Lin William Cong; Xing Huang; Lawrence J. Jin |
| Abstract: | Do generative AI models, particularly large language models (LLMs), exhibit systematic behavioral biases in economic and financial decisions? If so, how can these biases be mitigated? Drawing on the cognitive psychology and experimental economics literatures, we conduct the most comprehensive set of experiments to date—originally designed to document human biases—on prominent LLM families across model versions and scales. We document systematic patterns in LLM behavior. In preference-based tasks, responses become more human-like as models become more advanced or larger, while in belief-based tasks, advanced large-scale models frequently generate rational responses. Prompting LLMs to make rational decisions reduces biases. |
| JEL: | D03 G02 G11 G4 G40 G41 |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:34745 |
| By: | J. Ignacio Conde-Ruiz; Miguel Díaz Salazar; Juan José Ganuza |
| Abstract: | This paper combines artificial intelligence with economic modeling to design evaluation committees that are both efficient and fair in the presence of gender differences in economic research orientation. We develop a dynamic framework in which research evaluation depends on the thematic similarity between evaluators and researchers. The model shows that while topic balanced committees maximize welfare, this researchneutral-gender allocation is dynamically unstable, leading to the persistent dominance of the group initially overrepresented in evaluation committees. Guided by these predictions, we employ unsupervised machine learning to extract research profiles for male and female researchers from articles published in leading economics journals between 2000 and 2025. We characterize optimal balanced committees within this multidimensional latent topic space and introduce the Gender-Topic Alignment Index (GTAI) to measure the alignment between committee expertise and female-prevalent research areas. Our simulations demonstrate that AI-based committee designs closely approximate the welfare-maximizing benchmark. In contrast, traditional headcount-based quotas often fail to achieve balance and may even disadvantage the groups they intend to support. We conclude that AI-based tools can significantly optimize institutional design for editorial boards, tenure committees, and grant panels. |
| Keywords: | machine learning, artificial intelligence, Topic Modeling, evaluation committees, committee quotas, research orientation |
| JEL: | D72 D82 J16 J78 |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:upf:upfgen:1937 |