nep-cmp New Economics Papers
on Computational Economics
Issue of 2026–03–23
twenty papers chosen by
Stan Miles, Thompson Rivers University


  1. Applying generative adversarial networks to generate synthetic train trip data for train delay prediction By Hauck, Florian; Güth, Albrecht; Kliewer, Natalia; Rößler-von Saß, David
  2. How does AI Distribute the pie? Large Language Models and the Ultimatum Game. By Douglas K.G. Araujo; Harald Uhlig
  3. An Introduction to Double/Debiased Machine Learning By Ahrens, Achim; Chernozhukov, Victor; Hansen, Christian; Kozbur, Damian; Schaffer, Mark; Wiemann, Thomas
  4. HLER: Human-in-the-Loop Economic Research via Multi-Agent Pipelines for Empirical Discovery By Chen Zhu; Xiaolu Wang
  5. Calibrated Credit Intelligence: Shift-Robust and Fair Risk Scoring with Bayesian Uncertainty and Gradient Boosting By Srikumar Nayak
  6. Estimating Visual Attribute Effects in Advertising from Observational Data: A Deepfake-Informed Double Machine Learning Approach By Yizhi Liu; Balaji Padmanabhan; Siva Viswanathan
  7. AI for Survey Design: Generating and Evaluating Survey Questions with Large Language Models By Fuchs, Anna; Haensch, Anna-Carolina; Weber, Wiebke
  8. Out of the Black Box: Uncertainty Quantification for LLMs via Conditional Probabilities By Hui Chen; Antoine Didisheim; Luciano A. Somoza
  9. LLM-Agent Interactions on Markets with Information Asymmetries By Alexander Erlei; Lukas Meub
  10. Estimating Demand Shocks from Foot Traffic: A Big-Data Approach By Marina Azzimonti-Renzo; David Wiczer; Yang Xuan
  11. Exploratory Randomization for Discrete-Time Risk-Sensitive Benchmarked Investment Management with Reinforcement Learning By Sebastien Lleo; Wolfgang Runggaldier
  12. THE ALGORITHMIC ALCHEMY: SYNTHESIZING GLOBAL LEGAL FRAMEWORKS FOR ARTIFICIAL INTELLIGENCE IN FINANCIAL SERVICES By Cicilia Anggadewi Harun; Safari Kasiyanto; Camila Amalia; Shinta Fitrianti; Esha Gianne Poetry; Nilasari; Rina Megasari; Naura Pradipta Khairunnis
  13. IDENTIFICATION OF ILLEGAL TRANSACTION PATTERNS IN PAYMENT SYSTEM DATA USING AI/ML: A CASE STUDY ON ONLINE GAMBLING By Renardi Ardiya Bimantoro; Rudy Hardiyanto; Irfan Sampe; Agung Bayu Purwoko; Imam Dwi Kuncoro; Irvan Fadjar R.; Devima Christi M.; Anugerah Mohamad Setiawan; Moh. Mashudi Arif; Mahanani Margani; Dwi Kartika Siregar; Ganang Suryo Anggoro; Melati Pramudyastuti; Farah Hilda Fuad Lubis; Rudy Marhastari; Nurkholisoh Ibnu Aman; Sintia Aurida
  14. What and How Should Urban Planners Learn in the AI Era? Exploring Urban AI Pedagogy from a Pilot Course in Urban Planning Education By Liang, Xiaofan
  15. Difference-in-differences for mediation analysis using double machine learning By Martin Huber; Sarina Joy Oberh\"ansli
  16. Scaling Open-Ended Survey Coding: An LLM Pipeline Where Definitions Do the Heavy Lifting By Soria, Chris
  17. Guidance for the Use of AI in the Meta-Analysis of Economics Research By Nikolai Cook, František Bartoš, Pedro R. D. Bom, Sebastian Gechert, Klára Kantová, Jerome Geyer-Klingeberg, Tomáš Havránek, Zuzana Irsova, Martina Luskova, MatÄ›j Opatrnı, Franz Prante, Heiko J. Rachinger, T. D. Stanley
  18. Predicting University Dropouts: Evidence on the Value of Student Expectations and Motivation By Epper, Thomas; Ibsen, Kristoffer; Koch, Alexander; Nafziger, Julia
  19. Introducing BISTRO: a foundational model for unconditional and conditional forecasting of macroeconomic time series By Batuhan Koyuncu; Byeungchun Kwon; Marco Jacopo Lombardi; Fernando Perez-Cruz; Hyun Song Shin
  20. A Monte Carlo Simulation Framework for University Enrollment Strategy Under Marketing Uncertainty By Hait, Subir

  1. By: Hauck, Florian; Güth, Albrecht; Kliewer, Natalia; Rößler-von Saß, David
    Abstract: This paper examines the possibilities of creating synthetic train trip data with Generative Adversarial Networks (GANs). A real data set from Deutsche Bahn is enhanced with synthetic data created by using a Conditional Wasserstein Generative Adversarial Network (CWGAN). The synthetic data is analyzed and compared with the original data using statistical methods as well as machine learning models. The results show that the synthetic data is very similar to the original data in terms of data structure and dependencies, but at the same time contains enough noise to not just copy already existing instances. To analyze and measure the quality of the synthetic data, different supervised machine learning models are trained to predict the change of delay of trains at a specific station based on the arrival delays of other trains at that station. These models are then each trained once using the real data and once using the real data enhanced by synthetic data. All models are evaluated using a test set containing only real data that was not used to train the models. The results show that the R2 value of delay predictions increases significantly when using the enhanced data set. In particular, neural network-based models can benefit from the larger amount of input data. The proposed approach of generating synthetic train trip data with a CWGAN can also be applied to various other railway data analysis projects that require a large amount of input data. In addition, the presented approach is particularly interesting because, unlike most GAN approaches discussed in current literature, the data basis contains numerical data and not image data.
    Keywords: Generative Adversarial Networks, Train Delay Prediction, Railway Analysis
    Date: 2026
    URL: https://d.repec.org/n?u=RePEc:zbw:fubsbe:338080
  2. By: Douglas K.G. Araujo; Harald Uhlig
    Abstract: As Large Language Models (LLMs) are increasingly tasked with autonomous decision making, understanding their behavior in strategic settings is crucial. We investigate the choices of various LLMs in the Ultimatum Game, a setting where human behavior notably deviates from theoretical rationality. We conduct experiments varying the stake size and the nature of the opponent (Human vs. AI) across both Proposer and Responder roles. Three key results emerge. First, LLM behavior is heterogeneous but predictable when conditioning on stake size and player types. Second, while some models approximate the rational benchmark and others mimic human social preferences, a distinct “altruistic” mode emerges where LLMs propose hyper-fair distributions (greater than 50%). Third, LLM Proposers forgo a large share of total payoff, and an even larger share when the Responder is human. These findings highlight the need for careful testing before deploying AI agents in economic settings.
    JEL: C70 C90 D91
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:nbr:nberwo:34919
  3. By: Ahrens, Achim (CERGE-EI); Chernozhukov, Victor (Massachusetts Institute of Technology); Hansen, Christian (University of Chicago); Kozbur, Damian (University of Zurich); Schaffer, Mark (Heriot-Watt University, Edinburgh); Wiemann, Thomas (University of Chicago)
    Abstract: This paper provides an introduction to Double/Debiased Machine Learning (DML). DML is a general approach to performing inference about a target parameter in the presence of nuisance functions: objects that are needed to identify the target parameter but are not of primary interest. Nuisance functions arise naturally in many settings, such as when controlling for confounding variables or leveraging instruments. The paper describes two biases that arise from nuisance function estimation and explains how DML alleviates these biases. Consequently, DML allows the use of flexible methods, including machine learning tools, for estimating nuisance functions, reducing the dependence on auxiliary functional form assumptions and enabling the use of complex non-tabular data, such as text or images. We illustrate the application of DML through simulations and empirical examples. We conclude with a discussion of recommended practices. A companion website includes additional examples and references to other resources.
    Keywords: causal inference, econometrics, high-dimensional models, machine learning, nonparametric estimation
    JEL: C14 C21 C23 C26
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:iza:izadps:dp18438
  4. By: Chen Zhu; Xiaolu Wang
    Abstract: Large language models (LLMs) have enabled agent-based systems that aim to automate scientific research workflows. Most existing approaches focus on fully autonomous discovery, where AI systems generate research ideas, conduct analyses, and produce manuscripts with minimal human involvement. However, empirical research in economics and the social sciences poses additional constraints: research questions must be grounded in available datasets, identification strategies require careful design, and human judgment remains essential for evaluating economic significance. We introduce HLER (Human-in-the-Loop Economic Research), a multi-agent architecture that supports empirical research automation while preserving critical human oversight. The system orchestrates specialized agents for data auditing, data profiling, hypothesis generation, econometric analysis, manuscript drafting, and automated review. A key design principle is dataset-aware hypothesis generation, where candidate research questions are constrained by dataset structure, variable availability, and distributional diagnostics, reducing infeasible or hallucinated hypotheses. HLER further implements a two-loop architecture: a question quality loop that screens and selects feasible hypotheses, and a research revision loop where automated review triggers re-analysis and manuscript revision. Human decision gates are embedded at key stages, allowing researchers to guide the automated pipeline. Experiments on three empirical datasets show that dataset-aware hypothesis generation produces feasible research questions in 87% of cases (versus 41% under unconstrained generation), while complete empirical manuscripts can be produced at an average API cost of $0.8-$1.5 per run. These results suggest that Human-AI collaborative pipelines may provide a practical path toward scalable empirical research.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.07444
  5. By: Srikumar Nayak
    Abstract: Credit risk scoring must support high-stakes lending decisions where data distributions change over time, probability estimates must be reliable, and group-level fairness is required. While modern machine learning models improve default prediction accuracy, they often produce poorly calibrated scores under distribution shift and may create unfair outcomes when trained without explicit constraints. This paper proposes Calibrated Credit Intelligence (CCI), a deployment-oriented framework that combines (i) a Bayesian neural risk scorer to capture epistemic uncertainty and reduce overconfident errors, (ii) a fairnessconstrained gradient boosting model to control group disparities while preserving strong tabular performance, and (iii) a shiftaware fusion strategy followed by post-hoc probability calibration to stabilize decision thresholds in later time periods. We evaluate CCI on the Home Credit Credit Risk Model Stability benchmark using a time-consistent split to reflect real-world drift. Compared with strong baselines (LightGBM, XGBoost, CatBoost, TabNet, and a standalone Bayesian neural model), CCI achieves the best overall trade-off between discrimination, calibration, stability, and fairness. In particular, CCI reaches an AUC-ROC of 0.912 and an AUC-PR of 0.438, improves operational performance with Recall@1%FPR = 0.509, and reduces calibration error (Brier score 0.087, ECE 0.015). Under temporal shift, CCI shows a smaller AUC-PR drop from early to late periods (0.017), and it lowers group disparities (demographic parity gap 0.046, equal opportunity gap 0.037) compared to unconstrained boosting. These results indicate that CCI produces risk scores that are accurate, reliable, and more equitable under realistic deployment conditions.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.06733
  6. By: Yizhi Liu; Balaji Padmanabhan; Siva Viswanathan
    Abstract: Digital advertising increasingly relies on visual content, yet marketers lack rigorous methods for understanding how specific visual attributes causally affect consumer engagement. This paper addresses a fundamental methodological challenge: estimating causal effects when the treatment, such as a model's skin tone, is an attribute embedded within the image itself. Standard approaches like Double Machine Learning (DML) fail in this setting because vision encoders entangle treatment information with confounding variables, producing severely biased estimates. We develop DICE-DML (Deepfake-Informed Control Encoder for Double Machine Learning), a framework that leverages generative AI to disentangle treatment from confounders. The approach combines three mechanisms: (1) deepfake-generated image pairs that isolate treatment variation; (2) DICE-Diff adversarial learning on paired difference vectors, where background signals cancel to reveal pure treatment fingerprints; and (3) orthogonal projection that geometrically removes treatment-axis components. In simulations with known ground truth, DICE-DML reduces root mean squared error by 73-97% compared to standard DML, with the strongest improvement (97.5%) at the null effect point, demonstrating robust Type I error control. Applying DICE-DML to 232, 089 Instagram influencer posts, we estimate the causal effect of skin tone on engagement. Standard DML produces diagnostically invalid results (negative outcome R^2), while DICE-DML achieves valid confounding control (R^2 = 0.63) and estimates a marginally significant negative effect of darker skin tone (-522 likes; p = 0.062), substantially smaller than the biased standard estimate. Our framework provides a principled approach for causal inference with visual data when treatments and confounders coexist within images.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.02359
  7. By: Fuchs, Anna; Haensch, Anna-Carolina; Weber, Wiebke
    Abstract: Designing survey questions is easy; however designing good survey questions is a complex task. Large language models (LLMs) have the potential to support this task by automating parts of the item-generation process, but their suitability for survey research has not yet been systematically evaluated. Published research in this area remains sparse, and little is known about the quality and characteristics of survey items generated by LLMs or the factors influencing their performance. This work provides the first in-depth analysis of LLM-based survey item generation and systematically evaluates how different design choices affect item quality. Five LLMs, namely GPT-4o, GPT-4o-mini, GPT-oss-20B, LLaMA 3.1 8B, and LLaMA 3.1 70B, were used to generate survey items on four substantive domains: work, living conditions, national politics, and recent politics. We additionally evaluate three prompting strategies: zero-shot, role, and chain-of-thought prompting. To assess the quality of the generated survey items, we use the Survey Quality Predictor (SQP), a tool for estimating the quality of attitudinal survey items based on codings of their formal and linguistic characteristics. To code these characteristics, we used an LLM-assisted procedure. The findings show striking differences in survey item characteristics across the different models and prompting techniques. Both the choice of model and the prompting technique employed influence the quality of LLM-generated survey items. Closed-source GPT models generally produce more consistent items than open-source LLaMA models. Overall, chain-of-thought prompting achieved the best results. GPT-4o, GPT-4o-mini, and LLaMA 3.1 70B achieved similar item quality, while the LLaMA model showed greater variability.
    Date: 2026–03–12
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:fzn7t_v1
  8. By: Hui Chen; Antoine Didisheim; Luciano A. Somoza
    Abstract: Autoregressive LLMs generate text by sampling from estimated probability distributions over the next token, conditional on prior context. We use these probabilities to construct an entropy-based measure of prediction uncertainty, termed inner confidence. In news classification, LLM predictions with higher inner confidence are systematically more accurate. To evaluate the measure's economic relevance, we form long-short portfolios based on LLM predictions. The portfolio based on high-confidence predictions achieves a Sharpe ratio about 20\% higher than the unconditional benchmark, while the one based on low-confidence predictions yields no excess returns. In contrast, self-declared confidence exhibits significant decoding biases and provides no comparable performance gains.
    JEL: C45 C55 G11 G14
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:nbr:nberwo:34965
  9. By: Alexander Erlei; Lukas Meub
    Abstract: As AI agents increasingly act on behalf of human stakeholders in economic settings, understanding their behavior in complex market environments becomes critical. This article examines how Large Language Models coordinate on markets that are characterized by information asymmetries and in which providers of services have incentives to exploit that asymmetry for their own economic gain. To that end, we conduct simulations with GPT-5.1 agents in credence goods markets, manipulating the institutional framework (free market, verifiability, liability), LLM agent's social preferences (default, self-interested, inequity-averse, efficiency-loving), and reputation mechanisms across one-shot and repeated 16-round interactions. In one-shot settings, LLM agents largely fail to establish cooperation, with markets breaking down except under liability rules or when experts have efficiency-loving preferences. Repeated interactions solve consumer participation through competitive price reduction, but expert fraud remains entrenched absent explicit other-regarding preferences. LLM consumers focus narrowly on price levels rather than understanding strategic incentives embedded in markups, making them vulnerable to exploitation. Compared to human experiments, LLM markets exhibit substantially higher consumer participation but much greater market concentration, lower prices, and more polarized fraud patterns. The effect of institutions like verifiability and reputation is also much more ambiguous. Surplus shifts dramatically toward consumers under social-preference objectives. These findings suggest that institutional design for AI agent markets requires fundamentally different approaches than those effective for human actors, with social preference alignment emerging as the primary determinant of market efficiency.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.08853
  10. By: Marina Azzimonti-Renzo; David Wiczer; Yang Xuan
    Abstract: This study leverages high-frequency foot-traffic data from SafeGraph to estimate demand shocks in customer-facing establishments across New York City’s retail, service, and health sectors. Recognizing that variations in foot traffic can arise from both unpredictable demand shocks and firm-driven strategies to attract customers, we present a theoretical framework that isolates establishment-level demand fluctuations from firm-level strategic choices. Implementing this empirically, we employ an unsupervised machine learning approach to classify establishments into distinct categories that are largely orthogonal to location and sector. We find important heterogeneity in the persistence of shocks, important heterogeneity in their trends, and that estimation on a pooled sample importantly understates the variance experienced by some establishments.
    Keywords: Consumer-facing; Brands; Service; Retail Trade; Health; Demand Dy namics; Demand Shocks; Foot Traffic; Big Data; Machine Learning
    Date: 2026–03–20
    URL: https://d.repec.org/n?u=RePEc:fip:fedrwp:102907
  11. By: Sebastien Lleo; Wolfgang Runggaldier
    Abstract: This paper bridges reinforcement learning (RL) and risk-sensitive stochastic control by introducing a tractable exploration mechanism for policy search in risk-sensitive portfolio management, with known and unknown model parameters, that yields an endogenous relative-entropy regularization. We construct a discrete-time risk-sensitive benchmarked investment model. This model combines a factor-based asset universe with periodic portfolio rebalancing. Exploration is incorporated through user-specified Gaussian perturbations to baseline (exploitative) controls. The risk-sensitive stochastic control problem is solved analytically using the Free Energy-Entropy Duality. The Duality recasts the control problem as a linear-quadratic-Gaussian game and introduces a natural penalty for exploration. This approach yields simple sufficiency conditions for optimality. It also induces intuitive bounds on exploration based on risk sensitivity, asset covariance, and rebalancing frequency. Additionally, the optimal investment strategy can be interpreted through the lens of fractional Kelly strategies. By connecting risk-sensitive control theory and RL, this work provides a principled parametric family for policy-gradient implementations, guiding the design of RL methods.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.00738
  12. By: Cicilia Anggadewi Harun (Bank Indonesia); Safari Kasiyanto (Bank Indonesia); Camila Amalia (Bank Indonesia); Shinta Fitrianti (Bank Indonesia); Esha Gianne Poetry (Bank Indonesia); Nilasari (Bank Indonesia); Rina Megasari (Bank Indonesia); Naura Pradipta Khairunnis (Bank Indonesia)
    Abstract: This study examines and recommends regulatory and liability frameworks for the use of artificial intelligence in financial sector. Algorithmic bias, the black-box aspect of AI, data privacy concerns, and unequal treatment are the primary focus of this study. It employs normative, comparative, and empirical juridical analyses by assessing at AIrelated laws and cases, comparing AI governance models across jurisdictions, and undertaking focus group discussions with academics, industry stakeholders, and regulators. For the comparative analyses the study evaluates the regulatory models and AI-related cases in the European Union, the United States, Singapore, Australia, China, and Qatar. The result shows Indonesia should use a hybrid model that begins with an adaptive sandbox phase, moves toward a risk-based framework to balance innovation and responsibility, and subsequently transitioning to a co-regulatory model as AI utilization escalates. Additionally, considering that AI is a non-legal subject, the proposed Clear Box Liability framework puts a strong emphasis on human accountability through proportional liability principles. Furthermore, the FairSight Liability Model strengthens consumer protection, transparency, and effective dispute resolution in AI-driven financial services by integrating fairness and foresight.
    Keywords: Artificial Intelligence, AI Regulatory Framework, AI Bias, Consumer Protection, AI in Financial Services, AI Liability Framework
    JEL: A11 B11 C11 D11 F11
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:idn:wpaper:wp202025
  13. By: Renardi Ardiya Bimantoro (Bank Indonesia); Rudy Hardiyanto (Bank Indonesia); Irfan Sampe (Bank Indonesia); Agung Bayu Purwoko (Bank Indonesia); Imam Dwi Kuncoro (Bank Indonesia); Irvan Fadjar R. (Bank Indonesia); Devima Christi M. (Bank Indonesia); Anugerah Mohamad Setiawan (Bank Indonesia); Moh. Mashudi Arif (Bank Indonesia); Mahanani Margani (Bank Indonesia); Dwi Kartika Siregar (Bank Indonesia); Ganang Suryo Anggoro (Bank Indonesia); Melati Pramudyastuti (Bank Indonesia); Farah Hilda Fuad Lubis (Bank Indonesia); Rudy Marhastari (Bank Indonesia); Nurkholisoh Ibnu Aman (Bank Indonesia); Sintia Aurida (Bank Indonesia)
    Abstract: The transformation of Indonesia’s payment system, driven by BSPI initiatives such as SNAP, QRIS, and BI-FAST, has made digital payments faster, more affordable, and more accessible. However, these advancements can also be misused for illegal activities, specifically online gambling. With transactions projected to grow rapidly from Rp327 trillion in 2023 to Rp900 trillion in 2024, this issue has become a major national financial concern. Beyond eroding public trust, this poses serious social and legal risks. Standard monitoring simply cannot keep up with these shifting threats. To address this, this study proposes an AI-driven Fraud Detection System (FDS). By using a hybrid machine learning approach, combining clustering, classification, and GraphML, we can map out criminal networks and how accounts interconnect. The results indicate that the system identified over 90% of syndicate accounts linked to gamblers. It also cut the time required to flag 1, 000 fraudulent accounts from a week of manual work down to just 30 minutes, while catching three times the volume of fraud. These insights offer a strong basis for creating adaptive, risk-based policies that reinforce the integrity and resilience of Indonesia's payment ecosystem.
    Keywords: AI/Machine Learning, Judi Daring, Sistem Pembayaran, Bank Indonesia, Pengawasan Keuangan, Deteksi Penipuan
    JEL: C55 G18 K42
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:idn:wpaper:wp142025
  14. By: Liang, Xiaofan (University of Michigan)
    Abstract: Artificial Intelligence (AI) promises to transform urban planning research, practice, and education, yet few curricula address “Urban AI”. This paper presents the pedagogical design of a pilot Urban AI course and argues for three meta learning goals: applying AI effectively and appropriately in urban challenges, addressing its social, environmental, and governance impacts, and developing normative judgements and professional identities around AI. Pilot teaching produced a knowledge graph connecting essential skills to these goals and a critical framework for AI use and reflection, grounded in analysis of 235 student reflection journals, alongside course evaluations, syllabus materials, and student projects (https://www.xiaofanliang.com/teaching/) .
    Date: 2026–03–11
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:g9cps_v1
  15. By: Martin Huber; Sarina Joy Oberh\"ansli
    Abstract: We propose a difference-in-differences (DiD) framework with mediation for possibly multivalued discrete or continuous treatments and mediators, aimed at identifying the direct effect of the treatment on the outcome (net of effects operating through the mediator), the indirect effect via the mediator, and the joint effects of treatment and mediator, consistent with the framework of dynamic treatment effects. Identification relies on a conditional parallel trends assumption imposed on the mean potential outcome across treatment and mediator states, or (depending on the causal parameter) additionally on the mean potential outcomes and potential mediator distributions across treatment states. We propose ATET estimators for repeated cross sections and panel data within the double/debiased machine learning framework, which allows for data-driven control of covariates, and we establish their asymptotic normality under standard regularity conditions. We investigate the finite-sample performance of the proposed methods in a simulation study and illustrate our approach in an empirical application to the US National Longitudinal Survey of Youth, estimating the direct effect of health care coverage on general health as well as the indirect effect operating through routine checkups.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.23877
  16. By: Soria, Chris
    Abstract: As large language model (LLM)–based text classification becomes routine in the social sciences, researchers confront dozens of competing models, inconsistent advice on prompting, and little standardized tooling with evidence‑based defaults. CatLLM, an open‑source Python and R package, addresses this gap with a three‑stage pipeline—exploration, extraction, classification—for coding open‑ended survey responses. The package offers a provider‑agnostic interface that supports multi‑model ensembles, batch processing, and fully local deployment via open‑weight models, and can be operated through a conversational interface by researchers with no programming experience. CatLLM’s defaults are calibrated by a systematic empirical study evaluating 21 LLMs across three capability tiers, six providers, and four survey questions, benchmarked against sociologist‑coded ground truth. This validation reveals a consistent problem: all models over‑classify, with precision lagging 40–50 percentage points behind sensitivity, implying that default LLM configurations may substantially overstate category prevalence. CatLLM encodes empirically grounded mitigations as defaults: verbose category definitions with explicit inclusion and exclusion criteria, unanimous multi‑model ensembling, and an automatic “Other” escape‑valve category, while disabling advanced prompting strategies that show no reliable benefit. Ensembles of inexpensive open‑weight models outperform the best individual cloud model, enabling fully local deployment without transmitting survey data to external servers. These findings replicate on two independent public datasets spanning political and emotional text, and an applied example linking tool‑coded “move reasons” to respondent demographics uncovers distinct life‑course patterns in residential mobility.
    Date: 2026–03–20
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:gjvcf_v1
  17. By: Nikolai Cook, František Bartoš, Pedro R. D. Bom, Sebastian Gechert, Klára Kantová, Jerome Geyer-Klingeberg, Tomáš Havránek, Zuzana Irsova, Martina Luskova, MatÄ›j Opatrnı, Franz Prante, Heiko J. Rachinger, T. D. Stanley
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:wlu:lcerpa:jc0161
  18. By: Epper, Thomas (CNRS, IESEG School of Management, Univ. Lille, UMR 9221 – LEM – Lille Economie Management, F-59000 Lille, France); Ibsen, Kristoffer (Aarhus University); Koch, Alexander (Aarhus University); Nafziger, Julia (Aarhus University)
    Abstract: University dropout is costly, making it a policy priority to identify factors that predict dropout. Using a survey experiment with incoming first-year students linked to long-run administrative outcomes, we assess which information improves dropout prediction beyond standard university records. A small number of targeted, study-specific survey items - especially motivation and expectations about degree completion - substantially improve predictive performance. By contrast, widely used measures of general preferences and traits (such as grit and self-control) add little incremental value - a result that we qualitatively replicate in a large population. Our findings suggest inexpensive, scalable ways to improve dropout predictions.
    Keywords: dropout, non-cognitive skills, motivation, economic preferences, beliefs, education, machine learning
    JEL: I23 D91
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:iza:izadps:dp18439
  19. By: Batuhan Koyuncu; Byeungchun Kwon; Marco Jacopo Lombardi; Fernando Perez-Cruz; Hyun Song Shin
    Abstract: This article introduces the BIS Time-series Regression Oracle (BISTRO), a general purpose time series model for macroeconomic forecasting. Its edge over traditional econometric approaches lies in its ability to deal with generic unconditional and conditional forecasting tasks, without requiring to adjust the model to the macroe conomic tasks being tackled. Building on the transformer architecture underlying LLMs, BISTRO is fine-tuned on the large repository of macroeconomic data main tained at the BIS. We show that BISTRO provides reliable unconditional forecasts for key macroeconomic aggregates and illustrate how using it for conditional fore casting can help unveiling patterns of nonlinearity in the data.
    Keywords: forecasting, scenarios, large language models
    JEL: C32 C45 C55 C87
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:bis:biswps:1337
  20. By: Hait, Subir
    Abstract: Universities operate in increasingly uncertain financial and enrollment environments, yet evidence-based recruitment investment planning remains difficult because campaign-level data are often proprietary or unavailable. This study develops a decision-analytic framework for university enrollment strategy under uncertainty, integrating Monte Carlo simulation, econometric analysis, nonlinear optimization, and policy stress testing. The Enrollment Strategy Simulation (ESS) framework evaluates how alternative recruitment budget allocation ratios affect financial performance—including return on investment (ROI), net present value (NPV), enrollment yield, and downside risk—across a four-year discounted tuition revenue horizon. Using 10, 000 replications per scenario, the analysis compares 5%, 10%, and 15% allocation ratios under stochastic variation in cost per lead (CPL), conversion rate (CR), and institutional costs. Expected ROI rises from −0.476 at 5% to 0.325 at 15%, with mean NPV turning positive at the 15% threshold. Regression results confirm that conversion rate is strongly positively and cost per lead strongly negatively associated with ROI. A risk-adjusted optimization procedure identifies approximately 19.3% as the optimal mean-variance allocation. Policy stress tests show that advertising cost inflation produces the largest deterioration in expected outcomes. The ESS framework provides a transparent and reproducible decision-support tool for enrollment investment planning when empirical institutional data are unavailable. All simulations were implemented in R, with replication code archived via GitHub and Zenodo. Keywords: Monte Carlo simulation, enrollment strategy, decision analytics, higher education finance, university budgeting, risk-adjusted optimization
    Date: 2026–03–10
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:4kjb9_v1

This nep-cmp issue is ©2026 by Stan Miles. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.