nep-big New Economics Papers
on Big Data
Issue of 2026–03–23
eight papers chosen by
Tom Coupé, University of Canterbury


  1. Transfer Reinforcement Learning for Pricing, Driver Repositioning and Customer Admission in Ride-Hailing Networks By De Munck, Thomas; Tancrez, Jean-Sébastien; Chevalier, Philippe
  2. Lobbying for Regulations: When Big Business Says Yes By Luca Macedoni; Ariel Weinberger
  3. Deep Learning Projects Jurisdiction of New and Proposed Clean Water Act Regulation By Simon Greenhill; Brant J. Walker; Joseph S. Shapiro
  4. Estimating Demand Shocks from Foot Traffic: A Big-Data Approach By Marina Azzimonti-Renzo; David Wiczer; Yang Xuan
  5. ECHOES OF POLICY: LEVERAGING AI/ML TO SUPPORT CENTRAL BANK COMMUNICATION STRATEGIES By Rudy Marhastari; Cicilia Anggadewi Harun; Retno Muhardini; Agatha Silalahi; Annes Nisrina Khoirunnisa; Rheznandya Arkaputra Azis; Sintia Aurida; Rahardian Luthfan Ihtifazhuddin; Citra Ayu Rossi Wulandari; Alvin Andhika Zulen; Amin Endah Sulistiawati
  6. Estimating Visual Attribute Effects in Advertising from Observational Data: A Deepfake-Informed Double Machine Learning Approach By Yizhi Liu; Balaji Padmanabhan; Siva Viswanathan
  7. Stock Market Prediction Using Node Transformer Architecture Integrated with BERT Sentiment Analysis By Mohammad Al Ridhawi; Mahtab Haj Ali; Hussein Al Osman
  8. AI for Survey Design: Generating and Evaluating Survey Questions with Large Language Models By Fuchs, Anna; Haensch, Anna-Carolina; Weber, Wiebke

  1. By: De Munck, Thomas (Université catholique de Louvain, LIDAM/CORE, Belgium); Tancrez, Jean-Sébastien (Université catholique de Louvain, LIDAM/CORE, Belgium); Chevalier, Philippe (Université catholique de Louvain, LIDAM/CORE, Belgium)
    Abstract: We consider the problem of a ride-hailing platform (e.g., Uber, Lyft) that connects supply with demand over a network of locations. To this aim, the platform makes pricing, driver repositioning, and customer admission decisions. Customers are impatient and have distinct willingness to pay. Drivers can be repositioned by the platform, or can choose to relocate to other locations by themselves. We formulate this problem as a discrete-time Markov decision process and propose a transfer learning approach to find an efficient policy. Our approach first derives a rolling-horizon strategy by repeatedly solving a deterministic optimization problem. Then, two neural networks are pretrained to replicate the strategy and learn the associated value function. Finally, the policy is further improved through deep reinforcement learning (DRL). Using data from New York City, we apply our approach to networks of up to 20 locations. The results show that our approach outperforms alternative DRL algorithms and rolling-horizon strategies while reducing computation time and stabilizing learning. We also explore the interplay between pricing, driver repositioning, and customer admission, providing insights into their respective roles.
    Keywords: Transportation ; Ride-hailing platforms ; Pricing and repositioning decisions ; Transfer learning ; Deep reinforcement learning
    Date: 2025–02–01
    URL: https://d.repec.org/n?u=RePEc:cor:louvco:2025004
  2. By: Luca Macedoni; Ariel Weinberger
    Abstract: Do firms uniformly oppose regulations that increase production costs, or might industry leaders strategically support stricter standards as a competitive tool? We identify a specific mechanism through which large firms strategically support regulations to enhance their competitive position. Extending the Melitz-Chaney model of firm heterogeneity to incorporate government regulations and lobbying following Grossman-Helpman, we derive conditions under which regulations disproportionately burden smaller competitors while benefiting larger survivors through reduced competition. The model predicts that firm size is positively correlated with support for stringent regulations, but that larger sunk investments push firms to oppose such policies. To test these predictions, we develop a text-as-data approach using large language models to classify firm regulatory preferences from lobbying disclosures—a measurement challenge that has limited prior systematic analysis. Applying guided machine learning to over 20, 000 U.S. lobbying reports, we confirm that larger firms are significantly more likely to support stricter regulations, especially in concentrated industries. Capital-intensive firms with high leverage and less redeployable assets tend to oppose regulations, suggesting that operational flexibility is crucial for extracting strategic benefits from regulatory changes.
    Keywords: strategic lobbying, product standard regulations, firm heterogeneity, machine learning
    JEL: F12 D22 D72 L11 L51
    Date: 2026
    URL: https://d.repec.org/n?u=RePEc:ces:ceswps:_12536
  3. By: Simon Greenhill; Brant J. Walker; Joseph S. Shapiro
    Abstract: Projecting the effects of proposed policy reforms is challenging because no outcome data exist for regulations that governments have not yet implemented. We propose an ex ante deep learning framework that can project effects of proposed reforms by mapping outcomes observed under past regulations onto the legal criteria of proposed future policies (i.e., by “relabeling”). We apply this framework to study changes in jurisdiction of the US Clean Water Act (CWA). We compare our ex ante deep learning projection of jurisdiction under the Supreme Court’s Sackett decision against widely used projections from domain experts. Ex ante machine learning generates exceptional performance improvements over the leading domain expert model that the US Environmental Protection Agency currently uses, with 65 times more accurate identification of jurisdictional sites. We also develop an ex post deep learning model trained with data after policy implementation. Ex post deep learning performs best. Sackett deregulates one-third of all previously regulated US waters, particularly floodplains and pristine fish habitats, totaling 700, 000 deregulated stream miles and 17 million deregulated wetland acres. Deep learning can effectively project consequences of far-reaching regulatory reforms before they are implemented, when projections are both most uncertain and most useful.
    JEL: C45 D61 H11 H23 K32 Q25 Q53 Q58 R11
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:nbr:nberwo:34947
  4. By: Marina Azzimonti-Renzo; David Wiczer; Yang Xuan
    Abstract: This study leverages high-frequency foot-traffic data from SafeGraph to estimate demand shocks in customer-facing establishments across New York City’s retail, service, and health sectors. Recognizing that variations in foot traffic can arise from both unpredictable demand shocks and firm-driven strategies to attract customers, we present a theoretical framework that isolates establishment-level demand fluctuations from firm-level strategic choices. Implementing this empirically, we employ an unsupervised machine learning approach to classify establishments into distinct categories that are largely orthogonal to location and sector. We find important heterogeneity in the persistence of shocks, important heterogeneity in their trends, and that estimation on a pooled sample importantly understates the variance experienced by some establishments.
    Keywords: Consumer-facing; Brands; Service; Retail Trade; Health; Demand Dy namics; Demand Shocks; Foot Traffic; Big Data; Machine Learning
    Date: 2026–03–20
    URL: https://d.repec.org/n?u=RePEc:fip:fedrwp:102907
  5. By: Rudy Marhastari (Bank Indonesia); Cicilia Anggadewi Harun (Bank Indonesia); Retno Muhardini (Bank Indonesia); Agatha Silalahi (Bank Indonesia); Annes Nisrina Khoirunnisa (Bank Indonesia); Rheznandya Arkaputra Azis (Bank Indonesia); Sintia Aurida (Bank Indonesia); Rahardian Luthfan Ihtifazhuddin (Bank Indonesia); Citra Ayu Rossi Wulandari (Bank Indonesia); Alvin Andhika Zulen (Bank Indonesia); Amin Endah Sulistiawati (Bank Indonesia)
    Abstract: This study evaluates the effectiveness of Bank Indonesia’s communication strategy by integrating computational linguistics, media sentiment analytics, and macroeconomic diagnostics within a unified empirical framework. Using advanced Natural Language Processing (NLP) techniques, BI’s press releases from 2019–2024 are transformed into quantitative indicators capturing clarity, sentiment, comprehensiveness, consistency, and economic appropriateness. In parallel, news articles on inflation and exchange rate developments are analyzed to assess how policy messages are transmitted or amplified through media channels. These linguistic features are further enriched using Named Entity Recognition to identify stakeholder-specific resonance and potential pathways of narrative distortion within the public communication ecosystem. To assess macroeconomic implications, a VARX model links communication characteristics to intermediary channels, market expectations, and macroeconomic outcomes under both normal and anomalous conditions. Complementing this analysis, an Early Warning System (EWS) employing a 12-month rolling window and IsolationForest anomaly detection identifies periods of inflation and exchange-rate stress, providing a diagnostic foundation for anticipating heightened communication demands. The findings show that central bank communication functions not only as an information conduit but also as an active policy instrument that shapes expectations and influences market behavior. Building on these insights, the study proposes a three-pillar framework: Features, Timing, and Channels; to strengthen clarity, responsiveness, and coherence in central bank communication. This research advances the literature by integrating AI/ML-based diagnostics with policy communication analysis, offering an empirically grounded approach to enhancing communication effectiveness, transparency, and expectation management.
    Keywords: Central Bank Communication, Communication Feature, Sentiment Analysis, Communication Impact Analysis, Early Warning System
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:idn:wpaper:wp192025
  6. By: Yizhi Liu; Balaji Padmanabhan; Siva Viswanathan
    Abstract: Digital advertising increasingly relies on visual content, yet marketers lack rigorous methods for understanding how specific visual attributes causally affect consumer engagement. This paper addresses a fundamental methodological challenge: estimating causal effects when the treatment, such as a model's skin tone, is an attribute embedded within the image itself. Standard approaches like Double Machine Learning (DML) fail in this setting because vision encoders entangle treatment information with confounding variables, producing severely biased estimates. We develop DICE-DML (Deepfake-Informed Control Encoder for Double Machine Learning), a framework that leverages generative AI to disentangle treatment from confounders. The approach combines three mechanisms: (1) deepfake-generated image pairs that isolate treatment variation; (2) DICE-Diff adversarial learning on paired difference vectors, where background signals cancel to reveal pure treatment fingerprints; and (3) orthogonal projection that geometrically removes treatment-axis components. In simulations with known ground truth, DICE-DML reduces root mean squared error by 73-97% compared to standard DML, with the strongest improvement (97.5%) at the null effect point, demonstrating robust Type I error control. Applying DICE-DML to 232, 089 Instagram influencer posts, we estimate the causal effect of skin tone on engagement. Standard DML produces diagnostically invalid results (negative outcome R^2), while DICE-DML achieves valid confounding control (R^2 = 0.63) and estimates a marginally significant negative effect of darker skin tone (-522 likes; p = 0.062), substantially smaller than the biased standard estimate. Our framework provides a principled approach for causal inference with visual data when treatments and confounders coexist within images.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.02359
  7. By: Mohammad Al Ridhawi; Mahtab Haj Ali; Hussein Al Osman
    Abstract: Stock market prediction presents considerable challenges for investors, financial institutions, and policymakers operating in complex market environments characterized by noise, non-stationarity, and behavioral dynamics. Traditional forecasting methods often fail to capture the intricate patterns and cross-sectional dependencies inherent in financial markets. This paper presents an integrated framework combining a node transformer architecture with BERT-based sentiment analysis for stock price forecasting. The proposed model represents the stock market as a graph structure where individual stocks form nodes and edges capture relationships including sectoral affiliations, correlated price movements, and supply chain connections. A fine-tuned BERT model extracts sentiment from social media posts and combines it with quantitative market features through attention-based fusion. The node transformer processes historical market data while capturing both temporal evolution and cross-sectional dependencies among stocks. Experiments on 20 S&P 500 stocks spanning January 1982 to March 2025 demonstrate that the integrated model achieves a mean absolute percentage error (MAPE) of 0.80% for one-day-ahead predictions, compared to 1.20% for ARIMA and 1.00% for LSTM. Sentiment analysis reduces prediction error by 10% overall and 25% during earnings announcements, while graph-based modeling contributes an additional 15% improvement by capturing inter-stock dependencies. Directional accuracy reaches 65% for one-day forecasts. Statistical validation through paired t-tests confirms these improvements (p
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.05917
  8. By: Fuchs, Anna; Haensch, Anna-Carolina; Weber, Wiebke
    Abstract: Designing survey questions is easy; however designing good survey questions is a complex task. Large language models (LLMs) have the potential to support this task by automating parts of the item-generation process, but their suitability for survey research has not yet been systematically evaluated. Published research in this area remains sparse, and little is known about the quality and characteristics of survey items generated by LLMs or the factors influencing their performance. This work provides the first in-depth analysis of LLM-based survey item generation and systematically evaluates how different design choices affect item quality. Five LLMs, namely GPT-4o, GPT-4o-mini, GPT-oss-20B, LLaMA 3.1 8B, and LLaMA 3.1 70B, were used to generate survey items on four substantive domains: work, living conditions, national politics, and recent politics. We additionally evaluate three prompting strategies: zero-shot, role, and chain-of-thought prompting. To assess the quality of the generated survey items, we use the Survey Quality Predictor (SQP), a tool for estimating the quality of attitudinal survey items based on codings of their formal and linguistic characteristics. To code these characteristics, we used an LLM-assisted procedure. The findings show striking differences in survey item characteristics across the different models and prompting techniques. Both the choice of model and the prompting technique employed influence the quality of LLM-generated survey items. Closed-source GPT models generally produce more consistent items than open-source LLaMA models. Overall, chain-of-thought prompting achieved the best results. GPT-4o, GPT-4o-mini, and LLaMA 3.1 70B achieved similar item quality, while the LLaMA model showed greater variability.
    Date: 2026–03–12
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:fzn7t_v1

This nep-big issue is ©2026 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.