nep-big 2026-03-09 papers

on Big Data

Issue of 2026–03–09
eleven papers chosen by
Tom Coupé, University of Canterbury

Overreaction as an indicator for momentum in algorithmic trading: A Case of AAPL stocks By Szymon Lis; Robert \'Slepaczuk; Pawe{\l} Sakowski
Machine learning mutual fund flows By Fausch, Jürg; Frigg, Moreno; Ruenzi, Stefan; Weigert, Florian
Using Transformers and Reinforcement Learning as Narrative Filters in Macroeconomics By Vegard H. Larsen; Leif Anders Thorsrud
Can satellites predict oil demand? By Bricongne, Jean-Charles; Meunier, Baptiste; Macalos, Joao; Milis, Julia; Pical, Thomas
Automated historical census digitization using image augmentation and transformer-based methods By Leonardo Costa Ribeiro; Jonatan Andersson; William Skoglund; Jakob Molinder; Martin Önnerfors
Measuring Online Media Ideology with Large Language Models and "Multi-Cue Classification" By da Silva, Lucas Paulo
When Algorithms Rate Performance: Do Large Language Models Replicate Human Evaluation Biases? By Rilke, Rainer; Sliwka, Dirk
How Effectively Can Current LLMs Analyze Macrofinancial Issues? By Paola Ganum; Tohid Atashbar
Sub-City Real Estate Price Index Forecasting at Weekly Horizons Using Satellite Radar and News Sentiment By Baris Arat; Hasan Fehmi Ates; Emre Sefer
Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models? By Wenxi Geng; Dingyuan Liu; Liya Li; Yiqing Wang
The Worth of a “Wo”: Gender Bias in Financial Advice from LLMs By Foltyn, Richard; Olsson, Jonna

Overreaction as an indicator for momentum in algorithmic trading: A Case of AAPL stocks

By:	Szymon Lis; Robert \'Slepaczuk; Pawe{\l} Sakowski
Abstract:	This paper investigates whether short-term market overreactions can be systematically predicted and monetized as momentum signals using high-frequency emotional information and modern machine learning methods. Focusing on Apple Inc. (AAPL), we construct a comprehensive intraday dataset that combines volatility normalized returns with transformer-based emotion features extracted from Twitter messages. Overreactions are defined as extreme return realizations relative to contemporaneous volatility and transaction costs and are modeled as a three-class prediction problem. We evaluate the performance of several nonlinear classifiers, including XGBoost, Random Forests, Deep Neural Networks, and Bidirectional LSTMs, across multiple intraday frequencies (1, 5, 10, and 15 minute data). Model outputs are translated into trading strategies and assessed using risk-adjusted performance measures and formal statistical tests. The results show that machine learning models significantly outperform benchmark overreaction rules at ultra short horizons, while classical behavioral momentum effects dominate at intermediate frequencies, particularly around 10 minutes. Explainability analysis based on SHAP reveals that volatility and negative emotions, especially fear and sadness, play a central role in driving predicted overreactions. Overall, the findings demonstrate that emotion-driven overreactions contain a predictable structure that can be exploited by machine learning models, offering new insights into the behavioral origins of intraday momentum and the interaction between sentiment, volatility, and algorithmic trading.
Date:	2026–02
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2602.18912

Machine learning mutual fund flows

By:	Fausch, Jürg; Frigg, Moreno; Ruenzi, Stefan; Weigert, Florian
Abstract:	We present improved out-of-sample predictability of future fund flows using state-of-the-art machine learning methods. Nonlinear machine learning models significantly outperform linear models in terms of out-of-sample R-squared. Using interpretable ML methods, we identify past flows and the Morningstar rating as the most important predictors for net- flows, while other past performance variables are of minor importance. We find that the importance of Morningstar ratings and expenses has increased over time. In addition, the interaction effect of past flows with the Morningstar rating has a substantial impact on future flows. Furthermore, our results demonstrate that machine learning-based fund flow predictions can be used to ex-ante differentiate between high and low-performing mutual funds. Finally, funds whose flow predictions can be improved the most using ML reveal the worst performance, consistent with the idea that liquidity management is particularly challenging for these funds.
Keywords:	Machine learning, fund flow prediction, big data, interpretable machine learning
JEL:	C45 C52 C53 C55 G10 G11 G12 G17 G23
Date:	2026
URL:	https://d.repec.org/n?u=RePEc:zbw:cfrwps:337467

Using Transformers and Reinforcement Learning as Narrative Filters in Macroeconomics

By:	Vegard H. Larsen; Leif Anders Thorsrud
Abstract:	Building on recent advances in Natural Language Processing and modeling of sequences, we study how a multimodal Transformer-based deep learning architecture can be used for measurement and structural narrative attribution in macroeconomics. The framework we propose combines (news) text and (macroeconomic) time series information using cross-attention mechanisms, easily incorporates differences in data frequencies and reporting delays, and can be used together with Reinforcement Learning to produce structurally coherent summaries of high-frequency news flows. Applied and tested on both simulated and real-world data out-of-sample, the results we obtain are encouraging.
Keywords:	multimodal transformer, structural decomposition, text analytics, macroeconomic nowcasting
JEL:	C45 C55 E32 E37
Date:	2026
URL:	https://d.repec.org/n?u=RePEc:ces:ceswps:_12454

Can satellites predict oil demand?

By:	Bricongne, Jean-Charles; Meunier, Baptiste; Macalos, Joao; Milis, Julia; Pical, Thomas
Abstract:	We investigate whether satellite observations of nitrogen dioxide (NO₂) – a short-lived pollutant primarily emitted by fossil fuel combustion – can improve the forecasting of oil demand. After retrieving, cleaning, and aggregating daily satellite data, we integrate NO₂ into a range of forecasting models. Across a panel of advanced and emerging economies, we find that including NO₂ significantly enhances nowcasting accuracy relative to benchmark models based on autoregressive terms and traditional predictors such as industrial activity, prices, weather, and vehicle registrations. Accuracy gains are particularly strong during crisis episodes but remain present in more stable times. Non-linear models, especially neural networks, yield the largest improvements, highlighting the non-linear link between energy demand and pollution. By offering a timely, globally consistent, and freely available proxy, satellite-based NO₂ data provide a valuable new tool for real-time monitoring of oil dema JEL Classification: C51, C81, E23, E37
Keywords:	big data, energy consumption, machine learning, nowcasting, satellite data
Date:	2026–02
URL:	https://d.repec.org/n?u=RePEc:ecb:ecbwps:20263198

Automated historical census digitization using image augmentation and transformer-based methods

By:	Leonardo Costa Ribeiro (Federal University of Minas Gerais); Jonatan Andersson (Uppsala University); William Skoglund (Lund University); Jakob Molinder (Uppsala University); Martin Önnerfors (Uppsala University)
Abstract:	A large literature in economic history uses digitized census data to study individual-level outcomes in history. Although many census records have been digitized manually, the process is extremely labor-intensive, and substantial material remains unprocessed in archives. Recent advances in machine learning offer the potential to automate large part of this work. We demonstrate an end-to-end digitization pipeline based on the transformer-based Donut model, trained on hand-annotated data and enhanced with image augmentation, to extract information from the 1955 Stockholm tax and census records. The resulting output attains high accuracy across multiple evaluation metrics.
Keywords:	Digitization, Census, OCR, Transformers
JEL:	N01
Date:	2026–02
URL:	https://d.repec.org/n?u=RePEc:hes:wpaper:0298

Measuring Online Media Ideology with Large Language Models and "Multi-Cue Classification"

By:	da Silva, Lucas Paulo (Trinity College Dublin)
Abstract:	Measuring media ideology is essential for researching media bias, media effects, and various important topics in political science, communication, and other social sciences. However, given journalistic norms of objectivity and the complexity of ideology, measuring media ideology accurately is uniquely challenging. Large language models (LLMs) have become valuable tools in this endeavor. Based on media communication theories, I argue that media ideology is expressed via different cues -- the topic, argument, framing, criticism, and sources of the media content -- and that LLMs often miss these. Standard methods of LLM classification also offer little control, flexibility, and data granularity to researchers. Drawing on insights about computational and quantitative measurement methodologies, I introduce the "Multi-Cue Classification" (MQ-Class) approach. With MQ-Class, an LLM classifies the different ideological cues separately and researchers then apply pre-specified weights and thresholds to combine them into one label per text. I compare standard LLM and MQ-Class methods using two example tasks -- classifying the economic and cultural ideologies of a novel sample of online media articles. Across multiple tests, MQ-Class is more accurate and puts researchers "back in the driver's seat." I conclude by discussing how MQ-Class could be implemented for other classification tasks and data.
Date:	2026–02–20
URL:	https://d.repec.org/n?u=RePEc:osf:socarx:zmtqp_v1

When Algorithms Rate Performance: Do Large Language Models Replicate Human Evaluation Biases?

By:	Rilke, Rainer (WHU - Otto Beisheim School of Management); Sliwka, Dirk (University of Cologne)
Abstract:	A large body of research across management, psychology, accounting, and economics shows that subjective performance evaluations are systematically biased: ratings cluster near the midpoint of scales and are often excessively lenient. As organizations increasingly adopt large language models (LLMs) for evaluative tasks, little is known about how these systems perform when assessing human performance. We document that, in the absence of clear objective standards and when individuals are rated independently, LLMs reproduce the familiar patterns of human raters. However, LLMs generate greater dispersion and accuracy when evaluating multiple individuals simultaneously. With noisy but objective performance signals, LLMs provide substantially more accurate evaluations than human raters, as they (i) are less subject to biases arising from concern for the evaluated employee and (ii) make fewer mistakes in information processing closely approximating rational Bayesian benchmarks.
Keywords:	performance evaluation, large language models, signal objectivity, algorithmic judgment, Gen-AI
JEL:	J24 J28 M12 M53
Date:	2026–02
URL:	https://d.repec.org/n?u=RePEc:iza:izadps:dp18371

How Effectively Can Current LLMs Analyze Macrofinancial Issues?

By:	Paola Ganum; Tohid Atashbar
Abstract:	This paper empirically evaluates the ability of current Large Language Models (LLMs) to analyze macrofinancial coverage in IMF Article IV staff reports, using human economists' assessments as a benchmark. We test several GPT models on reports from 2016-2024, assessing their performance on both qualitative ratings and binary questions. Our findings indicate that the latest models can meaningfully assist economists, achieving an average accuracy of 71-75% on ratings and an average exact match rate of 76-81% on binary questions in 2024 across advanced GPT models. However, we find that LLMs tend to assign higher, less-dispersed ratings than human experts and struggle with open-ended questions that require deep contextual judgment. The paper provides quantitative evidence on current LLM accuracy in this domain, explores the drivers of its performance, and discusses key limitations such as optimistic bias.
Keywords:	AI; Large Language Model; Textual Analysis; Macrofinancial Surveillance; IMF Staff Reports; Human-AI Comparison
Date:	2026–02–27
URL:	https://d.repec.org/n?u=RePEc:imf:imfwpa:2026/035

Sub-City Real Estate Price Index Forecasting at Weekly Horizons Using Satellite Radar and News Sentiment

By:	Baris Arat; Hasan Fehmi Ates; Emre Sefer
Abstract:	Reliable real estate price indicators are typically published at city level and low frequency, limiting their use for neighborhood-scale monitoring and long-horizon planning. We study whether sub-city price indices can be forecasted at weekly frequency by combining physical development signals from satellite radar with market narratives from news text. Using over 350, 000 transactions from Dubai Land Department (2015-2025), we construct weekly price indices for 19 sub-city regions and evaluate forecasts from 2 to 34 weeks ahead. Our framework fuses regional transaction history with Sentinel-1 SAR backscatter, news sentiment combining lexical tone and semantic embeddings, and macroeconomic context. Results are strongly horizon dependent: at horizons up to 10 weeks, price history alone matches multimodal configurations, but beyond 14 weeks sentiment and SAR become critical. At long horizons (26-34 weeks), the full multimodal model reduces mean absolute error from 4.48 to 2.93 (35% reduction), with gains statistically significant across regions. Nonparametric learners consistently outperform deep architectures in this data regime. These findings establish benchmarks for weekly sub-city index forecasting and demonstrate that remote sensing and news sentiment materially improve predictability at strategically relevant horizons.
Date:	2026–02
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2602.18572

Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models?

By:	Wenxi Geng; Dingyuan Liu; Liya Li; Yiqing Wang
Abstract:	Post-hoc explainability is central to credit risk model governance, yet widely used tools such as coefficient-based attributions and SHapley Additive exPlanations (SHAP) often produce numerical outputs that are difficult to communicate to non-technical stakeholders. This paper investigates whether large language models (LLMs) can serve as post-hoc explainability tools for credit risk predictions through in-context learning, focusing on two roles: translators and autonomous explainers. Using a personal lending dataset from LendingClub, we evaluate three commercial LLMs, including GPT-4-turbo, Claude Sonnet 4, and Gemini-2.0-Flash. Results provide strong evidence for the translator role. In contrast, autonomous explanations show low alignment with model-based attributions. Few-shot prompting improves feature overlap for logistic regression but does not consistently benefit XGBoost, suggesting that LLMs have limited capacity to recover non-linear, interaction-driven reasoning from prompt cues alone. Our findings position LLMs as effective narrative interfaces grounded in auditable model attributions, rather than as substitutes for post-hoc explainers in credit risk model governance. Practitioners should leverage LLMs to bridge the communication gap between complex model outputs and regulatory or business stakeholders, while preserving the rigor and traceability required by credit risk governance frameworks.
Date:	2026–02
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2602.18895

The Worth of a “Wo”: Gender Bias in Financial Advice from LLMs

By:	Foltyn, Richard (Dept. of Economics, Norwegian School of Economics and Business Administration); Olsson, Jonna (Dept. of Economics, Norwegian School of Economics and Business Administration)
Abstract:	Do large language models (LLMs) provide gender-neutral financial advice? We answer this question by prompting 33 widely used LLMs from five vendors, varying only a single word in otherwise identical prompts: “man” versus “woman.” We find that women are advised to allocate 1.8 percentage points less to equity funds than men; this gap persists across vendors, model generations, and model complexity. Providing richer investor information attenuates but does not entirely eliminate the gender gap. Since even modest allocation differences imply persistent return differentials, algorithmic financial advice can shape wealth accumulation across demographic groups.
Keywords:	Algorithmic bias; Gender bias; Large Language Models; Portfolio allocation
JEL:	C01 G11 J16
Date:	2026–02–27
URL:	https://d.repec.org/n?u=RePEc:hhs:nhheco:2026_004

This nep-big issue is ©2026 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.