|
on Big Data |
| By: | Eurydice Fotopoulou; Iyke Maduako; M. Belen Sbrancia; Prachi Srivastava |
| Abstract: | The absence of reliable data on fundamental economic indicators (e.g. real GDP), combined with structural shifts in the economy, can severely constrain the ability to conduct accurate macroeconomic analysis and forecasting. This paper explores alternatives to address data limitations by integrating machine learning and satellite data to estimate real GDP. Specifically, it finds that incorporating satellite-based nightlight data into a random forest model significantly improves the accuracy of quarterly GDP growth estimates compared with models relying solely on traditional indicators. This empirical application contributes to the emerging nowcasting field to enhance economic forecasting in economies with significant data gaps. |
| Keywords: | Macroeconomic forecast; Machine learning; Nowcasting; GDP; Satellite data; Random Forest |
| Date: | 2026–01–30 |
| URL: | https://d.repec.org/n?u=RePEc:imf:imfwpa:2026/020 |
| By: | Yuanhong Wu; Wei Ye; Jingyan Xu; D. Frank Hsu |
| Abstract: | In this work, we propose to apply a new model fusion and learning paradigm, known as Combinatorial Fusion Analysis (CFA), to the field of Bitcoin price prediction. Price prediction of financial product has always been a big topic in finance, as the successful prediction of the price can yield significant profit. Every machine learning model has its own strength and weakness, which hinders progress toward robustness. CFA has been used to enhance models by leveraging rank-score characteristic (RSC) function and cognitive diversity in the combination of a moderate set of diverse and relatively well-performed models. Our method utilizes both score and rank combinations as well as other weighted combination techniques. Key metrics such as RMSE and MAPE are used to evaluate our methodology performance. Our proposal presents a notable MAPE performance of 0.19\%. The proposed method greatly improves upon individual model performance, as well as outperforms other Bitcoin price prediction models. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.00037 |
| By: | Walid Siala (SnT, University of Luxembourg, Luxembourg); Ahmed Khanfir (RIADI, ENSI, University of Manouba, Tunisia; SnT, University of Luxembourg, Luxembourg); Mike Papadakis (SnT, University of Luxembourg, Luxembourg) |
| Abstract: | This paper addresses stock price movement prediction by leveraging LLM-based news sentiment analysis. Earlier works have largely focused on proposing and assessing sentiment analysis models and stock movement prediction methods, however, separately. Although promising results have been achieved, a clear and in-depth understanding of the benefit of the news sentiment to this task, as well as a comprehensive assessment of different architecture types in this context, is still lacking. Herein, we conduct an evaluation study that compares 3 different LLMs, namely, DeBERTa, RoBERTa and FinBERT, for sentiment-driven stock prediction. Our results suggest that DeBERTa outperforms the other two models with an accuracy of 75% and that an ensemble model that combines the three models can increase the accuracy to about 80%. Also, we see that sentiment news features can benefit (slightly) some stock market prediction models, i.e., LSTM-, PatchTST- and tPatchGNN-based classifiers and PatchTST- and TimesNet-based regression tasks models. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.00086 |
| By: | Simon Deakin; Linda Shuku |
| Abstract: | The use of natural language processing (NLP) and machine learning (ML) to analyse the structure of legal texts is a fast-growing field. While much attention has been devoted to the use of these techniques to predict case outcomes, they have the potential to contribute more broadly to research into the nature of legal reasoning and its relationship to social and economic change. In this paper, we use recently developed NLP and ML methods to test the claim that judicial language is systematically shaped by economic shocks deriving from the business cycle and by long-run trends in the economy associated with technological change and industrial transition. Focusing on cases decided under the Anglo-Welsh poor law between the 1690s and 1830s, we show that the terminology used to describe the right to poor relief shifted over time according to economic conditions. We explore the implications of our results for the poor law, the theory of legal evolution, and socio-legal research methods. |
| Keywords: | Law and computation, poor law, legal evolution, natural language processing |
| JEL: | J41 K31 N33 |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:cbr:cbrwps:wp546 |
| By: | M.Jahangir Alam; Shane Boyle; Huiyu Li; Tatevik Sekhposyan |
| Abstract: | Recent research suggests that generic large language models (LLMs) can match the accuracy of traditional methods when forecasting macroeconomic variables in pseudo out-of-sample settings generated via prompts. This paper assesses the out-of-sample forecasting accuracy of LLMs by eliciting real-time forecasts of U.S. inflation from ChatGPT. We find that out-of-sample predictions are largely inaccurate and stale, even though forecasts generated in pseudo out-of-sample environments are comparable to existing benchmarks. Our results underscore the importance of out-of-sample benchmarking for LLM predictions. |
| Keywords: | large language models; generative AI; inflation forecasting |
| JEL: | C45 E31 E37 |
| Date: | 2026–02–05 |
| URL: | https://d.repec.org/n?u=RePEc:fip:fedfwp:102407 |
| By: | Esther Bailey; Daniel Fehder; Eric Floyd; Yael Hochberg; Daniel J. Lee |
| Abstract: | We use a randomized experiment with 553 science- and technology-based startups in 12 co-working spaces across the US to evaluate the effects of intensive, short-term entrepreneurial training programs on survival and performance for innovation-driven startups. Treated startups are more likely to shut down their businesses and do so sooner than control startups. Conditional on survival, however, treated startups are more likely to raise external funding for their ventures, raise funding faster, and raise more funding than the control group; they also exhibit higher employment and revenue. Treated founders are less likely to found a new startup after shutdown. Our findings are consistent with practitioner arguments that early entrepreneurship training interventions can help entrepreneurs with less viable ventures “rationally quit” (“fail fast”). We use machine learning techniques (causal random forest) to provide exploratory insights on the most impacted subgroups. |
| JEL: | C93 D22 M13 M53 O32 |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:34755 |
| By: | Oskar V{\aa}le; Shiliang Zhang; Sabita Maharjan; Gro Kl{\ae}boe |
| Abstract: | The balancing market in the energy sector plays a critical role in physically and financially balancing the supply and demand. Modeling dynamics in the balancing market can provide valuable insights and prognosis for power grid stability and secure energy supply. While complex machine learning models can achieve high accuracy, their black-box nature severely limits the model interpretability. In this paper, we explore the trade-off between model accuracy and interpretability for the energy balancing market. Particularly, we take the example of forecasting manual frequency restoration reserve (mFRR) activation price in the balancing market using real market data from different energy price zones. We explore the interpretability of mFRR forecasting using two models: extreme gradient boosting (XGBoost) machine and explainable boosting machine (EBM). We also integrate the two models, and we benchmark all the models against a baseline naive model. Our results show that EBM provides forecasting accuracy comparable to XGBoost while yielding a considerable level of interpretability. Our analysis also underscores the challenge of accurately predicting the mFRR price for the instances when the activation price deviates significantly from the spot price. Importantly, EBM's interpretability features reveal insights into non-linear mFRR price drivers and regional market dynamics. Our study demonstrates that EBM is a viable and valuable interpretable alternative to complex black-box AI models in the forecast for the balancing market. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.00049 |
| By: | Anauati, María Victoria; Romero, María Noelia; Baraldi, Lucia; Sosa Escudero, Walter; Tommasi, Mariano |
| Abstract: | Recidivism is a persistent challenge for criminal justice systems worldwide, yet evidence from Latin America remains scarce. This study addresses that gap through three contributions. First, it reviews the individual, institutional, and contextual determinants of recidivism, with special attention to Latin America. Second, it examines the potential use of AI-based prediction tools, discussing the institutional, data-related, and ethical challenges such implementation entails. Third, using two decades of administrative data from Argentinas prison system, it applies six machine learning models to predict reoffending. The analysis identifies economic offenses and age at incarceration as the strongest predictors, while geographic indicators also play a role, reflecting the spatial clustering of repeat offenders across prisons. The findings suggest that routinely collected prison-level information, often underutilized, can enable reasonably accurate risk prediction and guide effective rehabilitation and prison management strategies. |
| JEL: | K40 C50 I30 |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:idb:brikps:14489 |
| By: | Kilic, Talip; Letta, Marco; Montalbano, Pierluigi; Petruccelli, Federica |
| Abstract: | This paper proposes a new resilience index, CLARE (Causal machine Learning Approach to Resilience Estimation), which is rooted in an impact evaluation framework and causal machine learning algorithms applied to longitudinal household survey data. The indicator is model-agnostic, data-driven, scalable, and normatively anchored to wellbeing thresholds, and can be either shock-specific or a general-purpose resilience metric. The paper provides an empirical demonstration of constructing the CLARE resilience index, leveraging more than 28, 000 household observations from 19 nationally representative, longitudinal, multi-topic surveys that were implemented by the national statistical offices in Malawi, Nigeria, Tanzania, and Uganda over 2009–20 in partnership with the World Bank Living Standards Measurement Study. Although the paper centers on measuring resilience to drought, the proposed index is applicable to any type of shock. The analysis shows that CLARE outperforms existing resilience metrics and alternative approaches to predict food insecurity out-of-sample—both in the future (dynamic forecasting) and in held-out countries (cross-sectional prediction). The index can be decomposed to causally identify the relative importance of resilience capacities that can insulate populations from shocks. Thus, it can be operationalized in designing, targeting, and monitoring policies and investments that aim to strengthen resilience. CLARE’s deployment—paired with continued investments in national longitudinal survey platforms—can boost the effectiveness of early-warning systems and resilience-building interventions, while allowing the transfer of resilience policy advice from data-rich contexts to data-poor environments that may not immediately provide the requisite longitudinal survey data for index estimation. |
| Date: | 2026–01–12 |
| URL: | https://d.repec.org/n?u=RePEc:wbk:wbrwps:11292 |
| By: | Kyle Higham (Motu Economic and Public Policy Research); Hannah Kotula (Motu Economic and Public Policy Research); Emma Scharfmann (University of California, Berkeley); Steve Gong (Google); Gaétan de Rassenfosse (Ecole polytechnique fédérale de Lausanne) |
| Abstract: | We present a curated dataset of about 850, 000 citations extracted from Office Actions issued by examiners at the United States Patent and Trademark Office. These references, historically underused due to accessibility challenges, provide a granular view into the patent examination process and complement traditional front-page citation data. We classify each citation into one of 14 categories and focus on the 265, 000 references to scientific literature, which we parse, clean, and disambiguate using machine learning and external bibliographic services. To enhance reusability, disambiguated records are linked to OpenAlex, a comprehensive research metadata platform. The dataset enables new research on examiner behavior, science–technology linkages, and the construction of citation-based metrics. All data and code are openly available to facilitate reuse across disciplines. |
| Keywords: | citation; patent; office actions; open data; non-patent literature; NPL |
| JEL: | O34 K29 D83 C81 |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:iip:wpaper:31 |
| By: | Mayoral, L.; Mueller, H.; Philipp, M.; Rauh, C.; Vassallo, R. |
| Abstract: | This article proposes a semantic-similarity approach to detecting and predicting rare events in newspaper text and applies it to institutional disruptions. Using a global news corpus covering more than 170 countries, we measure the similarity of headlines to event-specific prototypes in embedding space and aggregate these signals to identify disruptions to political institutions. We combine these text-based measures with supervised nowcasting and targeted human verification to expand existing datasets on military coups, irregular term-limit extensions, and weakening of the judiciary. The resulting event data are then used to forecast the likelihood of disruptions up to 12 months ahead, providing a high-frequency and scalable tool for monitoring institutional risk. As an illustration of its empirical value, we document that coups are followed by large and persistent declines in economic growth. More broadly, the framework can be adapted to detect and track a wide range of economic and political events and policy actions from news text in real time and in historical archives. |
| Keywords: | Political Institutions, Autocratization, Military Coups, Term Limit Evasion, Judiciary Weakening, Semantic Similarity, Embeddings, Nowcasting, Forecasting |
| JEL: | C53 C55 D72 P16 |
| Date: | 2026–01–14 |
| URL: | https://d.repec.org/n?u=RePEc:cam:camdae:2609 |
| By: | Brandon Yee; Krishna Sharma |
| Abstract: | Behavioral parameters such as loss aversion, herding, and extrapolation are central to asset pricing models but remain difficult to measure reliably. We develop a framework that treats large language models (LLMs) as calibrated measurement instruments for behavioral parameters. Using four models and 24{, }000 agent--scenario pairs, we document systematic rationality bias in baseline LLM behavior, including attenuated loss aversion, weak herding, and near-zero disposition effects relative to human benchmarks. Profile-based calibration induces large, stable, and theoretically coherent shifts in several parameters, with calibrated loss aversion, herding, extrapolation, and anchoring reaching or exceeding benchmark magnitudes. To assess external validity, we embed calibrated parameters in an agent-based asset pricing model, where calibrated extrapolation generates short-horizon momentum and long-horizon reversal patterns consistent with empirical evidence. Our results establish measurement ranges, calibration functions, and explicit boundaries for eight canonical behavioral biases. |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.01022 |
| By: | Felipe A. Csaszar; Aticus Peterson; Daniel Wilde |
| Abstract: | Can artificial intelligence outperform humans at strategic foresight -- the capacity to form accurate judgments about uncertain, high-stakes outcomes before they unfold? We address this question through a fully prospective prediction tournament using live Kickstarter crowdfunding projects. Thirty U.S.-based technology ventures, launched after the training cutoffs of all models studied, were evaluated while fundraising remained in progress and outcomes were unknown. A diverse suite of frontier and open-weight large language models (LLMs) completed 870 pairwise comparisons, producing complete rankings of predicted fundraising success. We benchmarked these forecasts against 346 experienced managers recruited via Prolific and three MBA-trained investors working under monitored conditions. The results are striking: human evaluators achieved rank correlations with actual outcomes between 0.04 and 0.45, while several frontier LLMs exceeded 0.60, with the best (Gemini 2.5 Pro) reaching 0.74 -- correctly ordering nearly four of every five venture pairs. These differences persist across multiple performance metrics and robustness checks. Neither wisdom-of-the-crowd ensembles nor human-AI hybrid teams outperformed the best standalone model. |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.01684 |
| By: | Carbuccia, Laudine (Sciences Po; Ecole Normale Supérieure) |
| Abstract: | Formal early childcare has a strong equalizing potential, yet access remains socioeconomically stratified. This study examines how these socioeconomic inequalities emerge and widen across three stages of the formal early childcare access process: intention to use early childcare during pregnancy, application, and actual access during the child’s first year. Using longitudinal data on approximately 2, 000 families in France, collected during pregnancy and followed one year after birth, we document a progressive widening of gaps along the access pathway. Compared with high–socioeconomic status (SES) households, low–SES households are about 18% less likely to intend to use early childcare, 25% less likely to apply, and 46% less likely to obtain access. To identify the determinants of these gaps, we combine machine learning for variable selection with decomposition analyses that quantify the contribution of observable factors at each stage across a wide range of 39 predictors. At the intention stage, most of the SES gap is accounted for by differences in observable characteristics related to resources, constraints, and available alternatives, with norms contributing little. At subsequent stages, inequalities increasingly reflect institutional barriers. The largest disparities emerge at the access stage, where spot allocation-related factors favoring higher-income, working, and earlier-applying households, and knowledge of the childcare system, account for most of the gap. Overall, the results show that socioeconomic stratification in early childcare access is closely linked to the timing and design of access processes, even in systems intended to be universal. |
| Date: | 2026–01–28 |
| URL: | https://d.repec.org/n?u=RePEc:osf:socarx:kgr67_v1 |
| By: | Keywan Christian Rasekhschaffe |
| Abstract: | We study whether generative AI can automate feature discovery in U.S. equities. Using large language models with retrieval-augmented generation and structured/programmatic prompting, we synthesize economically motivated features from analyst, options, and price-volume data. These features are then used as inputs to a tabular machine-learning model to forecast short-horizon returns. Across multiple datasets, AI-generated features are consistently competitive with baselines, with Sharpe improvements ranging from 14% to 91% depending on dataset and configuration. Retrieval quality is pivotal: better knowledge bases materially improve outcomes. The AI-generated signals are weakly correlated with traditional features, supporting combination. Overall, generative AI can meaningfully augment feature discovery when retrieval quality is controlled, producing interpretable signals while reducing manual engineering effort. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.00196 |
| By: | Zhi Yang; Lingfeng Zeng; Fangqi Lou; Qi Qi; Wei Zhang; Zhenyu Wu; Zhenxiong Yu; Jun Han; Zhiheng Jin; Lejie Zhang; Xiaoming Huang; Xiaolong Liang; Zheng Wei; Junbo Zou; Dongpo Cheng; Zhaowei Liu; Xin Guo; Rongjunchen Zhang; Liwen Zhang |
| Abstract: | Multimodal large language models are playing an increasingly significant role in empowering the financial domain, however, the challenges they face, such as multimodal and high-density information and cross-modal multi-hop reasoning, go beyond the evaluation scope of existing multimodal benchmarks. To address this gap, we propose UniFinEval, the first unified multimodal benchmark designed for high-information-density financial environments, covering text, images, and videos. UniFinEval systematically constructs five core financial scenarios grounded in real-world financial systems: Financial Statement Auditing, Company Fundamental Reasoning, Industry Trend Insights, Financial Risk Sensing, and Asset Allocation Analysis. We manually construct a high-quality dataset consisting of 3, 767 question-answer pairs in both chinese and english and systematically evaluate 10 mainstream MLLMs under Zero-Shot and CoT settings. Results show that Gemini-3-pro-preview achieves the best overall performance, yet still exhibits a substantial gap compared to financial experts. Further error analysis reveals systematic deficiencies in current models. UniFinEval aims to provide a systematic assessment of MLLMs' capabilities in fine-grained, high-information-density financial environments, thereby enhancing the robustness of MLLMs applications in real-world financial scenarios. Data and code are available at https://github.com/aifinlab/UniFinEval. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2601.22162 |