nep-big 2024-09-23 papers

on Big Data

Issue of 2024‒09‒23
29 papers chosen by
Tom Coupé, University of Canterbury

Revealed Preferences: ChatGPT’s Opinion on Economic Issues and the Economics Profession By Tom Coupé
Should Central Banks Care About Text Mining? A Literature Review By Jean-Charles Bricongne; Baptiste Meunier; Raquel Caldeira
Economic Surveillance using Corporate Text By Tarek A. Hassan; Stephan Hollander; Aakash Kalyani; Markus Schwedeler; Ahmed Tahoun; Laurence van Lent
Continuous difference-in-differences with double/debiased machine learning By Lucas Zhang
Biases in inequality of opportunity estimates: measures and solutions By Domenico Moramarco; Paolo Brunori; Pedro Salas-Rojo
Gradient Reduction Convolutional Neural Network Policy for Financial Deep Reinforcement Learning By Sina Montazeri; Haseebullah Jumakhan; Sonia Abrasiabian; Amir Mirzaeinia
High-Frequency Trading Liquidity Analysis | Application of Machine Learning Classification By Sid Bhatia; Sidharth Peri; Sam Friedman; Michelle Malen
Cross-border Commodity Pricing Strategy Optimization via Mixed Neural Network for Time Series Analysis By Lijuan Wang; Yijia Hu; Yan Zhou
How Small is Big Enough? Open Labeled Datasets and the Development of Deep Learning By Daniel Souza; Aldo Geuna; Jeff Rodr\'iguez
Deep-MacroFin: Informed Equilibrium Neural Network for Continuous Time Economic Models By Yuntao Wu; Jiayuan Guo; Goutham Gopalakrishna; Zisis Poulos
An Integrated Approach to Importance Sampling and Machine Learning for Efficient Monte Carlo Estimation of Distortion Risk Measures in Black Box Models By S\"oren Bettels; Stefan Weber
Deep Learning for the Estimation of Heterogeneous Parameters in Discrete Choice Models By Stephan Hetzenecker; Maximilian Osterhaus
Stochastic Calculus for Option Pricing with Convex Duality, Logistic Model, and Numerical Examination By Zheng Cao
Biases in inequality of opportunity estimates: measures and solutions. By Domenico Moramarco; Paolo Brunori; Pedro Salas-Rojo
Anytime-Valid Inference for Double/Debiased Machine Learning of Causal Parameters By Abhinandan Dalal; Patrick Bl\"obaum; Shiva Kasiviswanathan; Aaditya Ramdas
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications By Qianqian Xie; Dong Li; Mengxi Xiao; Zihao Jiang; Ruoyu Xiang; Xiao Zhang; Zhengyu Chen; Yueru He; Weiguang Han; Yuzhe Yang; Shunian Chen; Yifei Zhang; Lihang Shen; Daniel Kim; Zhiwei Liu; Zheheng Luo; Yangyang Yu; Yupeng Cao; Zhiyang Deng; Zhiyuan Yao; Haohang Li; Duanyu Feng; Yongfu Dai; VijayaSai Somasundaram; Peng Lu; Yilun Zhao; Yitao Long; Guojun Xiong; Kaleb Smith; Honghai Yu; Yanzhao Lai; Min Peng; Jianyun Nie; Jordan W. Suchow; Xiao-Yang Liu; Benyou Wang; Alejandro Lopez-Lira; Jimin Huang; Sophia Ananiadou
Argentina | Pronóstico de inflación de corto plazo con modelos Random Forest By Federico Daniel Forte
Less is more: AI Decision-Making using Dynamic Deep Neural Networks for Short-Term Stock Index Prediction By CJ Finnegan; James F. McCann; Salissou Moutari
Paired Completion: Flexible Quantification of Issue-framing at Scale with LLMs By Simon D Angus; Lachlan O'Neill
Enhancing Startup Success Predictions in Venture Capital: A GraphRAG Augmented Multivariate Time Series Method By Zitian Gao; Yihao Xiao
Knowledge in the 21st Century: Making Sense of Big Data By Julia M. Puaschunder
A Longitudinal Tree-Based Framework for Lapse Management in Life Insurance By Mathias Valla
Conditional nonparametric variable screening by neural factor regression By Jianqing Fan; Weining Wang; Yue Zhao
Measuring the demand for AI skills in the United Kingdom By Julia Schmidt; Graham Pilgrim; Annabelle Mourougane
Reinsurance with neural networks By Aleksandar Arandjelovi\'c; Julia Eisenberg
Solving stochastic climate-economy models: A deep least-squares Monte Carlo approach By Aleksandar Arandjelovi\'c; Pavel V. Shevchenko; Tomoko Matsui; Daisuke Murakami; Tor A. Myrvoll
Big Data Inequality By Julia M. Puaschunder
Exploring the informativeness and drivers of tone during committee meetings: the case of the Federal Reserve By Hamza Bennani; Davide Romelli
Analysing the VAT Cut Pass-Through in Spain Using Web Scraped Supermarket Data and Machine Learning By Nicolás Forteza; Elvira Prades; Marc Roca

Revealed Preferences: ChatGPT’s Opinion on Economic Issues and the Economics Profession

By:	Tom Coupé (University of Canterbury)
Abstract:	In this paper, I analyse ChatGPT’s opinion on economic issues by repeatedly prompting ChatGPT with questions from different surveys that have been used to assess the opinion of the economics profession. I find that ChatGPT 3.5 is a one-handed economist with strong opinions, while ChatGPT4o is much more of an ‘average’ economist. I further find little evidence that the widespread use of ChatGPT4o could reduce the gap between what the general public thinks about economic issues and the economics’ profession views on those issues, that ChatGPT4o is about equally likely to prefer professors’ financial advice and the financial advice from popular books, and that ChatGPT4o is more likely to agree with less/nonmainstream views about the economics profession than the economics profession.
Keywords:	ChatGPT, Economic Opinion, Economists' Consensus, Public Policy, Artificial Intelligence
JEL:	C83 A11 D80 D83
Date:	2024–08–01
URL:	https://d.repec.org/n?u=RePEc:cbt:econwp:24/13

Should Central Banks Care About Text Mining? A Literature Review

By:	Jean-Charles Bricongne; Baptiste Meunier; Raquel Caldeira
Abstract:	As text mining has expanded in economics, central banks appear to also have ridden this wave, as we review use cases of text mining across central banks and supervisory institutions. Text mining is a polyvalent tool to gauge the economic outlook in which central banks operate, notably as an innovative way to measure inflation expectations. This is also a pivotal tool to assess risks to financial stability. Beyond financial markets, text mining can also help supervising individual financial institutions. As central banks increasingly consider issues such as the climate challenge, text mining also allows to assess the perception of climate-related risks and banks’ preparedness. Besides, the analysis of central banks’ communication provides a feedback tool on how to best convey decisions. Albeit powerful, text mining complements – rather than replaces – the usual indicators and procedures at central banks. Going forward, generative AI opens new frontiers for the use of textual data.
Keywords:	Text Mining, Sentiment Analysis, Central Banking, Generative AI, Language Models
JEL:	C38 C55 C82 E58 L82
Date:	2024
URL:	https://d.repec.org/n?u=RePEc:bfr:banfra:950

Economic Surveillance using Corporate Text

By:	Tarek A. Hassan; Stephan Hollander; Aakash Kalyani; Markus Schwedeler; Ahmed Tahoun; Laurence van Lent
Abstract:	FULL AND CORRECT ORDER OF AUTHORS: Tarek A. Hassan, Stephan Hollander, Aakash Kalyani, Laurence van Lent, Markus Schwedeler, and Ahmed Tahoun. This article applies simple methods from computational linguistics to analyze unstructured corporate texts for economic surveillance. We apply text-as-data approaches to earnings conference call transcripts, patent texts, and job postings to uncover unique insights into how markets and firms respond to economic shocks, such as a nuclear disaster or a geopolitical event---insights that often elude traditional data sources. This method enhances our ability to extract actionable intelligence from textual data, thereby aiding policy-making and strategic corporate decisions. By integrating computational linguistics into the analysis of economic shocks, our study opens new possibilities for real-time economic surveillance and offers a more nuanced understanding of firm-level reactions in volatile economic environments.
Keywords:	text as data
JEL:	C55
Date:	2024–09
URL:	https://d.repec.org/n?u=RePEc:fip:fedlwp:98767

Continuous difference-in-differences with double/debiased machine learning

By:	Lucas Zhang
Abstract:	This paper extends difference-in-differences to settings involving continuous treatments. Specifically, the average treatment effect on the treated (ATT) at any level of continuous treatment intensity is identified using a conditional parallel trends assumption. In this framework, estimating the ATTs requires first estimating infinite-dimensional nuisance parameters, especially the conditional density of the continuous treatment, which can introduce significant biases. To address this challenge, estimators for the causal parameters are proposed under the double/debiased machine learning framework. We show that these estimators are asymptotically normal and provide consistent variance estimators. To illustrate the effectiveness of our methods, we re-examine the study by Acemoglu and Finkelstein (2008), which assessed the effects of the 1983 Medicare Prospective Payment System (PPS) reform. By reinterpreting their research design using a difference-in-differences approach with continuous treatment, we nonparametrically estimate the treatment effects of the 1983 PPS reform, thereby providing a more detailed understanding of its impact.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.10509

Biases in inequality of opportunity estimates: measures and solutions

By:	Domenico Moramarco (University of Bari - Department of Economics and Finance); Paolo Brunori (University of Firenze and LSE - International Inequalities Institute); Pedro Salas-Rojo (LSE - International Inequalities Institute)
Abstract:	In this paper we discuss some limitations of using survey data to measure inequality of opportunity. First, we highlight a link between the two fundamental principles of the theory of equal opportunities -- compensation and reward -- and the concepts of power and confidence levels in hypothesis testing. This connection can be used to address, for example, whether a sample has sufficient observations to appropriately measure inequality of opportunity. Second, we propose a set of tools to normatively assess inequality of opportunity estimates in any type partition. We apply our proposal to Conditional Inference Trees, a machine learning technique that has received growing attention in the literature. Finally, guided by such tools, we suggest that standard tree-based partitions can be manipulated to reduce the risk of compensation and reward principles.Our methodological contribution is complemented with an application using a quasi-administrative sample of Italian PhD graduates. We find a substantial level of labor income inequality among two cohorts of PhD graduates (2012 and 2014), with a significant portion explained by circumstances beyond their control.
Keywords:	Equality of opportunity, Machine learning, PhD graduates, Compensation, Reward
JEL:	C38 D31 D63
Date:	2024–09
URL:	https://d.repec.org/n?u=RePEc:inq:inqwps:ecineq2024-675

Gradient Reduction Convolutional Neural Network Policy for Financial Deep Reinforcement Learning

By:	Sina Montazeri; Haseebullah Jumakhan; Sonia Abrasiabian; Amir Mirzaeinia
Abstract:	Building on our prior explorations of convolutional neural networks (CNNs) for financial data processing, this paper introduces two significant enhancements to refine our CNN model's predictive performance and robustness for financial tabular data. Firstly, we integrate a normalization layer at the input stage to ensure consistent feature scaling, addressing the issue of disparate feature magnitudes that can skew the learning process. This modification is hypothesized to aid in stabilizing the training dynamics and improving the model's generalization across diverse financial datasets. Secondly, we employ a Gradient Reduction Architecture, where earlier layers are wider and subsequent layers are progressively narrower. This enhancement is designed to enable the model to capture more complex and subtle patterns within the data, a crucial factor in accurately predicting financial outcomes. These advancements directly respond to the limitations identified in previous studies, where simpler models struggled with the complexity and variability inherent in financial applications. Initial tests confirm that these changes improve accuracy and model stability, suggesting that deeper and more nuanced network architectures can significantly benefit financial predictive tasks. This paper details the implementation of these enhancements and evaluates their impact on the model's performance in a controlled experimental setting.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.11859

High-Frequency Trading Liquidity Analysis | Application of Machine Learning Classification

By:	Sid Bhatia; Sidharth Peri; Sam Friedman; Michelle Malen
Abstract:	This research presents a comprehensive framework for analyzing liquidity in financial markets, particularly in the context of high-frequency trading. By leveraging advanced machine learning classification techniques, including Logistic Regression, Support Vector Machine, and Random Forest, the study aims to predict minute-level price movements using an extensive set of liquidity metrics derived from the Trade and Quote (TAQ) data. The findings reveal that employing a broad spectrum of liquidity measures yields higher predictive accuracy compared to models utilizing a reduced subset of features. Key liquidity metrics, such as Liquidity Ratio, Flow Ratio, and Turnover, consistently emerged as significant predictors across all models, with the Random Forest algorithm demonstrating superior accuracy. This study not only underscores the critical role of liquidity in market stability and transaction costs but also highlights the complexities involved in short-interval market predictions. The research suggests that a comprehensive set of liquidity measures is essential for accurate prediction, and proposes future work to validate these findings across different stock datasets to assess their generalizability.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.10016

Cross-border Commodity Pricing Strategy Optimization via Mixed Neural Network for Time Series Analysis

By:	Lijuan Wang; Yijia Hu; Yan Zhou
Abstract:	In the context of global trade, cross-border commodity pricing largely determines the competitiveness and market share of businesses. However, existing methodologies often prove inadequate, as they lack the agility and precision required to effectively respond to the dynamic international markets. Time series data is of great significance in commodity pricing and can reveal market dynamics and trends. Therefore, we propose a new method based on the hybrid neural network model CNN-BiGRU-SSA. The goal is to achieve accurate prediction and optimization of cross-border commodity pricing strategies through in-depth analysis and optimization of time series data. Our model undergoes experimental validation across multiple datasets. The results show that our method achieves significant performance advantages on datasets such as UNCTAD, IMF, WITS and China Customs. For example, on the UNCTAD dataset, our model reduces MAE to 4.357, RMSE to 5.406, and R2 to 0.961, significantly better than other models. On the IMF and WITS datasets, our method also achieves similar excellent performance. These experimental results verify the effectiveness and reliability of our model in the field of cross-border commodity pricing. Overall, this study provides an important reference for enterprises to formulate more reasonable and effective cross-border commodity pricing strategies, thereby enhancing market competitiveness and profitability. At the same time, our method also lays a foundation for the application of deep learning in the fields of international trade and economic strategy optimization, which has important theoretical and practical significance.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.12115

How Small is Big Enough? Open Labeled Datasets and the Development of Deep Learning

By:	Daniel Souza; Aldo Geuna; Jeff Rodr\'iguez
Abstract:	We investigate the emergence of Deep Learning as a technoscientific field, emphasizing the role of open labeled datasets. Through qualitative and quantitative analyses, we evaluate the role of datasets like CIFAR-10 in advancing computer vision and object recognition, which are central to the Deep Learning revolution. Our findings highlight CIFAR-10's crucial role and enduring influence on the field, as well as its importance in teaching ML techniques. Results also indicate that dataset characteristics such as size, number of instances, and number of categories, were key factors. Econometric analysis confirms that CIFAR-10, a small-but-sufficiently-large open dataset, played a significant and lasting role in technological advancements and had a major function in the development of the early scientific literature as shown by citation metrics.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.10359

Deep-MacroFin: Informed Equilibrium Neural Network for Continuous Time Economic Models

By:	Yuntao Wu; Jiayuan Guo; Goutham Gopalakrishna; Zisis Poulos
Abstract:	In this paper, we present Deep-MacroFin, a comprehensive framework designed to solve partial differential equations, with a particular focus on models in continuous time economics. This framework leverages deep learning methodologies, including conventional Multi-Layer Perceptrons and the newly developed Kolmogorov-Arnold Networks. It is optimized using economic information encapsulated by Hamilton-Jacobi-Bellman equations and coupled algebraic equations. The application of neural networks holds the promise of accurately resolving high-dimensional problems with fewer computational demands and limitations compared to standard numerical methods. This versatile framework can be readily adapted for elementary differential equations, and systems of differential equations, even in cases where the solutions may exhibit discontinuities. Importantly, it offers a more straightforward and user-friendly implementation than existing libraries.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.10368

An Integrated Approach to Importance Sampling and Machine Learning for Efficient Monte Carlo Estimation of Distortion Risk Measures in Black Box Models

By:	S\"oren Bettels; Stefan Weber
Abstract:	Distortion risk measures play a critical role in quantifying risks associated with uncertain outcomes. Accurately estimating these risk measures in the context of computationally expensive simulation models that lack analytical tractability is fundamental to effective risk management and decision making. In this paper, we propose an efficient important sampling method for distortion risk measures in such models that reduces the computational cost through machine learning. We demonstrate the applicability and efficiency of the Monte Carlo method in numerical experiments on various distortion risk measures and models.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.02401

Deep Learning for the Estimation of Heterogeneous Parameters in Discrete Choice Models

By:	Stephan Hetzenecker; Maximilian Osterhaus
Abstract:	This paper studies the finite sample performance of the flexible estimation approach of Farrell, Liang, and Misra (2021a), who propose to use deep learning for the estimation of heterogeneous parameters in economic models, in the context of discrete choice models. The approach combines the structure imposed by economic models with the flexibility of deep learning, which assures the interpretebility of results on the one hand, and allows estimating flexible functional forms of observed heterogeneity on the other hand. For inference after the estimation with deep learning, Farrell et al. (2021a) derive an influence function that can be applied to many quantities of interest. We conduct a series of Monte Carlo experiments that investigate the impact of regularization on the proposed estimation and inference procedure in the context of discrete choice models. The results show that the deep learning approach generally leads to precise estimates of the true average parameters and that regular robust standard errors lead to invalid inference results, showing the need for the influence function approach for inference. Without regularization, the influence function approach can lead to substantial bias and large estimated standard errors caused by extreme outliers. Regularization reduces this property and stabilizes the estimation procedure, but at the expense of inducing an additional bias. The bias in combination with decreasing variance associated with increasing regularization leads to the construction of invalid inferential statements in our experiments. Repeated sample splitting, unlike regularization, stabilizes the estimation approach without introducing an additional bias, thereby allowing for the construction of valid inferential statements.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.09560

Stochastic Calculus for Option Pricing with Convex Duality, Logistic Model, and Numerical Examination

By:	Zheng Cao
Abstract:	This thesis explores the historical progression and theoretical constructs of financial mathematics, with an in-depth exploration of Stochastic Calculus as showcased in the Binomial Asset Pricing Model and the Continuous-Time Models. A comprehensive survey of stochastic calculus principles applied to option pricing is offered, highlighting insights from Peter Carr and Lorenzo Torricelli's ``Convex Duality in Continuous Option Pricing Models". This manuscript adopts techniques such as Monte-Carlo Simulation and machine learning algorithms to examine the propositions of Carr and Torricelli, drawing comparisons between the Logistic and Bachelier models. Additionally, it suggests directions for potential future research on option pricing methods.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.05672

Biases in inequality of opportunity estimates: measures and solutions.

By:	Domenico Moramarco (University of Bari); Paolo Brunori (University of Firenze and London School of Economics); Pedro Salas-Rojo (London School of Economics)
Abstract:	In this paper we discuss some limitations of using survey data to measure inequality of opportunity. First, we highlight a link between the two fundamental principles of the theory of equal opportunities - compensation and reward - and the concepts of power and confidence levels in hypothesis testing. This connection can be used to address, for example, whether a sample has sufficient observations to appropriately measure inequality of opportunity. Second, we propose a set of tools to normatively assess inequality of opportunity estimates in any type partition. We apply our proposal to Conditional Inference Trees, a machine learning technique that has received growing attention in the literature. Finally, guided by such tools, we suggest that standard tree-based partitions can be manipulated to reduce the risk of compensation and reward principles. Our methodological contribution is complemented with an application using a quasi-administrative sample of Italian PhD graduates. We find a substantial level of labor income inequality among two cohorts of PhD graduates (2012 and 2014), with a significant portion explained by circumstances beyond their control.
Keywords:	Equality of opportunity, Machine learning, PhD graduates, Compensation, Reward.
JEL:	C38 D31 D63
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:bai:series:series_wp_02-2024

Anytime-Valid Inference for Double/Debiased Machine Learning of Causal Parameters

By:	Abhinandan Dalal; Patrick Bl\"obaum; Shiva Kasiviswanathan; Aaditya Ramdas
Abstract:	Double (debiased) machine learning (DML) has seen widespread use in recent years for learning causal/structural parameters, in part due to its flexibility and adaptability to high-dimensional nuisance functions as well as its ability to avoid bias from regularization or overfitting. However, the classic double-debiased framework is only valid asymptotically for a predetermined sample size, thus lacking the flexibility of collecting more data if sharper inference is needed, or stopping data collection early if useful inferences can be made earlier than expected. This can be of particular concern in large scale experimental studies with huge financial costs or human lives at stake, as well as in observational studies where the length of confidence of intervals do not shrink to zero even with increasing sample size due to partial identifiability of a structural parameter. In this paper, we present time-uniform counterparts to the asymptotic DML results, enabling valid inference and confidence intervals for structural parameters to be constructed at any arbitrary (possibly data-dependent) stopping time. We provide conditions which are only slightly stronger than the standard DML conditions, but offer the stronger guarantee for anytime-valid inference. This facilitates the transformation of any existing DML method to provide anytime-valid guarantees with minimal modifications, making it highly adaptable and easy to use. We illustrate our procedure using two instances: a) local average treatment effect in online experiments with non-compliance, and b) partial identification of average treatment effect in observational studies with potential unmeasured confounding.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.09598

Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

By:	Qianqian Xie; Dong Li; Mengxi Xiao; Zihao Jiang; Ruoyu Xiang; Xiao Zhang; Zhengyu Chen; Yueru He; Weiguang Han; Yuzhe Yang; Shunian Chen; Yifei Zhang; Lihang Shen; Daniel Kim; Zhiwei Liu; Zheheng Luo; Yangyang Yu; Yupeng Cao; Zhiyang Deng; Zhiyuan Yao; Haohang Li; Duanyu Feng; Yongfu Dai; VijayaSai Somasundaram; Peng Lu; Yilun Zhao; Yitao Long; Guojun Xiong; Kaleb Smith; Honghai Yu; Yanzhao Lai; Min Peng; Jianyun Nie; Jordan W. Suchow; Xiao-Yang Liu; Benyou Wang; Alejandro Lopez-Lira; Jimin Huang; Sophia Ananiadou
Abstract:	Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. To address these limitations, we introduce \textit{Open-FinLLMs}, a series of Financial LLMs. We begin with FinLLaMA, pre-trained on a 52 billion token financial corpus, incorporating text, tables, and time-series data to embed comprehensive financial knowledge. FinLLaMA is then instruction fine-tuned with 573K financial instructions, resulting in FinLLaMA-instruct, which enhances task performance. Finally, we present FinLLaVA, a multimodal LLM trained with 1.43M image-text instructions to handle complex financial data types. Extensive evaluations demonstrate FinLLaMA's superior performance over LLaMA3-8B, LLaMA3.1-8B, and BloombergGPT in both zero-shot and few-shot settings across 19 and 4 datasets, respectively. FinLLaMA-instruct outperforms GPT-4 and other Financial LLMs on 15 datasets. FinLLaVA excels in understanding tables and charts across 4 multimodal tasks. Additionally, FinLLaMA achieves impressive Sharpe Ratios in trading simulations, highlighting its robust financial application capabilities. We will continually maintain and improve our models and benchmarks to support ongoing innovation in academia and industry.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.11878

Argentina | Pronóstico de inflación de corto plazo con modelos Random Forest

By:	Federico Daniel Forte
Abstract:	El presente trabajo examina el desempeño de los modelos Random Forest para pronosticar la inflación mensual de corto plazo en Argentina, utilizando una base de datos con indicadores en frecuencia mensual desde 1962. This paper examines the performance of Random Forest models in forecasting short-term monthly inflation in Argentina, based on a database of monthly indicators since 1962.
Keywords:	Interest rates, Tasas de interés, Monetary policy, Política monetaria, Inflation, Inflación, Argentina, Argentina, Analysis with Big Data, Análisis con Big Data, Macroeconomic Analysis, Análisis Macroeconómico, Working Paper, Documento de Trabajo
JEL:	C14 E31 E37
Date:	2024–09
URL:	https://d.repec.org/n?u=RePEc:bbv:wpaper:2410

Less is more: AI Decision-Making using Dynamic Deep Neural Networks for Short-Term Stock Index Prediction

By:	CJ Finnegan; James F. McCann; Salissou Moutari
Abstract:	In this paper we introduce a multi-agent deep-learning method which trades in the Futures markets based on the US S&P 500 index. The method (referred to as Model A) is an innovation founded on existing well-established machine-learning models which sample market prices and associated derivatives in order to decide whether the investment should be long/short or closed (zero exposure), on a day-to-day decision. We compare the predictions with some conventional machine-learning methods namely, Long Short-Term Memory, Random Forest and Gradient-Boosted-Trees. Results are benchmarked against a passive model in which the Futures contracts are held (long) continuously with the same exposure (level of investment). Historical tests are based on daily daytime trading carried out over a period of 6 calendar years (2018-23). We find that Model A outperforms the passive investment in key performance metrics, placing it within the top quartile performance of US Large Cap active fund managers. Model A also outperforms the three machine-learning classification comparators over this period. We observe that Model A is extremely efficient (doing less and getting more) with an exposure to the market of only 41.95% compared to the 100% market exposure of the passive investment, and thus provides increased profitability with reduced risk.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.11740

Paired Completion: Flexible Quantification of Issue-framing at Scale with LLMs

By:	Simon D Angus; Lachlan O'Neill
Abstract:	Detecting and quantifying issue framing in textual discourse - the perspective one takes to a given topic (e.g. climate science vs. denialism, misogyny vs. gender equality) - is highly valuable to a range of end-users from social and political scientists to program evaluators and policy analysts. However, conceptual framing is notoriously challenging for automated natural language processing (NLP) methods since the words and phrases used by either `side' of an issue are often held in common, with only subtle stylistic flourishes separating their use. Here we develop and rigorously evaluate new detection methods for issue framing and narrative analysis within large text datasets. By introducing a novel application of next-token log probabilities derived from generative large language models (LLMs) we show that issue framing can be reliably and efficiently detected in large corpora with only a few examples of either perspective on a given issue, a method we call `paired completion'. Through 192 independent experiments over three novel, synthetic datasets, we evaluate paired completion against prompt-based LLM methods and labelled methods using traditional NLP and recent LLM contextual embeddings. We additionally conduct a cost-based analysis to mark out the feasible set of performant methods at production-level scales, and a model bias analysis. Together, our work demonstrates a feasible path to scalable, accurate and low-bias issue-framing in large corpora.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.09742

Enhancing Startup Success Predictions in Venture Capital: A GraphRAG Augmented Multivariate Time Series Method

By:	Zitian Gao; Yihao Xiao
Abstract:	In the Venture Capital(VC) industry, predicting the success of startups is challenging due to limited financial data and the need for subjective revenue forecasts. Previous methods based on time series analysis or deep learning often fall short as they fail to incorporate crucial inter-company relationships such as competition and collaboration. Regarding the issues, we propose a novel approach using GrahphRAG augmented time series model. With GraphRAG, time series predictive methods are enhanced by integrating these vital relationships into the analysis framework, allowing for a more dynamic understanding of the startup ecosystem in venture capital. Our experimental results demonstrate that our model significantly outperforms previous models in startup success predictions. To the best of our knowledge, our work is the first application work of GraphRAG.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.09420

Knowledge in the 21st Century: Making Sense of Big Data

By:	Julia M. Puaschunder (Columbia University, USA)
Abstract:	Economics is built on the idea of scarcity. The rational agent is meant to decide efficiently given scarce resources. Friedrich von Hayek challenged mainstream economic scarcity focus by proposing that economics is also about making sense of an abundance of knowledge that is dispersedly shared throughout society. Born out of the internet and digitalization, we live in the age of the advent of big data science. Computational advantages in light of enormous data storage and analytic powers have allowed to gain unprecedented information transfers out of our use of technology. Todayâ€™s most profitable corporations in the world all derive value from big data insights. But extracting sense from a massive amount of data generated online on a constant basis has also become an enormous environmental burden, which is often not discussed or thematized. While marketing ensures to frame cloud storage as something light and intangible, the reality is that data hoarding has become an environmentally-burdensome practice that may not be sustainable given the pace digitalization is advancing, e.g., with 5G and Artificial Intelligence (AI) encroaching every aspect of human life. Big data is stored in facilities that resemble warehouses with enormous electricity consumption for the cooling of real-time data processors. In light of the rising trend of data storage as well as environmental conscientiousness demands at the same time, we may revisit Hayekâ€™s idea of the knowledge paradigm and connect it to scarcity. Economics is called for providing models that explain how to make sense of data efficiently. In particular, Friedrich von Hayekâ€™s knowledge paradigm could offer insights on how and what kind of information should be stored to become knowledgeable and what kind of information warrants for scarcity of being neglected to conserve and simply been forgotten over time to be in line with overall efficiency and sustainability demands to pass the earth onto future generations meaningfully and viably.
Keywords:	behavioral economics, carbon footprint, data storage, digitalization, discounting, environmental conscientiousness, environmentalism, internet, knowledge, law & economics, scarcity, Sustainable Development Goals, sustainability
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:smo:raiswp:0386

A Longitudinal Tree-Based Framework for Lapse Management in Life Insurance

By:	Mathias Valla (LSAF - Laboratoire de Sciences Actuarielles et Financières [Lyon] - ISFA - Institut de Science Financière et d'Assurances, FEB - Faculty of Economics and Business - KU Leuven - Catholic University of Leuven = Katholieke Universiteit Leuven)
Abstract:	Developing an informed lapse management strategy (LMS) is critical for life insurers to improve profitability and gain insight into the risk of their global portfolio. Prior research in actuarial science has shown that targeting policyholders by maximising their individual customer lifetime value is more advantageous than targeting all those likely to lapse. However, most existing lapse analyses do not leverage the variability of features and targets over time. We propose a longitudinal LMS framework, utilising tree-based models for longitudinal data, such as left-truncated and right-censored (LTRC) trees and forests, as well as mixed-effect tree-based models. Our methodology provides time-informed insights, leading to increased precision in targeting. Our findings indicate that the use of longitudinally structured data significantly enhances the precision of models in predicting lapse behaviour, estimating customer lifetime value, and evaluating individual retention gains. The implementation of mixed-effect random forests enables the production of time-varying predictions that are highly relevant for decision-making. This paper contributes to the field of lapse analysis for life insurers by demonstrating the importance of exploiting the complete past trajectory of policyholders, which is often available in insurers' information systems but has yet to be fully utilised.
Keywords:	Lapse management strategy, Longitudinal Analysis, Machine learning, Life insurance, Customer lifetime value
Date:	2024–08–05
URL:	https://d.repec.org/n?u=RePEc:hal:journl:hal-04178278

Conditional nonparametric variable screening by neural factor regression

By:	Jianqing Fan (Princeton University); Weining Wang (University of Groningen); Yue Zhao (University of York)
Abstract:	High-dimensional covariates often admit linear factor structure. To effectively screen correlated covariates in high-dimension, we propose a conditional variable screening test based on non-parametric regression using neural networks due to their representation power. We ask the question whether individual covariates have additional contributions given the latent factors or more generally a set of variables. Our test statistics are based on the estimated partial derivative of the regression function of the candidate variable for screening and a observable proxy for the latent factors. Hence, our test reveals how much predictors contribute additionally to the non-parametric regression after accounting for the latent factors. Our derivative estimator is the convolution of a deep neural network regression estimator and a smoothing kernel. We demonstrate that when the neural network size diverges with the sample size, unlike estimating the regression function itself, it is necessary to smooth the partial derivative of the neural network estimator to recover the desired convergence rate for the derivative. Moreover, our screening test achieves asymptotic normality under the null after finely centering our test statistics that makes the biases negligible, as well as consistency for local alternatives under mild conditions. We demonstrate the performance of our test in a simulation study and two real world applications.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.10825

Measuring the demand for AI skills in the United Kingdom

By:	Julia Schmidt; Graham Pilgrim; Annabelle Mourougane
Abstract:	This paper estimates the artificial intelligence-hiring intensity of occupations/industries (i.e. the share of job postings related to AI skills) in the United Kingdom during 2012-22. The analysis deploys a natural language processing algorithm (NLP) on online job postings, collected by Lightcast, which provides timely and detailed insights into labour demand for different professions. The key contribution of the study lies in the design of the classification rule identifying jobs as AI-related which, contrary to the existing literature, goes beyond the simple use of keywords. Moreover, the methodology allows for comparisons between data-hiring intensive jobs, defined as the share of jobs related to data production tasks, and AI-hiring intensive jobs. Estimates point to a rise in the economy-wide AI-hiring intensity in the United Kingdom over the past decade but to fairly small levels (reaching 0.6% on average over the 2017-22 period). Over time, the demand for AI-related jobs has spread outside the traditional Information, Communication and Telecommunications industries, with the Finance and Insurance industry increasingly demanding AI skills. At a regional level, the higher demand for AI-related jobs is found in London and research hubs. At the occupation level, marked changes in the demand for AI skills are also visible. Professions such as data scientist, computer scientist, hardware engineer and robotics engineer are estimated to be the most AI-hiring intense occupations in the United Kingdom. The data and methodology used allow for the exploration of cross-country estimates in the future.
Keywords:	AI-hiring intensity, artificial intelligence, job advertisements, natural language processing, united kingdom
JEL:	C80 C88 E01 J21
Date:	2024–09–05
URL:	https://d.repec.org/n?u=RePEc:oec:comaaa:25-en

Reinsurance with neural networks

By:	Aleksandar Arandjelovi\'c; Julia Eisenberg
Abstract:	We consider an insurance company which faces financial risk in the form of insurance claims and market-dependent surplus fluctuations. The company aims to simultaneously control its terminal wealth (e.g. at the end of an accounting period) and the ruin probability in a finite time interval by purchasing reinsurance. The target functional is given by the expected utility of terminal wealth perturbed by a modified Gerber-Shiu penalty function. We solve the problem of finding the optimal reinsurance strategy and the corresponding maximal target functional via neural networks. The procedure is illustrated by a numerical example, where the surplus process is given by a Cram\'er-Lundberg model perturbed by a mean-reverting Ornstein-Uhlenbeck process.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.06168

Solving stochastic climate-economy models: A deep least-squares Monte Carlo approach

By:	Aleksandar Arandjelovi\'c; Pavel V. Shevchenko; Tomoko Matsui; Daisuke Murakami; Tor A. Myrvoll
Abstract:	Stochastic versions of recursive integrated climate-economy assessment models are essential for studying and quantifying policy decisions under uncertainty. However, as the number of stochastic shocks increases, solving these models as dynamic programming problems using deterministic grid methods becomes computationally infeasible, and simulation-based methods are needed. The least-squares Monte Carlo (LSMC) method has become popular for solving optimal stochastic control problems in quantitative finance. In this paper, we extend the application of the LSMC method to stochastic climate-economy models. We exemplify this approach using a stochastic version of the DICE model with all five main uncertainties discussed in the literature. To address the complexity and high dimensionality of these models, we incorporate deep neural network approximations in place of standard regression techniques within the LSMC framework. Our results demonstrate that the deep LSMC method can be used to efficiently derive optimal policies for climate-economy models in the presence of uncertainty.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.09642

Big Data Inequality

By:	Julia M. Puaschunder (Columbia University, USA)
Abstract:	The age of digitalization has led to a rising big data insights trend. Our constant use of digital tools to master our world and our all communication via modern technologies has increased data transfer to be stored and analyzed. As never before, we are now able to derive inferences from big data. The most profitable corporations in the world are currently big data analyzing entities. In an effort to redistribute some of the gains of big data inferences to those who create the information and share their information output on a constant basis, several solutions have been proposed. Granting property rights to information retrieved online is one of the most promising solutions to cope with the fact that corporate capital is gained from our all information sharing online on a constant basis. When considering the establishment of private property rights of oneâ€™s own data, the advantage lies in the controllability of information sharing and the monetization of information shared online. At the same time, inequality may be imbued in the idea to â€˜sellâ€™ private property data to big data analyzing entities. First, private property rights in data created online could lead to a divide between those who create more interesting and meaningful information by actively using the internet rather than passively consuming it. And people may differ in the degree of useful connections and meaningful conversations with them. Divides between US internet users versus European ones, which already now are skewed towards the US being more active internet users in comparison to Europeans being more passive ones, will rise. The education and income gap may also exacerbate if those with more skilled mindsets or those who can afford more sophisticated technology will be able to produce qualitatively and quantitatively richer data sources. Second, enabling to sell big data would incentivize a productive, active and meaningful use of the internet, which would set positive incentives to develop human capital in general. At the same time, however, there is the problem of abuse in markets. The reason why certain goods are not traded in markets is the fear over abuse and exploitation of minors or specially gifted. Like the restrictions of being able to sell oneâ€™s organs in markets as for the fear that people may then start harvesting and exploiting dependents with limited mental capacity; similar restrictions may apply to the sales of internet data. Parents or custodians should not be incentivized to capitalize on their dependentsâ€™ data. Third, data brokerage may become a lucrative business if one can sell data online. However, data brokerage platforms may favor certain digitalization hubs in the world which have the legal capacity and technological sophistication to implement high-tech market capitalization from data efficiently and effectively. This logistic peculiarity may hold risks of unjust enrichment of some advanced nations over other less digitalized areas of the world, which may drive the existing economic power divide in the international arena even further in the future. Potential remedies of alternative remedies are to tax big data gains and redistribute some of the gains to those equally whose data serves as building block for big data insights.
Keywords:	big data, data storage, digitalization, Digital Markets Act, inequality, internet, knowledge, law, economics, privacy, private property rights, redistribution, sustainability, taxation, wealth transfer
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:smo:raiswp:0415

Exploring the informativeness and drivers of tone during committee meetings: the case of the Federal Reserve

By:	Hamza Bennani (LEMNA - Laboratoire d'économie et de management de Nantes Atlantique - Nantes Univ - IAE Nantes - Nantes Université - Institut d'Administration des Entreprises - Nantes - Nantes Université - pôle Sociétés - Nantes Univ - Nantes Université); Davide Romelli (Trinity College Dublin)
Abstract:	This paper examines the informativeness and drivers of the tone used by FOMC members to gain insights into the decision-making process of the FOMC. We use a bag-of-words approach to measure the tone of transcripts at the speaker-meeting-round level from 1992-2009 and find persistent differences in tone among FOMC members. We also document how Presidents of regional Federal Reserve Banks use a more volatile and positive tone than the members of the Federal Reserve Bank Board of Governors. Next, we investigate whether the tone used during FOMC deliberations is associated with future monetary policy decisions and study the drivers of differences in tone among FOMC members. Our results suggest that tone is useful to predict future policy decisions and that differences in tone are mainly associated with the differences in the individual inflation projections of FOMC members.
Keywords:	Central banks, Federal Reserve, FOMC, monetary policy committees, text analysis
Date:	2024–08–12
URL:	https://d.repec.org/n?u=RePEc:hal:journl:hal-04670309

Analysing the VAT Cut Pass-Through in Spain Using Web Scraped Supermarket Data and Machine Learning

By:	Nicolás Forteza; Elvira Prades; Marc Roca
Abstract:	On 28 December 2022, the Spanish government announced a temporary value added tax (VAT) rate reduction for selected products. VAT rates were cut on January 1, 2023 and are expected to go back to their previous level by mid-2024. Using a web-scraped data set, we leverage machine-learning techniques to classify each product according to categories of the official classification by the statistical office (COICOP5). Then we study the price effects of the temporary VAT rate reduction covering the daily prices of roughly 10, 000 food products sold on-line by a Spanish supermarket. To identify causal effects, we compare the evolution of prices for treated items (that is, subject to the tax policy) against a control group (food items out of the policy's scope). Our findings indicate that, at the supermarket level, the pass-through was almost complete. We observe differences in the speed of pass-through across different product types.
Keywords:	Price Rigidity, Inflation, Consumer Prices, Heterogeneity, Microdata, VAT Pass-Through
JEL:	E31 H22 H25
Date:	2024
URL:	https://d.repec.org/n?u=RePEc:bfr:banfra:951

This nep-big issue is ©2024 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.