nep-big New Economics Papers
on Big Data
Issue of 2024–11–25
fifteen papers chosen by
Tom Coupé, University of Canterbury


  1. A Hands-On Machine Learning Primer for Social Scientists: Math, Algorithms and Code By Nikos Askitas; Nikolaos Askitas
  2. Transforming Urban Planning through Machine Learning: A Study on Planning Application Classification using Natural Language Processing By Lin, Yang; Thackway, William; Soundararaj, Balamurugan; Eagleson, Serryn; Han, Hoon; Pettit, Christopher
  3. Optimization of Actuarial Neural Networks with Response Surface Methodology By Belguutei Ariuntugs; Kehelwala Dewage Gayan Madurang
  4. Capturing the Complexity of Human Strategic Decision-Making with Machine Learning By Jian-Qiao Zhu; Joshua C. Peterson; Benjamin Enke; Thomas L. Griffiths
  5. Stock Price Prediction and Traditional Models: An Approach to Achieve Short-, Medium- and Long-Term Goals By Opeyemi Sheu Alamu; Md Kamrul Siam
  6. Machine Learning et Veille économique : Analyse des données RePEc à l’aide des techniques du NLP By Mohamed Bassi
  7. LTPNet Integration of Deep Learning and Environmental Decision Support Systems for Renewable Energy Demand Forecasting By Te Li; Mengze Zhang; Yan Zhou
  8. Spooky Boundaries at a Distance: Inductive Bias, Dynamic Models, and Behavioral Macro By Mahdi Ebrahimi Kahou; Jesús Fernández-Villaverde; Sebastián Gómez-Cardona; Jesse Perla; Jan Rosa
  9. Solving The Dynamic Volatility Fitting Problem: A Deep Reinforcement Learning Approach By Emmanuel Gnabeyeu; Omar Karkar; Imad Idboufous
  10. Computing Systemic Risk Measures with Graph Neural Networks By Lukas Gonon; Thilo Meyer-Brandis; Niklas Weber
  11. Neuroevolution Neural Architecture Search for Evolving RNNs in Stock Return Prediction and Portfolio Trading By Zimeng Lyu; Amulya Saxena; Rohaan Nadeem; Hao Zhang; Travis Desell
  12. Hierarchical Reinforced Trader (HRT): A Bi-Level Approach for Optimizing Stock Selection and Execution By Zijie Zhao; Roy E. Welsch
  13. Temporal Relational Reasoning of Large Language Models for Detecting Stock Portfolio Crashes By Kelvin J. L. Koa; Yunshan Ma; Ritchie Ng; Huanhuan Zheng; Tat-Seng Chua
  14. Aggregation Trees By Riccardo Di Francesco
  15. Measuring Climate Policy Uncertainty with LLMs: New Insights into Corporate Bond Credit Spreads By Yikai Zhao; Jun Nagayasu; Xinyi Geng

  1. By: Nikos Askitas; Nikolaos Askitas
    Abstract: This paper addresses the steep learning curve in Machine Learning faced by non-computer scientists, particularly social scientists, stemming from the absence of a primer on its fundamental principles. I adopt a pedagogical strategy inspired by the adage ”once you understand OLS, you can work your way up to any other estimator, ” and apply it to Machine Learning. Focusing on a single-hidden-layer artificial neural network, the paper discusses its mathematical underpinnings, including the pivotal Universal Approximation Theorem—an essential ”existence theorem”. The exposition extends to the algorithmic exploration of solutions, specifically through “feed forward” and “back-propagation”, and rounds up with the practical implementation in Python. The objective of this primer is to equip readers with a solid elementary comprehension of first principles and fire some trailblazers to the forefront of AI and causal machine learning.
    Keywords: machine learning, deep learning, supervised learning, artificial neural network, perceptron, Python, keras, tensorflow, universal approximation theorem
    JEL: C01 C87 C00 C60
    Date: 2024
    URL: https://d.repec.org/n?u=RePEc:ces:ceswps:_11353
  2. By: Lin, Yang; Thackway, William; Soundararaj, Balamurugan; Eagleson, Serryn; Han, Hoon; Pettit, Christopher
    Abstract: Planning for sustainable urban growth is a pressing challenge facing many cities. Investigating proposed changes to the built environment can provide planners and policymakers information to understand future urban development trends and related infrastructure requirements. It is in this context we have developed a novel urban analytics approach that utilises planning applications (PAs) data and Natural Language Processing (NLP) techniques to forecast the housing supply pipeline in Australia. Firstly, we implement a data processing pipeline which scrapes, geocodes, and filters PA data from council websites and planning portals to provide the first nationally available daily dataset of PAs that are currently under consideration. Secondly, we classify the collected PAs into four distinct urban development categories, selected based on infrastructure planning provisioning requirements. Of the five model architectures tested, we found that the fine-tuned DeBERTA-v3 model achieves the best performance with an accuracy and F1-score of 0.944. This demonstrates the suitability of fine-tuned Pre-trained Language Models (PLMs) for planning text classification tasks. Finally, the model is applied to classify and map urban development trends in Australia’s two largest cities, Sydney and Melbourne, from 2021-2022 and 2023-2024. The mapping affirms a face-validation test of the classification model and demonstrates the utility of PA insights for planners. Holistically, the paper demonstrates the potential for NLP to enrich urban analytics through the integration of previously inaccessible planning text data into planning analysis and decisions.
    Date: 2024–10–25
    URL: https://d.repec.org/n?u=RePEc:osf:osfxxx:fs76e
  3. By: Belguutei Ariuntugs; Kehelwala Dewage Gayan Madurang
    Abstract: In the data-driven world of actuarial science, machine learning (ML) plays a crucial role in predictive modeling, enhancing risk assessment and pricing strategies. Neural networks, specifically combined actuarial neural networks (CANN), are vital for tasks such as mortality forecasting and pricing. However, optimizing hyperparameters (e.g., learning rates, layers) is essential for resource efficiency. This study utilizes a factorial design and response surface methodology (RSM) to optimize CANN performance. RSM effectively explores the hyperparameter space and captures potential curvature, outperforming traditional grid search. Our results show accurate performance predictions, identifying critical hyperparameters. By dropping statistically insignificant hyperparameters, we reduced runs from 288 to 188, with negligible loss in accuracy, achieving near-optimal out-of-sample Poisson deviance loss.
    Date: 2024–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2410.12824
  4. By: Jian-Qiao Zhu; Joshua C. Peterson; Benjamin Enke; Thomas L. Griffiths
    Abstract: Understanding how people behave in strategic settings–where they make decisions based on their expectations about the behavior of others–is a longstanding problem in the behavioral sciences. We conduct the largest study to date of strategic decision-making in the context of initial play in two-player matrix games, analyzing over 90, 000 human decisions across more than 2, 400 procedurally generated games that span a much wider space than previous datasets. We show that a deep neural network trained on these data predicts people’s choices better than leading theories of strategic behavior, indicating that there is systematic variation that is not explained by those theories. We then modify the network to produce a new, interpretable behavioural model, revealing what the original network learned about people: their ability to optimally respond and their capacity to reason about others are dependent on the complexity of individual games. This context-dependence is critical in explaining deviations from the rational Nash equilibrium, response times, and uncertainty in strategic decisions. More broadly, our results demonstrate how machine learning can be applied beyond prediction to further help generate novel explanations of complex human behavior.
    Keywords: behavioural game theory, large scale experiment, machine learning, behavioral economics, complexity
    Date: 2024
    URL: https://d.repec.org/n?u=RePEc:ces:ceswps:_11296
  5. By: Opeyemi Sheu Alamu; Md Kamrul Siam
    Abstract: A comparative analysis of deep learning models and traditional statistical methods for stock price prediction uses data from the Nigerian stock exchange. Historical data, including daily prices and trading volumes, are employed to implement models such as Long Short Term Memory (LSTM) networks, Gated Recurrent Units (GRUs), Autoregressive Integrated Moving Average (ARIMA), and Autoregressive Moving Average (ARMA). These models are assessed over three-time horizons: short-term (1 year), medium-term (2.5 years), and long-term (5 years), with performance measured by Mean Squared Error (MSE) and Mean Absolute Error (MAE). The stability of the time series is tested using the Augmented Dickey-Fuller (ADF) test. Results reveal that deep learning models, particularly LSTM, outperform traditional methods by capturing complex, nonlinear patterns in the data, resulting in more accurate predictions. However, these models require greater computational resources and offer less interpretability than traditional approaches. The findings highlight the potential of deep learning for improving financial forecasting and investment strategies. Future research could incorporate external factors such as social media sentiment and economic indicators, refine model architectures, and explore real-time applications to enhance prediction accuracy and scalability.
    Date: 2024–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2410.07220
  6. By: Mohamed Bassi
    Abstract: Dans un monde de plus en plus digitalisé, la collecte et le traitement de la donnée numérique provenant du web et des objets connectés s’imposent comme une activité de première importance dans les centres de recherche et autres think tanks. Avec le langage Python nous avons développé un outil de veille économique qui permet d’analyser les publications des chercheurs en économie affiliés aux institutions africaines. Cet outil met en jeu des algorithmes de Machine Learning, en particulier des techniques de Traitement du Langage Naturel. Les jeux de données mis en jeu émanent de la plateforme Research Papers in Economics, et ce à travers le web scraping.
    Date: 2023–02
    URL: https://d.repec.org/n?u=RePEc:ocp:pbecon:pb_11_23
  7. By: Te Li; Mengze Zhang; Yan Zhou
    Abstract: Against the backdrop of increasingly severe global environmental changes, accurately predicting and meeting renewable energy demands has become a key challenge for sustainable business development. Traditional energy demand forecasting methods often struggle with complex data processing and low prediction accuracy. To address these issues, this paper introduces a novel approach that combines deep learning techniques with environmental decision support systems. The model integrates advanced deep learning techniques, including LSTM and Transformer, and PSO algorithm for parameter optimization, significantly enhancing predictive performance and practical applicability. Results show that our model achieves substantial improvements across various metrics, including a 30% reduction in MAE, a 20% decrease in MAPE, a 25% drop in RMSE, and a 35% decline in MSE. These results validate the model's effectiveness and reliability in renewable energy demand forecasting. This research provides valuable insights for applying deep learning in environmental decision support systems.
    Date: 2024–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2410.15286
  8. By: Mahdi Ebrahimi Kahou; Jesús Fernández-Villaverde; Sebastián Gómez-Cardona; Jesse Perla; Jan Rosa
    Abstract: In the long run, we are all dead. Nonetheless, when studying the short-run dynamics of economic models, it is crucial to consider boundary conditions that govern long-run, forward-looking behavior, such as transversality conditions. We demonstrate that machine learning (ML) can automatically satisfy these conditions due to its inherent inductive bias toward finding flat solutions to functional equations. This characteristic enables ML algorithms to solve for transition dynamics, ensuring that long-run boundary conditions are approximately met. ML can even select the correct equilibria in cases of steady-state multiplicity. Additionally, the inductive bias provides a foundation for modeling forward-looking behavioural agents with self-consistent expectations.
    Keywords: machine learning, inductive bias, rational expectations, transitional dynamics, transversality, behavioural macroeconomics
    JEL: C10 E10
    Date: 2024
    URL: https://d.repec.org/n?u=RePEc:ces:ceswps:_11292
  9. By: Emmanuel Gnabeyeu; Omar Karkar; Imad Idboufous
    Abstract: The volatility fitting is one of the core problems in the equity derivatives business. Through a set of deterministic rules, the degrees of freedom in the implied volatility surface encoding (parametrization, density, diffusion) are defined. Whilst very effective, this approach widespread in the industry is not natively tailored to learn from shifts in market regimes and discover unsuspected optimal behaviors. In this paper, we change the classical paradigm and apply the latest advances in Deep Reinforcement Learning(DRL) to solve the fitting problem. In particular, we show that variants of Deep Deterministic Policy Gradient (DDPG) and Soft Actor Critic (SAC) can achieve at least as good as standard fitting algorithms. Furthermore, we explain why the reinforcement learning framework is appropriate to handle complex objective functions and is natively adapted for online learning.
    Date: 2024–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2410.11789
  10. By: Lukas Gonon; Thilo Meyer-Brandis; Niklas Weber
    Abstract: This paper investigates systemic risk measures for stochastic financial networks of explicitly modelled bilateral liabilities. We extend the notion of systemic risk measures from Biagini, Fouque, Fritelli and Meyer-Brandis (2019) to graph structured data. In particular, we focus on an aggregation function that is derived from a market clearing algorithm proposed by Eisenberg and Noe (2001). In this setting, we show the existence of an optimal random allocation that distributes the overall minimal bailout capital and secures the network. We study numerical methods for the approximation of systemic risk and optimal random allocations. We propose to use permutation equivariant architectures of neural networks like graph neural networks (GNNs) and a class that we name (extended) permutation equivariant neural networks ((X)PENNs). We compare their performance to several benchmark allocations. The main feature of GNNs and (X)PENNs is that they are permutation equivariant with respect to the underlying graph data. In numerical experiments we find evidence that these permutation equivariant methods are superior to other approaches.
    Date: 2024–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2410.07222
  11. By: Zimeng Lyu; Amulya Saxena; Rohaan Nadeem; Hao Zhang; Travis Desell
    Abstract: Stock return forecasting is a major component of numerous finance applications. Predicted stock returns can be incorporated into portfolio trading algorithms to make informed buy or sell decisions which can optimize returns. In such portfolio trading applications, the predictive performance of a time series forecasting model is crucial. In this work, we propose the use of the Evolutionary eXploration of Augmenting Memory Models (EXAMM) algorithm to progressively evolve recurrent neural networks (RNNs) for stock return predictions. RNNs are evolved independently for each stocks and portfolio trading decisions are made based on the predicted stock returns. The portfolio used for testing consists of the 30 companies in the Dow-Jones Index (DJI) with each stock have the same weight. Results show that using these evolved RNNs and a simple daily long-short strategy can generate higher returns than both the DJI index and the S&P 500 Index for both 2022 (bear market) and 2023 (bull market).
    Date: 2024–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2410.17212
  12. By: Zijie Zhao; Roy E. Welsch
    Abstract: Leveraging Deep Reinforcement Learning (DRL) in automated stock trading has shown promising results, yet its application faces significant challenges, including the curse of dimensionality, inertia in trading actions, and insufficient portfolio diversification. Addressing these challenges, we introduce the Hierarchical Reinforced Trader (HRT), a novel trading strategy employing a bi-level Hierarchical Reinforcement Learning framework. The HRT integrates a Proximal Policy Optimization (PPO)-based High-Level Controller (HLC) for strategic stock selection with a Deep Deterministic Policy Gradient (DDPG)-based Low-Level Controller (LLC) tasked with optimizing trade executions to enhance portfolio value. In our empirical analysis, comparing the HRT agent with standalone DRL models and the S&P 500 benchmark during both bullish and bearish market conditions, we achieve a positive and higher Sharpe ratio. This advancement not only underscores the efficacy of incorporating hierarchical structures into DRL strategies but also mitigates the aforementioned challenges, paving the way for designing more profitable and robust trading algorithms in complex markets.
    Date: 2024–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2410.14927
  13. By: Kelvin J. L. Koa; Yunshan Ma; Ritchie Ng; Huanhuan Zheng; Tat-Seng Chua
    Abstract: Stock portfolios are often exposed to rare consequential events (e.g., 2007 global financial crisis, 2020 COVID-19 stock market crash), as they do not have enough historical information to learn from. Large Language Models (LLMs) now present a possible tool to tackle this problem, as they can generalize across their large corpus of training data and perform zero-shot reasoning on new events, allowing them to detect possible portfolio crash events without requiring specific training data. However, detecting portfolio crashes is a complex problem that requires more than basic reasoning abilities. Investors need to dynamically process the impact of each new information found in the news articles, analyze the the relational network of impacts across news events and portfolio stocks, as well as understand the temporal context between impacts across time-steps, in order to obtain the overall aggregated effect on the target portfolio. In this work, we propose an algorithmic framework named Temporal Relational Reasoning (TRR). It seeks to emulate the spectrum of human cognitive capabilities used for complex problem-solving, which include brainstorming, memory, attention and reasoning. Through extensive experiments, we show that TRR is able to outperform state-of-the-art solutions on detecting stock portfolio crashes, and demonstrate how each of the proposed components help to contribute to its performance through an ablation study. Additionally, we further explore the possible applications of TRR by extending it to other related complex problems, such as the detection of possible global crisis events in Macroeconomics.
    Date: 2024–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2410.17266
  14. By: Riccardo Di Francesco
    Abstract: Uncovering the heterogeneous effects of particular policies or "treatments" is a key concern for researchers and policymakers. A common approach is to report average treatment effects across subgroups based on observable covariates. However, the choice of subgroups is crucial as it poses the risk of $p$-hacking and requires balancing interpretability with granularity. This paper proposes a nonparametric approach to construct heterogeneous subgroups. The approach enables a flexible exploration of the trade-off between interpretability and the discovery of more granular heterogeneity by constructing a sequence of nested groupings, each with an optimality property. By integrating our approach with "honesty" and debiased machine learning, we provide valid inference about the average treatment effect of each group. We validate the proposed methodology through an empirical Monte-Carlo study and apply it to revisit the impact of maternal smoking on birth weight, revealing systematic heterogeneity driven by parental and birth-related characteristics.
    Date: 2024–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2410.11408
  15. By: Yikai Zhao; Jun Nagayasu; Xinyi Geng
    Abstract: This study examines the impact of climate policy uncertainty (CPU) on credit spreads using data from corporate bonds listed on the Chinese exchange market between 2008 and 2022. We innovatively apply large language models (LLMs) to construct a firm-level CPU index based on disclosure texts and validateits effectiveness. We find that a CPU rise widens a firm’s credit spreads by exacerbating financial distress. Although disclosing environmental, social, and governance (ESG) information moderate CPU’s effect on credit spreads, controversies in ESG ratings amplify it. Finally, heterogeneity analyses reveal that CPU’s effect on wideningbond spreads is more pronounced for traditional bonds, short- to medium-term bonds, nonstate-owned enterprises, and issuing firms with dispersed supply chains.
    Date: 2024–11
    URL: https://d.repec.org/n?u=RePEc:toh:dssraa:143

This nep-big issue is ©2024 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.