nep-big New Economics Papers
on Big Data
Issue of 2024‒02‒05
thirteen papers chosen by
Tom Coupé, University of Canterbury


  1. Machine Learning Based Panel Data Models By Bingduo Yang; Wei Long; Zongwu Cai
  2. Nowcasting Madagascar's real GDP using machine learning algorithms By Ramaharo, Franck M.; Rasolofomanana, Gerzhino H.
  3. (Almost) 200 Years of News-Based Economic Sentiment By Jules H. van Binsbergen; Svetlana Bryzgalova; Mayukh Mukhopadhyay; Varun Sharma
  4. Deep Learning Solutions to Master Equations for Continuous Time Heterogeneous Agent Macroeconomic Models By Zhouzhou Gu; Mathieu Laurière; Sebastian Merkel; Jonathan Payne
  5. Elderly People Treated in Integrated Home Care in Italian Regions: A Metric Approach By Resta, Onofrio; Resta, Emanuela; Costantiello, Alberto; Leogrande, Angelo
  6. Investigating the Determinants of Beds for High-Care Specialties in the Italian Regions in the Environmental, Social and Governance Model By Resta, Emanuela; Resta, Onofrio; Costantiello, Alberto; Leogrande, Angelo
  7. The Hospital Emigration to Another Region in the Light of the Environmental, Social and Governance Model in Italy During the Period 2004-2021 By Resta, Emanuela; Resta, Onofrio; Costantiello, Alberto; Leogrande, Angelo
  8. Integration and Financial Stability: A Post-Global Crisis Assessment By Giraldo, Iader; Giraldo, Iader; Gomez-Gonzalez, Jose E; Uribe, Jorge M
  9. Forecasting CPI inflation under economic policy and geo-political uncertainties By Shovon Sengupta; Tanujit Chakraborty; Sunny Kumar Singh
  10. Can Large Language Models Beat Wall Street? Unveiling the Potential of AI in Stock Selection By Georgios Fatouros; Konstantinos Metaxas; John Soldatos; Dimosthenis Kyriazis
  11. The role of big data in changing the scope of the modern Indian banking sector By Mishra, Mukesh Kumar
  12. Integrating machine behavior into human subject experiments: A user-friendly toolkit and illustrations By Christoph Engel; Max R. P. Grossmann; Axel Ockenfels
  13. Intraday Trading Algorithm for Predicting Cryptocurrency Price Movements Using Twitter Big Data Analysis By Vahidin Jeleskovic; Stephen Mackay

  1. By: Bingduo Yang (School of Finance, Guangdong University of Finance and Economics, Guangzhou 510320, China); Wei Long (Department of Economics, Tulane University, New Orleans, LA 70118, USA); Zongwu Cai (Department of Economics, The University of Kansas, Lawrence, KS 66045, USA)
    Abstract: We examine nonparametric panel data regression models with fixed effects and cross-sectional dependence through a diverse collection of machine learning techniques. We add cross-sectional averages and time averages as regressors to the model to account for unobserved common factors and fixed effects respectively. Additionally, we utilize the debiased machine learning method by Chernozhukov et al. (2018) to estimate parametric coefficients followed by the nonparametric component. We comprehensively investigate three commonly used machine learning techniques - LASSO, random forests, and neural network - in finite samples. Simulation results demonstrate the effectiveness of our proposed method across different combinations of the number of cross-sectional units, time dimension sample size, and the number of regressors, irrespective of the presence of fixed effects and cross-sectional dependence. In the empirical part, we employ the proposed machine learning-based panel data model to estimate the total factor productivity (TFP) of public companies of Chinese mainland and find that the proposed machine learning methods are comparable to other competitive methods.
    Keywords: Machine learning; panel data model; cross-sectional dependence; debiased machine learning.
    JEL: C12 C22
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:kan:wpaper:202402&r=big
  2. By: Ramaharo, Franck M.; Rasolofomanana, Gerzhino H.
    Abstract: We investigate the predictive power of different machine learning algorithms to nowcast Madagascar's gross domestic product (GDP). We trained popular regression models, including linear regularized regression (Ridge, Lasso, Elastic-net), dimensionality reduction model (principal component regression), k-nearest neighbors algorithm (k-NN regression), support vector regression (linear SVR), and tree-based ensemble models (Random forest and XGBoost regressions), on 10 Malagasy quarterly macroeconomic leading indicators over the period 2007Q1-2022Q4, and we used simple econometric models as a benchmark. We measured the nowcast accuracy of each model by calculating the root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). Our findings reveal that the Ensemble Model, formed by aggregating individual predictions, consistently outperforms traditional econometric models. We conclude that machine learning models can deliver more accurate and timely nowcasts of Malagasy economic performance and provide policymakers with additional guidance for data-driven decision making.
    Keywords: nowcasting; gross domestic product; machine learning; Madagascar
    JEL: C02 C53 C63 E17
    Date: 2023–12–23
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:119574&r=big
  3. By: Jules H. van Binsbergen; Svetlana Bryzgalova; Mayukh Mukhopadhyay; Varun Sharma
    Abstract: Using text from 200 million pages of 13, 000 US local newspapers and machine learning methods, we construct a 170-year-long measure of economic sentiment at the country and state levels, that expands existing measures in both the time series (by more than a century) and the cross-section. Our measure predicts GDP (both nationally and locally), consumption, and employment growth, even after controlling for commonly-used predictors, as well as monetary policy decisions. Our measure is distinct from the information in expert forecasts and leads its consensus value. Interestingly, news coverage has become increasingly negative across all states in the past half-century.
    JEL: E2 E3 E4 E40 E43 E44 G01 G1 G10 G14 G17 G18 G40
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:32026&r=big
  4. By: Zhouzhou Gu (Princeton University); Mathieu Laurière (NYU Shanghai, NYU-ECNU Institute of Mathematical Sciences); Sebastian Merkel (University of Exeter); Jonathan Payne (Princeton University)
    Abstract: We propose a new global solution algorithm for continuous time heterogeneous agent economies with aggregate shocks. First, we approximate the state space so that equilibrium in the economy can be characterized by one high, but finite, dimensional partial differential equation. Second, we approximate the value function using neural networks and solve the differential equation using deep learning tools. We refer to the solution as an Economic Model Informed Neural Network (EMINN). The main advantage of this technique is that it allows us to find global solutions to high dimensional, non-linear problems. We demonstrate our algorithm by solving two canonical models in the macroeconomics literature: the Aiyagari (1994) model and the Krusell and Smith (1998) model.
    Keywords: Heterogeneous agents, computational methods, deep learning, inequality, mean field games, continuous time methods, aggregate shocks, global solution
    JEL: C70
    Date: 2023–08
    URL: http://d.repec.org/n?u=RePEc:pri:econom:2023-19&r=big
  5. By: Resta, Onofrio; Resta, Emanuela; Costantiello, Alberto; Leogrande, Angelo
    Abstract: In this article, we analyse the ESG determinants of the “Elderly People Treated in Integrated Home Care”-EPIHC in the Italian regions between 2004 and 2022. We used data from the ISTAT-BES database. We used different econometric techniques i.e.: Panel Data with Random Effects, Panel Data with Fixed Effects, Pooled Ordinary Least Squares-OLS and Weighted Least Squares-WLS. The results show that the EPIHC is positively associated with “Nurses, midwives, and Soil sealing by artificial cover" and negatively associated with "Museum heritage density and relevance" and "Trust in law enforcement agencies and firefighters fire". Furthermore, we have applied a k-Means algorithm with the Silhouette Coefficient and we find the presence of two clusters. Finally, we propose a confrontation among eight different machine-learning algorithms and we find that Linear Regression is the best predictive algorithm.
    Keywords: Analysis of Health Care Markets, Health Behaviors, Health Insurance, Public and Private, Health and Inequality, Health and Economic Development, Government Policy • Regulation • Public Health.
    JEL: I11 I12 I13 I14 I15 I18
    Date: 2023–12–30
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:119621&r=big
  6. By: Resta, Emanuela; Resta, Onofrio; Costantiello, Alberto; Leogrande, Angelo
    Abstract: In the following article, it is presented an investigation of the determinants of Beds for High-Care Specialties-BHCS in the Italian regions in the context of Environmental, Social and Governance-ESG approach. Data from ISTAT-BES for 20 countries in the period 2004-2021 are been used. Different econometric techniques have been applied i.e.: Pooled Ordinary Least Squares, Panel Data with Fixed Effects, Panel Data with Random Effects, Dynamic Panel at 1 stage. Furthermore, a cluster analysis performed with a k-Means algorithm optimized with the Silhouette Coefficient indicated the presence of three clusters. Finally, eight different machine-learning algorithms are analysed to predict the future value of BHCS. The results show that the Artificial Neural Network-ANN algorithm is the best algorithm. The future value of BHSC is expected to growth on average of 4.88% for the analysed regions.
    Keywords: Analysis of Health Care Markets, Health Behaviors, Health Insurance, Public and Private, Health and Inequality, Health and Economic Development, Government Policy, Regulation, Public Health.
    JEL: I11 I12 I13 I14 I15 I18
    Date: 2023–12–30
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:119622&r=big
  7. By: Resta, Emanuela; Resta, Onofrio; Costantiello, Alberto; Leogrande, Angelo
    Abstract: The following article presents an analysis of the impact of the Environmental, Social and Governance-ESG determinants on Hospital Emigration to Another Region-HEAR in the Italian regions in the period 2004-2021. The data are analysed using Panel Data with Random Effects, Panel Data with Fixed Effects, Pooled Ordinary Least Squares-OLS, Weighted Least Squares-WLS, and Dynamic Panel at 1 Stage. Results show that HEAR is negatively associated to E, positively to S and negatively associated to the G within the ESG model. The data were subjected to clustering with a k-Means algorithm optimized with the Silhouette coefficient. The optimal clustering with k=2 is compared to the sub-optimal cluster with k=3. The results suggest a negative relationship between the resident population and hospital emigration at regional level. Finally, a prediction is proposed with machine learning algorithms classified based on statistical performance. The results show that the Artificial Neural Network-ANN algorithm is the best predictor. The ANN predictions are critically analyzed in light of health economic policy directions.
    Keywords: Analysis of Health Care Markets, Health Behaviors, Health Insurance, Public and Private, Health and Inequality, Health and Economic Development, Government Policy, Regulation, Public Health.
    JEL: I11 I12 I13 I14 I15 I18
    Date: 2023–12–30
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:119624&r=big
  8. By: Giraldo, Iader (FLAR); Giraldo, Iader (FLAR); Gomez-Gonzalez, Jose E (Department of Finance, Information Systems, and Economics, City University of New York – Lehman College, Bronx); Uribe, Jorge M (Faculty of Economics and Business, Universitat Oberta de Catalunya)
    Abstract: In this study, we revisit the debate regarding the effects of financial openness on financial stability. In contrast to previous studies, our approach involves measuring the direct influences of openness on stability through a varied set of proxies used to capture the diverse dimensions of both of these concepts within a unified estimation framework. Employing state-of-the-art machine learning techniques, our estimates enable us to isolate the focal effects while controlling for a comprehensive set of macroeconomic, political, and institutional variables. Covering the period spanning 2010 to 2020 across 45 countries, our results indicate that, in the majority of cases, increased financial openness is beneficial for financial stability. Greater levels of integration tends to reduce the ratio of nonperforming loans to total loans, concurrently improving capital adequacy ratios and the ratio of provisions to nonperforming loans. Additionally, heightened openness leads to an increase in the levels of bank liquidity. Importantly, these enhancements to financial stability occur without any adverse effects on bank profitability. This suggests that policies aimed at fostering greater integration with global financial markets and promoting increased bank competition can exert positive impacts on financial stability without compromising bank profitability.
    Keywords: Openness; integration; Financial stability; Double-Debiased Machine Learning
    JEL: F21 F32 G21 G28
    Date: 2024–01–17
    URL: http://d.repec.org/n?u=RePEc:col:000566:020926&r=big
  9. By: Shovon Sengupta; Tanujit Chakraborty; Sunny Kumar Singh
    Abstract: Forecasting a key macroeconomic variable, consumer price index (CPI) inflation, for BRIC countries using economic policy uncertainty and geopolitical risk is a difficult proposition for policymakers at the central banks. This study proposes a novel filtered ensemble wavelet neural network (FEWNet) that can produce reliable long-term forecasts for CPI inflation. The proposal applies a maximum overlapping discrete wavelet transform to the CPI inflation series to obtain high-frequency and low-frequency signals. All the wavelet-transformed series and filtered exogenous variables are fed into downstream autoregressive neural networks to make the final ensemble forecast. Theoretically, we show that FEWNet reduces the empirical risk compared to single, fully connected neural networks. We also demonstrate that the rolling-window real-time forecasts obtained from the proposed algorithm are significantly more accurate than benchmark forecasting methods. Additionally, we use conformal prediction intervals to quantify the uncertainty associated with the forecasts generated by the proposed approach. The excellent performance of FEWNet can be attributed to its capacity to effectively capture non-linearities and long-range dependencies in the data through its adaptable architecture.
    Date: 2023–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.00249&r=big
  10. By: Georgios Fatouros; Konstantinos Metaxas; John Soldatos; Dimosthenis Kyriazis
    Abstract: In the dynamic and data-driven landscape of financial markets, this paper introduces MarketSenseAI, a novel AI-driven framework leveraging the advanced reasoning capabilities of GPT-4 for scalable stock selection. MarketSenseAI incorporates Chain of Thought and In-Context Learning methodologies to analyze a wide array of data sources, including market price dynamics, financial news, company fundamentals, and macroeconomic reports emulating the decision making process of prominent financial investment teams. The development, implementation, and empirical validation of MarketSenseAI are detailed, with a focus on its ability to provide actionable investment signals (buy, hold, sell) backed by cogent explanations. A notable aspect of this study is the use of GPT-4 not only as a predictive tool but also as an evaluator, revealing the significant impact of the AI-generated explanations on the reliability and acceptance of the suggested investment signals. In an extensive empirical evaluation with S&P 100 stocks, MarketSenseAI outperformed the benchmark index by 13%, achieving returns up to 40%, while maintaining a risk profile comparable to the market. These results demonstrate the efficacy of Large Language Models in complex financial decision-making and mark a significant advancement in the integration of AI into financial analysis and investment strategies. This research contributes to the financial AI field, presenting an innovative approach and underscoring the transformative potential of AI in revolutionizing traditional financial analysis investment methodologies.
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.03737&r=big
  11. By: Mishra, Mukesh Kumar
    Abstract: Big data has become a crucial asset for the modern Indian banking sector, driving innovation, improving decision-making, and ultimately enhancing the overall banking experience for customers. As technology continues to advance, the role of big data in banking is likely to evolve, bringing about further improvements in efficiency, security, and customer-centric services. The integration of big data analytics in banking operations has brought about several changes, enhancing efficiency, customer experience, risk management, and decision-making processes. This paper explores the transformative role of big data as a service (BDaaS) and its applications in the Indian banking sector. The study highlights how BDaaS serves as a robust and innovative instrument, contributing significantly to the identification and prevention of security issues and fraudulent behavior within the industry. The experimental results suggest that deploying big data technology is crucial for various aspects, particularly in handling financial risks and managing operational workflows within the banking sector.
    Keywords: Big Data, Banking System
    JEL: G1 G21 O33 G28
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:zbw:esprep:280834&r=big
  12. By: Christoph Engel (Max Planck Institute for Research on Collective Goods); Max R. P. Grossmann (University of Cologne); Axel Ockenfels (University of Cologne, Max Planck Institute for Research on Collective Goods)
    Abstract: Large Language Models (LLMs) have the potential to profoundly transform and enrich experimental economic research. We propose a new software framework, “alter_ego†, which makes it easy to design experiments between LLMs and to integrate LLMs into oTree-based experiments with human subjects. Our toolkit is freely available at github.com/mrpg/ego. To illustrate, we run differently framed prisoner’s dilemmas with interacting machines as well as with human machine interaction. Framing effects in machine-only treatments are strong and similar to those expected from previous human-only experiments, yet less pronounced and qualitatively different if machines interact with human participants.
    Keywords: Software for experiments, large language models, humanmachine interaction, framing
    JEL: C91 C92 D91 O33 L86
    Date: 2023–12
    URL: http://d.repec.org/n?u=RePEc:mpg:wpaper:2024_01&r=big
  13. By: Vahidin Jeleskovic; Stephen Mackay
    Abstract: Cryptocurrencies have emerged as a novel financial asset garnering significant attention in recent years. A defining characteristic of these digital currencies is their pronounced short-term market volatility, primarily influenced by widespread sentiment polarization, particularly on social media platforms such as Twitter. Recent research has underscored the correlation between sentiment expressed in various networks and the price dynamics of cryptocurrencies. This study delves into the 15-minute impact of informative tweets disseminated through foundation channels on trader behavior, with a focus on potential outcomes related to sentiment polarization. The primary objective is to identify factors that can predict positive price movements and potentially be leveraged through a trading algorithm. To accomplish this objective, we conduct a conditional examination of return and excess return rates within the 15 minutes following tweet publication. The empirical findings reveal statistically significant increases in return rates, particularly within the initial three minutes following tweet publication. Notably, adverse effects resulting from the messages were not observed. Surprisingly, sentiments were found to have no discerni-ble impact on cryptocurrency price movements. Our analysis further identifies that inves-tors are primarily influenced by the quality of tweet content, as reflected in the choice of words and tweet volume. While the basic trading algorithm presented in this study does yield some benefits within the 15-minute timeframe, these benefits are not statistically significant. Nevertheless, it serves as a foundational framework for potential enhance-ments and further investigations.
    Date: 2023–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.00603&r=big

This nep-big issue is ©2024 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.