nep-big 2025-06-09 papers

on Big Data

Issue of 2025–06–09
23 papers chosen by
Tom Coupé, University of Canterbury

Analyzing Income Inequalities across Italian regions: Instrumental Variable Panel Data, K-Means Clustering and Machine Learning Algorithms By Antonicelli, Margareth; Drago, Carlo; Costantiello, Alberto; Leogrande, Angelo
Do International Reserve Holdings Still Predict Economic Crises? Insights from Recent Machine Learning Techniques By Nikolaos Giannakis; Periklis Gogas; Theophilos Papadimitriou; Jamel Saadaoui; Emmanouil Sofianos
Deep Learning in Renewable Energy Forecasting: A Cross-Dataset Evaluation of Temporal and Spatial Models By Lutfu Sua; Haibo Wang; Jun Huang
Panel Machine Learning with Mixed-Frequency Data: Monitoring State-Level Fiscal Variables By Philippe Goulet Coulombe; Massimiliano Marcellino; Dalibor Stevanovic
Unemployment Dynamics Forecasting with Machine Learning Regression Models By Kyungsu Kim
Forecasting Disaggregated Producer Prices: A Fusion of Machine Learning and Econometric Techniques By Sona Benecka
Forecasting inflation with the hedged random forest By Elliot Beck; Michael Wolf
Panel Machine Learning with Mixed-Frequency Data: Monitoring State-Level Fiscal Variables By Philippe Goulet Coulombe; Massimiliano Marcellino; Dalibor Stevanovic
Assumption errors and forecast accuracy: A partial linear instrumental variable and double machine learning approach By Heinisch, Katja; Scaramella, Fabio; Schult, Christoph
Colombian economic activity nowcasting: addressing nonlinearities and high dimensionality through machine-learning By Rincón Briceño, Juan José
Forecasting economic downturns in South Africa using leading indicators and machine learning By Fourie, Jurgens; Steenkamp, Daan
Stock Market Telepathy: Graph Neural Networks Predicting the Secret Conversations between MINT and G7 Countries By Nurbanu Bursa
When majority rules, minority loses: bias amplification of gradient descent By Bachoc, François; Bolte, Jérôme; Boustany, Ryan; Loubes, Jean-Michel
Forecasting CPI inflation under economic policy and geopolitical uncertainties By Shovon Sengupta; Tanujit Chakraborty; Sunny Kumar Singh
The Impact of Russia-Ukraine conflict on Global Commodity Brent Crude Prices By Pal, Hemendra
Can Artificial Intelligence Trade the Stock Market? By Jędrzej Maskiewicz; Paweł Sakowski
Predicting the Price of Gold in the Financial Markets Using Hybrid Models By Mohammadhossein Rashidi; Mohammad Modarres
Suspicion and Communication By Lisa Bruttel; Friedericke Fromme; Vasilisa Werner
Machine-Learning-Enhanced Measuring of Multidimensional Energy Poverty: Insights from a Pilot Survey in Portugal and Denmark By Dejkam, Rahil; Madlener, Reinhard
Do SDGs Support Human Security? A Machine Learning Analysis with Policy Recommendations By Phoebe Koundouri; Kostas Dellis; Monika Mavragani; Angelos Plataniotis; Georgios Feretzakis
Self-selection into Health Professions By Alessandro Fedele; Mirco Tonin; Daniel Wiesen
Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks By Qiang Chen; Tianyang Han; Jin Li; Ye Luo; Yuxiao Wu; Xiaowei Zhang; Tuo Zhou
Worker specialization and the consequences of occupational decline By Ek, Simon

Analyzing Income Inequalities across Italian regions: Instrumental Variable Panel Data, K-Means Clustering and Machine Learning Algorithms

By:	Antonicelli, Margareth; Drago, Carlo; Costantiello, Alberto; Leogrande, Angelo
Abstract:	This study examines income inequality across Italian regions by integrating instrumental variable panel data models, k-means clustering, and machine learning algorithms. Using econometric techniques, we address endogeneity and identify causal relationships influencing regional disparities. K-means clustering, optimized with the elbow method, classifies Italian regions based on income inequality patterns, while machine-learning models, including random forest, support vector machines, and decision tree regression, predict inequality trends and key determinants. Informal employment, temporary employment, and overeducation also play a major role in influencing inequality. Clustering results confirm a permanent North-South economic divide and the most disadvantaged regions are Campania, Calabria, and Sicily. Among the machine learning models, the highest income disparities prediction accuracy comes with the use of Random Forest Regression. The findings emphasize the necessity of education-focused and digitally based policies and reforms of the labor market in an effort to enhance economic convergence. The study portrays the use of a combination of econometric and machine learning methods in the analysis of regional disparities and proposes a solid framework of policy-making with the intention of curbing economic disparities in Italy.
Keywords:	Income Inequality, Regional Disparities, Machine Learning, Labor Market, Digital Divide.
JEL:	C23 C38 C45 O15 R11 R58
Date:	2025–05–05
URL:	https://d.repec.org/n?u=RePEc:pra:mprapa:124910

Do International Reserve Holdings Still Predict Economic Crises? Insights from Recent Machine Learning Techniques

By:	Nikolaos Giannakis (Democritus University of Thrace); Periklis Gogas (Democritus University of Thrace); Theophilos Papadimitriou (Democritus University of Thrace); Jamel Saadaoui (University Paris 8); Emmanouil Sofianos (University of Strasbourg)
Abstract:	This study aims to predict currency, banking, and debt crises using a dataset of 184 crisis events and 2896 non-crisis cases from 79 countries (1970-2017). We tested eight machine learning methods: Logistic Regression, KNN, SVM, Random Forest, Balanced Random Forest, Balanced Bagging Classifier, Easy Ensemble Classifier, and Gradient Boosted Trees. The Balanced Random Forest had the best performance with a 72.91% balanced accuracy, predicting 149 out of 184 crises accurately. To address machine learning’s black-box issue, we used Variable Importance Measure (VIM) and Partial Dependence Plots (PDP). International reserve holdings, inflation rate, and current account balance were key predictors. Depleting international reserves at varying inflation levels signals impending crises, supporting the buffer effects of international reserves.
Keywords:	Currency crises, banking crises, debt crises, international reserve holdings, inflation, machine learning, forecasting
JEL:	F G
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:inf:wpaper:2025.6

Deep Learning in Renewable Energy Forecasting: A Cross-Dataset Evaluation of Temporal and Spatial Models

By:	Lutfu Sua; Haibo Wang; Jun Huang
Abstract:	Unpredictability of renewable energy sources coupled with the complexity of those methods used for various purposes in this area calls for the development of robust methods such as DL models within the renewable energy domain. Given the nonlinear relationships among variables in renewable energy datasets, DL models are preferred over traditional machine learning (ML) models because they can effectively capture and model complex interactions between variables. This research aims to identify the factors responsible for the accuracy of DL techniques, such as sampling, stationarity, linearity, and hyperparameter optimization for different algorithms. The proposed DL framework compares various methods and alternative training/test ratios. Seven ML methods, such as Long-Short Term Memory (LSTM), Stacked LSTM, Convolutional Neural Network (CNN), CNN-LSTM, Deep Neural Network (DNN), Multilayer Perceptron (MLP), and Encoder-Decoder (ED), were evaluated on two different datasets. The first dataset contains the weather and power generation data. It encompasses two distinct datasets, hourly energy demand data and hourly weather data in Spain, while the second dataset includes power output generated by the photovoltaic panels at 12 locations. This study deploys regularization approaches, including early stopping, neuron dropping, and L2 regularization, to reduce the overfitting problem associated with DL models. The LSTM and MLP models show superior performance. Their validation data exhibit exceptionally low root mean square error values.
Date:	2025–05
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2505.03109

Panel Machine Learning with Mixed-Frequency Data: Monitoring State-Level Fiscal Variables

By:	Philippe Goulet Coulombe (University of Quebec in Montreal); Massimiliano Marcellino (Bocconi University); Dalibor Stevanovic (University of Quebec in Montreal)
Abstract:	We study the nowcasting of U.S. state-level fiscal variables using machine learning (ML) models and mixed-frequency predictors within a panel framework. Neural networks with continuous and categorical embeddings consistently outperform both linear and nonlinear alternatives, especially when combined with pooled panel structures. These architectures flexibly capture differences across states while benefiting from shared patterns in the panel structure. Forecast gains are especially large for volatile variables like expenditures and deficits. Pooling enhances forecast stability, and ML models are better suited to handle crosssectional nonlinearities. Results show that predictive improvements are broad-based and that even a few high-frequency state indicators contribute substantially to forecast accuracy. Our findings highlight the complementarity between flexible modeling and cross sectional pooling, making panel neural networks a powerful tool for timely and accurate fiscal monitoring in heterogeneous settings.
Keywords:	Machine learning, Nowcasting, Panel, Mixed-frequency, Fiscal indicators
JEL:	C53 C55 E37 H72
Date:	2025–05
URL:	https://d.repec.org/n?u=RePEc:bbh:wpaper:25-04

Unemployment Dynamics Forecasting with Machine Learning Regression Models

By:	Kyungsu Kim
Abstract:	In this paper, I explored how a range of regression and machine learning techniques can be applied to monthly U.S. unemployment data to produce timely forecasts. I compared seven models: Linear Regression, SGDRegressor, Random Forest, XGBoost, CatBoost, Support Vector Regression, and an LSTM network, training each on a historical span of data and then evaluating on a later hold-out period. Input features include macro indicators (GDP growth, CPI), labor market measures (job openings, initial claims), financial variables (interest rates, equity indices), and consumer sentiment. I tuned model hyperparameters via cross-validation and assessed performance with standard error metrics and the ability to predict the correct unemployment direction. Across the board, tree-based ensembles (and CatBoost in particular) deliver noticeably better forecasts than simple linear approaches, while the LSTM captures underlying temporal patterns more effectively than other nonlinear methods. SVR and SGDRegressor yield modest gains over standard regression but don't match the consistency of the ensemble and deep-learning models. Interpretability tools , feature importance rankings and SHAP values, point to job openings and consumer sentiment as the most influential predictors across all methods. By directly comparing linear, ensemble, and deep-learning approaches on the same dataset, our study shows how modern machine-learning techniques can enhance real-time unemployment forecasting, offering economists and policymakers richer insights into labor market trends. In the comparative evaluation of the models, I employed a dataset comprising thirty distinct features over the period from January 2020 through December 2024.
Date:	2025–05
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2505.01933

Forecasting Disaggregated Producer Prices: A Fusion of Machine Learning and Econometric Techniques

By:	Sona Benecka
Abstract:	This paper proposes a novel framework to the forecast of disaggregated producer prices using both machine learning techniques and traditional econometric models. Due to the complexity and diversity of pricing dynamics within the euro area, no single model consistently outperforms others across all sectors. This highlights the necessity for a tailored approach that leverages the strengths of various forecasting methods to effectively capture the unique characteristics of each sector. Our forecasting exercise has highlighted diverse pricing strategies linked to commodity prices, autoregressive behavior, or a mixture of both, with pipeline pressures being especially pertinent to final goods. Employing a mixture of a wide range of models has proven to be a successful strategy in managing the varied pricing behavior at the sectoral level. Notably, tree-based methods, like Random Forests or XGBoost, have shown significant efficacy in forecasting short-term PPI inflation across a number of sectors, especially when accounting for pipeline pressures. Moreover, newly proposed Hybrid ARMAX models proved to be a suitable alternative for sectors tightly linked to commodity prices.
Keywords:	Disaggregated producer prices, forecasting, inflation, machine learning
JEL:	C22 C52 C53 E17 E31 E37
Date:	2025–03
URL:	https://d.repec.org/n?u=RePEc:cnb:wpaper:2025/2

Forecasting inflation with the hedged random forest

By:	Elliot Beck; Michael Wolf
Abstract:	Accurately forecasting inflation is critical for economic policy, financial markets, and broader societal stability. In recent years, machine learning methods have shown great potential for improving the accuracy of inflation forecasts; specifically, the random forest stands out as a particularly effective approach that consistently outperforms traditional benchmark models in empirical studies. Building on this foundation, this paper adapts the hedged random forest (HRF) framework of Beck et al. (2024) for the task of forecasting inflation. Unlike the standard random forest, the HRF employs non-equal (and even negative) weights of the individual trees, which are designed to improve forecasting accuracy. We develop estimators of the HRF's two inputs, the mean and the covariance matrix of the errors corresponding to the individual trees, that are customized for the task at hand. An extensive empirical analysis demonstrates that the proposed approach consistently outperforms the standard random forest.
Keywords:	Exponentially weighted moving average, Linear shrinkage, Machine learning
JEL:	C21 C53 C31 E47
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:snb:snbwpa:2025-07

Panel Machine Learning with Mixed-Frequency Data: Monitoring State-Level Fiscal Variables

By:	Philippe Goulet Coulombe; Massimiliano Marcellino; Dalibor Stevanovic
Abstract:	We study the nowcasting of U.S. state-level fiscal variables using machine learning (ML) models and mixed-frequency predictors within a panel framework. Neural networks with continuous and categorical embeddings consistently outperform both linear and nonlinear alternatives, especially when combined with pooled panel structures. These architectures flexibly capture differences across states while benefiting from shared patterns in the panel structure. Forecast gains are especially large for volatile variables like expenditures and deficits. Pooling enhances forecast stability, and ML models are better suited to handle cross-sectional nonlinearities. Results show that predictive improvements are broad-based and that even a few high frequency state indicators contribute substantially to forecast accuracy. Our findings highlight the complementarity between flexible modeling and cross-sectional pooling, making panel neural networks a powerful tool for timely and accurate fiscal monitoring in heterogeneous settings. Nous étudions le nowcasting des variables budgétaires des États américains à l’aide de modèles d’apprentissage automatique (machine learning) et de prédicteurs à fréquence mixte, dans un cadre en panel. Les réseaux de neurones intégrant des variables continues et des identifiants catégoriels surpassent systématiquement les alternatives linéaires, en particulier lorsqu’ils sont combinés à des structures en panel mutualisé. Ces architectures permettent de capter les différences entre les États tout en tirant parti des régularités partagées. Les gains de prévision sont particulièrement importants pour les variables volatiles comme les dépenses et les déficits. Le regroupement des données améliore la stabilité des prévisions, et les modèles d’apprentissage automatique sont mieux adaptés pour traiter les non-linéarités transversales. Les résultats montrent que les améliorations prédictives sont généralisées et que même quelques indicateurs infranuels spécifiques aux États contribuent de manière significative à la précision des prévisions. Nos résultats soulignent la complémentarité entre la modélisation flexible et le regroupement transversal, faisant des réseaux de neurones en panel un outil puissant pour un suivi budgétaire rapide et précis dans des contextes hétérogènes.
Keywords:	Machine learning, Nowcasting, Panel, Mixed-frequency, Fiscal indicators, Apprentissage automatique, Panel, Fréquences mixtes, Indicateurs budgétaires, Prévisions à court terme
JEL:	C53 C55 E37 H72
Date:	2025–05–27
URL:	https://d.repec.org/n?u=RePEc:cir:cirwor:2025s-15

Assumption errors and forecast accuracy: A partial linear instrumental variable and double machine learning approach

By:	Heinisch, Katja; Scaramella, Fabio; Schult, Christoph
Abstract:	Accurate macroeconomic forecasts are essential for effective policy decisions, yet their precision depends on the accuracy of the underlying assumptions. This paper examines the extent to which assumption errors affect forecast accuracy, introducing the average squared assumption error (ASAE) as a valid instrument to address endogeneity. Using double/debiased machine learning (DML) techniques and partial linear instrumental variable (PLIV) models, we analyze GDP growth forecasts for Germany, conditioning on key exogenous variables such as oil price, exchange rate, and world trade. We find that traditional ordinary least squares (OLS) techniques systematically underestimate the influence of assumption errors, particularly with respect to world trade, while DML effectively mitigates endogeneity, reduces multicollinearity, and captures nonlinearities in the data. However, the effect of oil price assumption errors on GDP forecast errors remains ambiguous. These results underscore the importance of advanced econometric tools to improve the evaluation of macroeconomic forecasts.
Keywords:	accuracy, external assumptions, forecasts, forecast errors, machine learning
JEL:	C14 C53 E02 E37
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:zbw:iwhdps:318189

Colombian economic activity nowcasting: addressing nonlinearities and high dimensionality through machine-learning

By:	Rincón Briceño, Juan José (Universidad de los Andes)
Abstract:	Economic decisions are made with high uncertainty about the current and recent past economic activity, due to the limited and imperfect available information. Therefore the following question arises: how can the accuracy of Colombian economic activity nowcasting be enhanced compared to traditional forecasting methods? This paper demonstrates: (a) using a risk-averse customized loss function that accounts for the agent disutility and penalizes directional discrepancies provides a useful alternative for assessing model performance by ensuring more accurate nowcasts, maximizing both precision and economic relevance. And (b) during periods of abrupt shocks and high volatility, such as the COVID-19 (2020–2021) and the post COVID-19 subsequent years (2022-2023), machine learning models outperform traditional nowcasting models
Keywords:	Colombian economic activity; nowcast; forecast; Random forests; LSTM.
JEL:	C45 C52 C53 E32 E37
Date:	2025–06–06
URL:	https://d.repec.org/n?u=RePEc:col:000089:021388

Forecasting economic downturns in South Africa using leading indicators and machine learning

By:	Fourie, Jurgens; Steenkamp, Daan
Abstract:	We identify South African business cycles using the algorithm of Bry-Boschan and show that the identified turning points are very similar to those from other approaches. We demonstrate that South Africa has a very volatile business cycle that makes it particularly difficult to predict turning points in the economic cycle. South Africa’s business cycle is characterised by relatively long downswings and short upswing phases with low amplitude. We find that the South African Reserve Bank (SARB)’s Leading Indicator does not substantive improve predictions of the business cycle relative to GDP itself. We assess the performance of a range of potential leading indicators in identifying economic downturns and consider whether alternative indicators and estimation approaches can produce better predictions than those of the SARB. We demonstrate that using a larger information set produces substantially better business cycle predictions, especially when using machine learning techniques. Our findings have implications for the creation of composite leading indicators, with our results suggesting that many of the macroeconomic variables considered by analysts as leading indicators do not provide good signals of GDP growth or developments in the South African business cycle.
Keywords:	business cycle, forecast, leading indicator, economic downturns
JEL:	E32 E37
Date:	2025–05–07
URL:	https://d.repec.org/n?u=RePEc:pra:mprapa:124709

Stock Market Telepathy: Graph Neural Networks Predicting the Secret Conversations between MINT and G7 Countries

By:	Nurbanu Bursa
Abstract:	Emerging economies, particularly the MINT countries (Mexico, Indonesia, Nigeria, and T\"urkiye), are gaining influence in global stock markets, although they remain susceptible to the economic conditions of developed countries like the G7 (Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States). This interconnectedness and sensitivity of financial markets make understanding these relationships crucial for investors and policymakers to predict stock price movements accurately. To this end, we examined the main stock market indices of G7 and MINT countries from 2012 to 2024, using a recent graph neural network (GNN) algorithm called multivariate time series forecasting with graph neural network (MTGNN). This method allows for considering complex spatio-temporal connections in multivariate time series. In the implementations, MTGNN revealed that the US and Canada are the most influential G7 countries regarding stock indices in the forecasting process, and Indonesia and T\"urkiye are the most influential MINT countries. Additionally, our results showed that MTGNN outperformed traditional methods in forecasting the prices of stock market indices for MINT and G7 countries. Consequently, the study offers valuable insights into economic blocks' markets and presents a compelling empirical approach to analyzing global stock market dynamics using MTGNN.
Date:	2025–06
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2506.01945

When majority rules, minority loses: bias amplification of gradient descent

By:	Bachoc, François; Bolte, Jérôme; Boustany, Ryan; Loubes, Jean-Michel
Abstract:	Despite growing empirical evidence of bias amplification in machine learning, its theoretical foundations remain poorly understood. We develop a formal framework for majority-minority learning tasks, showing how standard training can favor majority groups and produce stereotypical predictors that neglect minority-specific features. Assuming population and variance imbalance, our analysis reveals three key findings: (i) the close proximity between “full-data” and stereotypical predictors, (ii) the dominance of a region where training the entire model tends to merely learn the majority traits, and (iii) a lower bound on the additional training required. Our results are illustrated through experiments in deep learning for tabular and image classification tasks.
Date:	2025–05–21
URL:	https://d.repec.org/n?u=RePEc:tse:wpaper:130552

Forecasting CPI inflation under economic policy and geopolitical uncertainties

By:	Shovon Sengupta (SUAD_SAFIR - SUAD - Sorbonne University Abu Dhabi, BITS Pilani - Birla Institute of Technology and Science, Fidelity Investments); Tanujit Chakraborty (SUAD_SAFIR - SUAD - Sorbonne University Abu Dhabi); Sunny Kumar Singh (BITS Pilani - Birla Institute of Technology and Science)
Abstract:	Forecasting consumer price index (CPI) inflation is of paramount importance for both academics and policymakers at central banks. This study introduces the filtered ensemble wavelet neural network (FEWNet) to forecast CPI inflation, tested in BRIC countries. FEWNet decomposes inflation data into high- and low-frequency components using wavelet transforms and incorporates additional economic factors, such as economic policy uncertainty and geopolitical risk, to enhance forecast accuracy. These wavelet-transformed series and filtered exogenous variables are input into downstream autoregressive neural networks, producing the final ensemble forecast. Theoretically, we demonstrate that FEWNet reduces empirical risk compared to fully connected autoregressive neural networks. Empirically, FEWNet outperforms other forecasting methods and effectively estimates prediction uncertainty due to its ability to capture non-linearities and long-range dependencies through its adaptable architecture. Consequently, FEWNet emerges as a valuable tool for central banks to manage inflation and enhance monetary policy decisions.
Keywords:	Inflation forecasting Wavelets Neural networks Empirical risk minimization Conformal prediction intervals
Date:	2024–09
URL:	https://d.repec.org/n?u=RePEc:hal:journl:hal-05056934

The Impact of Russia-Ukraine conflict on Global Commodity Brent Crude Prices

By:	Pal, Hemendra
Abstract:	This study investigates the impact of the Russia- Ukraine conflict on Brent Crude commodity pricing using World Bank time series data. The conflict’s influence on global oil and gas markets, characterized by intricate supply and demand dynamics, is analyzed through advanced time series techniques and machine learning modeling. Univariate models such as Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) are employed to discern temporal patterns in Brent Crude prices. Additionally, Seasonal Autoregressive Integrated Moving Average (SARIMA) and Exponential Smoothing State Space (ETS) models are utilized to capture complex seasonality and trends in the data. Moving beyond traditional methods, multivariate models are leveraged to comprehensively grasp the multifaceted impact of the conflict. Principal Component Analysis (PCA) and Factor Analysis are applied to uncover latent variables influencing Brent Crude pricing in the context of global trade disruptions, inflation, and diplomatic negotiations. These extracted components are then integrated with ensemble machine learning algorithms, including Random Forest, Extra Tree Classifier, Gradient Boosting, K-Nearest Neighbors, and Decision Trees. The fusion of multivariate time series analysis and machine learning empowers a holistic understanding of the conflict’s intricate repercussions on commodity prices. The analysis reveals that not only direct factors related to geopolitical tensions but also indirect economic data are crucial in determining Brent Crude prices. Factors such as declining industrial demand for precious metals like silver, disruptions in vehicle production due to supply chain breakdowns, reduced demand for automotive auto-catalysts, weak copper demand from China, and unexpected changes in steel consumption have contributed to the observed fluctuations in Brent Crude prices. Through a comprehensive exploration of time series data and advanced machine learning modeling, this research contributes to a a clearer understanding of the complex connections between the crisis in Russia and Ukraine and the price of commodities globally. The findings offer valuable insights for policy-makers, industry stakeholders, and investors seeking to navigate the complex landscape of commodity markets during periods of geopolitical instability.
Keywords:	Brent Crude Prices, Univariate Models, Multivariate Models, Ensemble Machine Learning, PCA, SARIMA, ETS
JEL:	C15 C32 C38 C45 C51 C53 C55 O57
Date:	2023–08–15
URL:	https://d.repec.org/n?u=RePEc:pra:mprapa:124770

Can Artificial Intelligence Trade the Stock Market?

By:	Jędrzej Maskiewicz (Quantitative Finance Research Group, Department of Quantitative Finance, Faculty of Economic Sciences, University of Warsaw); Paweł Sakowski (Quantitative Finance Research Group, Department of Quantitative Finance, Faculty of Economic Sciences, University of Warsaw)
Abstract:	The paper explores the use of Deep Reinforcement Learning (DRL) in stock market trading, focusing on two algorithms: Double Deep Q-Network (DDQN) and Proximal Policy Optimization (PPO) and compares them with Buy and Hold benchmark. It evaluates these algorithms across three currency pairs, the S&P 500 index and Bitcoin, on the daily data in the period of 2019-2023. The results demonstrate DRL's effectiveness in trading and its ability to manage risk by strategically avoiding trades in unfavorable conditions, providing a substantial edge over classical approaches, based on supervised learning in terms of risk-adjusted returns.
Keywords:	Reinforcement Learning, Deep Learning, stock market, algorithmic trading, Double Deep Q-Network, Proximal Policy Optimization
JEL:	C4 C14 C45 C53 C58 G13
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:war:wpaper:2025-14

Predicting the Price of Gold in the Financial Markets Using Hybrid Models

By:	Mohammadhossein Rashidi; Mohammad Modarres
Abstract:	Predicting the price that has the least error and can provide the best and highest accuracy has been one of the most challenging issues and one of the most critical concerns among capital market activists and researchers. Therefore, a model that can solve problems and provide results with high accuracy is one of the topics of interest among researchers. In this project, using time series prediction models such as ARIMA to estimate the price, variables, and indicators related to technical analysis show the behavior of traders involved in involving psychological factors for the model. By linking all of these variables to stepwise regression, we identify the best variables influencing the prediction of the variable. Finally, we enter the selected variables as inputs to the artificial neural network. In other words, we want to call this whole prediction process the "ARIMA_Stepwise Regression_Neural Network" model and try to predict the price of gold in international financial markets. This approach is expected to be able to be used to predict the types of stocks, commodities, currency pairs, financial market indicators, and other items used in local and international financial markets. Moreover, a comparison between the results of this method and time series methods is also expressed. Finally, based on the results, it can be seen that the resulting hybrid model has the highest accuracy compared to the time series method, regression, and stepwise regression.
Date:	2025–05
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2505.01402

Suspicion and Communication

By:	Lisa Bruttel (Universität Potsdam, Berlin School of Economics, CEPA); Friedericke Fromme (Universität Potsdam); Vasilisa Werner (Universität Potsdam, Berlin School of Economics)
Abstract:	In this paper, we study how communication influences suspicion. The experiment uses a sender-receiver setup with a low probability of misaligned incentives for senders and receivers. We focus on the impact of open communication on the receivers’ suspicion as measured by the size of the deviation from the senders’ recommendation before and after the communication. Overall, communication substantially reduces suspicion, but some receivers become more suspicious during the communication. We disentangle these effects using machine learning methods to analyze the chat logs.
Keywords:	cooperation, communication, suspicion, lying, laboratory experiment
JEL:	C92 D82 D83
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:pot:cepadp:86

Machine-Learning-Enhanced Measuring of Multidimensional Energy Poverty: Insights from a Pilot Survey in Portugal and Denmark

By:	Dejkam, Rahil (E.ON Energy Research Center, Future Energy Consumer Needs and Behavior (FCN)); Madlener, Reinhard (E.ON Energy Research Center, Future Energy Consumer Needs and Behavior (FCN))
Abstract:	Energy poverty, a multidimensional socio-economic challenge, significantly affects the welfare of many people across Europe. This paper aims to alleviate energy poverty by exploring sustainable energy practices and policy interventions, using pilot household survey data collected within an EU project in Portugal and Denmark. A novel multidimensional energy poverty index (MEPI) is developed to assess energy poverty through different dimensions—such as heating and cooling comfort, financial strain, access to energy-efficient appliances, and overall health and well-being. In a next step, for selecting features, machine learning techniques, including recursive feature elimination and random forest analysis, are employed. These methods help to reduce the number of irrelevant and mutually correlated predictors. Subsequently, a logistic regression model is used to predict energy-poor households based on selected socio-economic and policy-related factors. The logistic regression model results indicate that sustainable energy-saving behaviors and supportive government policies can indeed effectively mitigate energy poverty. Furthermore, to analyze the impact of the determined features, the shapley additive explanations (SHAP) method is being utilized. Finally, the main findings are further evaluated via scenario simulation analysis.
Keywords:	Multidimensional Energy Poverty Index (MEPI); Thermal Discomfort; Sustainable Energy-saving Practices; Logistic Regression; Recursive Feature Elimination-Cross Validation (RFE-CV)
JEL:	C60 C83
Date:	2024–10–01
URL:	https://d.repec.org/n?u=RePEc:ris:fcnwpa:2024_001

Do SDGs Support Human Security? A Machine Learning Analysis with Policy Recommendations

By:	Phoebe Koundouri; Kostas Dellis; Monika Mavragani; Angelos Plataniotis; Georgios Feretzakis
Abstract:	Human Security (HS) emphasizes safeguarding individuals from pervasive threats-ranging from poverty and health crises to environmental degradation and governance failures-by placing people's rights, needs, and dignity at the center of security and development discourse. The Sustainable Development Goals (SDGs), serving as a global compass for equitable and sustainable progress, inherently support the well-being and resilience of people worldwide. Yet the explicit linkages between these two frameworks are not always clear. This chapter introduces a machine learning (ML) approach to systematically map how HS-related policy documents and reports align with the SDGs. Using advanced language-model embeddings and similarity scoring, our methodology identifies the extent to which each policy text addresses defined HS Aspects and their Material Issues. This allows us to move beyond simple keyword spotting toward capturing nuanced thematic alignment. The resulting scores highlight overlooked connections or synergies, enabling policymakers to see where further integration can enhance outcomes. Our mapping exercise revealed that Economic and Food Security achieved the highest similarity scores, indicating robust policy alignment. Conversely, Technological Security received lower scores, highlighting a gap in addressing digital and innovation challenges within current frameworks and the necessity for integrated policy solutions. By identifying thematic synergies and gaps, we provide policymakers with concrete insights delineate policies that simultaneously enhance SDG outcomes and strengthen HS dimensions. Our results underscore the deep interconnection between HS and the SDGs, advancing our understanding of their mutual supportiveness. This study not only fills a critical gap in research by offering a pragmatic tool for assessing document alignment with the SDGs but also proposes an inclusive framework for policymakers and scholars. This framework encourages the integration of human-centered approaches with sustainable development goals. In doing so, it highlights the essential role of cutting-edge methodologies in navigating the complexities of global security and sustainability.
Keywords:	Human Security (HS), Sustainable Development Goals (SDGs), Machine Learning, Economic Policy, Textual Analysis
Date:	2025–05–29
URL:	https://d.repec.org/n?u=RePEc:aue:wpaper:2538

Self-selection into Health Professions

By:	Alessandro Fedele (Free University of Bozen-Bolzano, Italy); Mirco Tonin (Free University of Bozen-Bolzano, Italy); Daniel Wiesen (University of Cologne, Germany)
Abstract:	The health sector requires skilled, altruistic, and motivated individuals to perform complex tasks for which ex-post incentives may prove ineffective. Understanding the determinants of self-selection into health professions is therefore critical. We investigate this issue relying on data from surveys and incentivized dictator games. We compare applicants to medical and healthcare schools in Italy and Austria with non-applicants from the same regions and age cohorts. Drawing on a wide range of individual characteristics, we employ machine learning techniques for variable selection. Our findings show that higher cognitive ability, greater altruism, and the personality trait of conscientiousness are positively associated with the likelihood of applying to medical or nursing school, while neuroticism is negatively associated. Additionally, individuals with a strong identification with societal goals and those with parents working as doctors are more likely to pursue medical education. These results provide evidence of capable, altruistic, and motivated individuals self-selecting into the health sector, a necessary condition for building a high-quality healthcare workforce.
Keywords:	Self-selection, Health professions, Altruism, Cognitive ability, Personality traits, Machine learning (Lasso, high-dimensional metrics).
JEL:	I1 J24 J4
Date:	2025–05
URL:	https://d.repec.org/n?u=RePEc:bzn:wpaper:bemps115

Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks

By:	Qiang Chen; Tianyang Han; Jin Li; Ye Luo; Yuxiao Wu; Xiaowei Zhang; Tuo Zhou
Abstract:	Can AI effectively perform complex econometric analysis traditionally requiring human expertise? This paper evaluates an agentic AI's capability to master econometrics, focusing on empirical analysis performance. We develop an ``Econometrics AI Agent'' built on the open-source MetaGPT framework. This agent exhibits outstanding performance in: (1) planning econometric tasks strategically, (2) generating and executing code, (3) employing error-based reflection for improved robustness, and (4) allowing iterative refinement through multi-round conversations. We construct two datasets from academic coursework materials and published research papers to evaluate performance against real-world challenges. Comparative testing shows our domain-specialized agent significantly outperforms both benchmark large language models (LLMs) and general-purpose AI agents. This work establishes a testbed for exploring AI's impact on social science research and enables cost-effective integration of domain expertise, making advanced econometric methods accessible to users with minimal coding expertise. Furthermore, our agent enhances research reproducibility and offers promising pedagogical applications for econometrics teaching.
Date:	2025–06
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2506.00856

Worker specialization and the consequences of occupational decline

By:	Ek, Simon (IFAU - Institute for Evaluation of Labour Market and Education Policy)
Abstract:	Are workers with poor outside opportunities less responsive and more susceptible to negative demand shifts in routine occupations? To answer this, I create and estimate an occupation specialization index (OSI) using Swedish register data and machine Learning tools. It measures the expected utility difference between a worker’s occupation and his best outside option. This determines the loss he is willing to tolerate to avoid switching. Low-OSI generalists disproportionately left routine work. Their future wage growth was comparable to similar workers initially in non-routine occupations. By contrast, routine specialists largely stayed put and experienced lower wage growth than generalists and non-routine specialists.
Keywords:	Multidimensional skills; Occupational structure changes
JEL:	J23 J24 J31 J62
Date:	2025–05–26
URL:	https://d.repec.org/n?u=RePEc:hhs:ifauwp:2025_007

This nep-big issue is ©2025 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.