|
on Big Data |
By: | Prashant Pilla; Raji Mekonen |
Abstract: | With the volatile and complex nature of financial data influenced by external factors, forecasting the stock market is challenging. Traditional models such as ARIMA and GARCH perform well with linear data but struggle with non-linear dependencies. Machine learning and deep learning models, particularly Long Short-Term Memory (LSTM) networks, address these challenges by capturing intricate patterns and long-term dependencies. This report compares ARIMA and LSTM models in predicting the S&P 500 index, a major financial benchmark. Using historical price data and technical indicators, we evaluated these models using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). The ARIMA model showed reasonable performance with an MAE of 462.1, RMSE of 614, and 89.8 percent accuracy, effectively capturing short-term trends but limited by its linear assumptions. The LSTM model, leveraging sequential processing capabilities, outperformed ARIMA with an MAE of 369.32, RMSE of 412.84, and 92.46 percent accuracy, capturing both short- and long-term dependencies. Notably, the LSTM model without additional features performed best, achieving an MAE of 175.9, RMSE of 207.34, and 96.41 percent accuracy, showcasing its ability to handle market data efficiently. Accurately predicting stock movements is crucial for investment strategies, risk assessments, and market stability. Our findings confirm the potential of deep learning models in handling volatile financial data compared to traditional ones. The results highlight the effectiveness of LSTM and suggest avenues for further improvements. This study provides insights into financial forecasting, offering a comparative analysis of ARIMA and LSTM while outlining their strengths and limitations. |
Date: | 2025–01 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2501.17366 |
By: | Ying Chen; Paul Griffin; Paolo Recchia; Zhou Lei; Hongrui Chang |
Abstract: | Recovery rate prediction plays a pivotal role in bond investment strategies, enhancing risk assessment, optimizing portfolio allocation, improving pricing accuracy, and supporting effective credit risk management. However, forecasting faces challenges like high-dimensional features, small sample sizes, and overfitting. We propose a hybrid Quantum Machine Learning model incorporating Parameterized Quantum Circuits (PQC) within a neural network framework. PQCs inherently preserve unitarity, avoiding computationally costly orthogonality constraints, while amplitude encoding enables exponential data compression, reducing qubit requirements logarithmically. Applied to a global dataset of 1, 725 observations (1996-2023), our method achieved superior accuracy (RMSE 0.228) compared to classical neural networks (0.246) and quantum models with angle encoding (0.242), with efficient computation times. This work highlights the potential of hybrid quantum-classical architectures in advancing recovery rate forecasting. |
Date: | 2025–01 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2501.15828 |
By: | Jinghai He; Cheng Hua; Chunyang Zhou; Zeyu Zheng |
Abstract: | We develop a portfolio allocation framework that leverages deep learning techniques to address challenges arising from high-dimensional, non-stationary, and low-signal-to-noise market information. Our approach includes a dynamic embedding method that reduces the non-stationary, high-dimensional state space into a lower-dimensional representation. We design a reinforcement learning (RL) framework that integrates generative autoencoders and online meta-learning to dynamically embed market information, enabling the RL agent to focus on the most impactful parts of the state space for portfolio allocation decisions. Empirical analysis based on the top 500 U.S. stocks demonstrates that our framework outperforms common portfolio benchmarks and the predict-then-optimize (PTO) approach using machine learning, particularly during periods of market stress. Traditional factor models do not fully explain this superior performance. The framework's ability to time volatility reduces its market exposure during turbulent times. Ablation studies confirm the robustness of this performance across various reinforcement learning algorithms. Additionally, the embedding and meta-learning techniques effectively manage the complexities of high-dimensional, noisy, and non-stationary financial data, enhancing both portfolio performance and risk management. |
Date: | 2025–01 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2501.17992 |
By: | John Michael, Riveros-Gavilanes |
Abstract: | This paper presents two synthetic estimations of the Gini coefficient at a municipality level for Colombia in the years 2000-2020. The methodology relies on several machine learning models to select the best model for imputation of the data. This derives in two Random Forest models were the first is characterized by containing Dominant Fixed Effects, while the second contains a set of Dominant Varying Factors. Upon these estimations, the Synthetic Gini Coefficients for both models are inspected, and public links are generated to access them. The Dominant Fixed Effects models is rather ”stiff” in contrast to the Varying Factor model. Hence, for researchers it is recommended to use the Synthetic Gini Coefficient with Varying Factors because it contains greater variability across time than the Dominant Fixed Effects models. |
Keywords: | Gini; Machine learning; Random forest; estimation; synthetic; economics |
JEL: | C80 H7 O10 P19 |
Date: | 2025–02–01 |
URL: | https://d.repec.org/n?u=RePEc:pra:mprapa:123561 |
By: | Jose Mauricio Gomez Julian |
Abstract: | This research studies the relation between money and prices and its practical implications analyzing quarterly data from United States (1959-2022), Canada (1961-2022), United Kingdom (1986-2022), and Brazil (1996-2022). The historical, logical, and econometric consistency of the logical core of the two main theories of money is analyzed using objective bayesian and frequentist machine learning models, bayesian regularized artificial neural networks, and ensemble learning. It is concluded that money is not neutral at any time horizon and that, despite money is ultimately subordinated to prices, there is a reciprocal influence over time between money and prices which constitute a complex system. Non-neutrality is transmitted through aggregate demand and is based on the exchange value of money as a monetary unit. |
Date: | 2025–01 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2501.14623 |
By: | Jacob Pratt (University of Tennessee Chattanooga, USA); Serkan Varol (University of Tennessee Chattanooga, USA); Serkan Catma (University of Tennessee Chattanooga, USA) |
Abstract: | The COVID-19 pandemic has necessitated the use of multidisciplinary approach to assess public health interventions. Data science has been widely utilized to promote interdisciplinary collaboration especially during the post-COVID era. This study uses a comprehensive dataset, including mask usage and epidemiological metrics from U.S. counties, to explore the correlation between public compliance with mask-wearing guidelines and COVID-19 mortality rates. After employing machine learning approaches such as linear regression, decision tree regression, and random forest regression, our analysis identified the random forest model as the most accurate model in predicting mortality rates due to its efficacy with the lowest error metrics. The models' performances were rigorously evaluated through error metric comparisons, highlighting the random forest model's robustness in handling complex interactions between variables. These findings provide actionable insights for public health strategists and policy makers, suggesting that enhanced mask compliance could significantly mitigate mortality rates during the ongoing pandemic and future health crises. |
Keywords: | machine learning applications, predictive modeling for public health, COVID-19 analysis, pandemic, model comparison |
Date: | 2024–08 |
URL: | https://d.repec.org/n?u=RePEc:smo:raiswp:0455 |
By: | Visentin, Andrea; Volante, Louis |
Abstract: | This study investigates differences in employment outcomes of students graduating from private versus public universities in Spain, and the resulting impact on employment outcomes. The methodology involves propensity score matching, utilising novel machine learning approaches. Machine learning algorithms can be used to calculate propensity scores and can potentially have advantages compared to conventional methods. Contrary to previous research carried out in Spain, this analysis found a wage premium for those pupils who attended a private university in the short and medium term, although these differences were relatively small. The discussion outlines the implications for intergenerational inequality, policy development, and future research that utilises machine learning algorithms. |
JEL: | I24 I25 J62 O15 |
Date: | 2023–11–13 |
URL: | https://d.repec.org/n?u=RePEc:unm:unumer:2023039 |
By: | Hasan Fallahgoul |
Abstract: | This paper develops a scale-insensitive framework for neural network significance testing, substantially generalizing existing approaches through three key innovations. First, we replace metric entropy calculations with Rademacher complexity bounds, enabling the analysis of neural networks without requiring bounded weights or specific architectural constraints. Second, we weaken the regularity conditions on the target function to require only Sobolev space membership $H^s([-1, 1]^d)$ with $s > d/2$, significantly relaxing previous smoothness assumptions while maintaining optimal approximation rates. Third, we introduce a modified sieve space construction based on moment bounds rather than weight constraints, providing a more natural theoretical framework for modern deep learning practices. Our approach achieves these generalizations while preserving optimal convergence rates and establishing valid asymptotic distributions for test statistics. The technical foundation combines localization theory, sharp concentration inequalities, and scale-insensitive complexity measures to handle unbounded weights and general Lipschitz activation functions. This framework better aligns theoretical guarantees with contemporary deep learning practice while maintaining mathematical rigor. |
Date: | 2025–01 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2501.15753 |
By: | Hadi Elzayn; Simon Freyaldenhoven; Minchul Shin |
Abstract: | We develop a clustering-based algorithm to detect loan applicants who submit multiple applications (“cross-applicants”) in a loan-level dataset without personal identifiers. A key innovation of our approach is a novel evaluation method that does not require labeled training data, allowing us to optimize the tuning parameters of our machine learning algorithm. By applying this methodology to Home Mortgage Disclosure Act (HMDA) data, we create a unique dataset that consolidates mortgage applications to the individual applicant level across the United States. Our preferred specification identifies cross-applicants with 93 percent precision |
Keywords: | clustering; mortgage applications; HMDA |
JEL: | C38 C63 C81 G21 R21 |
Date: | 2025–02–04 |
URL: | https://d.repec.org/n?u=RePEc:fip:fedpwp:99499 |
By: | Moore, Lindsey; van de Laar, Mindel (RS: GSBE MORSE, Maastricht Graduate School of Governance, RS: GSBE MGSoG); Wong, Pui Hang (Maastricht Graduate School of Governance, RS: GSBE MGSoG, RS: UNU-MERIT Theme 3, RS: UNU-MERIT Theme 4); O'Donoghue, Cathal (RS: UNU-MERIT) |
Abstract: | This paper introduces a novel methodology aimed at addressing a critical knowledge gap related to the lack of a systematic understanding of agriculture projects across spatial and temporal dimensions. This gap has impeded efforts to enhance learning and accountability, thereby reducing the overall effectiveness of foreign assistance to the agriculture sector. To address this gap, deductive and inductive methodologies are applied to develop a standardized taxonomy for benchmarking United States Agency for International Development (USAID) agricultural projects. By applying this taxonomy to code all available final evaluations of USAID projects, a large qualitative dataset was generated. This dataset facilitates the analysis of the rich qualitative information available within public project evaluations and covers ninety countries over a span of six decades. The result of this research is a new dataset on the multi-layer composition of development projects, forming the foundation for a machine learning algorithm that expedites the process of synthesizing qualitative evidence and measuring the impact of development aid projects at a systems level. The overarching objective of this research is to contribute to the improvement of project and policy implementation in the field of agriculture development. |
JEL: | C40 F35 O13 |
Date: | 2023–04–04 |
URL: | https://d.repec.org/n?u=RePEc:unm:unumer:2023011 |
By: | Tavishi Choudhary (Greenwich High, Greenwich, Connecticut, US) |
Abstract: | Artificial Intelligence large language models have rapidly gained widespread adoption, sparking discussions on their societal and political impact, especially for political bias and its far-reaching consequences on society and citizens. This study explores the political bias in large language models by conducting a comparative analysis across four popular AI mod-els—ChatGPT-4, Perplexity, Google Gemini, and Claude. This research systematically evaluates their responses to politically charged prompts and questions from the Pew Research Center’s Political Typology Quiz, Political Compass Quiz, and ISideWith Quiz. The findings revealed that ChatGPT-4 and Claude exhibit a liberal bias, Perplexity is more conservative, while Google Gemini adopts more centrist stances based on their training data sets. The presence of such biases underscores the critical need for transparency in AI development and the incorporation of diverse training datasets, regular audits, and user education to mitigate any of these biases. The most significant question surrounding political bias in AI is its consequences, particularly its influence on public discourse, policy-making, and democratic processes. The results of this study advocate for ethical implications for the development of AI models and the need for transparency to build trust and integrity in AI models. Additionally, future research directions have been outlined to explore and address the complex AI bias issue. |
Keywords: | Large language models (LLM), Generative AI (GenAI), AI Governance and Policy, Ethical AI Systems |
Date: | 2024–08 |
URL: | https://d.repec.org/n?u=RePEc:smo:raiswp:0451 |
By: | Sukjin Han; Kyungho Lee |
Abstract: | Copyright policies play a pivotal role in protecting the intellectual property of creators and companies in creative industries. The advent of cost-reducing technologies, such as generative AI, in these industries calls for renewed attention to the role of these policies. This paper studies product positioning and competition in a market of creatively differentiated products and the competitive and welfare effects of copyright protection. A common feature of products with creative elements is that their key attributes (e.g., images and text) are unstructured and thus high-dimensional. We focus on a stylized design product, fonts, and use data from the world's largest online marketplace for fonts. We use neural network embeddings to quantify unstructured attributes and measure the visual similarity. We show that this measure closely aligns with actual human perception. Based on this measure, we empirically find that competitions occur locally in the visual characteristics space. We then develop a structural model for supply and demand that integrate the embeddings. Through counterfactual analyses, we find that local copyright protection can enhance consumer welfare when products are relocated, and the interplay between copyright and cost-reducing technologies is essential in determining an optimal policy for social welfare. We believe that the embedding analysis and empirical models introduced in this paper can be applicable to a range of industries where unstructured data captures essential features of products and markets. |
Date: | 2025–01 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2501.16120 |