|
on Big Data |
By: | Mahalakshmi Manian (Research Scholar); Parthajit Kayal ((corresponding author), Assistant Professor Madras School of Economics, Chennai) |
Abstract: | This research investigates the phenomenon of economic or financial bubbles within the Indian stock market context, characterized by pronounced asset price inflation exceeding the intrinsic worth of the underlying assets. Leveraging data from the NIFTY 500 index spanning the period 2003 to 2021, the study utilizes the Phillips, Shi, and Yu (PSY) method (Phillips et. al., 2015b), which employs a right-tailed unit root test, to discern the presence of financial bubbles. Subsequently, machine learning algorithms are employed to predict real-time occurrences of such bubbles. Analysis reveals the manifestation of financial bubbles within the Indian stock market notably in the years 2007 and 2017. Moreover, empirical evidence underscores the superior predictive efficacy of Artificial Neural Networks, Random Forest, and Gradient Boosting algorithms vis-à-vis conventional statistical methodologies in forecasting financial bubble occurrences within the Indian stock market. Policymakers should use advanced machine learning techniques for real-time financial bubble detection to improve regulation and mitigate market risks. |
Keywords: | Financial Bubbles; Machine Learning; K-nearest Neighbour; Random Forest Classifier; Artificial Neural Network; Naïve Bayes |
JEL: | G1 G2 G3 C1 C5 |
Date: | 2024–10 |
URL: | https://d.repec.org/n?u=RePEc:mad:wpaper:2024-270 |
By: | Sebastian Bell; Ali Kakhbod; Martin Lettau; Abdolreza Nazemi |
Abstract: | Machine learning methods in asset pricing are often criticized for their black box nature. We study this issue by predicting corporate bond returns using interpretable machine learning on a high-dimensional bond characteristics dataset. We achieve state-of-the-art performance while maintaining an interpretable model structure, overcoming the accuracy-interpretability trade-off. The estimation uncovers nonlinear relationships and economically meaningful interactions in bond pricing, notably related to term structure and macroeconomic uncertainty. Subsample analysis reveals stronger sensitivities to these effects for small firms and long-maturity bonds. Finally, we demonstrate how interpretable models enhance transparency in portfolio construction by providing ex ante insights into portfolio composition. |
JEL: | C45 C55 G11 G12 |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:33320 |
By: | Robert Kelchen; Dubravka Ritter; Douglas A. Webber |
Abstract: | In this paper, we assemble the most comprehensive dataset to date on the characteristics of colleges and universities, including dates of operation, institutional setting, student body, staff, and finance data from 2002 to 2023. We provide an extensive description of what is known and unknown about closed colleges compared with institutions that did not close. Using this data, we first develop a series of predictive models of financial distress, utilizing factors like operational revenue/expense patterns, sources of revenue, metrics of liquidity and leverage, enrollment/staff patterns, and prior signs of significant financial strain. We benchmark these models against existing federal government screening mechanisms such as financial responsibility scores and heightened cash monitoring. We document a high degree of missing data among colleges that eventually close and show that this is a key impediment to identifying at risk institutions. We then show that modern machine learning techniques, combined with richer data, are far more effective at predicting college closures than linear probability models, and considerably more effective than existing accountability metrics. Our preferred model, which combines an off-the-shelf machine learning algorithm with the richest set of explanatory variables, can significantly improve predictive accuracy even for institutions with complete data, but is particularly helpful for predicting instances of financial distress for institutions with spotty data. Finally, we conduct simulations using our estimates to contemplate likely increases in future closures, showing that enrollment challenges resulting from an impending demographic cliff are likely to significantly increase annual college closures for reasonable scenarios. |
Keywords: | higher education; college; university; enrollment; tuition; revenue; budget; closure; fiscal challenge; demographic cliff |
JEL: | I2 I22 I23 |
Date: | 2024–12–02 |
URL: | https://d.repec.org/n?u=RePEc:fip:fedpwp:99207 |
By: | Kaiji Chen; Mr. Yunhui Zhao |
Abstract: | We construct a daily Chinese Housing Market Sentiment Index by applying GPT-4o to Chinese news articles. Our method outperforms traditional models in several validation tests, including a test based on a suite of machine learning models. Applying this index to household-level data, we find that after monetary easing, an important group of homebuyers (who have a college degree and are aged between 30 and 50) in cities with more optimistic housing sentiment have lower responses in non-housing consumption, whereas for homebuyers in other age-education groups, such a pattern does not exist. This suggests that current monetary easing might be more effective in boosting non-housing consumption than in the past for China due to weaker crowding-out effects from pessimistic housing sentiment. The paper also highlights the need for complementary structural reforms to enhance monetary policy transmission in China, a lesson relevant for other similar countries. Methodologically, it offers a tool for monitoring housing sentiment and lays out some principles for applying generative AI models, adaptable to other studies globally. |
Keywords: | Chinese Housing Market Sentiment; Generative AI; Monetary Policy Transmission; Consumption; Crowding-Out |
Date: | 2024–12–23 |
URL: | https://d.repec.org/n?u=RePEc:imf:imfwpa:2024/264 |
By: | Wendy E. Dunn; Raakin Kabir; Ellen E. Meade; Nitish R. Sinha |
Abstract: | In an era increasingly shaped by artificial intelligence (AI), the public’s understanding of economic policy may be filtered through the lens of generative AI models (also called large language models or LLMs). Generative AI models offer the promise of quickly ingesting and interpreting large amounts of textual information. |
Date: | 2024–12–06 |
URL: | https://d.repec.org/n?u=RePEc:fip:fedgfn:2024-12-06-1 |
By: | Matthias R. Fengler (University of St. Gallen - SEPS: Economics and Political Sciences; Swiss Finance Institute); Minh Tri Phan (University of St. Gallen (HSG)) |
Abstract: | We investigate the topics in the Management’s Discussion and Analysis (MD&A) section of 10-K filings. Our approach extracts MD&A topics by clustering words around anchor words that broadly define potential themes. The resulting topics are intelligible, distinct and multi-faceted, shedding light on why classical topic models applied to 10-K filings might lack interpretability. We extract two loading series from the MD&As: topic prevalence and topic sentiment. We find that topic prevalence exhibits significant variation throughout the sample period, while sentiment displays marked heterogeneity across topics. Linking MD&A topics to stock returns, we document non-uniform market perceptions toward the topic sentiment. |
Keywords: | 10-K files, MD&A, natural language processing, topic modeling |
Date: | 2024–10 |
URL: | https://d.repec.org/n?u=RePEc:chf:rpseri:rp24106 |
By: | Hai-Anh Dang; Ibrahima Sarr; Carlos Santiago Guzman Gutierrez; Theresa Beltramo; Paolo Verme |
Abstract: | Household consumption or income surveys do not typically cover refugee populations. In the rare cases where refugees are included, inconsistencies between different data sources could interfere with comparable poverty estimates. We test the performance of a recently developed cross-survey imputation method to estimate poverty for a sample of refugees in Colombia, combining household income surveys collected by the Government of Colombia and administrative (ProGres) data collected by the United Nations High Commissioner for Refugees in 2019 and 2022. We find that certain variable transformation methods can help resolve these inconsistencies. Estimation results with our preferred variable standardization method are robust to different imputation methods, including the normal linear regression method, the empirical distribution of the errors method, and the probit and logit methods. Several common machine learning techniques generally perform worse than our proposed imputation methods. We also find that we can reasonably impute poverty rates using an older household income survey and a more recent ProGres dataset for most of the poverty lines. These results provide relevant inputs into designing better surveys and administrative datasets on refugees in various country settings. |
Keywords: | colombia, imputation, poverty, refugees |
JEL: | C15 F22 I32 O15 O20 |
Date: | 2024 |
URL: | https://d.repec.org/n?u=RePEc:hic:wpaper:422 |
By: | Weinig, Max; Fritsche, Ulrich |
Abstract: | In recent years, there has been increasing interest in the analysis of narratives in macroeconomic research. Our paper contributes to this research by proposing a way to identify and extract economic narratives from media reports. Therefore, this paper applies state-of-the-art bag-of-words text analysis methods to a large news corpus covering five years of news coverage in combination with results from a survey study on recent inflation narratives (Andre et al., 2023) in the US. This approach enables us to measure the prevalence and spread of inflation narratives over time and to examine the role of these narratives in aggregate macroeconomic expectations. Using Granger causality tests and local projections, we provide empirical evidence on the dynamics between inflation narratives and inflation expectations. Moreover, the paper highlights the vast heterogeneity across shortterm and mid-term inflation expectations as well as socioeconomic groups. |
Keywords: | narratives, expectations, inflation, media, textual data, machine learning |
JEL: | D84 E31 E32 E52 E71 |
Date: | 2024 |
URL: | https://d.repec.org/n?u=RePEc:zbw:uhhwps:307613 |
By: | Tobias Schimanski (University of Zurich); Chiara Colesanti Senni (University of Zurich - Department of Finance); Glen Gostlow (University of Zurich - Department Finance); Jingwei Ni (ETH Zurich); Tingyu Yu (University of Zurich - Department Finance); Markus Leippold (University of Zurich; Swiss Finance Institute) |
Abstract: | Nature is an amorphous concept. Yet, it is essential for the planet's well-being to understand how the economy interacts with it. To address the growing demand for information on corporate nature disclosure, we provide datasets and classifiers to detect nature communication by companies. We ground our approach in the guidelines of the Taskforce on Nature-related Financial Disclosures (TNFD). Particularly, we focus on the specific dimensions of water, forest, and biodiversity. For each dimension, we create an expert-annotated dataset with 2, 200 text samples and train classifier models. Furthermore, we show that nature communication is more prevalent in hotspot areas and directly effected industries like agriculture and utilities. Our approach is the first to respond to calls to assess corporate nature communication on a large scale. |
Keywords: | Nature-related risks, TNFD, Natural Language Processing, Disclosure |
Date: | 2024–01 |
URL: | https://d.repec.org/n?u=RePEc:chf:rpseri:rp2495 |
By: | Tobias Schimanski (University of Zurich); Jingwei Ni (ETH Zurich); Mathias Kraus (University of Erlangen-Nuremberg-Friedrich Alexander Universität Erlangen Nürnberg); Elliott Ash (ETH Zürich); Markus Leippold (University of Zurich; Swiss Finance Institute) |
Abstract: | Advances towards more faithful and traceable answers of Large Language Models (LLMs) are crucial for various research and practical endeavors. One avenue in reaching this goal is basing the answers on reliable sources. However, this Evidence-Based QA has proven to work insufficiently with LLMs in terms of citing the correct sources (source quality) and truthfully representing the information within sources (answer attributability). In this work, we systematically investigate how to robustly fine-tune LLMs for better source quality and answer attributability. Specifically, we introduce a data generation pipeline with automated data quality filters, which can synthesize diversified high-quality training and testing data at scale. We further introduce four test sets to benchmark the robustness of fine-tuned specialist models. Extensive evaluation shows that fine-tuning on synthetic data improves performance on both in- and out-of-distribution. Furthermore, we show that data quality, which can be drastically improved by proposed quality filters, matters more than quantity in improving Evidence-Based QA. |
Date: | 2024–03 |
URL: | https://d.repec.org/n?u=RePEc:chf:rpseri:rp2494 |
By: | Munipalle, Pravith |
Abstract: | Bot trading, or algorithmic trading, has transformed modern financial markets by using advanced technologies like artificial intelligence and machine learning to execute trades with unparalleled speed and efficiency. This paper examines the mechanisms and types of trading bots, their impact on market liquidity, efficiency, and stability, and the ethical and regulatory challenges they pose. Key findings highlight the dual nature of bot trading—enhancing market performance while introducing systemic risks, such as those observed during the 2010 Flash Crash. Emerging technologies like blockchain and predictive analytics, along with advancements in AI, present opportunities for innovation but also underscore the need for robust regulations and ethical design. To provide deeper insights, we conducted an experiment analyzing the performance of different trading bot strategies in simulated market conditions, revealing the potential and pitfalls of these systems under varying scenarios. |
Date: | 2024–12–22 |
URL: | https://d.repec.org/n?u=RePEc:osf:osfxxx:p98zv |
By: | Markus Leippold (University of Zurich; Swiss Finance Institute); Qian Wang (University of Zurich - Department Finance; Inovest Partners AG); Min Yang (Swiss Finance Institute - University of Zurich) |
Abstract: | This paper explores the effectiveness of technical patterns in predicting asset prices and market movements, emphasizing the role of news sentiment. We employ an image recognition method to detect technical patterns in price images and assess whether this approach provides more information than traditional rule-based methods. Our findings indicate that many model-based patterns yield significant returns in the US market, whereas bottom-type patterns are less effective in the Chinese market. The model demonstrates high accuracy in training samples and strong out-of-sample performance. Our empirical analysis concludes that technical patterns remain effective in recent stock markets when combined with news sentiment, offering a profitable portfolio strategy. This study highlights the potential of image recognition methods in market data analysis and underscores the importance of sentiment in technical analysis. |
Date: | 2024–08 |
URL: | https://d.repec.org/n?u=RePEc:chf:rpseri:rp2488 |
By: | Chiara Colesanti Senni (University of Zurich - Department of Finance); Tobias Schimanski (University of Zurich); Julia Bingler (University of Oxford); Jingwei Ni (ETH Zurich); Markus Leippold (University of Zurich; Swiss Finance Institute) |
Abstract: | Company transition plans toward a low-carbon economy are key for effective capital allocation and risk management. This paper proposes a set of 64 indicators to comprehensively assess transition plans and develops a Large Language Model-based tool to automate the assessment of company disclosures. We evaluate our tool with experts from 26 institutions, including financial regulators, investors, and non-governmental organizations. We apply the tool to the sustainability reports from carbon-intensive Climate Action 100+ companies. Our results show that companies tend to disclose more information related to target setting (talk), but fewer information related to the concrete implementation of strategies (walk). In addition, companies that disclose more information tend to have lower emissions. Our results highlight the need for increased scrutiny of companies' efforts and potential greenwashing risks. The complexity of transition activities presents a major challenge for comprehensive large-scale assessments. As shown in this paper, novel and flexible approaches using Large Language Models can serve as a remedy. |
Keywords: | Climate disclosure, Large Language Models, RAG system, transition plans, human evaluation, CA100+ |
Date: | 2024–05 |
URL: | https://d.repec.org/n?u=RePEc:chf:rpseri:rp2492 |
By: | Barbaglia, Luca (European Commission - JRC); Bellia, Mario (European Commission - JRC); Di Girolamo, Francesca (European Commission - JRC); Rho, Caterina (European Commission - JRC) |
Abstract: | Digital and crypto currencies are becoming an integral part of financial markets. Nevertheless, regulation of these markets seems still at an early stage and the literature evaluating the impact of policy interventions is scarce. We investigate the reaction of crypto markets in the aftermath of a policy announcement using textual information from news and sentiment analysis. Our findings are threefold. First, there is evidence of peaks in news about crypto-assets in correspondence of the date of new developments in EU legislation, in particular about Central Bank Digital currencies. Second, we find that both returns of cryptocurrencies and general stock market returns are directly proportional to the news sentiment about crypto markets. Third, our event study shows that the introduction of regulation on digital and crypto currencies is perceived as a negative shock by financial markets, especially for digital currencies. |
Keywords: | cryptocurrencies, digital finance, text mining |
JEL: | C55 E42 G41 |
Date: | 2024–11 |
URL: | https://d.repec.org/n?u=RePEc:jrs:wpaper:202407 |
By: | Daniel Hartley; Jonathan D. Rose; Becky Schneirov |
Abstract: | We characterize the dynamics of neighborhood racial composition by using the k-medians machine learning technique to group neighborhoods into five different patterns according to the evolution of the Black population share of census tracts from 1950 through 1990. The procedure classifies tracts into groups that: always have a high Black population share, always have a low Black population share, have a steep increase in the Black population share from 1950-1960, or 1960-1970, and those that have a gradual increase in the Black population share from 1950-1990. We calculate the growth in median rents and home values in each to the five groups and find that those with steep increases in the Black population share show the smallest increases in home values and rent implying that Black households that bought homes in these neighborhoods in 1950 or 1960 were likely to have lost money or barely broken even by 1990. |
Keywords: | blockbusting; neighborhood dynamics; Housing prices; cluster analysis; wealth-gap |
JEL: | C38 N22 N92 G21 R23 |
Date: | 2024–09 |
URL: | https://d.repec.org/n?u=RePEc:fip:fedhwp:99309 |
By: | Jianghao Chu (JPMorgan Chase & Co., Jersey City, NJ 07310); Tae-Hwy Lee (Department of Economics, University of California Riverside); Aman Ullah (Department of Economics, University of California, Riverside) |
Abstract: | Carter Hill's numerous contributions (books and articles) in econometrics stand out especially in pedagogy. An important aspect of his pedagogy is to integrate "theory and practice'' of econometrics, as coined into the titles of his popular books. The new methodology we propose in this paper is consistent with these contributions of Carter Hill. In particular, we bring the maximum score regression of \citet{Manski1975, Manski1985} to high dimension in theory and show that the "Asymmetric AdaBoost'' provides the algorithmic implementation of the high dimensional maximum score regression in practice. Recent advances in machine learning research have not only expanded the horizon of econometrics by providing new methods but also provided the algorithmic aspects of many of traditional econometrics methods. For example, Adaptive Boosting (AdaBoost) introduced by \citet{Freund1996} has gained enormous success in binary/discrete classification/prediction. In this paper, we introduce the ``Asymmetric AdaBoost'' and relate it to the maximum score regression in the algorithmic perspective. The Asymmetric AdaBoost solves high-dimensional binary classification/prediction problem with state-dependent loss functions. Asymmetric AdaBoost produces a nonparametric classifier via minimizing the "asymmetric exponential risk'' which is a convex surrogate of the non-convex 0-1 risk. The convex risk function gives a huge computational advantage over non-convex risk functions of \citet{Manski1975, Manski1985} especially when the data is high-dimensional. The resulting nonparametric classifier is more robust than the parametric classifiers whose performance depends on the correct specification of the model. We show that the risk of the classifier that Asymmetric AdaBoost produces approaches the Bayes risk which is the infimum of risk that can be achieved by all classifiers. Monte Carlo experiments show that the Asymmetric AdaBoost performs better than the commonly used LASSO-regularized logistic regression when parametric assumption is violated and sample size is large. We apply the Asymmetric AdaBoost to predict business cycle turning points as in \citet{Ng2014a}. |
Keywords: | AdaBoost, Asymmetric Loss, Maximum Score Estimation, Binary Choice Models, High Dimensional Predictors |
JEL: | C1 C5 |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:ucr:wpaper:202414 |
By: | Christoph Engel (Max Planck Institute for Research on Collective Goods) |
Keywords: | comparative law, contract fulfilment, change in circumstances, experiment, large language model, GPT, manipulating language of stimulus material |
JEL: | C45 C81 C88 C91 D91 K12 K40 P50 |
Date: | 2024–07 |
URL: | https://d.repec.org/n?u=RePEc:mpg:wpaper:2024_12 |
By: | Leogrande, Angelo; Drago, Carlo; Mallardi, Giulio; Costantiello, Alberto; Magaletti, Nicola |
Abstract: | This article focuses on the propensity to patent across Italian regions, considering data from ISTAT-BES between 2004 and 2019 to contribute to analyzing regional gaps and determinants of innovative performances. Results show how the North-South gap in innovative performance has persisted over time, confirming the relevance of research intensity, digital infrastructure, and cultural employment on patenting activity. These relations have been analyzed using the panel data econometric model. It allows singling out crucial positive drivers like R&D investment or strongly negative factors, such as limited mobility of graduates. More precisely, given the novelty of approaches applied in the used model, the following contributions are represented: first, the fine grain of regional differentiation, from which the sub-national innovation system will be observed. It also puts forward a set of actionable policy recommendations that would contribute to more substantial inclusive innovation, particularly emphasizing less-performing regions. By focusing on such dynamics, this study will indirectly address how regional characteristics and policies shape innovation and technological competitiveness in Italy. Therefore, it contributes to the debate on regional systems of innovation and their possible role in economic development in Europe since the economic, institutional, and technological conditions are differentiated between various areas in Italy. |
Date: | 2024–12–27 |
URL: | https://d.repec.org/n?u=RePEc:osf:socarx:nftv3 |
By: | Chiara Colesanti Senni (University of Zurich - Department of Finance); Saeid Vaghefi (University of Zurich); Tobias Schimanski (University of Zurich); Tushar Manekar (Zurich University); Markus Leippold (University of Zurich; Swiss Finance Institute) |
Abstract: | Nature-related disclosures by companies are insufficient. As long as they remain voluntary, this situation is unlikely to improve, even under well-intentioned initiatives like the Task Force on Nature-related Financial Disclosures (TNFD). These conclusions are based on our investigation into the decision-usefulness of such disclosures through the development of ASKNATURE 1 , an AI-powered tool that analyzes company reports to assess their environmental impact. Our conversational AI prototype can answer challenging questions in two settings: (1) recommendations and guidelines from organizations such as the Task Force on Nature-related Financial Disclosures (TNFD) and (2) user-specific inquiries for Corporate Sustainability Reports (CSR). As an illustration, we apply ASKNATURE to the CSRs of the Nature Action 100 (NA100) companies. Based on the answers provided by our tool and in line with the double materiality paradigm, we classify companies' activities based on their impact direction: company-to-nature (C2N), nature-tocompany (N2C), or neutral. Despite the unprecedented loss of biodiversity and significant depletion of natural capital, our sentiment analysis reveals that corporate disclosures predominantly report positive C2N impact. Moreover, there is minimal overlap between the countries mentioned in the reports and regions of environmental significance, which raises concerns about transparency. Consequently, we find that current CSR disclosures, although aligned with the TNFD, are not sufficiently decision-useful for stakeholders and lack legal enforceability. |
Date: | 2024–07 |
URL: | https://d.repec.org/n?u=RePEc:chf:rpseri:rp2490 |
By: | Markus Leippold (University of Zurich; Swiss Finance Institute); Saeid Vaghefi (University of Zurich); Veruska Muccione (University of Zurich - Department of Geography; University of Geneva - Institute for Environmental Sciences); Julia Bingler (University of Oxford); Dominik Stammbach (ETH Zürich); Chiara Colesanti Senni (University of Zurich - Department of Finance); Jingwei Ni (ETH Zurich); Tobias Wekhof (ETH Zürich - CER-ETH - Center of Economic Research at ETH Zurich); Tingyu Yu (University of Zurich - Department Finance); Tobias Schimanski (University of Zurich); Glen Gostlow (University of Zurich - Department Finance); Jürg Luterbacher (World Health Organization (WHO) - World Health Organization, Geneva); Christian Huggel (University of Zurich) |
Abstract: | This paper presents Climinator, a novel AI-based tool designed to automate the fact-checking of climate change claims. Utilizing an array of Large Language Models (LLMs) informed by authoritative sources like the IPCC reports and peer-reviewed scientific literature, Climinator employs an innovative Mediator-Advocate framework. This design allows Climinator to effectively synthesize varying scientific perspectives, leading to robust, evidence-based evaluations. Our model demonstrates remarkable accuracy when testing claims collected from Climate Feedback and Skeptical Science. Notably, when integrating an advocate with a climate science denial perspective in our framework, Climinator's iterative debate process reliably converges towards scientific consensus, underscoring its adeptness at reconciling diverse viewpoints into science-based, factual conclusions. While our research is subject to certain limitations and necessitates careful interpretation, our approach holds significant potential. We hope to stimulate further research and encourage exploring its applicability in other contexts, including political fact-checking and legal domains. |
Date: | 2024–03 |
URL: | https://d.repec.org/n?u=RePEc:chf:rpseri:rp2493 |
By: | Federico Giorgi (DEF, University of Rome "Tor Vergata"); Stefano Herzel (DEF, University of Rome "Tor Vergata"); Paolo Pigato (DEF, University of Rome "Tor Vergata") |
Abstract: | We propose an algorithm, based on Reinforcement Learning, to hedge the payoff on a European call option. The algorithm is first tested in a model where the problem has a well known analytic solution, so that we can compare the strategy obtained by the algorithm to the theoretical optimal one. In a more realistic case, considering transaction costs, the algorithm outperforms the standard delta hedging strategy. |
Keywords: | Reinforcement Learning; Dynamic Strategies; Risk management |
Date: | 2024–12–17 |
URL: | https://d.repec.org/n?u=RePEc:rtv:ceisrp:586 |