|
on Big Data |
By: | Zihan Chen; Lei Nico Zheng; Cheng Lu; Jialu Yuan; Di Zhu |
Abstract: | ChatGPT has demonstrated remarkable capabilities across various natural language processing (NLP) tasks. However, its potential for inferring dynamic network structures from temporal textual data, specifically financial news, remains an unexplored frontier. In this research, we introduce a novel framework that leverages ChatGPT's graph inference capabilities to enhance Graph Neural Networks (GNN). Our framework adeptly extracts evolving network structures from textual data, and incorporates these networks into graph neural networks for subsequent predictive tasks. The experimental results from stock movement forecasting indicate our model has consistently outperformed the state-of-the-art Deep Learning-based benchmarks. Furthermore, the portfolios constructed based on our model's outputs demonstrate higher annualized cumulative returns, alongside reduced volatility and maximum drawdown. This superior performance highlights the potential of ChatGPT for text-based network inferences and underscores its promising implications for the financial sector. |
Date: | 2023–05 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2306.03763&r=big |
By: | Dylan Brewer (Georgia Institute of Technology); Alyssa Carlson (Department of Economics, University of Missouri) |
Abstract: | We study approaches for adjusting machine learning methods when the training sample differs from the prediction sample on unobserved dimensions. The machine learning literature predominately assumes selection only on observed dimensions. Common approaches are to weight or include variables that influence selection as solutions to selection on observables. Simulation results show that selection on unobservables increases mean squared prediction error using popular machine-learning algorithms. Common machine learning practices such as weighting or including variables that influence selection into the training or prediction sample often worsens sample selection bias. We propose two control-function approaches that remove the effects of selection bias before training and find that they reduce mean-squared prediction error in simulations. We apply these approaches to predicting the vote share of the incumbent in gubernatorial elections using previously observed re-election bids. We find that ignoring selection on unobservables leads to substantially higher predicted vote shares for the incumbent than when the control function approach is used. |
Keywords: | sample selection, machine learning, control function, inverse probability weighting |
JEL: | C13 C31 C55 D72 |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:umc:wpaper:2310&r=big |
By: | Kucklick (Paderborn University); Priefer (Paderborn University); Beverungen (Paderborn University); Müller (Paderborn University) |
Abstract: | Information systems have proven their value in facilitating pricing decisions. Still, predicting prices for complex goods remains challenging due to information asymmetries. Beyond Search qualities that sellers can identify ex-ante of a purchase, these goods possess Experience qualities only identifiable ex-post. While research has discussed how information asymmetries cause market failure, it remains unclear what benefits Search and Experience qualities offer for information systems that enable pricing on online markets. In a Machine Learning-based study, we quantify their predictive power for online real estate pricing. We use Geographic Information Systems and Computer Vision to incorporate spatial and image data into a Machine Learning algorithm for price prediction. We find that these secondary use data can transform Experience qualities to Search qualities, increasing the predictive power by up to 15.4%. Our results suggest that secondary use data can provide valuable resources for improving the predictive power of pricing complex goods. |
Keywords: | information asymmetries, real estate appraisal; SEC theory; Machine Learning; Geographic Information Systems, Computer Vision |
JEL: | C45 R32 R00 |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:pdn:dispap:112&r=big |
By: | Badruddoza, Syed; Fuad, Syed; Amin, Modhurima D. |
Keywords: | Research Methods/Statistical Methods, Food Consumption/Nutrition/Food Safety, Agricultural and Food Policy |
Date: | 2023 |
URL: | http://d.repec.org/n?u=RePEc:ags:aaea22:335782&r=big |
By: | Michael Cafarella; Gabriel Ehrlich; Tian Gao; John C. Haltiwanger; Matthew D. Shapiro; Laura Zhao |
Abstract: | This paper uses machine learning (ML) to estimate hedonic price indices at scale from item-level transaction and product characteristics. The procedure uses state-of-the-art approaches from hedonic econometrics and implements them with a neural network ML approach. Applying the methodology to Nielsen Retail Scanner data leads to a large hedonic adjustment to the Tornqvist index for food product groups: Cumulative food inflation over the period from 2007 through 2015 is reduced by half from 5.9% to 2.8% -- owing to quality adjustment. These results suggest that quality improvement via product turnover is important even in product groups that are not normally considered to feature rapid technological progress. The approach in the paper thus demonstrates the feasibility and importance of implementing hedonic adjustment at scale. |
JEL: | C81 E31 |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:31315&r=big |
By: | von Schenk, Alicia; Klockmann, Victor; Bonnefon, Jean-François; Rahwan, Iyad; Köbis, Nils |
Abstract: | People are not very good at detecting lies, which may explain why they refrain from accusing others of lying, given the social costs attached to false accusations — both for the accuser and the accused. Here we consider how this social balance might be disrupted by the availability of lie-detection algorithms powered by Artificial Intelligence (AI). Will people elect to use lie-detection AI that outperforms humans, and if so, will they show less restraint in their accusations? To find out, we built a machine learning classifier whose accuracy (66.86%) was significantly better than human accuracy (46.47%) lie-detection task. We conducted an incentivized lie-detection experiment (N = 2040) in which we measured participants’ propensity to use the algorithm, as well as the impact of that use on accusation rates and accuracy. Our results reveal that (a) requesting predictions from the lie-detection AI and especially (b) receiving AI predictions that accuse others of lying increase accusation rates. Due to the low uptake of the algorithm (31.76% requests), we do not see an improvement in accuracy when the AI prediction becomes available for purchase. |
Date: | 2023–06–21 |
URL: | http://d.repec.org/n?u=RePEc:tse:iastwp:128164&r=big |
By: | von Schenk, Alicia; Klockmann, Victor; Bonnefon, Jean-François; Rahwan, Iyad; Köbis, Nils |
Abstract: | People are not very good at detecting lies, which may explain why they refrain from accusing others of lying, given the social costs attached to false accusations — both for the accuser and the accused. Here we consider how this social balance might be disrupted by the availability of lie-detection algorithms powered by Artificial Intelligence (AI). Will people elect to use lie-detection AI that outperforms humans, and if so, will they show less restraint in their accusations? To find out, we built a machine learning classifier whose accuracy (66.86%) was significantly better than human accuracy (46.47%) lie-detection task. We conducted an incentivized lie-detection experiment (N = 2040) in which we measured participants’ propensity to use the algorithm, as well as the impact of that use on accusation rates and accuracy. Our results reveal that (a) requesting predictions from the lie-detection AI and especially (b) receiving AI predictions that accuse others of lying increase accusation rates. Due to the low uptake of the algorithm (31.76% requests), we do not see an improvement in accuracy when the AI prediction becomes available for purchase. |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:tse:wpaper:128163&r=big |
By: | Lubdhak Mondal; Udeshya Raj; Abinandhan S; Began Gowsik S; Sarwesh P; Abhijeet Chandra |
Abstract: | This study investigates the relationship between narratives conveyed through microblogging platforms, namely Twitter, and the value of crypto assets. Our study provides a unique technique to build narratives about cryptocurrency by combining topic modelling of short texts with sentiment analysis. First, we used an unsupervised machine learning algorithm to discover the latent topics within the massive and noisy textual data from Twitter, and then we revealed 4-5 cryptocurrency-related narratives, including financial investment, technological advancement related to crypto, financial and political regulations, crypto assets, and media coverage. In a number of situations, we noticed a strong link between our narratives and crypto prices. Our work connects the most recent innovation in economics, Narrative Economics, to a new area of study that combines topic modelling and sentiment analysis to relate consumer behaviour to narratives. |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2306.05803&r=big |
By: | Anderson, Patrese; Baylis, Kathy; Davenport, Frank; Shukla, Shraddhanand |
Keywords: | International Development, Marketing, Research Methods/Statistical Methods |
Date: | 2023 |
URL: | http://d.repec.org/n?u=RePEc:ags:aaea22:335809&r=big |
By: | Philippe Goulet Coulombe; Maximilian Goebel |
Abstract: | When it comes to stock returns, any form of predictability can bolster risk-adjusted profitability. We develop a collaborative machine learning algorithm that optimizes portfolio weights so that the resulting synthetic security is maximally predictable. Precisely, we introduce MACE, a multivariate extension of Alternating Conditional Expectations that achieves the aforementioned goal by wielding a Random Forest on one side of the equation, and a constrained Ridge Regression on the other. There are two key improvements with respect to Lo and MacKinlay's original maximally predictable portfolio approach. First, it accommodates for any (nonlinear) forecasting algorithm and predictor set. Second, it handles large portfolios. We conduct exercises at the daily and monthly frequency and report significant increases in predictability and profitability using very little conditioning information. Interestingly, predictability is found in bad as well as good times, and MACE successfully navigates the debacle of 2022. |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2306.05568&r=big |
By: | Hendriks, Patrick; Sturm, Timo; Olt, Christian M.; Buxmann, Peter |
Abstract: | To make sense of their increasingly digital and complex environments, organizations strive for a future in which machine learning (ML) systems join humans in collaborative learning partnerships to complement each other’s learning capabilities. While these so-called artificial assistants enable their human partners (and vice versa) to gain insights about unique knowledge domains that would otherwise remain hidden from them, they may also disrupt and impede each other's learning. To explore the virtuous and vicious dynamics that affect organizational learning, we conduct a series of agent-based simulations of different learning modes between humans and artificial assistants in an organization. We find that aligning the learning of humans and artificial assistants and allowing them to influence each other’s learning processes equally leads to the highest organizational performance. |
Date: | 2023–06–16 |
URL: | http://d.repec.org/n?u=RePEc:dar:wpaper:138376&r=big |
By: | Shadi Haj-Yahia; Omar Mansour; Tomer Toledo |
Abstract: | Discrete choice models (DCM) are widely employed in travel demand analysis as a powerful theoretical econometric framework for understanding and predicting choice behaviors. DCMs are formed as random utility models (RUM), with their key advantage of interpretability. However, a core requirement for the estimation of these models is a priori specification of the associated utility functions, making them sensitive to modelers' subjective beliefs. Recently, machine learning (ML) approaches have emerged as a promising avenue for learning unobserved non-linear relationships in DCMs. However, ML models are considered "black box" and may not correspond with expected relationships. This paper proposes a framework that expands the potential of data-driven approaches for DCM by supporting the development of interpretable models that incorporate domain knowledge and prior beliefs through constraints. The proposed framework includes pseudo data samples that represent required relationships and a loss function that measures their fulfillment, along with observed data, for model training. The developed framework aims to improve model interpretability by combining ML's specification flexibility with econometrics and interpretable behavioral analysis. A case study demonstrates the potential of this framework for discrete choice analysis. |
Date: | 2023–05 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2306.00016&r=big |
By: | Julián Alonso Cárdenas-Cárdenas; Deicy J. Cristiano-Botia; Nicolás Martínez-Cortés |
Abstract: | We use Long Short Term Memory (LSTM) neural networks, a deep learning technique, to forecast Colombian headline inflation one year ahead through two approaches. The first one uses only information from the target variable, while the second one incorporates additional information from some relevant variables. We employ sample rolling to the traditional neuronal network construction process, selecting the hyperparameters with criteria for minimizing the forecast error. Our results show a better forecasting capacity of the network with information from additional variables, surpassing both the other LSTM application and ARIMA models optimized for forecasting (with and without explanatory variables). This improvement in forecasting accuracy is most pronounced over longer time horizons, specifically from the seventh month onwards. **** RESUMEN: A través de dos enfoques utilizamos redes neuronales Long Short-Term Memory (LSTM), una técnica de aprendizaje profundo, para pronosticar la inflación en Colombia con un horizonte de doce meses. El primer enfoque emplea solo información de la variable objetivo, la inflación, mientras que el segundo incorpora información adicional proveniente de algunas variables relevantes. Utilizamos rolling sample dentro del proceso tradicional de construcción de las redes neuronales, seleccionando los hiperparámetros con criterios de minimización del error de pronóstico. Nuestros resultados muestran una mejor capacidad de pronóstico de la red bajo el segundo enfoque, superando al primer enfoque y a modelos ARIMA optimizados para pronóstico (con y sin variables explicativas). Esta mejora en la capacidad de pronóstico es más pronunciada en horizontes más largos, específicamente entre el séptimo y doceavo mes. |
Keywords: | Deep learning, Long Short Term Memory neural networks, forecast, inflation, Aprendizaje profundo, redes neuronales Long Short-Term Memory, pronóstico, inflación |
JEL: | C45 C51 C52 C53 C61 E37 |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:bdr:borrec:1241&r=big |
By: | Andr\'es Garc\'ia-Medina; Benito Rodrigu\'ez-Camejo |
Abstract: | This work aims to deal with the optimal allocation instability problem of Markowitz's modern portfolio theory in high dimensionality. We propose a combined strategy that considers covariance matrix estimators from Random Matrix Theory~(RMT) and the machine learning allocation methodology known as Nested Clustered Optimization~(NCO). The latter methodology is modified and reformulated in terms of the spectral clustering algorithm and Minimum Spanning Tree~(MST) to solve internal problems inherent to the original proposal. Markowitz's classical mean-variance allocation and the modified NCO machine learning approach are tested on financial instruments listed on the Mexican Stock Exchange~(BMV) in a moving window analysis from 2018 to 2022. The modified NCO algorithm achieves stable allocations by incorporating RMT covariance estimators. In particular, the allocation weights are positive, and their absolute value adds up to the total capital without considering explicit restrictions in the formulation. Our results suggest that can be avoided the risky \emph{short position} investment strategy by means of RMT inference and statistical learning techniques. |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2306.05667&r=big |
By: | Sebastian Galiani; Ramiro H. Gálvez; Ian Nachman |
Abstract: | This article presents a comprehensive analysis of trends in the publication and citation of economics scholarly research, with a focus on specialization within fields of economics research (i.e., applied, applied theory, econometrics methods, and theory). We collected detailed data on 24, 273 articles published from 1970 to 2016 in highly regarded general research economics journals. We then used state-of-the-art machine learning and natural language processing techniques to further enrich the collected data. Our findings reveal significant disparities in article content and citations across fields of economics research. The analysis indicates growing specialization trends in theory and econometric methods. In contrast, applied papers are covering a wider range of topics and receiving an increasing proportion of extramural citations over time. By 2016, applied ranked among the most or second most cited field by any other field of economics research. These patterns are consistent with applied papers becoming more multidisciplinary. Applied theory articles have also demonstrated a growing breadth of topics covered (similar to applied articles); however, this has not been accompanied by an increase in extramural citations or in the share of citations received from other fields of economics research (as observed with theory articles). This makes it challenging to determine their specialization status. |
JEL: | A1 |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:31295&r=big |
By: | Ilias Chronopoulos; Katerina Chrysikou; George Kapetanios; James Mitchell; Aristeidis Raftapostolos |
Abstract: | In this paper we study neural networks and their approximating power in panel data models. We provide asymptotic guarantees on deep feed-forward neural network estimation of the conditional mean, building on the work of Farrell et al. (2021), and explore latent patterns in the cross-section. We use the proposed estimators to forecast the progression of new COVID-19 cases across the G7 countries during the pandemic. We find significant forecasting gains over both linear panel and nonlinear time series models. Containment or lockdown policies, as instigated at the national-level by governments, are found to have out-of-sample predictive power for new COVID-19 cases. We illustrate how the use of partial derivatives can help open the "black-box" of neural networks and facilitate semi-structural analysis: school and workplace closures are found to have been effective policies at restricting the progression of the pandemic across the G7 countries. But our methods illustrate significant heterogeneity and time-variation in the effectiveness of specific containment policies. |
Date: | 2023–05 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2305.19921&r=big |
By: | Marica Valente; Timm Gries; Lorenzo Trapani |
Abstract: | We propose a new approach to detect and quantify informal employment resulting from irregular migration shocks. Focusing on a largely informal sector, agriculture, and on the exogenous variation from the Arab Spring wave on southern Italian coasts, we use machine-learning techniques to document abnormal increases in reported (vs. predicted) labor productivity on vineyards hit by the shock. Misreporting is largely heterogeneous across farms depending e.g. on size and grape quality. The shock resulted in a 6% increase in informal employment, equivalent to one undeclared worker for every three farms on average and 23, 000 workers in total over 2011-2012. Misreporting causes significant increases in farm profits through lower labor costs, while having no impact on grape sales, prices, or wages of formal workers. |
Keywords: | Informal employment, Migration shocks, Farm labor, Machine learning |
JEL: | F22 J61 J43 J46 C53 |
Date: | 2023–09 |
URL: | http://d.repec.org/n?u=RePEc:inn:wpaper:2023-09&r=big |
By: | David Hirshleifer; Dat Mai; Kuntara Pukthuanthong |
Abstract: | A war-related factor model derived from textual analysis of media news reports explains the cross section of expected asset returns. Using a semi-supervised topic model to extract discourse topics from 7, 000, 000 New York Times stories spanning 160 years, the war factor predicts the cross section of returns across test assets derived from both traditional and machine learning construction techniques, and spanning 138 anomalies. Our findings are consistent with assets that are good hedges for war risk receiving lower risk premia, or with assets that are more positively sensitive to war prospects being more overvalued. The return premium on the war factor is incremental to standard effects. |
JEL: | G0 G02 G1 G10 G11 G4 G41 |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:31348&r=big |
By: | Mehler, Maren F.; Vetter, Oliver A. |
Abstract: | Machine Learning (ML) technologies have become the foundation of a plethora of products and services. While the economic potential of such ML-infused solutions has become irrefutable, there is still uncertainty on pricing. Currently, software testing is one area to benefit from ML services assisting in the creation of test cases; a task both complex and demanding human-like outputs. Yet, little is known on the willingness to pay of users, inhibiting the suppliers' incentive to develop suitable tools. To provide insights into desired features and willingness to pay for such ML-based tools, we perform a choice-based conjoint analysis with 119 participants in Germany. Our results show that a high level of accuracy is particularly important for users, followed by ease of use and integration into existing environments. Thus, we not only guide future developers on which attributes to prioritize but also which characteristics of ML-based services are relevant for future research. |
Date: | 2023–06–14 |
URL: | http://d.repec.org/n?u=RePEc:dar:wpaper:138317&r=big |
By: | Pranjal Rawat |
Abstract: | This paper examines the impact of different payment rules on efficiency when algorithms learn to bid. We use a fully randomized experiment of 427 trials, where Q-learning bidders participate in up to 250, 000 auctions for a commonly valued item. The findings reveal that the first price auction, where winners pay the winning bid, is susceptible to coordinated bid suppression, with winning bids averaging roughly 20% below the true values. In contrast, the second price auction, where winners pay the second highest bid, aligns winning bids with actual values, reduces the volatility during learning and speeds up convergence. Regression analysis, incorporating design elements such as payment rules, number of participants, algorithmic factors including the discount and learning rate, asynchronous/synchronous updating, feedback, and exploration strategies, discovers the critical role of payment rules on efficiency. Furthermore, machine learning estimators find that payment rules matter even more with few bidders, high discount factors, asynchronous learning, and coarse bid spaces. This paper underscores the importance of auction design in algorithmic bidding. It suggests that computerized auctions like Google AdSense, which rely on the first price auction, can mitigate the risk of algorithmic collusion by adopting the second price auction. |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2306.09437&r=big |
By: | Saka, Orkun; Eichengreen, Barry; Aksoy, Cevat |
Abstract: | We ask whether epidemic exposure leads to a shift in financial technology usage and who participates in this shift. We exploit a dataset combining Gallup World Polls and Global Findex surveys for some 250, 000 individuals in 140 countries, merging them with information on the incidence of epidemics and local 3G internet infrastructure. Epidemic exposure is associated with an increase in remote-access (online/mobile) banking and substitution from bank branch-based to ATM activity. The temporary nature of the effects we identify is more consistent with a demand channel rather than that of supply with high initial fixed costs. Exploring heterogeneity using a machine-learning driven approach, we find that young, high-income earners in full-time employment have the greatest tendency to shift to online/mobile transactions in response to epidemics. Baseline effects are larger for individuals with better ex ante 3G signal coverage, highlighting the role of the digital divide in adaption to new technologies necessitated by adverse external shocks. |
Keywords: | epidemics; fintech; banking |
JEL: | G20 G00 I10 |
Date: | 2022–04–01 |
URL: | http://d.repec.org/n?u=RePEc:ehl:lserod:118871&r=big |
By: | Mihnea Constantinescu (National Bank of Ukraine; University of Amsterdam) |
Abstract: | Forecasting economic activity during an invasion is a nontrivial exercise. The lack of timely statistical data and the expected nonlinear effect of military action challenge the use of established nowcasting and short-term forecasting methodologies. This study explores the use of Partial Least Squares (PLS) augmented with an additional sparsity step to nowcast quarterly Ukrainian GDP using Google search data. Model outputs are benchmarked against both static and dynamic factor models. Preliminary results outline the usefulness of PLS in capturing the effects of large shocks in a setting rich in data, but poor in statistics. |
Keywords: | nowcasting, quarterly GDP, Google Trends, machine learning, partial, least squares, sparsity, Markov blanket |
JEL: | C38 C53 C55 E32 E37 |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:ukb:wpaper:01/2023&r=big |
By: | Tania Babina; Anastassia Fedyk; Alex X. He; James Hodson |
Abstract: | We study the shifts in U.S. firms' workforce composition and organization associated with the use of AI technologies. To do so, we leverage a unique combination of worker resume and job postings datasets to measure firm-level AI investments and workforce composition variables, such as educational attainment, specialization, and hierarchy. We document that firms with higher initial shares of highly-educated workers and STEM workers invest more in AI. As firms invest in AI, they tend to transition to more educated workforces, with higher shares of workers with undergraduate and graduate degrees, and more specialization in STEM fields and IT skills. Furthermore, AI investments are associated with a flattening of the firms' hierarchical structure, with significant increases in the share of workers at the junior level and decreases in shares of workers in middle-management and senior roles. Overall, our results highlight that adoption of AI technologies is associated with significant reorganization of firms' workforces. |
JEL: | D22 E22 J01 J23 J24 |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:31325&r=big |
By: | Grebe, Moritz; Kandemir, Sinem; Tillmann, Peter |
Abstract: | We assemble a data set of more than eight million German Twitter posts related to the war in Ukraine. Based on state-of-the-art methods of text analysis, we construct a daily index of uncertainty about the war as perceived by German Twitter. The approach also allows us to separate this index into uncertainty about sanctions against Russia, energy policy and other dimensions. We then estimate a VAR model with daily financial and macroeconomic data and identify an exogenous uncertainty shock. The increase in uncertainty has strong effects on financial markets and causes a significant decline in economic activity as well as an increase in expected inflation. We find the effects of uncertainty to be particularly strong in the first months of the war. |
Keywords: | war, Twitter, geopolitical risk, machine learning, business cycle |
JEL: | D8 E3 G1 |
Date: | 2023 |
URL: | http://d.repec.org/n?u=RePEc:zbw:imfswp:184&r=big |
By: | Hongyang Yang; Xiao-Yang Liu; Christina Dan Wang |
Abstract: | Large language models (LLMs) have shown the potential of revolutionizing natural language processing tasks in diverse domains, sparking great interest in finance. Accessing high-quality financial data is the first challenge for financial LLMs (FinLLMs). While proprietary models like BloombergGPT have taken advantage of their unique data accumulation, such privileged access calls for an open-source alternative to democratize Internet-scale financial data. In this paper, we present an open-source large language model, FinGPT, for the finance sector. Unlike proprietary models, FinGPT takes a data-centric approach, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. We highlight the importance of an automatic data curation pipeline and the lightweight low-rank adaptation technique in building FinGPT. Furthermore, we showcase several potential applications as stepping stones for users, such as robo-advising, algorithmic trading, and low-code development. Through collaborative efforts within the open-source AI4Finance community, FinGPT aims to stimulate innovation, democratize FinLLMs, and unlock new opportunities in open finance. Two associated code repos are \url{https://github.com/AI4Finance-Foundation/FinGPT} and \url{https://github.com/AI4Finance-Found ation/FinNLP} |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2306.06031&r=big |
By: | Alain Naef |
Abstract: | Most countries in the world use foreign exchange interventions, but measuring the success of the policy is difficult. By using a narrative approach, I identify interventions when the central bank manages to reverse the exchange rate based on pure luck. I separate them from interventions when the central bank actually impacted the exchange rate. Because intervention records are daily aggregates, an intervention might appear to have changed the direction of the exchange rate, when it is more likely to have been caused by market news. This analysis allows to have a better understanding of how successful central bank operations really are. I use new daily data on Bank of England interventions in the 1980s and 1990s. Some studies find that interventions work in up to 80% of cases. Yet, by accounting for intraday market moving news, I find in adverse conditions, the Bank of England managed to influence the exchange rate only in 8% of cases. I use natural language processing to confirm the validity of the narrative approach. Using Lasso and a VAR analysis, I investigate what makes the Bank of England intervene during that period. I find that only movement on the Deutschmark and not US dollar exchange rate made the Bank intervene. Also, I find that interest rate hikes were mostly a tool for currency management and accompanied by large reserve sales. |
Keywords: | Intervention, Foreign Exchange, Natural Language Processing, Central Bank, Bank of England |
JEL: | F31 E5 N14 N24 |
Date: | 2023 |
URL: | http://d.repec.org/n?u=RePEc:bfr:banfra:911&r=big |
By: | Marlène Koffi; Matt Marx |
Abstract: | We analyze more than 70 million scientific articles to characterize the gender dynamics of commercializing science. The double-digit gender gap we report is explained neither by the quality of the science nor its ex-ante commercial potential, and is widest among papers with female last authors (i.e., lab heads) when publishing high-quality science. Using Pitchbook database, we show that when authors self-commercialize scientific discoveries via new ventures, no gap appears, raising the question of whether incumbent firms are unaware of—or ignore—scientific contributions by women. A natural experiment based on the Obama administration’s staggered introduction of open-access requirements for federally-funded research reveals that although easier access to scientific articles might facilitate commercialization, this benefit accrues primarily to male authors. Articles written with more “boastful” language are commercialized more often, and female scientists generally boast less, but even when they do their discoveries are commercialized no more often. We also observe gender homophily between scientific authors and commercializing inventors, the majority of whom are male. We conclude with the potential welfare effects of the gender gap: the disparity is more pronounced for higher-quality discoveries, as indicated by academic and patent citations or by predicted probabilities of commercialization derived from deep-learning algorithms. |
JEL: | J16 O31 |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:31316&r=big |
By: | K. Peren Arin; Efstathios Polyzos; Marcel Thum |
Abstract: | Populist parties recently have shaken Western democracies, yet there is no consensus regarding the characteristics of populist voters. By using large-scale surveys from four European countries (France, Germany, Spain, and the U.K.), we investigate individual determinants of populist voting. Our methodological approach controls for model uncertainty by considering the responses to 100 questions that span social, economic, political, environmental, and psychological dimensions. We also include individual misperceptions across several domains. Our results show that left-wing populist voters are not religious, have lower misperceptions regarding foreign-national prisoners, distrust the police, are open to immigrants from poorer countries, and oppose dismantling the welfare state. The right-wing populist voters oppose incoming, racially diverse immigrants, distrust national and international institutions, and have high misperceptions regarding immigrant crimes and the share of social benefits in the GDP. Contrary to the previous literature, attitudes toward globalization, personality traits, labor-market status, and social media use are not consensus variables for either group. |
Keywords: | populism, random forest, Bayesian model averaging |
JEL: | C11 D72 P48 |
Date: | 2023 |
URL: | http://d.repec.org/n?u=RePEc:ces:ceswps:_10472&r=big |
By: | Zazueta, Jorge; Zazueta-Hernández, Jorge; Heredia, Andrea Chavez |
Abstract: | We provide an intuitive construction of a support vector machine (SVM) and explore the motivation behind using different tools for data classification. Beginning with linear classifiers, we build intuition on the subtlety of classification in increasingly non-linear circumstances and conclude with an example of bankruptcy prediction to illustrate the effectiveness and flexibility of support vector machines. |
Date: | 2023–06–23 |
URL: | http://d.repec.org/n?u=RePEc:osf:socarx:7z24k&r=big |
By: | De Marzo, Giordano,; Mathew, Nanditha,; Sbardella, Angelica, |
Abstract: | Our study investigates the heterogeneity of skill demands within occupations, the firm activities that are associated with demand for broader skill sets, and the firm characteristics that are related to particular skills and different combinations of skills. We use a unique matched database of firm- level data and online job vacancy data for a developing economy, namely, India. Employing a multi-level machine learning technique and an innovative skill taxonomy, we identify and categorize skill requirements of firms. Our empirical analysis provides robust evidence of significant heterogeneity in skill requirements across firms within the same occupations. Additionally, we show that firms demanding diverse skills differ from their counterparts. Firms that are competitive in international markets, as well as those that are more innovative, require digital skills and specific combinations of digital and other skills. Our findings highlight the crucial role played by firms in defining the changing nature of work. |
Keywords: | employment creation, skill, platform workers |
Date: | 2023 |
URL: | http://d.repec.org/n?u=RePEc:ilo:ilowps:995271691902676&r=big |
By: | Albanesi, Stefania (University of Pittsburgh); Dias da Silva, António (European Central Bank); Jimeno, Juan F. (Bank of Spain); Lamo, Ana (European Central Bank); Wabitsch, Alena (University of Oxford) |
Abstract: | We examine the link between labour market developments and new technologies such as artificial intelligence (AI) and software in 16 European countries over the period 2011- 2019. Using data for occupations at the 3-digit level in Europe, we find that on average employment shares have increased in occupations more exposed to AI. This is particularly the case for occupations with a relatively higher proportion of younger and skilled workers. This evidence is in line with the Skill Biased Technological Change theory. While there exists heterogeneity across countries, only very few countries show a decline in employment shares of occupations more exposed to AI-enabled automation. Country heterogeneity for this result seems to be linked to the pace of technology diffusion and education, but also to the level of product market regulation (competition) and employment protection laws. In contrast to the findings for employment, we find little evidence for a relationship between wages and potential exposures to new technologies. |
Keywords: | artificial intelligence, employment, skills, occupations |
JEL: | J23 O33 |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:iza:izadps:dp16227&r=big |
By: | Shumin Ma; Zhiri Yuan; Qi Wu; Yiyan Huang; Xixu Hu; Cheuk Hang Leung; Dongdong Wang; Zhixiang Huang |
Abstract: | Classical Domain Adaptation methods acquire transferability by regularizing the overall distributional discrepancies between features in the source domain (labeled) and features in the target domain (unlabeled). They often do not differentiate whether the domain differences come from the marginals or the dependence structures. In many business and financial applications, the labeling function usually has different sensitivities to the changes in the marginals versus changes in the dependence structures. Measuring the overall distributional differences will not be discriminative enough in acquiring transferability. Without the needed structural resolution, the learned transfer is less optimal. This paper proposes a new domain adaptation approach in which one can measure the differences in the internal dependence structure separately from those in the marginals. By optimizing the relative weights among them, the new regularization strategy greatly relaxes the rigidness of the existing approaches. It allows a learning machine to pay special attention to places where the differences matter the most. Experiments on three real-world datasets show that the improvements are quite notable and robust compared to various benchmark domain adaptation models. |
Date: | 2023–05 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2305.19499&r=big |
By: | Mukherjee, Krishnendu |
Abstract: | To the best of my knowledge, this problem has never been addressed by any researcher. This paper studies the effect of K-means, the Gaussian Mixture Model (GMM), and the integrated use of autoencoder and K-means on the computational time, MIP gap, feasible route, subtour, and the optimum use of vehicles. Miller-Tucker-Zemlin (MTZ) subtour elimination constraint is considered in this regard. This paper also gives the concept of a “layer”, which could be effective to solve a large vehicle routing problem with a time window (VRPTW) quickly. |
Keywords: | Machine Learning, Deep Learning, Mixed Integer Linear Program, and Large VRPTW |
JEL: | C6 C61 C63 |
Date: | 2023–06–03 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:117513&r=big |
By: | Alvaro Arroyo; Alvaro Cartea; Fernando Moreno-Pino; Stefan Zohren |
Abstract: | One of the key decisions in execution strategies is the choice between a passive (liquidity providing) or an aggressive (liquidity taking) order to execute a trade in a limit order book (LOB). Essential to this choice is the fill probability of a passive limit order placed in the LOB. This paper proposes a deep learning method to estimate the filltimes of limit orders posted in different levels of the LOB. We develop a novel model for survival analysis that maps time-varying features of the LOB to the distribution of filltimes of limit orders. Our method is based on a convolutional-Transformer encoder and a monotonic neural network decoder. We use proper scoring rules to compare our method with other approaches in survival analysis, and perform an interpretability analysis to understand the informativeness of features used to compute fill probabilities. Our method significantly outperforms those typically used in survival analysis literature. Finally, we carry out a statistical analysis of the fill probability of orders placed in the order book (e.g., within the bid-ask spread) for assets with different queue dynamics and trading activity. |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2306.05479&r=big |
By: | Lifan Zhao; Shuming Kong; Yanyan Shen |
Abstract: | Stock trend forecasting is a fundamental task of quantitative investment where precise predictions of price trends are indispensable. As an online service, stock data continuously arrive over time. It is practical and efficient to incrementally update the forecast model with the latest data which may reveal some new patterns recurring in the future stock market. However, incremental learning for stock trend forecasting still remains under-explored due to the challenge of distribution shifts (a.k.a. concept drifts). With the stock market dynamically evolving, the distribution of future data can slightly or significantly differ from incremental data, hindering the effectiveness of incremental updates. To address this challenge, we propose DoubleAdapt, an end-to-end framework with two adapters, which can effectively adapt the data and the model to mitigate the effects of distribution shifts. Our key insight is to automatically learn how to adapt stock data into a locally stationary distribution in favor of profitable updates. Complemented by data adaptation, we can confidently adapt the model parameters under mitigated distribution shifts. We cast each incremental learning task as a meta-learning task and automatically optimize the adapters for desirable data adaptation and parameter initialization. Experiments on real-world stock datasets demonstrate that DoubleAdapt achieves state-of-the-art predictive performance and shows considerable efficiency. |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2306.09862&r=big |
By: | Jiti Gao; Bin Peng; Yanrong Yang |
Abstract: | In this paper, we propose a localized neural network (LNN) model and then develop the LNN based estimation and inferential procedures for dependent data in both cases with quantitative/qualitative outcomes. We explore the use of identification restrictions from a nonparametric regression perspective, and establish an estimation theory for the LNN setting under a set of mild conditions. The asymptotic distributions are derived accordingly, and we show that LNN automatically eliminates the dependence of data when calculating the asymptotic variances. The finding is important, as one can easily use different types of wild bootstrap methods to obtain valid inference practically. In particular, for quantitative outcomes, the proposed LNN approach yields closed-form expressions for the estimates of some key estimators of interest. Last but not least, we examine our theoretical findings through extensive numerical studies. |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2306.05593&r=big |
By: | Barham, Tania (University of Colorado, Boulder); Cadena, Brian C. (University of Colorado, Boulder); Turner, Patrick S. (University of Notre Dame) |
Abstract: | This paper estimates experimental impacts of a supported work program on employment, earnings, benefit receipt, and other outcomes. Case managers addressed employment barriers and provided targeted financial assistance while participants were eligible for 30 weeks of subsidized employment. Program access increased employment rates by 21 percent and earnings by 30 percent while participants were receiving services. Though gains attenuated after services stopped, treatment group members experienced lasting improvements in employment stability, job quality, and well-being, and we estimate the program's marginal value of public funds to be 0.64. Post-program impacts are entirely concentrated among participants whose subsidized job was followed by unsubsidized employment with their host-site employer. This decomposition result suggests that encouraging employer learning about potential match quality is the key mechanism underlying the program's impact, and additional descriptive evidence supports this interpretation. Machine learning methods reveal little treatment effect heterogeneity in a broad sample of job seekers using a rich set of baseline characteristics from a detailed application survey. We conclude that subsidized employment programs with a focus on creating permanent job matches can be beneficial to a wide variety of unemployed workers in the low-wage labor market. |
Keywords: | subsidized employment, active labor market programs, randomized controlled trial |
JEL: | J24 J68 I38 H43 |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:iza:izadps:dp16221&r=big |
By: | Shaun D'Souza; Dheeraj Shah; Amareshwar Allati; Parikshit Soni |
Abstract: | Retail sales and price projections are typically based on time series forecasting. For some product categories, the accuracy of demand forecasts achieved is low, negatively impacting inventory, transport, and replenishment planning. This paper presents our findings based on a proactive pilot exercise to explore ways to help retailers to improve forecast accuracy for such product categories. We evaluated opportunities for algorithmic interventions to improve forecast accuracy based on a sample product category, Knitwear. The Knitwear product category has a current demand forecast accuracy from non-AI models in the range of 60%. We explored how to improve the forecast accuracy using a rack approach. To generate forecasts, our decision model dynamically selects the best algorithm from an algorithm rack based on performance for a given state and context. Outcomes from our AI/ML forecasting model built using advanced feature engineering show an increase in the accuracy of demand forecast for Knitwear product category by 20%, taking the overall accuracy to 80%. Because our rack comprises algorithms that cater to a range of customer data sets, the forecasting model can be easily tailored for specific customer contexts. |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2306.07305&r=big |
By: | Roland-Holst, David (Asian Development Bank Institute); Karymshakov, Kamalbek (Asian Development Bank Institute); Sulaimanova, Burulcha (Asian Development Bank Institute); Sultakeev, Kadyrbek (Asian Development Bank Institute) |
Abstract: | Infrastructure has always been a fundamental driver of long-term economic growth, but in recent decades information and communication technology (ICT) has supported and accelerated the growth of the global economy in ways beyond the imagining of our ancestors. We examine the role of ICT infrastructure in facilitating labor markets' access and remittance flows for workers from the Kyrgyz Republic. Using a combination of traditional high frequency macroeconomic data and real time internet search information from Google Trends, we take a novel approach to explaining the inflow of remittances to a developing country. In the first attempt to model remittance behavior with GTI data in this context, we use a gravity model. We also attempt to account for both origin and destination labor market conditions, using Kyrgyz language search words to identify both push and pull factors affecting migrant decisions. |
Keywords: | migration; remittances; infrastructure; internet |
JEL: | F22 F24 L86 O18 O33 |
Date: | 2022–12 |
URL: | http://d.repec.org/n?u=RePEc:ris:adbiwp:1348&r=big |
By: | Ruihua Ruan; Emmanuel Bacry; Jean-Fran\c{c}ois Muzy |
Abstract: | Due to the access to the labeled orders on the CAC40 data from Euronext, we are able to analyse agents' behaviours in the market based on their placed orders. In this study, we construct a self-supervised learning model using triplet loss to effectively learn the representation of agent market orders. By acquiring this learned representation, various downstream tasks become feasible. In this work, we utilise the K-means clustering algorithm on the learned representation vectors of agent orders to identify distinct behaviour types within each cluster. |
Date: | 2023–06 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2306.05987&r=big |
By: | Espasa, Antoni; Carlomagno Real, Guillermo |
Abstract: | The paper starts commenting on the hard tasks of data treatment -mainly, cleaning, classification, and aggregation- that are required at the beginning of any analysis with big data. Subsequently, it focuses on non-financial big data time series of high frequency that for many problems are aggregated at daily, hourly, or higher frequency levels of several minutes. Then, the paper discusses possible stylized facts present in these data. In this respect, it studies relevant seasonality: daily, weekly, monthly, and annually, and analyses how, for the data in question, these cycles could be affected by weather variables and by factors due to the annual composition of the calendar. Consequently, the paper investigates the possible main characteristics of the mentioned cycles and the types of responses to the exogenous weather and calendar factors that data could show. The shorter cycles could change along the annual cycle and interact with the exogenous variables. The modelling strategy could require regime-switching, dynamic, non-linear structures, and interactions between the factors considered. Then the paper analyses the construction of explanatory variables that could be useful for taking into account all the above peculiarities. We propose the use of the automated procedure, Autometrics, to discover -in words of Prof Hendry- a parsimonious model not dominated by any other, which is able to explain all the characteristics of the data. The model can be used for structural analysis, forecasting, and, when it is the case, to build real-time quantitative macroeconomic leading indicators. Finally, the paper includes an application to the daily series of jobless claims in Chile. |
Keywords: | Aggregation; Several Seasonality (Daily, Weekly, Monthly And Annual); Complex Annual Calendar Composition; Weather Variables; Interactive Effects; Switching Regimes; Multiplicative; Dynamic and Non-Linear Structures; Designing of Exogenous Variables; Autometrics; Macroeconomic Leading Indicators; Jobless Claims |
JEL: | C01 C22 C55 |
Date: | 2023–07–04 |
URL: | http://d.repec.org/n?u=RePEc:cte:wsrepe:37746&r=big |
By: | Lorenzo Bretscher (University of Lausanne; Swiss Finance Institute, and CEPR); Jesús Fernández-Villaverde (University of Pennsylvania; National Bureau of Economic Research (NBER)); Simon Scheidegger (University of Lausanne) |
Abstract: | This paper presents a dynamic stochastic general equilibrium model of Ricardian business cycles. Our model is Ricardian because countries (or, equivalently, regions) trade to take advantage of their comparative advantages. Their relative efficiencies, however, change over time stochastically. Similarly, country-specific shocks to demand, supply, and investment efficiency induce countries to engage in intra- and intertemporal substitutions in non-durable consumption, investment, services, and trade, generating business cycles. Finally, all agents have rational expectations about the stochastic components of the model. We solve the model globally using deep neural networks and calibrate it to the U.S., Europe, and China. Our quantitative results highlight the role of trading costs in shaping the responses of the economy to different shocks. |
Keywords: | International Trade, Business Cycles, General Equilibrium, Comparative Advantage, Deep Learning |
JEL: | C45 C63 F10 F40 |
Date: | 2023–01 |
URL: | http://d.repec.org/n?u=RePEc:chf:rpseri:rp2343&r=big |
By: | Abushama, Hala; Guo, Zhe; Siddig, Khalid; Kirui, Oliver K.; Abay, Kibrom A.; You, Liangzhi |
Keywords: | REPUBLIC OF THE SUDAN; EAST AFRICA; AFRICA SOUTH OF SAHARA; AFRICA; satellite observation; data; conflicts; economic activities; nitrogen dioxide; air quality; air pollution; Sudanese Armed Forces (SAF); Rapid Support Forces (RSF) |
Date: | 2023 |
URL: | http://d.repec.org/n?u=RePEc:fpr:ssspwp:7a&r=big |
By: | Katz, Lindsay; Chong, Michael; Alexander, Monica |
Abstract: | Patterns and trends in short-term mobility are important to understand, but data required to measure such movements are often not available from traditional sources. We collected daily data from Facebook’s Advertising Platform to measure short-term mobility across all states and provinces in the United States and Canada. We show that rates of short-term travel vary substantially over geographic area, but also by age and sex, with the highest rates of travel generally for males. Strong seasonal patterns are apparent in travel to many areas, with different regions experiencing either increased travel or decreased travel over winter, depending on climate. Further, some areas appear to show marked changes in mobility patterns since the onset of the pandemic. We used the traveler rates constructed from Facebook to adjust Covid-19 mortality rates over the period July 2020 to July 2021, and showed that accounting for travelers leads to on average a 3 per cent difference in implied mortality rates, with substantial variation across demographic groups and regions. |
Date: | 2023–06–16 |
URL: | http://d.repec.org/n?u=RePEc:osf:socarx:bev4p&r=big |
By: | Nicolas Kusz (UP1 EMS - Université Paris 1 Panthéon-Sorbonne - École de Management de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne, PRISM Sorbonne - Pôle de recherche interdisciplinaire en sciences du management - UP1 - Université Paris 1 Panthéon-Sorbonne); Jean-François Lemoine (ESSCA Research Lab - ESSCA - Ecole Supérieure des Sciences Commerciales d'Angers , PRISM Sorbonne - Pôle de recherche interdisciplinaire en sciences du management - UP1 - Université Paris 1 Panthéon-Sorbonne, UP1 EMS - Université Paris 1 Panthéon-Sorbonne - École de Management de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne) |
Abstract: | Voice-activated virtual assistants (Google, Alexa, etc.) are becoming more and more a part of people's daily lives. Although their usefulness and functionalities are increasing, the voice interface between man and machine has not been sufficiently taken into account by companies that have developed their voice application (i.e. Alexa Skills or Google Actions) yet: the synthetic voice applied to the technology. Our research aims to measure the impact of the assistant's voice on consumers reactions. Based on 15 interviews of users, the results reveal that the type of voice of the assistant influences trust in the assistant and social presence. However, in contrast to studies conducted on virtual agents and chatbots, our research indicates that realism influences trust; a synthetic voice that sounds exactly like a human degrades the perception of the voice assistant. |
Keywords: | voice assistant, trust, social presence, HMI |
Date: | 2023–05–23 |
URL: | http://d.repec.org/n?u=RePEc:hal:journl:hal-04108783&r=big |