|
on Big Data |
By: | Mallory Avery (Department of Economics, Monash Business School, Monash University); Andreas Leibbrandt (Department of Economics, Monash Business School, Monash University); Joseph Vecci (Gothenburg University, Vasagatan, Gothenburg, Sweden) |
Abstract: | The use of Artificial Intelligence (AI) in recruitment is rapidly increasing and drastically changing how people apply to jobs and how applications are reviewed. In this paper, we use two field experiments to study how AI in recruitment impacts gender diversity in the male-dominated technology sector, both overall and separately for labor supply and demand. We find that the use of AI in recruitment changes the gender distribution of potential hires, in some cases more than doubling the fraction of top applicants that are women. This change is generated by better outcomes for women in both supply and demand. On the supply side, we observe that the use of AI reduces the gender gap in application completion rates. Complementary survey evidence suggests that this is driven by female jobseekers believing that there is less bias in recruitment when assessed by AI instead of human evaluators. On the demand side, we find that providing evaluators with applicants’ AI scores closes the gender gap in assessments that otherwise disadvantage female applicants. Finally, we show that the AI tool would have to be substantially biased against women to result in a lower level of gender diversity than found without AI. |
Keywords: | Artificial Intelligence, Gender, Diversity, Field Experiment |
JEL: | C93 |
Date: | 2023–05 |
URL: | http://d.repec.org/n?u=RePEc:mos:moswps:2023-09&r=big |
By: | Cero, Ian (University of Rochester Medical Center); Luo, Jiebo; Falligant, John |
Abstract: | A complete science of human behavior requires a comprehensive account of the verbal behavior those humans exhibit. Existing behavioral theories of such verbal behavior have produced compelling insight into language’s underlying function, but the expansive program of research those theories deserve has unfortunately been slow to develop. We argue that the status quo’s manually implemented and study-specific coding systems are too resource intensive to be worthwhile for most behavior analysts. These high input costs in turn discourage research on verbal behavior overall. We propose lexicon-based sentiment analysis as a more modern and efficient approach to the study of human verbal products, especially naturally-occurring ones (e.g., psychotherapy transcripts, social media posts). In the present discussion, we introduce the reader to principles of sentiment analysis, highlighting its usefulness as a behavior analytic tool for the study of verbal behavior. We conclude with an outline of approaches for handling some of the more complex forms of speech, like negation, sarcasm, and speculation. The appendix also provides a worked example of how sentiment analysis could be applied to existing questions in behavior analysis, complete with code that readers can incorporate into their own work. |
Date: | 2023–05–01 |
URL: | http://d.repec.org/n?u=RePEc:osf:osfxxx:gw97k&r=big |
By: | Rian Dolphin; Barry Smyth; Ruihai Dong |
Abstract: | The financial domain has proven to be a fertile source of challenging machine learning problems across a variety of tasks including prediction, clustering, and classification. Researchers can access an abundance of time-series data and even modest performance improvements can be translated into significant additional value. In this work, we consider the use of case-based reasoning for an important task in this domain, by using historical stock returns time-series data for industry sector classification. We discuss why time-series data can present some significant representational challenges for conventional case-based reasoning approaches, and in response, we propose a novel representation based on stock returns embeddings, which can be readily calculated from raw stock returns data. We argue that this representation is well suited to case-based reasoning and evaluate our approach using a large-scale public dataset for the industry sector classification task, demonstrating substantial performance improvements over several baselines using more conventional representations. |
Date: | 2023–04 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2305.00245&r=big |
By: | Marcell T. Kurbucz; P\'eter P\'osfay; Antal Jakov\'ac |
Abstract: | The aim of this paper is to investigate the effect of a novel method called linear law-based feature space transformation (LLT) on the accuracy of intraday price movement prediction of cryptocurrencies. To do this, the 1-minute interval price data of Bitcoin, Ethereum, Binance Coin, and Ripple between 1 January 2019 and 22 October 2022 were collected from the Binance cryptocurrency exchange. Then, 14-hour nonoverlapping time windows were applied to sample the price data. The classification was based on the first 12 hours, and the two classes were determined based on whether the closing price rose or fell after the next 2 hours. These price data were first transformed with the LLT, then they were classified by traditional machine learning algorithms with 10-fold cross-validation. Based on the results, LLT greatly increased the accuracy for all cryptocurrencies, which emphasizes the potential of the LLT algorithm in predicting price movements. |
Date: | 2023–04 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2305.04884&r=big |
By: | Moltrecht, Bettina; Wood, Thomas Andrew (fastdatascience.com); Scopel Hoffmann, Mauricio (Universidade Federal de Santa Maria); McElroy, Eoin (University of Leicester) |
Abstract: | Motivation Retrospective questionnaire harmonisation allows researchers to pool and analyse information from multiple data sources, thereby increasing reproducibility in science. Currently, harmonisation of questionnaires relies on a multi-step process where items are manually matched based on expert opinion. Harmony, a new natural-language processing tool supports researchers with fast, reproducible and multilingual measurement harmonisation. Implementation Harmony is a web tool which can be used from any major web-browser by non-technical users. General features Users can upload questionnaire meta-data in text, PDF or excel format. Once uploaded, Harmony extracts the relevant information from the files and presents them to the user. Users choose which items they would like to match. Harmony will use the cosine similarity to find the closest matches between items and present the findings including a percentage match-score. Availability Harmony is open-source (//github.com/harmonydata/harmony) and freely available via //app.harmonydata.org/ under the MIT licence for non-commercial use. |
Date: | 2023–04–24 |
URL: | http://d.repec.org/n?u=RePEc:osf:osfxxx:9bmf3&r=big |
By: | Dangxing Chen; Weicheng Ye |
Abstract: | In this paper, we study the problem of establishing the accountability and fairness of transparent machine learning models through monotonicity. Although there have been numerous studies on individual monotonicity, pairwise monotonicity is often overlooked in the existing literature. This paper studies transparent neural networks in the presence of three types of monotonicity: individual monotonicity, weak pairwise monotonicity, and strong pairwise monotonicity. As a means of achieving monotonicity while maintaining transparency, we propose the monotonic groves of neural additive models. As a result of empirical examples, we demonstrate that monotonicity is often violated in practice and that monotonic groves of neural additive models are transparent, accountable, and fair. |
Date: | 2023–04 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2305.00799&r=big |
By: | Sri Rajitha Tattikota (Madras School of Economics, Chennai, India); Naveen Srinivasan ((Corresponding author) Professor, Madras School of Economics, Chennai, India) |
Abstract: | In this study we compare the in-sample-accuracy to evaluate the performance of Econometric models and Machine Learning models on the Time Series data. Enclosed to explore techniques which perform better for Time Series Classification to predict the state (High, Medium, or Low) of each quarter by studying macroeconomic variables in the United States: Inflation and Unemployment. In the direction of improving the models using machine learning techniques and investigating how they are incorporated in time series data to improve the efficiency of the predictions. We perform a comparative analysis of various models for this classification problem. In ML, Logistic regression, K-Nearest neighbors, Support vector machines, Gradient boosting and Random forest models were explored. In Econometrics, Autoregressive Moving Average and Autoregressive Conditional Heteroskedasticity models were explored. The results showed that Machine learning models are superior compared to the traditional Econometric models for time series data. The best model for Unemployment data was EGARCH in Econometrics and K- Nearest Neighbors to predict both 2 states and 3 states in ML. The best model for Inflation data was EGARCH in Econometrics and Linear SVM, Random forest to predict 2 states and 3 states respectively in ML. Even though the ML models lack the interpretability and clarity in the exact internal process, these models have resulted exceptional in terms of accuracy in predictions. Econometric modelling would be more suitable, if we focus to only understand the effect and interpret the casual effect of the data. |
Keywords: | Inflation, Unemployment, Econometric models, Machine Learning |
JEL: | C5 E24 E27 E31 E37 |
URL: | http://d.repec.org/n?u=RePEc:mad:wpaper:2021-207&r=big |
By: | Amanda M. Michaud |
Abstract: | This paper develops a quantitative framework to study the impact of Unemployment Insurance (UI) expansions to workers earning below eligibility thresholds. A model of how UI affects welfare and labor supply is developed and calibrated with microeconomic data, including consumption. The model predicts that the current ineligible would choose to stay on UI longer than the current eligible and the margins of why this is the case are quantified. The model is applied to the Great Recession by identifying ineligible workers in the data using machine learning and to an actual expansion during COVID-19 using administrative data. The UI duration for newly eligible under the expansion was 1.7 times longer than the previous eligible but is one-third shorter than the model's economic incentives predict. This suggests caution in extrapolating from the COVID-19 data and the model is used to predict impacts of smaller scale expansions during non-pandemic times. |
Keywords: | Labor supply; Business cycles; Unemployment insurance |
JEL: | J65 E32 E24 J20 |
Date: | 2023–03–27 |
URL: | http://d.repec.org/n?u=RePEc:fip:fedmoi:95879&r=big |
By: | Leland Bybee |
Abstract: | I introduce a survey of economic expectations formed by querying a large language model (LLM)'s expectations of various financial and macroeconomic variables based on a sample of news articles from the Wall Street Journal between 1984 and 2021. I find the resulting expectations closely match existing surveys including the Survey of Professional Forecasters (SPF), the American Association of Individual Investors, and the Duke CFO Survey. Importantly, I document that LLM based expectations match many of the deviations from full-information rational expectations exhibited in these existing survey series. The LLM's macroeconomic expectations exhibit under-reaction commonly found in consensus SPF forecasts. Additionally, its return expectations are extrapolative, disconnected from objective measures of expected returns, and negatively correlated with future realized returns. Finally, using a sample of articles outside of the LLM's training period I find that the correlation with existing survey measures persists -- indicating these results do not reflect memorization but generalization on the part of the LLM. My results provide evidence for the potential of LLMs to help us better understand human beliefs and navigate possible models of nonrational expectations. |
Date: | 2023–05 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2305.02823&r=big |
By: | Carlos Moreno Pérez (Banco de España); Marco Minozzo (University of Verona) |
Abstract: | We study and measure uncertainty in the minutes of the meetings of the board of governors of the Central Bank of Mexico and relate it to monetary policy variables. In particular, we construct two uncertainty indices for the Spanish version of the minutes using unsupervised machine learning techniques. The first uncertainty index is constructed exploiting Latent Dirichlet Allocation (LDA), whereas the second uses the Skip-Gram model and K-Means. We also create uncertainty indices for the three main sections of the minutes. We find that higher uncertainty in the minutes is related to an increase in inflation and money supply. Our results also show that a unit shock in uncertainty leads to changes of the same sign but different magnitude in the inter-bank interest rate and the target interest rate. We also find that a unit shock in uncertainty leads to a depreciation of the Mexican peso with respect to the US dollar in the same period of the shock, which is followed by appreciation in the subsequent period. |
Keywords: | Central Bank of Mexico, central bank communication, Latent Dirichlet Allocation, monetary policy uncertainty, Structural Vector Autoregressive model, Word Embedding |
JEL: | C32 C45 D83 E52 |
Date: | 2022–08 |
URL: | http://d.repec.org/n?u=RePEc:bde:wpaper:2229&r=big |
By: | Pierre Brugière (CEREMADE - CEntre de REcherches en MAthématiques de la DEcision - Université Paris Dauphine-PSL - PSL - Université Paris sciences et lettres - CNRS - Centre National de la Recherche Scientifique); Gabriel Turinici (CEREMADE - CEntre de REcherches en MAthématiques de la DEcision - Université Paris Dauphine-PSL - PSL - Université Paris sciences et lettres - CNRS - Centre National de la Recherche Scientifique) |
Abstract: | We present in this paper a method to compute, using generative neural networks, an estimator of the "Value at Risk" for a nancial asset. The method uses a Variational Auto Encoder with a 'energy' (a.k.a. Radon- Sobolev) kernel. The result behaves according to intuition and is in line with more classical methods. |
Date: | 2023–04–24 |
URL: | http://d.repec.org/n?u=RePEc:hal:journl:hal-03880381&r=big |
By: | Nadeem Malibari; Iyad Katib; Rashid Mehmood |
Abstract: | Applications of Reinforcement Learning in the Finance Technology (Fintech) have acquired a lot of admiration lately. Undoubtedly Reinforcement Learning, through its vast competence and proficiency, has aided remarkable results in the field of Fintech. The objective of this systematic survey is to perform an exploratory study on a correlation between reinforcement learning and Fintech to highlight the prediction accuracy, complexity, scalability, risks, profitability and performance. Major uses of reinforcement learning in finance or Fintech include portfolio optimization, credit risk reduction, investment capital management, profit maximization, effective recommendation systems, and better price setting strategies. Several studies have addressed the actual contribution of reinforcement learning to the performance of financial institutions. The latest studies included in this survey are publications from 2018 onward. The survey is conducted using PRISMA technique which focuses on the reporting of reviews and is based on a checklist and four-phase flow diagram. The conducted survey indicates that the performance of RL-based strategies in Fintech fields proves to perform considerably better than other state-of-the-art algorithms. The present work discusses the use of reinforcement learning algorithms in diverse decision-making challenges in Fintech and concludes that the organizations dealing with finance can benefit greatly from Robo-advising, smart order channelling, market making, hedging and options pricing, portfolio optimization, and optimal execution. |
Date: | 2023–04 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2305.07466&r=big |
By: | Patrick Bajari; Zhihao Cen; Victor Chernozhukov; Manoj Manukonda; Jin Wang; Ramon Huerta; Junbo Li; Ling Leng; George Monokroussos; Suhas Vijaykunar; Shan Wan |
Abstract: | Accurate, real-time measurements of price index changes using electronic records are essential for tracking inflation and productivity in today’s economic environment. We develop empirical hedonic models that can process large amounts of unstructured product data (text, images, prices, quantities) and output accurate hedonic price estimates and derived indices. To accomplish this, we generate abstract product attributes, or “features, ” from text descriptions and images using deep neural networks, and then use these attributes to estimate the hedonic price function. Specifically, we convert textual information about the product to numeric features using large language models based on transformers, trained or fine-tuned using product descriptions, and convert the product image to numeric features using a residual network model. To produce the estimated hedonic price function, we again use a multi-task neural network trained to predict a product’s price in all time periods simultaneously. To demonstrate the performance of this approach, we apply the models to Amazon’s data for first-party apparel sales and estimate hedonic prices. The resulting models have high predictive accuracy, with R2 ranging from 80% to 90%. Finally, we construct the AI-based hedonic Fisher price index, chained at the year-over-year frequency. We contrast the index with the CPI and other electronic indices. |
Date: | 2023–04–26 |
URL: | http://d.repec.org/n?u=RePEc:azt:cemmap:08/23&r=big |
By: | Clarissa Laura Maria Spiess Bru (Paderborn University) |
Abstract: | Particularly in the wine industry, information asymmetry between consumers and wine producers regarding product characteristics leads prospects to consider available information, such as market prices, professional reviews, and ratings, as reliable indicators for product quality and purchase decisions. Nevertheless, few studies reflect wine reviews' textual dimension and content. This study explores the impact of reviews and defined language inventory like articles, verbs, or adjectives and their effects on wine prices and ratings. Using 83, 067 reviews from the professional wine critics magazine "The Wine Enthusiast, " a seemingly unrelated regression (SUR) estimation, quantile regression (plots), and review text analysis utilizing the content analysis tool LIWC-22 was conducted to examine the simultaneous impact of linguistic categories on wine prices and ratings. The results indicate that the tasting note's increased word count and positive sentiment are significantly positively associated with a higher wine rating. Further, specific categories have a statistically significant positive impact on ratings but a negligible effect on wine prices. Consequently, a subsequent instrumental variables estimation is conducted to control for endogeneity and test for the effect of reviews on wine prices, revealing a significant positive influence. These findings could have practical strategic implications for wine market communication, marketing, and purchasing decisions, as linguistic indicators in reviews could be associated with wine quality by vintners and prospects. |
Keywords: | : Professional Reviews, Information Asymmetry, Text Analysis, Prices and Ratings, Quantile regression, Seemingly Unrelated Regression, Instrumental Variables Estimation |
JEL: | C31 L66 M30 O13 Q13 |
Date: | 2023–05 |
URL: | http://d.repec.org/n?u=RePEc:pdn:dispap:105&r=big |
By: | Simon Briole (CEE-M - Centre d'Economie de l'Environnement - Montpellier - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement - Institut Agro Montpellier - Institut Agro - Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement - UM - Université de Montpellier); Augustin Colette (INERIS - Institut National de l'Environnement Industriel et des Risques); Emmanuelle Lavaine (CEE-M - Centre d'Economie de l'Environnement - Montpellier - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement - Institut Agro Montpellier - Institut Agro - Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement - UM - Université de Montpellier) |
Abstract: | While a sharp decline in air pollution has been documented during early Covid-19 lockdown periods, the stability and homogeneity of this effect are still under debate. Building on pollution data with a very high level of resolution, this paper estimates the impact of lockdown policies on P M 2.5 exposure in France over the whole year 2020. Our analyses highlight a surprising and undocumented increase in exposure to particulate pollution during lockdown periods. This result is observed during both lockdown periods, in early spring and late fall, and is robust to several identification strategies and model specifications. Combining administrative datasets with machine learning techniques, this paper also highlights strong spatial heterogeneity in lockdown effects, especially according to long-term pollution exposure. |
Keywords: | air pollution, P M 2.5, lockdown, spatial heterogeneity, machine learning, Covid-19 |
Date: | 2023–04–28 |
URL: | http://d.repec.org/n?u=RePEc:hal:wpceem:hal-04084912&r=big |
By: | Andrea L. Eisfeldt; Gregor Schubert; Miao Ben Zhang |
Abstract: | What are the effects of recent advances in Generative AI on the value of firms? Our study offers a quantitative answer to this question for U.S. publicly traded companies based on the exposures of their workforce to Generative AI. Our novel firm-level measure of workforce exposure to Generative AI is validated by data from earnings calls, and has intuitive relationships with firm and industry-level characteristics. Using Artificial Minus Human portfolios that are long firms with higher exposures and short firms with lower exposures, we show that higher-exposure firms earned excess returns that are 0.4% higher on a daily basis than returns of firms with lower exposures following the release of ChatGPT. Although this release was generally received by investors as good news for more exposed firms, there is wide variation across and within industries, consistent with the substantive disruptive potential of Generative AI technologies. |
JEL: | E0 G0 |
Date: | 2023–05 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:31222&r=big |
By: | Carlos Moreno Pérez (Banco de España); Marco Minozzo (University of Verona) |
Abstract: | This paper investigates the reactions of US financial markets to press news from January 2019 to 1 May 2020. To this end, we deduce the content and sentiment of the news by developing apposite indices from the headlines and snippets of The New York Times, using unsupervised machine learning techniques. In particular, we use Latent Dirichlet Allocation to infer the content (topics) of the articles, and Word Embedding (implemented with the Skip-gram model) and K-Means to measure their sentiment (uncertainty). In this way, we arrive at the definition of a set of daily topic-specific uncertainty indices. These indices are then used to find explanations for the behaviour of the US financial markets by implementing a batch of EGARCH models. In substance, we find that two topic-specific uncertainty indices, one related to COVID-19 news and the other to trade war news, explain the bulk of the movements in the financial markets from the beginning of 2019 to end-April 2020. Moreover, we find that the topic-specific uncertainty index related to the economy and the Federal Reserve is positively related to the financial markets, meaning that our index is able to capture actions of the Federal Reserve during periods of uncertainty. |
Keywords: | COVID-19, EGARCH, Latent Dirichlet Allocation, investor attention, uncertainty indices, Word Embedding |
JEL: | D81 G15 C58 C45 |
Date: | 2022–08 |
URL: | http://d.repec.org/n?u=RePEc:bde:wpaper:2228&r=big |
By: | Jiaju Miao; Pawel Polak |
Abstract: | Asset-specific factors are commonly used to forecast financial returns and quantify asset-specific risk premia. Using various machine learning models, we demonstrate that the information contained in these factors leads to even larger economic gains in terms of forecasts of sector returns and the measurement of sector-specific risk premia. To capitalize on the strong predictive results of individual models for the performance of different sectors, we develop a novel online ensemble algorithm that learns to optimize predictive performance. The algorithm continuously adapts over time to determine the optimal combination of individual models by solely analyzing their most recent prediction performance. This makes it particularly suited for time series problems, rolling window backtesting procedures, and systems of potentially black-box models. We derive the optimal gain function, express the corresponding regret bounds in terms of the out-of-sample R-squared measure, and derive optimal learning rate for the algorithm. Empirically, the new ensemble outperforms both individual machine learning models and their simple averages in providing better measurements of sector risk premia. Moreover, it allows for performance attribution of different factors across various sectors, without conditioning on a specific model. Finally, by utilizing monthly predictions from our ensemble, we develop a sector rotation strategy that significantly outperforms the market. The strategy remains robust against various financial factors, periods of financial distress, and conservative transaction costs. Notably, the strategy's efficacy persists over time, exhibiting consistent improvement throughout an extended backtesting period and yielding substantial profits during the economic turbulence of the COVID-19 pandemic. |
Date: | 2023–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2304.09947&r=big |
By: | Dam, John; Rickon, Henry |
Abstract: | This literature review aims to elucidate the nuanced relationship between data openness and innovation within the field of Artificial Intelligence (AI). As the significance of AI continues to expand across various sectors, understanding the role of open data in fostering innovation becomes increasingly critical. Through this review, we systematically explore and analyze the wealth of existing literature on the topic. We address key concepts, theoretical perspectives, and empirical findings, shedding light on the multi-dimensional facets of data openness, including accessibility and usability, and their impact on AI innovation. Furthermore, the review highlights the practical implications and potential strategies to leverage data openness in propelling AI innovation. We also identify existing gaps and limitations in current literature, suggesting avenues for future research. This comprehensive review contributes to the evolving discourse in AI studies, offering valuable insights to researchers, data managers, and AI practitioners alike. |
Date: | 2023–05–15 |
URL: | http://d.repec.org/n?u=RePEc:osf:thesis:a3zwu&r=big |
By: | Gutierrez-Lythgoe, Antonio |
Abstract: | Research in the field of Artificial Intelligence has made considerable progress in recent years, demonstrating its effectiveness in predicting and classifying discrete decisions. However, these advances have been relatively underutilized in economic research due to the lack of links with economic theories that explain the decision-making process of agents. In this paper, we propose a microeconomic framework for decision trees, a machine learning technique, to establish a more solid connection with economic theory and encourage its application in the field of discrete choice. To do so, we rely on data from the 2019 EU-SILC for Spain. Through comparison with a conventional multinomial logit model, we demonstrate the usefulness of this economic perspective for studying the sociodemographic factors associated with self-employment in Spain. The results suggest that incorporating economic foundations can significantly improve the accuracy of predictions and the ability to draw individual sociodemographic profiles for self-employment. |
Keywords: | Artificial Intelligence; Machine Learning; Microeconomics; Self-employment; multinomial logit |
JEL: | C45 C53 J24 J62 L26 |
Date: | 2023–05 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:117275&r=big |