|
on Big Data |
By: | Bo Cowgill (Columbia University) |
Abstract: | Where should better learning technology (such as machine learning or AI) improve decisions? I develop a model of decision-making in which better learning technology is complementary with experimentation. Noisy, inconsistent decision-making introduces quasi-experimental variation into training datasets, which complements learning. The model makes heterogeneous predictions about when machine learning algorithms can improve human biases. These algorithms can remove human biases exhibited in historical training data, but only if the human training decisions are sufficiently noisy; otherwise, the algorithms will codify or exacerbate existing biases. Algorithms need only a small amount of noise to correct biases that cause large productivity distortions. As the amount of noise increases, the machine learning can correct both large and increasingly small productivity distortions. The theoretical conditions necessary to completely eliminate bias are extreme and unlikely to appear in real datasets. The model provides theoretical microfoundations for why learning from biased historical datasets may lead to a decrease (if not a full elimination) of bias, as has been documented in several empirical settings. The model makes heterogeneous predictions about the use of human expertise in machine learning. Expert-labeled training datasets may be suboptimal if experts are insufficiently noisy, as prior research suggests. I discuss implications for regulation, labor markets, and business strategy. |
Keywords: | machine learning, training data, decision algorithm, decision-making, human biases |
JEL: | C44 C45 D80 O31 O33 |
Date: | 2019–08 |
URL: | http://d.repec.org/n?u=RePEc:upj:weupjo:19-309&r=all |
By: | David Arnold; Will S. Dobbie; Peter Hull |
Abstract: | There is growing concern that the rise of algorithmic decision-making can lead to discrimination against legally protected groups, but measuring such algorithmic discrimination is often hampered by a fundamental selection challenge. We develop new quasi-experimental tools to overcome this challenge and measure algorithmic discrimination in the setting of pretrial bail decisions. We first show that the selection challenge reduces to the challenge of measuring four moments: the mean latent qualification of white and Black individuals and the race-specific covariance between qualification and the algorithm’s treatment recommendation. We then show how these four moments can be estimated by extrapolating quasi-experimental variation across as-good-as-randomly assigned decision-makers. Estimates from New York City show that a sophisticated machine learning algorithm discriminates against Black defendants, even though defendant race and ethnicity are not included in the training data. The algorithm recommends releasing white defendants before trial at an 8 percentage point (11 percent) higher rate than Black defendants with identical potential for pretrial misconduct, with this unwarranted disparity explaining 77 percent of the observed racial disparity in algorithmic recommendations. We find a similar level of algorithmic discrimination with regression-based recommendations, using a model inspired by a widely used pretrial risk assessment tool. |
JEL: | C26 J15 K42 |
Date: | 2020–12 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:28222&r=all |
By: | Andrew J Tiffin |
Abstract: | Machine learning tools are well known for their success in prediction. But prediction is not causation, and causal discovery is at the core of most questions concerning economic policy. Recently, however, the literature has focused more on issues of causality. This paper gently introduces some leading work in this area, using a concrete example—assessing the impact of a hypothetical banking crisis on a country’s growth. By enabling consideration of a rich set of potential nonlinearities, and by allowing individually-tailored policy assessments, machine learning can provide an invaluable complement to the skill set of economists within the Fund and beyond. |
Keywords: | Machine learning;Financial crises;Exchange rate flexibility;WP,machine-learning literature,instrumental-variables approach,treatment variable,confidence interval,ML technique |
Date: | 2019–11–01 |
URL: | http://d.repec.org/n?u=RePEc:imf:imfwpa:2019/228&r=all |
By: | Racine Ly; Fousseini Traore; Khadim Dia |
Abstract: | This paper applies a recurrent neural network (RNN) method to forecast cotton and oil prices. We show how these new tools from machine learning, particularly Long-Short Term Memory (LSTM) models, complement traditional methods. Our results show that machine learning methods fit reasonably well the data but do not outperform systematically classical methods such as Autoregressive Integrated Moving Average (ARIMA) models in terms of out of sample forecasts. However, averaging the forecasts from the two type of models provide better results compared to either method. Compared to the ARIMA and the LSTM, the Root Mean Squared Error (RMSE) of the average forecast was 0.21 and 21.49 percent lower respectively for cotton. For oil, the forecast averaging does not provide improvements in terms of RMSE. We suggest using a forecast averaging method and extending our analysis to a wide range of commodity prices. |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.03087&r=all |
By: | Alexandre Miot |
Abstract: | Adversarial samples have drawn a lot of attention from the Machine Learning community in the past few years. An adverse sample is an artificial data point coming from an imperceptible modification of a sample point aiming at misleading. Surprisingly, in financial research, little has been done in relation to this topic from a concrete trading point of view. We show that those adversarial samples can be implemented in a trading environment and have a negative impact on certain market participants. This could have far reaching implications for financial markets either from a trading or a regulatory point of view. |
Date: | 2020–12 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.03128&r=all |
By: | S. Van Cranenburgh; S. Wang; A. Vij; F. Pereira; J. Walker |
Abstract: | Since its inception, the choice modelling field has been dominated by theory-driven models. The recent emergence and growing popularity of machine learning models offer an alternative data-driven approach. Machine learning models, techniques and practices could help overcome problems and limitations of the current theory-driven modelling paradigm, e.g. relating to the ad-hocness in search for the optimal model specification, and theory-driven choice model's inability to work with text and image data. However, despite the potential value of machine learning to improve choice modelling practices, the choice modelling field has been somewhat hesitant to embrace machine learning. The aim of this paper is to facilitate (further) integration of machine learning in the choice modelling field. To achieve this objective, we make the case that (further) integration of machine learning in the choice modelling field is beneficial for the choice modelling field, and, we shed light on where the benefits of further integration can be found. Specifically, we take the following approach. First, we clarify the similarities and differences between the two modelling paradigms. Second, we provide a literature overview on the use of machine learning for choice modelling. Third, we reinforce the strengths of the current theory-driven modelling paradigm and compare this with the machine learning modelling paradigm, Fourth, we identify opportunities for embracing machine learning for choice modelling, while recognising the strengths of the current theory-driven paradigm. Finally, we put forward a vision on the future relationship between the theory-driven choice models and machine learning. |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.11948&r=all |
By: | Maximilien Germain (LPSM, EDF R&D); Huy\^en Pham (LPSM, FiME Lab); Xavier Warin (EDF R&D, FiME Lab) |
Abstract: | This paper presents machine learning techniques and deep reinforcement learningbased algorithms for the efficient resolution of nonlinear partial differential equations and dynamic optimization problems arising in investment decisions and derivative pricing in financial engineering. We survey recent results in the literature, present new developments, notably in the fully nonlinear case, and compare the different schemes illustrated by numerical tests on various financial applications. We conclude by highlighting some future research directions. |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.08068&r=all |
By: | Maximilien Germain (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistiques et Modélisations - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique - UP - Université de Paris, EDF R&D - EDF R&D - EDF - EDF); Huyên Pham (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistiques et Modélisations - UPD7 - Université Paris Diderot - Paris 7 - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique, FiME Lab - Laboratoire de Finance des Marchés d'Energie - EDF R&D - EDF R&D - EDF - EDF - CREST - Université Paris Dauphine-PSL - PSL - Université Paris sciences et lettres); Xavier Warin (EDF R&D - EDF R&D - EDF - EDF, FiME Lab - Laboratoire de Finance des Marchés d'Energie - EDF R&D - EDF R&D - EDF - EDF - CREST - Université Paris Dauphine-PSL - PSL - Université Paris sciences et lettres) |
Abstract: | This paper presents machine learning techniques and deep reinforcement learningbased algorithms for the efficient resolution of nonlinear partial differential equations and dynamic optimization problems arising in investment decisions and derivative pricing in financial engineering. We survey recent results in the literature, present new developments, notably in the fully nonlinear case, and compare the different schemes illustrated by numerical tests on various financial applications. We conclude by highlighting some future research directions. |
Date: | 2021–01–19 |
URL: | http://d.repec.org/n?u=RePEc:hal:wpaper:hal-03115503&r=all |
By: | Fajar, Muhammad |
Abstract: | International tourism is one indicator of measuring tourism development. Tourism development is important for the national economy since tourism could boost foreign exchange, create business opportunities, and provide employment opportunities. The prediction of foreign tourist numbers in the future obtained from forecasting is used as an input parameter for strategy and tourism programs planning. In this paper, the Hybrid Singular Spectrum Analysis – Extreme Learning Machine (SSA-ELM) is used to forecast the number of foreign tourists. Data used is the number of foreign tourists January 1980 - December 2017 taken from Badan Pusat Statistik (Statistics Indonesia). The result of this research concludes that Hybrid SSA-ELM performance is very good at forecasting the number of foreign tourists. It is shown by the MAPE value of 4.91 percent with eight observations out a sample. |
Keywords: | foreign tourist, singular spectrum analysis, extreme learning machine |
JEL: | C22 C45 C51 E17 |
Date: | 2019–10–31 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:105044&r=all |
By: | Mehran Taghian; Ahmad Asadi; Reza Safabakhsh |
Abstract: | A wide variety of deep reinforcement learning (DRL) models have recently been proposed to learn profitable investment strategies. The rules learned by these models outperform the previous strategies specially in high frequency trading environments. However, it is shown that the quality of the extracted features from a long-term sequence of raw prices of the instruments greatly affects the performance of the trading rules learned by these models. Employing a neural encoder-decoder structure to extract informative features from complex input time-series has proved very effective in other popular tasks like neural machine translation and video captioning in which the models face a similar problem. The encoder-decoder framework extracts highly informative features from a long sequence of prices along with learning how to generate outputs based on the extracted features. In this paper, a novel end-to-end model based on the neural encoder-decoder framework combined with DRL is proposed to learn single instrument trading strategies from a long sequence of raw prices of the instrument. The proposed model consists of an encoder which is a neural structure responsible for learning informative features from the input sequence, and a decoder which is a DRL model responsible for learning profitable strategies based on the features extracted by the encoder. The parameters of the encoder and the decoder structures are learned jointly, which enables the encoder to extract features fitted to the task of the decoder DRL. In addition, the effects of different structures for the encoder and various forms of the input sequences on the performance of the learned strategies are investigated. Experimental results showed that the proposed model outperforms other state-of-the-art models in highly dynamic environments. |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.03867&r=all |
By: | Lopez, Claude; Roh, Hyeongyul; Butler, Brittney |
Abstract: | This report aims to identify disease categories with the highest economic and social costs and a low level of R&D investment. First, we combine data sets on diseases’ medical expenses, patient counts, death rates, and research funding. We then use text mining and machine learning methods to identify gaps between diseases’ social and economic costs and research investments in therapeutic areas. We find that only 25 percent of disease categories causing high economic and social costs received more than 1 percent of National Institutes of Health (NIH) funding over 12 years. In addition, rare diseases imposing high medical costs per patient collected 0.3 percent of research investments on average. A disease’s cost and impact on society are challenging to assess. Our results highlight that the different measures may lead to different conclusions if considered separately: A disease can have a very high cost per patient but a low death rate. They also show that merging information across data sets becomes more complicated when the sources do not focus on diseases specifically. Our analysis reveals that a formalized procedure to define the correspondence between data sets is needed to successfully develop a metric that allows a systematic assessment of diseases’ cost, impact on society, and investment level. Furthermore, the simplification of the large dimensional decision space will only be useful to the questions at hand if there is a clear order of priorities. In our case, the first was the costs and then funding. These priorities dictate how to merge the data sets. |
Keywords: | Machine Learning, Disease Cost, NIH Funding, Health Innovation Gaps |
JEL: | C8 I1 I14 I18 |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:105215&r=all |
By: | Nicholas Economides (Professor of Economics, NYU Stern School of Business, New York, New York 10012); Ioannis Lianos (Professor of Global Competition Law and Public Policy, Faculty of Laws, University College London, and Hellenic Competition Commission) |
Abstract: | We discuss how the acquisition of private information by default without compensation by digital platforms such as Google and Facebook creates a market failure and can be grounds for antitrust enforcement. To avoid the market failure, the default in the collection of personal information has to be changed by law to “opt-out.” This would allow the creation of a vibrant market for the sale of users’ personal information to digital platforms. Assuming that all parties are perfectly informed, users are better off in this functioning market and digital platforms are worse off compared to the default opt-in. However, just switching to a default opt-in will not restore competition to the but for world because of the immense market power and bargaining power towards an individual user that digital platforms have acquired. Digital platforms can use this power to reduce the compensation that a user would receive for his/her personal information compared to a competitive world. Additionally, it is likely that the digital platforms are much better informed than the user in this market, and can use this information to disadvantage users in the market for personal information. |
Keywords: | personal information; Internet search; Google; Facebook; digital; privacy; restrictions of competition; exploitation; market failure; hold up; merger; abuse of a dominant position; unfair commercial practices; excessive data extraction; self-determination; behavioral manipulation; remedies; portability; opt-in; opt-out. |
JEL: | K21 L1 L12 L4 L41 L5 L86 L88 |
Date: | 2020–09 |
URL: | http://d.repec.org/n?u=RePEc:net:wpaper:2102&r=all |
By: | Arunav Das |
Abstract: | UK GDP data is published with a lag time of more than a month and it is often adjusted for prior periods. This paper contemplates breaking away from the historic GDP measure to a more dynamic method using Bank Account, Cheque and Credit Card payment transactions as possible predictors for faster and real time measure of GDP value. Historic timeseries data available from various public domain for various payment types, values, volume and nominal UK GDP was used for this analysis. Low Value Payments was selected for simple Ordinary Least Square Simple Linear Regression with mixed results around explanatory power of the model and reliability measured through residuals distribution and variance. Future research could potentially expand this work using datasets split by period of economic shocks to further test the OLS method or explore one of General Least Square method or an autoregression on GDP timeseries itself. |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.06478&r=all |
By: | Ujwal Kandi; Sasikanth Gujjula; Venkatesh Buddha; V S Bhagavan |
Abstract: | As more and more data being created every day, all of it can help take better decisions with data analysis. It is not different from data generated in financial markets. Here we examine the process of how the global economy is affected by the market sentiment influenced by the micro-blogging data (tweets) of American President Donald Trump. The news feed is gathered from The Guardian and Bloomberg from the period between December 2016 and October 2019, which are used to further identify the potential tweets that influenced the markets as measured by changes in equity indices. |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.03205&r=all |
By: | Sybrand Brekelmans; Georgios Petropoulos |
Abstract: | We study the nature and geography of occupational change in 24 European Union countries from 2002 to 2016. We evaluate how the composition of skills in the labour force depends on new technologies enabled by artificial intelligence and machine learning, and on institutional variables including educational attainment, labour legislation and product market regulations. We find that on average, EU countries have been through an upgrading of the skills of their... |
Date: | 2020–06 |
URL: | http://d.repec.org/n?u=RePEc:bre:wpaper:37146&r=all |
By: | Emre Alper; Michal Miktus |
Abstract: | Higher digital connectivity is expected to bring opportunities to leapfrog development in sub-Saharan Africa (SSA). Experience within the region demonstrates that if there is an adequate digital infrastructure and a supportive business environment, new forms of business spring up and create jobs for the educated as well as the less educated. The paper first confirms the global digital divide through the unsupervised machine learning clustering K-means algorithm. Next, it derives a composite digital connectivity index, in the spirit of De Muro-Mazziotta-Pareto, for about 190 economies. Descriptive analysis shows that majority of SSA countries lag in digital connectivity, specifically in infrastructure, internet usage, and knowledge. Finally, using fractional logit regressions we document that better business enabling and regulatory environment, financial access, and urbanization are associated with higher digital connectivity. |
Keywords: | Information technology in revenue administration;Infrastructure;Population and demographics;Machine learning;Income;WP,digital connectivity,EDAI SSA distribution,account ownership,ICT indicators database,SSA countries lag |
Date: | 2019–09–27 |
URL: | http://d.repec.org/n?u=RePEc:imf:imfwpa:2019/210&r=all |
By: | Tae Wan Kim; Matloob Khushi |
Abstract: | Portfolio optimization is one of the most attentive fields that have been researched with machine learning approaches. Many researchers attempted to solve this problem using deep reinforcement learning due to its efficient inherence that can handle the property of financial markets. However, most of them can hardly be applicable to real-world trading since they ignore or extremely simplify the realistic constraints of transaction costs. These constraints have a significantly negative impact on portfolio profitability. In our research, a conservative level of transaction fees and slippage are considered for the realistic experiment. To enhance the performance under those constraints, we propose a novel Deterministic Policy Gradient with 2D Relative-attentional Gated Transformer (DPGRGT) model. Applying learnable relative positional embeddings for the time and assets axes, the model better understands the peculiar structure of the financial data in the portfolio optimization domain. Also, gating layers and layer reordering are employed for stable convergence of Transformers in reinforcement learning. In our experiment using U.S. stock market data of 20 years, our model outperformed baseline models and demonstrated its effectiveness. |
Date: | 2020–12 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.03138&r=all |
By: | Geoffrey Parker; Georgios Petropoulos; Marshall Van Alstyne |
Abstract: | Platform ecosystems rely on economies of scale, data-driven economies of scope, high-quality algorithmic systems and strong network effects that typically promote winner-take-most markets. Some platform firms have grown rapidly and their merger and acquisition strategies have been very important factors in their growth. Market dominance by big platforms has led to competition concerns that are difficult to assess with current merger policy tools. In this paper, we examine the acquisition... |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:bre:wpaper:40796&r=all |
By: | Edvard Bakhitov; Amandeep Singh |
Abstract: | Recent advances in the literature have demonstrated that standard supervised learning algorithms are ill-suited for problems with endogenous explanatory variables. To correct for the endogeneity bias, many variants of nonparameteric instrumental variable regression methods have been developed. In this paper, we propose an alternative algorithm called boostIV that builds on the traditional gradient boosting algorithm and corrects for the endogeneity bias. The algorithm is very intuitive and resembles an iterative version of the standard 2SLS estimator. Moreover, our approach is data driven, meaning that the researcher does not have to make a stance on neither the form of the target function approximation nor the choice of instruments. We demonstrate that our estimator is consistent under mild conditions. We carry out extensive Monte Carlo simulations to demonstrate the finite sample performance of our algorithm compared to other recently developed methods. We show that boostIV is at worst on par with the existing methods and on average significantly outperforms them. |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.06078&r=all |
By: | Bertani, Filippo; Raberto, Marco; Teglio, Andrea; Cincotti, Silvano |
Abstract: | Digital technologies have been experiencing in the last thirty years a considerable development which has radically changed our economy and lives. In particular, the advent of new intangible technologies, represented by software, artificial intelligence and deep learning algorithms, has deeply affected our production systems from manufacturing to services, thanks also to further improvement of tangible computational assets. Investments in digital technologies have been increasing in most of developed countries, posing the issue of forecasting potential scenarios and consequences deriving form this new technological wave. The contribution of this paper is both theoretical and related to model design. First of all we present a new production function based on the concept of organizational units. Then, we enrich the macroeconomic model Eurace integrating this new function in the production processes in order to investigate the potential effects deriving from digital technologies innovation both at the micro and macro level. |
Keywords: | Elasticity of substitution, Elasticity augmenting approach, Digital transformation, Agent-based economics, Organizational unit. |
JEL: | C63 O33 |
Date: | 2021–01–15 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:105326&r=all |
By: | Simon Berset; Martin Huber; Mark Schelker |
Abstract: | We study the impact of fiscal revenue shocks on local fiscal policy. We focus on the very volatile revenues from the immovable property gains tax in the canton of Zurich, Switzerland, and analyze fiscal behavior following large and rare positive and negative revenue shocks. We apply causal machine learning strategies and implement the post-double-selection LASSO estimator to identify the causal effect of revenue shocks on public finances. We show that local policymakers overall predominantly smooth fiscal shocks. However, we also find some patterns consistent with fiscal conservatism, where positive shocks are smoothed, while negative ones are mitigated by spending cuts. |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.07661&r=all |
By: | Tiziana Carpi; Airo Hino; Stefano Maria Iacus; Giuseppe Porro |
Abstract: | This study analyzes the impact of the COVID-19 pandemic on the subjective well-being as measured through Twitter data indicators for Japan and Italy. It turns out that, overall, the subjective well-being dropped by 11.7% for Italy and 8.3% for Japan in the first nine months of 2020 compared to the last two months of 2019 and even more compared to the historical mean of the indexes. Through a data science approach we try to identify the possible causes of this drop down by considering several explanatory variables including, climate and air quality data, number of COVID-19 cases and deaths, Facebook Covid and flu symptoms global survey, Google Trends data and coronavirus-related searches, Google mobility data, policy intervention measures, economic variables and their Google Trends proxies, as well as health and stress proxy variables based on big data. We show that a simple static regression model is not able to capture the complexity of well-being and therefore we propose a dynamic elastic net approach to show how different group of factors may impact the well-being in different periods, even over a short time length, and showing further country-specific aspects. Finally, a structural equation modeling analysis tries to address the causal relationships among the COVID-19 factors and subjective well-being showing that, overall, prolonged mobility restrictions,flu and Covid-like symptoms, economic uncertainty, social distancing and news about the pandemic have negative effects on the subjective well-being. |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.07695&r=all |
By: | Fajar, Muhammad; Prasetyo, Octavia Rizky; Nonalisa, Septiarida; Wahyudi, Wahyudi |
Abstract: | The outbreak of COVID-19 is having a significant impact on the contraction of Indonesia`s economy, which is accompanied by an increase in unemployment. This study aims to predict the unemployment rate during the COVID-19 pandemic by making use of Google Trends data query share for the keyword “phk” (work termination) and former series from official labor force survey conducted by Badan Pusat Statistik (Statistics Indonesia). The method used is ARIMAX. The results of this study show that the ARIMAX model has good forecasting capabilities. This is indicated by the MAPE value of 13.46%. The forecast results show that during the COVID-19 pandemic period (March to June 2020) the open unemployment rate is expected to increase, with a range of 5.46% to 5.70%. The results of forecasting the open unemployment rate using ARIMAX during the COVID-19 period produce forecast values are consistent and close to reality, as an implication of using the Google Trends index query as an exogenous variable can capture the current conditions of a phenomenon that is happening. This implies that the time series model which is built based on the causal relationship between variables reflects current phenomenon if the required data is available and real-time, not only past historical data. |
Keywords: | Unemployment, Google Trends, PHK, ARIMAX |
JEL: | C22 C53 E24 E37 E39 J6 J64 |
Date: | 2020–11–30 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:105042&r=all |