|
on Big Data |
By: | Dylan Brewer (School of Economics, Georgia Institute of Technology); Alyssa Carlson (Department of Economics, University of Missouri) |
Abstract: | We study approaches for adjusting machine learning methods when the training sample differs from the prediction sample on unobserved dimensions. The machine learning literature predominately assumes selection only on observed dimensions. Common suggestions are to re-weight or control for variables that influence selection as solutions to selection on observables. Simulation results show that selection on unobservables increases mean squared prediction error using common machine-learning algorithms. Common machine learning practices such as re-weighting or controlling for variables that influence selection into the training or testing sample often worsens sample selection bias. We suggest two control-function approaches that remove the effects of selection bias before training and find that they reduce mean-squared prediction error in simulations with a high degree of selection. We apply these approaches to predicting the vote share of the incumbent in gubernatorial elections using previously observed re-election bids. We find that ignoring selection on unobservables leads to substantially higher predicted vote shares for the incumbent than when the control function approach is used. |
Keywords: | sample selection, machine learning, control function, inverse probability weighting |
JEL: | C13 C31 C55 D72 |
Date: | 2021–09 |
URL: | http://d.repec.org/n?u=RePEc:umc:wpaper:2114&r= |
By: | Alexander Jaax; Frédéric Gonzales; Annabelle Mourougane |
Abstract: | The increasing importance of services trade in the global economy contrasts with the lack of timely data to monitor recent developments. The nowcasting models developed in this paper are aimed at providing insights into current changes in total services trade, as recorded in monthly statistics of the G7 countries. Combining machine-learning techniques and dynamic factor models, the methodology exploits traditional data and Google Trends search data. No single model outperforms the others, but a weighted average of the best models combining machine-learning with dynamic factor models seems to be a promising avenue. The best models improve one-step ahead predictive performance relative to a simple benchmark by 30-35% on average across G7 countries and trade flows. Nowcasting models are estimated to have captured about 67% of the fall in services exports due to the COVID-19 shock and 60% of the fall in imports on average across G7 economies. |
Keywords: | Dynamic factor models, G7 economies, Machine learning |
JEL: | C4 C22 F17 |
Date: | 2021–09–23 |
URL: | http://d.repec.org/n?u=RePEc:oec:traaab:253-en&r= |
By: | Nathan Ratledge; Gabriel Cadamuro; Brandon De la Cuesta; Matthieu Stigler; Marshall Burke |
Abstract: | In many regions of the world, sparse data on key economic outcomes inhibits the development, targeting, and evaluation of public policy. We demonstrate how advancements in satellite imagery and machine learning can help ameliorate these data and inference challenges. In the context of an expansion of the electrical grid across Uganda, we show how a combination of satellite imagery and computer vision can be used to develop local-level livelihood measurements appropriate for inferring the causal impact of electricity access on livelihoods. We then show how ML-based inference techniques deliver more reliable estimates of the causal impact of electrification than traditional alternatives when applied to these data. We estimate that grid access improves village-level asset wealth in rural Uganda by 0.17 standard deviations, more than doubling the growth rate over our study period relative to untreated areas. Our results provide country-scale evidence on the impact of a key infrastructure investment, and provide a low-cost, generalizable approach to future policy evaluation in data sparse environments. |
JEL: | O11 O18 Q01 Q4 |
Date: | 2021–09 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:29237&r= |
By: | Jean-Charles Bricongne; Baptiste Meunier; Sylvain Pouget |
Abstract: | While official statistics provide lagged and aggregate information on the housing market, extensive information is available publicly on real-estate websites. By web scraping them for the UK on a daily basis, this paper extracts a large database from which we build timelier and highly granular indicators. One originality of the dataset is to provide the sellers’ perspective, allowing to compute innovative indicators of the housing market such as the number of new posted offers or how prices fluctuate over time for existing offers. Matching selling prices in our dataset with transacted prices from the notarial database using machine learning techniques allows us to measure the negotiation margin of buyers – an innovation to the literature. During the Covid-19 crisis, these indicators demonstrate the freezing of the market and the “wait-and-see” behaviour of sellers. They also show that prices have been increasing in rural regions after the lockdown but experienced a continued decline in London. |
Keywords: | Housing, Real-time, Big Data, Web Scraping, High Frequency, United Kingdom |
JEL: | E01 R30 |
Date: | 2021 |
URL: | http://d.repec.org/n?u=RePEc:bfr:banfra:827&r= |
By: | R\"udiger Frey; Verena K\"ock |
Abstract: | In recent years a large literature on deep learning based methods for the numerical solution partial differential equations has emerged; results for integro-differential equations on the other hand are scarce. In this paper we study deep neural network algorithms for solving linear and semilinear parabolic partial integro-differential equations with boundary conditions in high dimension. To show the viability of our approach we discuss several case studies from insurance and finance. |
Date: | 2021–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2109.11403&r= |
By: | Isaac K. Ofori (University of Insubria, Varese, Italy) |
Abstract: | A conspicuous lacuna in the literature on Sub-Saharan Africa (SSA) is the lack of clarity on variables key for driving and predicting inclusive growth. To address this, I train the machine learning algorithms for the Standard lasso, the Minimum Schwarz Bayesian Information Criterion (Minimum BIC) lasso, and the Adaptive lasso to study patterns in a dataset comprising 97 covariates of inclusive growth for 43 SSA countries. First, the regularization results show that only 13 variables are key for driving inclusive growth in SSA. Further, the results show that out of the 13, the poverty headcount (US$1.90) matters most. Second, the findings reveal that ‘Minimum BIC lasso’ is best for predicting inclusive growth in SSA. Policy recommendations are provided in line with the region’s green agenda and the coming into force of the African Continental Free Trade Area. |
Keywords: | Clean Fuel, Economic Growth, Machine Learning, Lasso, Sub-Saharan Africa, Regularization, Poverty. |
JEL: | C01 C14 C51 C52 C55 F43 O4 O55 |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:abh:wpaper:21/044&r= |
By: | Lin Li |
Abstract: | Financial trading aims to build profitable strategies to make wise investment decisions in the financial market. It has attracted interests in the machine learning community for a long time. This paper proposes to trade financial assets automatically using feature preprocessing skills and Recurrent Reinforcement Learning (RRL) algorithm. The strategy starts from technical indicators extracted from assets' market information. Then these technical indicators are preprocessed by Principal Component Analysis (PCA) and Discrete Wavelet Transform (DWT) and eventually inputted to the RRL algorithm to do the trading. The extensive empirical evidence shows that the proposed strategy is not only effective and robust in its performance, but also can mitigate the drawbacks underlying the initial trading using RRL. |
Date: | 2021–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2109.05283&r= |
By: | Jens Ludwig; Sendhil Mullainathan |
Abstract: | Algorithms (in some form) are already widely used in the criminal justice system. We draw lessons from this experience for what is to come for the rest of society as machine learning diffuses. We find economists and other social scientists have a key role to play in shaping the impact of algorithms, in part through improving the tools used to build them. |
JEL: | C01 C54 C55 D8 H0 K0 |
Date: | 2021–09 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:29267&r= |
By: | Jascha Buchhorn; Berthold U. Wigger |
Abstract: | Using neural networks, the present study replicates previous results on the prediction of student dropout obtained with decision trees and logistic regressions. For this purpose, multilayer perceptrons are trained on the same data as in the initial study. It is shown that neural networks lead to a significant improvement in the prediction of students at risk. Already after the first semester, potential dropouts can be identified with a probability of 95 percent. |
Keywords: | neural networks, student dropout, replication study |
Date: | 2021 |
URL: | http://d.repec.org/n?u=RePEc:ces:ceswps:_9300&r= |
By: | Pedro Salas-Rojo; Juan Gabriel Rodríguez |
Abstract: | This paper explores how the inheritances received influence the distribution of wealth (financial, non-financial and total) in four developed ?but substantially different? countries: the United States, Canada, Italy and Spain. Following the inequality of opportunity literature, we first group individuals into types based on the inheritances received. Then, we estimate the between-types wealth inequality to approximate the part of overall wealth inequality explained by inheritances. After showing that traditional approaches lead to non-robust and arbitrary results, we apply Machine Learning methods to overcome this limitation. Among the available computing methods, we observe that the random forests is the most precise algorithm. By using this technique, we find that inheritances explain more than 65% of wealth inequality (Gini coefficient) in the US and Spain, and more than 40% in Italy and Canada. Finally, for the US and Italy, given the availability of parental education, we also include this circumstance in the analysis and study its interaction with inheritances. It is observed that the effect of inheritances is more prominent at the middle of the wealth distribution, while parental education is more important for the asset-poor. |
Keywords: | Wealth inequality; inheritances; Machine Learning; inequality of opportunity; parental education |
JEL: | C60 D31 D63 G51 |
Date: | 2020–12 |
URL: | http://d.repec.org/n?u=RePEc:lis:lwswps:32&r= |
By: | Leogrande, Angelo; Costantiello, Alberto |
Abstract: | We estimate the relationships between innovation and human resources in Europe using the European Innovation Scoreboard of the European Commission for 36 countries for the period 2010-2019. We perform Panel Data with Fixed Effects, Random Effects, Pooled OLS, Dynamic Panel and WLS. We found that Human resources is positively associated to “Basic-school entrepreneurial education and training”, “Employment MHT manufacturing KIS services”, “Employment share Manufacturing (SD)”, “Lifelong learning”, “New doctorate graduates”, “R&D expenditure business sector”, “R&D expenditure public sector”, “Tertiary education”. Our results also show that “Human Resources” is negatively associated to “Government procurement of advanced technology products”, “Medium and high-tech product exports”, “SMEs innovating in-house”, “Venture capital”. In adjunct we perform a clusterization with k-Means algorithm and we find the presence of three clusters. Clusterization shows the presence of Central and Northern European countries that has higher levels of Human Resources, while Southern and Eastern Europe has very low degree of Human Resources. Finally, we use seven machine learning algorithms to predict the value of Human Resources in Europe Countries using data in the period 2014-2021 and we show that the linear regression algorithm performs at the highest level. |
Keywords: | Innovation and Invention: Processes and Incentives, Management of Technological Innovation and R&D, Technological Change: Choices and Consequences, Diffusion Processes Intellectual Property and Intellectual Capital, Open Innovation, Government Policy. |
JEL: | O30 O31 O32 O33 O34 O38 |
Date: | 2021–09 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:109749&r= |
By: | Lin William Cong; Ke Tang; Bing Wang; Jingyuan Wang |
Abstract: | We build a deep-learning-based SEIR-AIM model integrating the classical Susceptible-Exposed-Infectious-Removed epidemiology model with forecast modules of infection, community mobility, and unemployment. Through linking Google's multi-dimensional mobility index to economic activities, public health status, and mitigation policies, our AI-assisted model captures the populace's endogenous response to economic incentives and health risks. In addition to being an effective predictive tool, our analyses reveal that the long-term effective reproduction number of COVID-19 equilibrates around one before mass vaccination using data from the United States. We identify a "policy frontier" and identify reopening schools and workplaces to be the most effective. We also quantify protestors' employment-value-equivalence of the Black Lives Matter movement and find that its public health impact to be negligible. |
Date: | 2021–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2109.10009&r= |
By: | Solveig Flaig; Gero Junike |
Abstract: | In this research, we show how to expand existing approaches of generative adversarial networks (GANs) being used as economic scenario generators (ESG) to a whole internal model - with enough risk factors to model the full band-width of investments for an insurance company and for a one year horizon as required in Solvency 2. For validation of this approach as well as for optimisation of the GAN architecture, we develop new performance measures and provide a consistent, data-driven framework. Finally, we demonstrate that the results of a GAN-based ESG are similar to regulatory approved internal models in Europe. Therefore, GAN-based models can be seen as an assumption-free data-driven alternative way of market risk modelling. |
Date: | 2021–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2109.10072&r= |
By: | HOCUK Seyit; KUMAR Pradeep; MULDER Joris; PRUFER Patricia |
Abstract: | Economies of scale in data aggregation is a widely accepted concept. It refers to improved prediction accuracy when the number of observations on variables in a dataset increases. By contrast, economies of scope in data is more ambiguous. The classic economic interpretation refers to cost savings in the re-use of data for other purposes. Here, we introduce another interpretation of economies of scope, in data aggregation. It refers to improvements in prediction accuracy when the number of complementary variables in a dataset increases, not the number of observations on these variables. If economies of scope in data aggregation exist, the value of aggregated data pools of complementary variables is higher than the sum of values of the disaggregated datasets because more and better insights can be extracted from the aggregated dataset. Economies of scope in data aggregation is controversial in the economic research literature, also because there is so far little empirical evidence for their existence. The objective of this project is to fill that gap. For this purpose we create an aggregated data pool of health and health-related variables. We run machine learning models on this data pool to predict health outcomes. We gradually increase the number of independent variables in the model to estimate the magnitude of economies of scope in the aggregation of variables. Our findings confirm the existence of economies of scope in the aggregation of health and healthrelated variables in order to improve the prediction accuracy of health outcomes. The evidence is based on a nation-wide household survey and medical consumption data from the Netherlands. |
Keywords: | Economies of scope in data aggregation, data economics, health economics |
Date: | 2021–09 |
URL: | http://d.repec.org/n?u=RePEc:ipt:decwpa:202101&r= |
By: | Pihnastyi, Oleh; Sytnikova, Anastasiya |
Abstract: | In this paper, the results of the model for forecasting the flow parameters of a distributed transport system of the conveyor type are briefly considered. It is shown that the model of the transport system based on the neural network can be successfully applied to predict the flow parameters of the transport system which consists of a very large number of sections. |
Keywords: | conveyor; forecasting model; neural network |
JEL: | C02 C14 C25 C44 D24 L23 Q21 |
Date: | 2021–09–03 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:109770&r= |
By: | Knut Are Aastveit; Tuva Marie Fastbø; Eleonora Granziera; Kenneth Sæterhagen Paulsen; Kjersti Næss Torstensen |
Abstract: | We use a novel data set covering all domestic debit card transactions in physical terminals by Norwegian households, to nowcast quarterly Norwegian household consumption. These card payments data are free of sampling errors and are available weekly without delays, providing a valuable early indicator of household spending. To account for mixed-frequency data, we estimate various mixed-data sampling (MIDAS) regressions using predictors sampled at monthly and weekly frequency. We evaluate both point and density forecasting performance over the sample 2011Q4-2020Q1. Our results show that MIDAS regressions with debit card transactions data improve both point and density forecast accuracy over competitive standard benchmark models that use alternative high-frequency predictors. Finally, we illustrate the beneï¬ ts of using the card payments data by obtaining a timely and relatively accurate now cast of the ï¬ rst quarter of 2020, a quarter characterized by heightened uncertainty due to the COVID-19 pandemic. |
Keywords: | debit card transaction data, nowcasting, forecast evaluation, COVID-19 |
JEL: | C22 C52 C53 E27 |
Date: | 2020–11–08 |
URL: | http://d.repec.org/n?u=RePEc:bno:worpap:2020_17&r= |
By: | Daron Acemoglu |
Abstract: | This essay discusses several potential economic, political and social costs of the current path of AI technologies. I argue that if AI continues to be deployed along its current trajectory and remains unregulated, it may produce various social, economic and political harms. These include: damaging competition, consumer privacy and consumer choice; excessively automating work, fueling inequality, inefficiently pushing down wages, and failing to improve worker productivity; and damaging political discourse, democracy's most fundamental lifeblood. Although there is no conclusive evidence suggesting that these costs are imminent or substantial, it may be useful to understand them before they are fully realized and become harder or even impossible to reverse, precisely because of AI's promising and wide-reaching potential. I also suggest that these costs are not inherent to the nature of AI technologies, but are related to how they are being used and developed at the moment - to empower corporations and governments against workers and citizens. As a result, efforts to limit and reverse these costs may need to rely on regulation and policies to redirect AI research. Attempts to contain them just by promoting competition may be insufficient. |
JEL: | J23 J31 L13 L40 O33 P16 |
Date: | 2021–09 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:29247&r= |
By: | Blagov, Boris; Müller, Henrik; Jentsch, Carsten; Schmidt, Torsten |
Abstract: | Corporate investment in Germany has been relatively weak for a prolonged period after the financial crisis. This was remarkable given that interest rates and overall economic activity, important determinants of corporate investment, developed quite favourably during that time. These developments highlight the fact that the dynamics of business cycles varies over time: each cycle is somewhat different. A promising new line of research to identify the driving factors of business cycles is the use of narratives (Shiller 2017, 2020). Widely shared stories capture expectations and beliefs about the workings of the economy that may influence economic behavior, such as investment decisions. In this paper, we use Latent Dirichlet Allocation (LDA) to identify topics from news (text) data related to corporate investment in Germany and to construct suitable indicators. Furthermore, we focus on isolating those investment narratives that show the potential to lead to substantial improvement of the forecasting performance of econometric models. In our analysis, we demonstrate the benefit of using media-based indicators to improve econometric forecasts of business equipment investment. Newspaper data carries important information both on the future developments of investment (forecasting) as well as on current developments (nowcasting). Moreover, the identified investment narrative enables the researcher to improve her/his understanding of the investment process in general and allows to incorporate exogenous developments as well as economic sentiment, news and other relevant events to the analysis. |
Keywords: | narrative economics,mixed-frequency,nowcasting via media data |
JEL: | C53 C82 E32 |
Date: | 2021 |
URL: | http://d.repec.org/n?u=RePEc:zbw:rwirep:921&r= |
By: | Greyling, Talita; Rossouw, Stephanié |
Abstract: | COVID-19 severely impacted world health and, as a consequence of the measures implemented to stop the spread of the virus, also irreversibly damaged the world economy. Research shows that receiving the COVID-19 vaccine is the most successful measure to combat the virus and could also address its indirect consequences. However, vaccine hesitancy is growing worldwide and the WHO names this hesitancy as one of the top ten threats to global health. Therefore, in this study, our primary aim is to uncover the explanatory variables related to people's attitudes to the COVID-19 vaccine and investigate changes in these attitudes and emotions over time. We derive our corpus data from vaccine-related tweets, harvested in real-time from Twitter. Using Natural Language Processing, we derive the sentiment and emotions contained in the tweets to construct daily time-series data measuring the attitudes and emotions towards vaccines. Our analyses include other daily data to derive a cross-country panel dataset from 1 February 2021 to 1 August 2021. To determine the robustness of the relationships between several variables and the sentiment (attitude) towards vaccines, we run various models, including POLS, panel fixed effects and instrumental variables estimations. Our results show that more information related to the safety and side-effects of the vaccines is needed to improve the attitude towards vaccines. Additionally, governments should increase people's trust in institutions and disseminate more information about vaccines in general, for example, via social media. The results of this study on how the public perceives the COVID-19 vaccine and what influences their attitude are of the utmost importance to policymakers, health workers, and stakeholders who communicate to the public during infectious disease outbreaks. Additionally, the global fight against COVID-19 might be lost if the attitude towards vaccines is not improved. |
Keywords: | COVID-19,Vaccines,Big Data,Attitudes |
JEL: | C55 I18 I31 J18 |
Date: | 2021 |
URL: | http://d.repec.org/n?u=RePEc:zbw:glodps:939&r= |
By: | Federico Maria Ferrara (LSE - London School of Economics and Political Science); Jörg Haas (Hertie School of Governance [Berlin]); Andrew Peterson (Poitiers UFR LL - Université de Poitiers - UFR Lettres et langues - Université de Poitiers, TECHNÉ - EA 6316 - Technologies Numériques pour l'éducation - Université de Poitiers); Thomas Sattler (UNIGE - Université de Genève) |
Abstract: | The economic imbalances that characterize the world economy have unequally distributed costs and benefits. That raises the question how countries could run long-term external surpluses and deficits without significant opposition against the policies that generate them. We show that political discourse helps to secure public support for these policies and the resulting economic outcomes. First, a content analysis of 32,000 newspaper articles finds that the dominant interpretations of current account balances in Australia and Germany concur with very distinct perspectives: external surpluses are seen as evidence of competitiveness in Germany, while external deficits are interpreted as evidence of attractiveness for investments in Australia. Second, survey experiments in both countries suggest that exposure to these diverging interpretations has a causal effect on citizens' support for their country's economic strategy. Political discourse, thus, is crucial to provide the societal foundation of national growth strategies. |
Keywords: | survey experiments,text analysis,trade,capital flows,ideas,public opinion |
Date: | 2021 |
URL: | http://d.repec.org/n?u=RePEc:hal:journl:hal-02569351&r= |
By: | Stijn Claessens; Ricardo Correa; Juan M. Londono |
Abstract: | We investigate how central banks' governance frameworks influence their financial stability communication strategies and assess the effectiveness of these strategies in preventing a worsening of financial cycle conditions. We develop a simple conceptual framework of how central banks communicate about financial stability and how communication shapes the evolution of the financial cycle. We apply our framework using data on the governance characteristics of 24 central banks and the sentiment conveyed in their financial stability reports. We find robust evidence that communications by central banks participating in interagency financial stability committees more effectively mitigate a deterioration in financial conditions and advert a potential financial crisis. After observing a deterioration in conditions, such central banks also transmit a calmer message, suggesting that the ability to use policy tools other than communications strengthens incentives not to just "cry wolf". |
Keywords: | Financial Stability Governance; Natural Language Processing; Central Bank Communications; Financial Cycle |
JEL: | G15 G28 |
Date: | 2021–09–10 |
URL: | http://d.repec.org/n?u=RePEc:fip:fedgif:1328&r= |
By: | Jentsch, Carsten; Mammen, Enno; Müller, Henrik; Rieger, Jonas; Schötz, Christof |
Abstract: | Text mining is an active field of statistical research. In this paper we use two methods from text mining: the Poisson Reduced Rank Model (PRR, see Jentsch et al. 2020; Jentsch et al. 2021) and the Latent Dirichlet Allocation model (LDA, see Blei et al. 2003) for the statistical analysis of party manifesto texts from Germany. For the nine federal elections in Germany from 1990 to 2021, we analyze party manifestos that have been written by the parties to present their political positions and goals for the next legislative period of the German federal parliament (Bundestag). We use the models to quantify distances in the language of the manifestos and in the weight of importance the parties attribute to several political topics. The statistical analysis is purely data driven. No outside information, e.g., on the position of the parties, on the meaning of words, or on currently hot political topics, is used in fitting the statistical models. Outside information is only used when we interpret the statistical results. |
Keywords: | Poisson reduced-rank model,Latent Dirichlet Allocation,CDU,CSU,Union,SPD,Grüne,FDP,Linke,Kenia,Jamaica,Ampel,Deutschland,R2G,coalition |
Date: | 2021 |
URL: | http://d.repec.org/n?u=RePEc:zbw:docmaw:8&r= |