nep-big New Economics Papers
on Big Data
Issue of 2020‒11‒02
28 papers chosen by
Tom Coupé
University of Canterbury

  1. On the impact of publicly available news and information transfer to financial markets By Metod Jazbec; Barna P\'asztor; Felix Faltings; Nino Antulov-Fantulin; Petter N. Kolm
  2. News media vs. FRED-MD for macroeconomic forecasting By Jon Ellingsen; Vegard H. Larsen; Leif Anders Thorsrud
  3. Object Recognition for Economic Development from Daytime Satellite Imagery By Klaus Ackermann; Alexey Chernikov; Nandini Anantharama; Miethy Zaman; Paul A Raschky
  4. Binary Choice with Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Racial Justice By Andrii Babii; Xi Chen; Eric Ghysels; Rohit Kumar
  5. Data science in economics: comprehensive review of advanced machine learning and deep learning methods By Nosratabadi, Saeed; Mosavi, Amir; Duan, Puhong; Ghamisi, Pedram; Filip, Ferdinand; Band, Shahab S.; Reuter, Uwe; Gama, Joao; Gandomi, Amir H.
  6. Data science in economics: comprehensive review of advanced machine learning and deep learning methods By Nosratabadi, Saeed; Mosavi, Amir; Duan, Puhong; Ghamisi, Pedram; Filip, Ferdinand; Band, Shahab S.; Reuter, Uwe; Gama, Joao; Gandomi, Amir H.
  7. Data science in economics: comprehensive review of advanced machine learning and deep learning methods By Nosratabadi, Saeed; Mosavi, Amir; Duan, Puhong; Ghamisi, Pedram; Filip, Ferdinand; Band, Shahab S.; Reuter, Uwe; Gama, Joao; Gandomi, Amir H.
  8. Data science in economics: comprehensive review of advanced machine learning and deep learning methods By Nosratabadi, Saeed; Mosavi, Amir; Duan, Puhong; Ghamisi, Pedram; Filip, Ferdinand; Band, Shahab S.; Reuter, Uwe; Gama, Joao; Gandomi, Amir H.
  9. Data science in economics: comprehensive review of advanced machine learning and deep learning methods By Nosratabadi, Saeed; Mosavi, Amir; Duan, Puhong; Ghamisi, Pedram; Filip, Ferdinand; Band, Shahab S.; Reuter, Uwe; Gama, Joao; Gandomi, Amir H.
  10. Estimating Sleep and Work Hours from Alternative Data by Segmented Functional Classification Analysis, SFCA By Klaus Ackermann; Simon D Angus; Paul A Raschky
  11. Does Flagging POTUS’s Tweets Lead to Fewer or More Retweets? Preliminary Evidence from Machine Learning Models By Chipidza, Wallace; Yan, Jie
  12. Supporting Tool for The Transition of Existing Small and Medium Enterprises Towards Industry 4.0 By Miguel Baritto; Md Mashum Billal; S. M. Muntasir Nasim; Rumana Afroz Sultana; Mohammad Arani; Ahmed Jawad Qureshi
  13. Open data and data sharing: An economic analysis By Krotova, Alevtina; Mertens, Armin; Scheufen, Marc
  14. Reassessing the Resource Curse using Causal Machine Learning By Roland Hodler; Michael Lechner; Paul A. Raschky
  15. Assessing the Scoreboard of the EU Macroeconomic Imbalances Procedure: (Machine) Learning from Decisions By João Amador; Tiago Alves; Francisco Gonçalves
  16. Planes, Trains, and Automobiles: What Drives Human-Made Light? By Dickinson, Jeffrey
  17. Estimating Sleep & Work Hours from Alternative Data by Segmented Functional Classification Analysis (SFCA) By Klaus Ackermann; Simon D. Angus; Paul A. Raschky
  18. Position in inter-organizational networks and profitability and growth potential By Fumihiko Isada
  19. Robots and Worker Voice: An Empirical Exploration By Belloc, Filippo; Burdin, Gabriel; Landini, Fabio
  20. Hybrid Modelling Approaches for Forecasting Energy Spot Prices in EPEC market By Tahir Miriyev; Alessandro Contu; Kevin Schafers; Ion Gabriel Ion
  21. KrigHedge: GP Surrogates for Delta Hedging By Mike Ludkovski; Yuri Saporito
  22. COVID-19 and the Future of US Fertility: What Can We Learn from Google? By Wilde, Joshua; Chen, Wei; Lohmann, Sophie
  23. COVID-19 and the Future of US Fertility: What Can We Learn from Google? By Wilde, Joshua; Chen, Wei; Lohmann, Sophie
  24. An new algorithm for citation analysis By Gloria Gheno
  25. Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning By Xing Yan; Weizhong Zhang; Lin Ma; Wei Liu; Qi Wu
  26. DATA DETERMINANTS OF THE ACTIVITY OF SMES AUTOMOBILE DEALERS By David Salvetat; Jean-Sébastien Lacam
  27. Are there rebound effects from electric vehicle adoption? Evidence from German household data By Huwe, Vera; Gessner, Johannes
  28. German forecasters' narratives: How informative are German business cycle forecast reports? By Müller, Karsten

  1. By: Metod Jazbec; Barna P\'asztor; Felix Faltings; Nino Antulov-Fantulin; Petter N. Kolm
    Abstract: We quantify the propagation and absorption of large-scale publicly available news articles from the World Wide Web to financial markets. To extract publicly available information, we use the news archives from the Common Crawl, a nonprofit organization that crawls a large part of the web. We develop a processing pipeline to identify news articles associated with the constituent companies in the S\&P 500 index, an equity market index that measures the stock performance of U.S. companies. Using machine learning techniques, we extract sentiment scores from the Common Crawl News data and employ tools from information theory to quantify the information transfer from public news articles to the U.S. stock market. Furthermore, we analyze and quantify the economic significance of the news-based information with a simple sentiment-based portfolio trading strategy. Our findings provides support for that information in publicly available news on the World Wide Web has a statistically and economically significant impact on events in financial markets.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.12002&r=all
  2. By: Jon Ellingsen; Vegard H. Larsen; Leif Anders Thorsrud
    Abstract: Using a unique dataset of 22.5 million news articles from the Dow Jones Newswires Archive, we perform an in depth real-time out-of-sample forecasting comparison study with one of the most widely used data sets in the newer forecasting literature, namely the FRED-MD dataset. Focusing on U.S. GDP, consumption and investment growth, our results suggest that the news data contains information not captured by the hard economic indicators, and that the news-based data are particularly informative for forecasting consumption developments.
    Keywords: Forcasting, Real-time, Machine Learning, News, Text data
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:bny:wpaper:0091&r=all
  3. By: Klaus Ackermann (SoDa Laboratories, Monash University); Alexey Chernikov (SoDa Laboratories, Monash University); Nandini Anantharama (SoDa Laboratories, Monash University); Miethy Zaman (SoDa Laboratories, Monash University); Paul A Raschky (SoDa Laboratories, Monash University)
    Abstract: Reliable data about the stock of physical capital and infrastructure in developing countries is typically very scarce. This is particular a problem for data at the subnational level where existing data is often outdated, not consistently measured or coverage is incomplete. Traditional data collection methods are time and labor-intensive costly which often prohibits developing countries from collecting this type of data. This paper proposes a novel method to extract infrastructure features from high-resolution satellite images. We collected high-resolution satellite images for 5 million 1km x 1km grid cells covering 21 African countries. We contribute to the growing body of literature in this area by training our machine learning algorithm on ground-truth data. We show that our approach strongly improves the predictive accuracy. Our methodology can build the foundation to then predict subnational indicators of economic development for areas where this data is either missing or unreliable.
    Keywords: satellite data, machine learning, physical capital, economic development, africa
    JEL: C55 O18 R11
    Date: 2020–09
    URL: http://d.repec.org/n?u=RePEc:ajr:sodwps:2020-02&r=all
  4. By: Andrii Babii; Xi Chen; Eric Ghysels; Rohit Kumar
    Abstract: The importance of asymmetries in prediction problems arising in economics has been recognized for a long time. In this paper, we focus on binary choice problems in a data-rich environment with general loss functions. In contrast to the asymmetric regression problems, the binary choice with general loss functions and high-dimensional datasets is challenging and not well understood. Econometricians have studied binary choice problems for a long time, but the literature does not offer computationally attractive solutions in data-rich environments. In contrast, the machine learning literature has many computationally attractive algorithms that form the basis for much of the automated procedures that are implemented in practice, but it is focused on symmetric loss functions that are independent of individual characteristics. One of the main contributions of our paper is to show that the theoretically valid predictions of binary outcomes with arbitrary loss functions can be achieved via a very simple reweighting of the logistic regression, or other state-of-the-art machine learning techniques, such as boosting or (deep) neural networks. We apply our analysis to racial justice in pretrial detention.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.08463&r=all
  5. By: Nosratabadi, Saeed; Mosavi, Amir; Duan, Puhong; Ghamisi, Pedram; Filip, Ferdinand; Band, Shahab S.; Reuter, Uwe; Gama, Joao; Gandomi, Amir H.
    Abstract: This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.
    Date: 2020–10–16
    URL: http://d.repec.org/n?u=RePEc:osf:metaar:haf2v&r=all
  6. By: Nosratabadi, Saeed; Mosavi, Amir; Duan, Puhong; Ghamisi, Pedram; Filip, Ferdinand; Band, Shahab S.; Reuter, Uwe; Gama, Joao; Gandomi, Amir H.
    Abstract: This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.
    Date: 2020–10–15
    URL: http://d.repec.org/n?u=RePEc:osf:lawarx:kczj5&r=all
  7. By: Nosratabadi, Saeed; Mosavi, Amir; Duan, Puhong; Ghamisi, Pedram; Filip, Ferdinand; Band, Shahab S.; Reuter, Uwe; Gama, Joao; Gandomi, Amir H.
    Abstract: This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.
    Date: 2020–10–16
    URL: http://d.repec.org/n?u=RePEc:osf:edarxi:5dwrt&r=all
  8. By: Nosratabadi, Saeed; Mosavi, Amir; Duan, Puhong; Ghamisi, Pedram; Filip, Ferdinand; Band, Shahab S.; Reuter, Uwe; Gama, Joao; Gandomi, Amir H.
    Abstract: This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.
    Date: 2020–10–16
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:9vdwf&r=all
  9. By: Nosratabadi, Saeed; Mosavi, Amir; Duan, Puhong; Ghamisi, Pedram; Filip, Ferdinand; Band, Shahab S.; Reuter, Uwe; Gama, Joao; Gandomi, Amir H.
    Abstract: This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.
    Date: 2020–10–16
    URL: http://d.repec.org/n?u=RePEc:osf:thesis:auyvc&r=all
  10. By: Klaus Ackermann (SoDa Laboratories, Monash University); Simon D Angus (SoDa Laboratories, Monash University); Paul A Raschky (SoDa Laboratories, Monash University)
    Abstract: Alternative data is increasingly adapted to predict human and economic behaviour. This paper introduces a new type of alternative data by re-conceptualising the internet as a data-driven insights platform at global scale. Using data from a unique internet activity and location dataset drawn from over 1.5 trillion observations of end-user internet connections, we construct a functional dataset covering over 1,600 cities during a 7 year period with temporal resolution of just 15min. To predict ac- curate temporal patterns of sleep and work activity from this data-set, we develop a new technique, Segmented Functional Classification Analysis (SFCA), and compare its performance to a wide array of linear, functional, and classification methods. To confirm the wider applicability of SFCA, in a second application we predict sleep and work activity using SFCA from US city-wide electricity demand functional data. Across both problems, SFCA is shown to out-perform current methods.
    Keywords: functional data analysis, time use, electricity demand, big data, alternative data
    JEL: C38 C53 C55 J22
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:ajr:sodwps:2020-04&r=all
  11. By: Chipidza, Wallace; Yan, Jie
    Abstract: There is vigorous debate as to whether influential social media platforms like Twitter and Facebook should censor objectionable posts by government officials in the United States and elsewhere. Although these platforms have resisted pressure to censor such posts in the past, Twitter recently flagged five posts by the United States President Donald J. Trump on the rationale that the tweets contained inaccurate or inflammatory content. In this paper, we examine preliminary evidence as to whether these posts were retweeted less or more than expected. We employ 10 machine learning (ML) algorithms to estimate the expected number of retweets based on 8 features of each tweet from historical data since President Trump was elected: number of likes, word count, readability, polarity, subjectivity, presence of link or multimedia content, time of day of posting, and number of days since Trump’s election. Our results indicate agreement from all 10 ML algorithms that the three flagged tweets for which we had retweet data were retweeted at higher rates than expected. These results suggest that flagging tweets by government officials might be counterproductive towards the spread of content deemed objectionable by social media platforms.
    Date: 2020–07–04
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:69hkb&r=all
  12. By: Miguel Baritto; Md Mashum Billal; S. M. Muntasir Nasim; Rumana Afroz Sultana; Mohammad Arani; Ahmed Jawad Qureshi
    Abstract: The rapid growth of Industry 4.0 technologies such as big data, cloud computing, smart sensors, machine learning (ML), radio-frequency identification (RFID), robotics, 3D-printing, and Internet of Things (IoT) offers Small and Medium Enterprises (SMEs) the chance to improve productivity and efficiency, reduce cost and provide better customer experience, among other benefits. The main purpose of this work is to propose a methodology to support SMEs managers in better understanding the specific requirements for the implementation of Industry 4.0 solutions and the derived benefits within their firms. A proposed methodology will be helpful for SMEs manager to take a decision regarding when and how to migrate to Industry 4.0.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.12038&r=all
  13. By: Krotova, Alevtina; Mertens, Armin; Scheufen, Marc
    Abstract: Data is an important business resource. It forms the basis for various digital technologies such as artificial intelligence or smart services. However, access to data is unequally distributed in the market. Hence, some business ideas fail due to a lack of data sources. Although many governments have recognised the importance of open data and already make administrative data available to the public on a large scale, many companies are still reluctant to share their data among other firms and competitors. As a result, the economic potential of data is far from being fully exploited. Against this background, we analyse current developments in the area of open data. We compare the characteristics of open governmental and open company data in order to define the necessary framework conditions for data sharing. Subsequently, we examine the status quo of data sharing among firms. We use a qualitative analysis of survey data of European companies to derive the sufficient conditions to strengthen data sharing. Our analysis shows that government data is a public good, while company data can be seen as a club or private good. Latter frequently build the core for companies' business models and hence are less suitable for data sharing. Finally, we find that promoting legal certainty and the economic impact present important policy steps for fostering data sharing.
    JEL: L21 L86 M21 O32
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:zbw:iwkpps:212020&r=all
  14. By: Roland Hodler (SoDa Laboratories, Monash University); Michael Lechner (SoDa Laboratories, Monash University); Paul A. Raschky (SoDa Laboratories, Monash University)
    Abstract: We reassess the effects of natural resources on economic development and conflict, applying a causal forest estimator and data from 3,800 Sub-Saharan African districts. We find that, on average, mining activities and higher world market prices of locally mined minerals both increase economic development and conflict. Consistent with the previous literature, mining activities have more positive effects on economic development and weaker effects on conflict in places with low ethnic diversity and high institutional quality. In contrast, the effects of changes in mineral prices vary little in ethnic diversity and institutional quality, but are non-linear and largest at relatively high prices.
    Keywords: resource curse, economic development, conflict, causal machine learning, Africa
    JEL: C21 O13 O55 Q34 R12
    Date: 2020–09
    URL: http://d.repec.org/n?u=RePEc:ajr:sodwps:2020-01&r=all
  15. By: João Amador; Tiago Alves; Francisco Gonçalves
    Abstract: This paper uses machine learning methods to identify the macroeconomic variables that are most relevant for the classification of countries along the categories of the EU Macroeconomic Imbalances Procedure (MIP). The random forest algorithm considers the 14 headline indicators of the MIP scoreboard and the set of past decisions taken by the European Commission when classifying countries along the macroeconomic imbalances categories. The algorithm identifies the current account balance, the net international investment position and the unemployment rate as key variables, mostly to classify countries that need corrective action, notably through economic adjustment programmes.
    JEL: C40 F15
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:ptu:wpaper:w202016&r=all
  16. By: Dickinson, Jeffrey
    Abstract: This paper expands on our understanding of the lights-income relationship by linking the newest generation of nighttime satellite images derived from the Visible Infrared Imaging Radiometry Suite, VIIRS, to nationwide, panel data on population and income from 2012-2018 for both Brazil and the United States including 3,104 US counties, and 5,570 munic\'ipios. I leverage the quality and frequency of those data sources and the VIIRS lights images to decompose the links between population changes, GDP changes, and nighttime lights changes at the county and munic\'ipio level. I find decreasing marginal effects of GDP on nighttime light as well as decreasing marginal effects of population on nighttime light, a result which holds across many specifications and that is robust to sub-sample analysis and placebo tests. Interactions among controls also appear to be present. Using sub-sample analysis, I also find that nighttime light does a poor job of capturing less-wealthy areas. Finally, I use a between-county estimator to identify the effects of time-invariant infrastructure features on night-time light. Roads, rail, ports, airports, and border crossings I find to be strong contributors to increases in light.
    Keywords: night-time light, GDP, population, infrastructure, regional development
    JEL: C82 O51 R10 R11 R12
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:103504&r=all
  17. By: Klaus Ackermann; Simon D. Angus; Paul A. Raschky
    Abstract: Alternative data is increasingly adapted to predict human and economic behaviour. This paper introduces a new type of alternative data by re-conceptualising the internet as a data-driven insights platform at global scale. Using data from a unique internet activity and location dataset drawn from over 1.5 trillion observations of end-user internet connections, we construct a functional dataset covering over 1,600 cities during a 7 year period with temporal resolution of just 15min. To predict accurate temporal patterns of sleep and work activity from this data-set, we develop a new technique, Segmented Functional Classification Analysis (SFCA), and compare its performance to a wide array of linear, functional, and classification methods. To confirm the wider applicability of SFCA, in a second application we predict sleep and work activity using SFCA from US city-wide electricity demand functional data. Across both problems, SFCA is shown to out-perform current methods.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.08102&r=all
  18. By: Fumihiko Isada (Kansai University)
    Abstract: The purpose of this study is to empirically clarify the relationship between a company?s position in the network structure of inter-company collaboration and its profitability and growth potential. In particular, by categorizing and analyzing companies located at the center of the network, we attempted to gain insight into the generation and growth process of so-called platformer companies. According to previous research, companies that are at the center of collaboration among various companies vary by industry and product characteristics. This research focuses on the Internet of Things (IoT), which is an industry in which inter-business relationships are important for collecting and utilizing big data across industries as an object of analysis. As a research method, we extracted the text information of newspaper articles related to IoT and analyzed the transition of the relationship between companies using social network analysis. Network analysis based on newspaper article data showed the possibility of one process to grow from a niche company to a platform leader and eventually to a large company.
    Keywords: Inter-organizational network, Platform leadership, Business Ecosystem, Social network analysis, Internet of Things
    JEL: M11 O32 L14
    URL: http://d.repec.org/n?u=RePEc:sek:iacpro:11713169&r=all
  19. By: Belloc, Filippo (University of Siena); Burdin, Gabriel (Leeds University Business School); Landini, Fabio (University of Parma)
    Abstract: The interplay between labour institutions and the adoption of automation technologies remains poorly understood. Specifically, there is little evidence on how the nature of industrial relations shapes technological choices at the workplace level. Using a large sample of more than 20000 European establishments located in 28 countries, this paper documents conditional correlations between the presence of employee representation (ER) and the use of automation technologies. We find that ER is positively associated with robot usage. The presence of ER also correlates with the utilization of software-based artificial intelligence tools for data analytics. We extensively dig into the mechanisms through which ER may foster the use of robots by exploiting rich information on the de facto role played by ER bodies in relation to well-defined decision areas of management. Greater automation in establishments with ER does not seem to result from more adversarial employment relationships (as measured by past strike activity) or constraints on labour flexibility imposed by the interference of employee representatives with dismissal procedures. Interestingly, the positive effect of ER on robot usage is driven by workplaces operating in relatively centralized wage-setting environments, where one would expected a more limited influence of ER on wages. While our findings are exploratory and do not have a causal interpretation, they are suggestive that ER influences certain workplace practices, such as skill development, job redesign and working time management, that may be complementary to new automation technologies.
    Keywords: automation, robots, artificial intelligence, unions, employee representation, labor market institutions, European Company Survey
    JEL: J50 O32 O33
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp13799&r=all
  20. By: Tahir Miriyev; Alessandro Contu; Kevin Schafers; Ion Gabriel Ion
    Abstract: In this work we considered several hybrid modelling approaches for forecasting energy spot prices in EPEC market. Hybridization is performed through combining a Naive model, Fourier analysis, ARMA and GARCH models, a mean-reversion and jump-diffusion model, and Recurrent Neural Networks (RNN). Training data was given in terms of electricity prices for 2013-2014 years, and test data as a year of 2015.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.08400&r=all
  21. By: Mike Ludkovski; Yuri Saporito
    Abstract: We investigate a machine learning approach to option Greeks approximation based on Gaussian process (GP) surrogates. The method takes in noisily observed option prices, fits a nonparametric input-output map and then analytically differentiates the latter to obtain the various price sensitivities. Our motivation is to compute Greeks in cases where direct computation is expensive, such as in local volatility models, or can only ever be done approximately. We provide a detailed analysis of numerous aspects of GP surrogates, including choice of kernel family, simulation design, choice of trend function and impact of noise. We further discuss the application to Delta hedging, including a new Lemma that relates quality of the Delta approximation to discrete-time hedging loss. Results are illustrated with two extensive case studies that consider estimation of Delta, Theta and Gamma and benchmark approximation quality and uncertainty quantification using a variety of statistical metrics. Among our key take-aways are the recommendation to use Matern kernels, the benefit of including virtual training points to capture boundary conditions, and the significant loss of fidelity when training on stock-path-based datasets.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.08407&r=all
  22. By: Wilde, Joshua (Max Planck Institute for Demographic Research); Chen, Wei (Fordham University); Lohmann, Sophie (Max Planck Institute for Demographic Research)
    Abstract: We use data from Google Trends to predict the effect of the COVID-19 pandemic on future births in the United States. First, we show that periods of above-normal search volume for Google keywords relating to conception and pregnancy in US states are associated with higher numbers of births in the following months. Excess searches for unemployment keywords have the opposite effect. Second, by employing simple statistical learning techniques, we demonstrate that including information on keyword search volumes in prediction models significantly improves forecast accuracy over a number of cross-validation criteria. Third, we use data on Google searches during the COVID-19 pandemic to predict changes in aggregate fertility rates in the United States at the state level through February 2021. Our analysis suggests that between November 2020 and February 2021, monthly US births will drop sharply by approximately 15%. For context, this would be a 50% larger decline than that following the Great Recession of 2008-2009, and similar in magnitude to the declines following the Spanish Flu pandemic of 1918-1919 and the Great Depression. Finally, we find heterogeneous effects of the COVID-19 pandemic across different types of mothers. Women with less than a college education, as well as Black or African American women, are predicted to have larger declines in fertility due to COVID-19. This finding is consistent with elevated caseloads of COVID-19 in low-income and minority neighborhoods, as well as with evidence suggesting larger economic impacts of the crisis among such households.
    Keywords: COVID-19, google, fertility, prediction, statistical learning
    JEL: J11 J13 I10 C53
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp13776&r=all
  23. By: Wilde, Joshua; Chen, Wei; Lohmann, Sophie
    Abstract: We use data from Google Trends to predict the effect of the COVID-19 pandemic on future births in the United States. First, we show that periods of above-normal search volume for Google keywords relating to conception and pregnancy in US states are associated with higher numbers of births in the following months. Excess searches for unemployment keywords have the opposite effect. Second, by employing simple statistical learning techniques, we demonstrate that including information on keyword search volumes in prediction models significantly improves forecast accuracy over a number of cross-validation criteria. Third, we use data on Google searches during the COVID-19 pandemic to predict changes in aggregate fertility rates in the United States at the state level through February 2021. Our analysis suggests that between November 2020 and February 2021, monthly US births will drop sharply by approximately 15%. For context, this would be a 50% larger decline than that following the Great Recession of 2008-2009, and similar in magnitude to the declines following the Spanish Flu pandemic of 1918-1919 and the Great Depression. Finally, we find heterogeneous effects of the COVID-19 pandemic across different types of mothers. Women with less than a college education, as well as Black or African American women, are predicted to have larger declines in fertility due to COVID-19. This finding is consistent with elevated caseloads of COVID-19 in low-income and minority neighborhoods, as well as with evidence suggesting larger economic impacts of the crisis among such households.
    Date: 2020–10–05
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:2bgqs&r=all
  24. By: Gloria Gheno (Innovative Data Analysis)
    Abstract: The bibliographic coupling and co-citation analysis methodologies were proposed in the early 60s and 70s to study the structure and the production of scientific communities. Bibliographic coupling is fundamental to understand the current state of a particular research area and its possible and potential future direction, while co-citation analysis is used to map the roots of academic works, fundamental to the development of a specific research field. With the first method, papers which have a common reference are paired and the strength of the link is given by the number of the references in common. With the second, instead, the papers co-cited by one or more documents are grouped. Both methodologies assume that papers, citing the same articles or cited from the same article, have similar aspects. Because these two methodologies have been considered separately until now, I propose a new algorithm, based on the bicluster analysis, which applies them together and I create an index to measure the similarity of the elements of the obtained clusters. Therefore, this new method groups together the bibliographically coupled papers and the co-cited references. In the obtained bicluster, the references grouped together represent the roots from which is born the trend to which the citing papers, grouped together, adhere. I apply this new method to economic papers, published between 2011 and 2020, which have "big data" among the keywords, so as to understand in a more exhaustive and rapid way how the current state and the future direction of the study of the big data are in the economic sector.
    Keywords: Bibliographic coupling, bicluster, big data, co-citation analysis
    JEL: C19 N01 O10
    URL: http://d.repec.org/n?u=RePEc:sek:iacpro:11113161&r=all
  25. By: Xing Yan; Weizhong Zhang; Lin Ma; Wei Liu; Qi Wu
    Abstract: We propose a parsimonious quantile regression framework to learn the dynamic tail behaviors of financial asset returns. Our model captures well both the time-varying characteristic and the asymmetrical heavy-tail property of financial time series. It combines the merits of a popular sequential neural network model, i.e., LSTM, with a novel parametric quantile function that we construct to represent the conditional distribution of asset returns. Our model also captures individually the serial dependences of higher moments, rather than just the volatility. Across a wide range of asset classes, the out-of-sample forecasts of conditional quantiles or VaR of our model outperform the GARCH family. Further, the proposed approach does not suffer from the issue of quantile crossing, nor does it expose to the ill-posedness comparing to the parametric probability density function approach.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.08263&r=all
  26. By: David Salvetat; Jean-Sébastien Lacam (ESSCA - Groupe ESSCA)
    Abstract: Many SMEs still seem reluctant to accept the management of large datasets, which still appear to be too complex for them. However, our study reveals that the majority of small French car dealers are developing Big data and Smart data policies to improve the quality of their offers, the dynamism of their sales and their access to new opportunities. However, not every policy has the same effects on the development of their business. Whereas Big data improves all the components of SME development in a global, short-term and operational way, Smart data presents itself as a more targeted, prospective and strategic approach.
    Keywords: Big data,smart data,development,automobile
    Date: 2020–10–05
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-02965540&r=all
  27. By: Huwe, Vera; Gessner, Johannes
    Abstract: Widespread electric vehicle adoption is considered a major policy goal in order to decarbonize the transport sector. However, potential rebound effects both in terms of vehicle ownership and distance traveled might nullify the environmental edge of electric vehicles. Using cross-sectional household-level microdata from Germany, we identify rebound effects of electric vehicle adoption on both margins for specific subgroups of electric vehicle owners. As our data is cross-sectional, we resort to data-driven methods which are not yet commonly used in the economic literature. For the identification of changes in the number of cars owned after electric vehicle adoption, we predict counterfactual car ownership using a supervised learning approach. Furthermore, we investigate the effect of electric vehicle adoption on household mileage based on a genetic matching of households owning electric vehicles to similar owners of conventional cars. For the selection of covariates for matching, we contrast ad hoc variable selection based on the available literature with a data-driven variable selection method (double LASSO). We cannot verify asignificant increase in the number of cars owned for households with one electric and one conventional vehicle. For the subgroup of households who substitute the electric car for a conventional vehicle, electric vehicle ownership is associated with a significant reduction in annual mileage of -23% of the sample mean. The result indicates a strive for behavior consistent with the environmentally-friendly car choice rather than a rebound effect. Our results are subgroup-specific and may not generalize to the overall population. Methodologically, we find that data-driven variable selection identifies a refined set of covariates and changes the magnitude of the estimation results substantially. It may thus be considered a useful complement, especially in settings with limited theoretical or empirical knowledge established.
    Keywords: Rebound Effect,Electric Vehicle Adoption,Variable Selection
    JEL: R41 Q55
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:zbw:zewdip:20048&r=all
  28. By: Müller, Karsten
    Abstract: Based on German business cycle forecast reports covering 10 German institutions for the period 1993-2017, the paper analyses the information content of German forecasters' narratives for German business cycle forecasts. The paper applies textual analysis to convert qualitative text data into quantitative sentiment indices. First, a sentiment analysis utilizes dictionary methods and text regression methods, using recursive estimation. Next, the paper analyses the different characteristics of sentiments. In a third step, sentiment indices are used to test the efficiency of numerical forecasts. Using 12-month-ahead fixed horizon forecasts, fixed-effects panel regression results suggest some informational content of sentiment indices for growth and inflation forecasts. Finally, a forecasting exercise analyses the predictive power of sentiment indices for GDP growth and inflation. The results suggest weak evidence, at best, for in-sample and out-of-sample predictive power of the sentiment indices.
    Keywords: Textual analysis,Sentiment,Macroeconomic forecasting,Forecast evaluation,Germany
    JEL: C53 E32 E37 E66
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:zbw:pp1859:23&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.