nep-big New Economics Papers
on Big Data
Issue of 2019‒05‒27
fifteen papers chosen by
Tom Coupé
University of Canterbury

  1. Data Envelopment Analysis and Business Analytics: The Big Data Challenges and Some Solutions By Valentin Zelenyuk
  2. Political Entrenchment and GDP Misreporting By Ho Fai Chan; Bruno S. Frey; Ahmed Skali; Benno Torgler
  3. Artificial Intelligence, Automation and Work By Daron Acemoglu; Pascual Restrepo
  4. Time Series Analysis and Forecasting of the US Housing Starts using Econometric and Machine Learning Model By Sudiksha Joshi
  5. Convolutional Feature Extraction and Neural Arithmetic Logic Units for Stock Prediction By Shangeth Rajaa; Jajati Keshari Sahoo
  6. Predicting and Forecasting the Price of Constituents and Index of Cryptocurrency Using Machine Learning By Reaz Chowdhury; M. Arifur Rahman; M. Sohel Rahman; M. R. C. Mahdy
  7. The Informational Content of the Term-Spread in Forecasting the U.S. Inflation Rate: A Nonlinear Approach By Gogas, Periklis; Papadimitriou, Theophilos; Plakandaras, Vasilios; Gupta, Rangan
  8. The IAB-INCHER project of earned doctorates (IIPED): A supervised machine learning approach to identify doctorate recipients in the German integrated employment biography data By Heinisch, Dominik; Koenig, Johannes; Otto, Anne
  9. Sustainable Investing and the Cross-Section of Maximum Drawdown By Lisa R. Goldberg; Saad Mouti
  10. Machine Learning Tree and Exact Integration for Pricing American Options in High Dimension By Ludovic Gouden\`ege; Andrea Molent; Antonino Zanette
  11. Essays on reporting and information processing By de Kok, Ties
  12. Conformal Prediction Interval Estimations with an Application to Day-Ahead and Intraday Power Markets By Christopher Kath; Florian Ziel
  13. Hedging crop yields against weather uncertainties -- a weather derivative perspective By Samuel Asante Gyamerah; Philip Ngare; Dennis Ikpe
  14. Transforming Naturally Occurring Text Data Into Economic Statistics: The Case of Online Job Vacancy Postings By Arthur Turrell; Bradley J. Speigner; Jyldyz Djumalieva; David Copple; James Thurgood
  15. Deep Learning–based Eco-driving System for Battery Electric Vehicles By Wu, Guoyuan; Ye, Fei; Hao, Peng; Esaid, Danial; Boriboonsomsin, Kanok; Barth, Matthew J.

  1. By: Valentin Zelenyuk (School of Economics and Centre for Efficiency and Productivity Analysis (CEPA) at The University of Queensland, Australia)
    Abstract: The goal of this article is three-fold. The first goal is to present a concise review of Data Envelopment Analysis (DEA) for the more general Business Analytics (BA) community. The second goal of this paper is to discuss the key aspect (and thus the key challenge) of BA—the ‘big data’—to the DEA community, which besides a few exceptions, appears to have been largely circumventing this area, despite it gaining more and more attention in other areas of research and practice. The third, and most important, goal of this paper is to discuss possible solutions to the ‘big data’ problem related to the large dimensions in the context of DEA. To achieve the latter goal, we presented some theoretical grounds and performed a new simulation study to explore the price-based aggregation as a solution to address one of the key challenges of the ‘big data’ problems for DEA—the immense dimensionality problem.
    Keywords: Data Envelopment Analysis; Productivity; Efficiency; Business Analytics; Big Data
    Date: 2019–05
  2. By: Ho Fai Chan; Bruno S. Frey; Ahmed Skali; Benno Torgler
    Abstract: By examining discrepancies between officially reported GDP growth figures and the actual economic growth implied by satellite-based night time light (NTL) density, we investigate whether democracies manipulate officially reported GDP figures, and if so, whether such manipulation pays political dividends. We find that the over-reporting of growth rates does indeed precede increases in popular support, with around a 1% over-statement associated with a 0.5% increase in voter intentions for the incumbent. These results are robust to allowing the elasticity of official GDP statistics to NTL to be country specific, as well as accounting for the quality of governance, and checks and balances on executive power.
    Keywords: Manipulation; political entrenchment; electoral cycles; trust; popular support; GDP; night lights
    JEL: D72 D73 O43
    Date: 2019–05
  3. By: Daron Acemoglu (MIT); Pascual Restrepo (Boston University)
    Abstract: We summarize a framework for the study of the implications of automation and AI on the demand for labor, wages, and employment. Our task-based framework emphasizes the displacement effect that automation creates as machines and AI replace labor in tasks that it used to perform. This displacement effect tends to reduce the demand for labor and wages. But it is counteracted by a productivity effect, resulting from the cost savings generated by automation, which increase the demand for labor in non-automated tasks. The productivity effect is complemented by additional capital accumulation and the deepening of automation (improvements of existing machinery), both of which further increase the demand for labor. These countervailing effects are incomplete. Even when they are strong, automation in- creases output per worker more than wages and reduce the share of labor in national income. The more powerful countervailing force against automation is the creation of new labor-intensive tasks, which reinstates labor in new activities and tends to in- crease the labor share to counterbalance the impact of automation. Our framework also highlights the constraints and imperfections that slow down the adjustment of the economy and the labor market to automation and weaken the resulting produc- tivity gains from this transformation: a mismatch between the skill requirements of new technologies, and the possibility that automation is being introduced at an excessive rate, possibly at the expense of other productivity-enhancing technologies.
    Keywords: AI, automation, displacement effect, labor demand, inequality, productivity, tasks, technology, wages
    JEL: J23 J24
    Date: 2018–01–04
  4. By: Sudiksha Joshi
    Abstract: In this research paper, I have performed time series analysis and forecasted the monthly value of housing starts for the year 2019 using several econometric methods - ARIMA(X), VARX, (G)ARCH and machine learning algorithms - artificial neural networks, ridge regression, K-Nearest Neighbors, and support vector regression, and created an ensemble model. The ensemble model stacks the predictions from various individual models, and gives a weighted average of all predictions. The analyses suggest that the ensemble model has performed the best among all the models as the prediction errors are the lowest, while the econometric models have higher error rates.
    Date: 2019–05
  5. By: Shangeth Rajaa; Jajati Keshari Sahoo
    Abstract: Stock prediction is a topic undergoing intense study for many years. Finance experts and mathematicians have been working on a way to predict the future stock price so as to decide to buy the stock or sell it to make profit. Stock experts or economists, usually analyze on the previous stock values using technical indicators, sentiment analysis etc to predict the future stock price. In recent years, many researches have extensively used machine learning for predicting the stock behaviour. In this paper we propose data driven deep learning approach to predict the future stock value with the previous price with the feature extraction property of convolutional neural network and to use Neural Arithmetic Logic Units with it.
    Date: 2019–05
  6. By: Reaz Chowdhury; M. Arifur Rahman; M. Sohel Rahman; M. R. C. Mahdy
    Abstract: At present, cryptocurrencies have become a global phenomenon in financial sectors as it is one of the most traded financial instruments worldwide. Cryptocurrency is not only one of the most complicated and abstruse fields among financial instruments, but it is also deemed as a perplexing problem in finance due to its high volatility. This paper makes an attempt to apply machine learning techniques on the index and constituents of cryptocurrency with a goal to predict and forecast prices thereof. In particular, the purpose of this paper is to predict and forecast the close (closing) price of the cryptocurrency index 30 and nine constituents of cryptocurrencies using machine learning algorithms and models so that, it becomes easier for people to trade these currencies. We have used several machine learning techniques and algorithms and compared the models with each other to get the best output. We believe that our work will help reduce the challenges and difficulties faced by people, who invest in cryptocurrencies. Moreover, the obtained results can play a major role in cryptocurrency portfolio management and in observing the fluctuations in the prices of constituents of cryptocurrency market. We have also compared our approach with similar state of the art works from the literature, where machine learning approaches are considered for predicting and forecasting the prices of these currencies. In the sequel, we have found that our best approach presents better and competitive results than the best works from the literature thereby advancing the state of the art. Using such prediction and forecasting methods, people can easily understand the trend and it would be even easier for them to trade in a difficult and challenging financial instrument like cryptocurrency.
    Date: 2019–05
  7. By: Gogas, Periklis (Democritus University of Thrace, Department of Economics); Papadimitriou, Theophilos (Democritus University of Thrace, Department of Economics); Plakandaras, Vasilios (Democritus University of Thrace, Department of Economics); Gupta, Rangan (University of Pretoria)
    Abstract: The difficulty in modelling inflation and the significance in discovering the underlying data generating process of inflation is expressed in an ample literature regarding inflation forecasting. In this paper we evaluate nonlinear machine learning and econometric methodologies in forecasting the U.S. inflation based on autoregressive and structural models of the term structure. We employ two nonlinear methodologies: the econometric Least Absolute Shrinkage and Selection Operator (LASSO) and the machine learning Support Vector Regression (SVR) method. The SVR has never been used before in inflation forecasting considering the term–spread as a regressor. In doing so, we use a long monthly dataset spanning the period 1871:1–2015:3 that covers the entire history of inflation in the U.S. economy. For comparison reasons we also use OLS regression models as benchmark. In order to evaluate the contribution of the term-spread in inflation forecasting in different time periods, we measure the out-of-sample forecasting performance of all models using rolling window regressions. Considering various forecasting horizons, the empirical evidence suggests that the structural models do not outperform the autoregressive ones, regardless of the model’s method. Thus we conclude that the term-spread models are not more accurate than autoregressive ones in inflation forecasting.
    Keywords: U.S. Inflation; forecasting; Support Vector Regression; LASSO
    JEL: C22 C45
    Date: 2019–05–15
  8. By: Heinisch, Dominik; Koenig, Johannes; Otto, Anne (Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany])
    Abstract: "Only scarce information is available on doctorate recipients' career outcomes in Germany (BuWiN 2013). With the current information base, graduate students cannot make an informed decision whether to start a doctorate (Benderly 2018, Blank 2017). Administrative labour market data could provide the necessary information, is however incomplete in this respect. In this paper, we describe the record linkage of two datasets to close this information gap: data on doctorate recipients collected in the catalogue of the German National Library (DNB), and the German labour market biographies (IEB) from the German Institute of Employment Research. We use a machine learning based methodology, which 1) improves the record linkage of datasets without unique identifiers, and 2) evaluates the quality of the record linkage. The machine learning algorithms are trained on a synthetic training and evaluation dataset. In an exemplary analysis we compare the employment status of female and male doctorate recipients in Germany." (Author's abstract, IAB-Doku) ((en))
    JEL: C81 E24 I20
    Date: 2019–05–21
  9. By: Lisa R. Goldberg; Saad Mouti
    Abstract: We use supervised learning to identify factors that predict the cross-section of maximum drawdown for stocks in the US equity market. Our data run from January 1980 to June 2018 and our analysis includes ordinary least squares, penalized linear regressions, tree-based models, and neural networks. We find that the most important predictors tended to be consistent across models, and that non-linear models had better predictive power than linear models. Predictive power was higher in calm periods than stressed periods, and environmental, social, and governance indicators augmented predictive power for non-linear models.
    Date: 2019–05
  10. By: Ludovic Gouden\`ege; Andrea Molent; Antonino Zanette
    Abstract: In this paper we modify the Gaussian Process Regression Monte Carlo (GPR-MC) method introduced by Gouden\`ege et al. proposing two efficient techniques which allow one to compute the price of American basket options. In particular, we consider basket of assets that follow a Black-Scholes dynamics. The proposed techniques, called GPR Tree (GRP-Tree) and GPR Exact Integration (GPR-EI), are both based on Machine Learning, exploited together with binomial trees or with a closed formula for integration. Moreover, these two methods solve the backward dynamic programming problem considering a Bermudan approximation of the American option. On the exercise dates, the value of the option is first computed as the maximum between the exercise value and the continuation value and then approximated by means of Gaussian Process Regression. Both the two methods derive from the GPR-MC method and they mainly differ in the method used to approximate the continuation value: a single step of binomial tree or integration according to the probability density of the process. Numerical results show that these two methods are accurate and reliable and improve the results of the GPR-MC method in handling American options on very large baskets of assets.
    Date: 2019–05
  11. By: de Kok, Ties (Tilburg University, School of Economics and Management)
    Abstract: The three essays collected in this PhD thesis concern internal and external reporting practices, narrative disclosures, recent advancements in reporting technologies, and the role of reporting in emerging markets. These essays utilize state-of-the-art empirical techniques drawn from computer science along with new data sources to study fundamental accounting questions. The first essay studies the relationship between reporting frequency and market pressure over social media in crowdfunding markets. The second essay studies the use of soft information in the context of internal bank lending decisions, in particular during a scenario of mandated changes to the location of decisions rights. The third essay studies the information retrieval process for narrative disclosures for users that vary in their financial literacy by combining innovative tracking techniques deployed on Amazon Mechanical Turk with state-of-the-art machine learning techniques.
    Date: 2019
  12. By: Christopher Kath; Florian Ziel
    Abstract: We discuss a concept denoted as Conformal Prediction (CP) in this paper. While initially stemming from the world of machine learning, it was never applied or analyzed in the context of short-term electricity price forecasting. Therefore, we elaborate the aspects that render Conformal Prediction worthwhile to know and explain why its simple yet very efficient idea has worked in other fields of application and why its characteristics are promising for short-term power applications as well. We compare its performance with different state-of-the-art electricity price forecasting models such as quantile regression averaging (QRA) in an empirical out-of-sample study for three short-term electricity time series. We combine Conformal Prediction with various underlying point forecast models to demonstrate its versatility and behavior under changing conditions. Our findings suggest that Conformal Prediction yields sharp and reliable prediction intervals in short-term power markets. We further inspect the effect each of Conformal Prediction's model components has and provide a path-based guideline on how to find the best CP model for each market.
    Date: 2019–05
  13. By: Samuel Asante Gyamerah; Philip Ngare; Dennis Ikpe
    Abstract: The effects of weather on agriculture in recent years have become a major concern across the globe. Hence, the need for an effective weather risk management tool (weather derivatives) for agricultural stakeholders. However, most of these stakeholders are unwilling to pay for the price of weather derivatives (WD) because of product-design and geographical basis risks in the pricing models of WD. Using machine learning ensemble technique for crop yield forecasting and feature importance, the major major weather variable (average temperature) that affects crop yields are empirically determined. This variable (average temperature) is used as the underlying index for WD to eliminate product-design basis risks. A model with time-varying speed of mean reversion, seasonal mean, local volatility that depends on the average temperature and time for the contract period is proposed. Based on this model, pricing models for futures, options on futures, and basket futures for cumulative average temperature and growing degree-days are presented. Pricing futures on baskets reduces geographical basis risk as buyer's have the opportunity to select the most appropriate weather stations with their desired weight preference. With these pricing models, agricultural stakeholders can hedge their crops against the perils of weather.
    Date: 2019–05
  14. By: Arthur Turrell; Bradley J. Speigner; Jyldyz Djumalieva; David Copple; James Thurgood
    Abstract: Using a dataset of 15 million UK job adverts from a recruitment website, we construct new economic statistics measuring labour market demand. These data are ‘naturally occurring’, having originally been posted online by firms. They offer information on two dimensions of vacancies—region and occupation—that firm-based surveys do not usually, and cannot easily, collect. These data do not come with official classification labels so we develop an algorithm which maps the free form text of job descriptions into standard occupational classification codes. The created vacancy statistics give a plausible, granular picture of UK labour demand and permit the analysis of Beveridge curves and mismatch unemployment at the occupational level.
    JEL: E24 J63
    Date: 2019–05
  15. By: Wu, Guoyuan; Ye, Fei; Hao, Peng; Esaid, Danial; Boriboonsomsin, Kanok; Barth, Matthew J.
    Abstract: Eco-driving strategies based on connected and automated vehicles (CAV) technology, such as Eco-Approach and Departure (EAD), have attracted significant worldwide interest due to their potential to save energy and reduce tail-pipe emissions. In this project, the research team developed and tested a deep learning–based trajectory-planning algorithm (DLTPA) for EAD. The DLTPA has two processes: offline (training) and online (implementation), and it is composed of two major modules: 1) a solution feasibility checker that identifies whether there is a feasible trajectory subject to all the system constraints, e.g., maximum acceleration or deceleration; and 2) a regressor to predict the speed of the next time-step. Preliminary simulation with microscopic traffic modeling software PTV VISSIM showed that the proposed DLTPA can achieve the optimal solution in terms of energy savings and a greater balance of energy savings vs. computational efforts when compared to the baseline scenarios where no EAD is implemented and the optimal solution (in terms of energy savings) is provided by a graph-based trajectory planning algorithm. View the NCST Project Webpage
    Keywords: Engineering, Eco-driving, deep-learning, energy and emissions, VISSIM
    Date: 2019–05–01

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.