nep-big New Economics Papers
on Big Data
Issue of 2018‒07‒23
eight papers chosen by
Tom Coupé
University of Canterbury

  1. Machine Learning Macroeconometrics A Primer By Korobilis, Dimitris
  2. Assessing Autonomous Algorithmic Collusion: Q-Learning Under Sequential Pricing By Timo Klein
  3. Long Short-Term Memory Networks for CSI300 Volatility Prediction with Baidu Search Volume By Yu-Long Zhou; Ren-Jie Han; Qian Xu; Wei-Ke Zhang
  4. Double/de-biased machine learning using regularized Riesz representers By Victor Chernozhukov; Whitney K. Newey; James Robins
  5. Unravelling Airbnb Predicting Price for New Listing By Paridhi Choudhary; Aniket Jain; Rahul Baijal
  6. Problems, solutions and new problems with the third wave of technological unemployment By Fabio D'Orlando
  7. Decision Sciences, Economics, Finance, Business, Computing, and Big Data: Connections By Chia-Lin Chang; Michael McAleer; Wing-Keung Wong
  8. Early Detection of Students at Risk - Predicting Student Dropouts Using Administrative Student Data and Machine Learning Methods Abstract: High rates of student attrition in tertiary education are a major concern for universities and public policy, as dropout is not only costly for the students but also wastes public funds. To successfully reduce student attrition, it is imperative to understand which students are at risk of dropping out and what are the underlying determinants of dropout. We develop an early detection system (EDS) that uses machine learning and classic regression techniques to predict student success in tertiary education as a basis for a targeted intervention. The method developed in this paper is highly standardized and can be easily implemented in every German institution of higher education, as it uses student performance and demographic data collected, stored, and maintained by legal mandate at all German universities and therefore self-adjusts to the university where it is employed. The EDS uses regression analysis and machine learning methods, such as neural networks, decision trees and the AdaBoost algorithm to identify student characteristics which distinguish potential dropouts from graduates. The EDS we present is tested and applied on a medium-sized state university with 23,000 students and a medium-sized private university of applied sciences with 6,700 students. Both institutes of higher education differ considerably in their organization, tuition fees and student-teacher ratios. Our results indicate a prediction accuracy at the end of the first semester of 79% for the state university and 85% for the private university of applied sciences. Furthermore, accuracy of the EDS increases with each completed semester as new performance data becomes available. After the fourth semester, the accuracy improves to 90% for the state university and 95% for the private university of applied sciences. At the day of enrollment the accuracy, relying only on demographic data, is 68% for the state university and 67% for the private university. By Johannes Berens; Simon Oster; Kerstin Schneider; Julian Burghoff

  1. By: Korobilis, Dimitris
    Abstract: This Chapter reviews econometric methods that can be used in order to deal with the challenges of inference in high-dimensional empirical macro models with possibly 'more parameters than observations'.These methods broadly include machine learning algorithms for Big Data, but also more traditional estimation algorithms for data with a short span of observations relative to the number of explanatory variables. While building mainly on a univariate linear regression setting, I show how machine learning ideas can be generalized to classes of models that are interesting to applied macroeconomists, such as time-varying parameter models and vector autoregressions.
    Date: 2018–07
    URL: http://d.repec.org/n?u=RePEc:esy:uefcwp:22666&r=big
  2. By: Timo Klein (University of Amsterdam)
    Abstract: A novel debate within competition policy and regulation circles is whether autonomous machine learning algorithms are able to tacitly collude on prices. Using a general framework, we show how autonomous Q-learning -- a simple but well-established machine learning algorithm -- is able to achieve supracompetitive profits in a stylized oligopoly environment with sequential price competition. This occurs without any communication or explicit instructions to collude, suggesting tacit collusion. The intuition is that the algorithm is able to learn and exploit the dynamics of Edgeworth price cycles, where periodic price increases reset a gradual downward spiral of price competition. The general framework used can guide future research into the capacity of various algorithms to collude in environments that are less stylized or more case-specific.
    Keywords: pricing algorithms; algorithmic collusion; machine learning; Q-learning; sequential pricing
    JEL: K21 L13 L49
    Date: 2018–06–21
    URL: http://d.repec.org/n?u=RePEc:tin:wpaper:20180056&r=big
  3. By: Yu-Long Zhou; Ren-Jie Han; Qian Xu; Wei-Ke Zhang
    Abstract: Intense volatility in financial markets affect humans worldwide. Therefore, relatively accurate prediction of volatility is critical. We suggest that massive data sources resulting from human interaction with the Internet may offer a new perspective on the behavior of market participants in periods of large market movements. First we select 28 key words, which are related to finance as indicators of the public mood and macroeconomic factors. Then those 28 words of the daily search volume based on Baidu index are collected manually, from June 1, 2006 to October 29, 2017. We apply a Long Short-Term Memory neural network to forecast CSI300 volatility using those search volume data. Compared to the benchmark GARCH model, our forecast is more accurate, which demonstrates the effectiveness of the LSTM neural network in volatility forecasting.
    Date: 2018–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1805.11954&r=big
  4. By: Victor Chernozhukov (Institute for Fiscal Studies and MIT); Whitney K. Newey (Institute for Fiscal Studies and MIT); James Robins (Institute for Fiscal Studies)
    Abstract: We provide adaptive inference methods for linear functionals of L1-regularized linear approximations to the conditional expectation function. Examples of such functionals include average derivatives, policy effects, average treatment effects, and many others. The construction relies on building Neyman-orthogonal equations that are approximately invariant to perturbations of the nuisance parameters, including the Riesz representer for the linear functionals. We use L1-regularized methods to learn the approximations to the regression function and the Riesz representer, and construct the estimator for the linear functionals as the solution to the orthogonal estimating equations. We establish that under weak assumptions the estimator concentrates in a 1/vn neighborhood of the target with deviations controlled by the normal laws, and the estimator attains the semi-parametric efficiency bound in many cases. In particular, either the approximation to the regression function or the approximation to the Rietz representer can be “dense” as long as one of them is sufficiently “sparse”. Our main results are non-asymptotic and imply asymptotic uniform validity over large classes of models.
    Keywords: Approximate Sparsity vs. Density, Double/De-biased Machine Learning, Regularized Riesz Representers, Linear Functionals
    Date: 2018–03–02
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:15/18&r=big
  5. By: Paridhi Choudhary; Aniket Jain; Rahul Baijal
    Abstract: This paper analyzes Airbnb listings in the city of San Francisco to better understand how different attributes such as bedrooms, location, house type amongst others can be used to accurately predict the price of a new listing that optimal in terms of the host's profitability yet affordable to their guests. This model is intended to be helpful to the internal pricing tools that Airbnb provides to its hosts. Furthermore, additional analysis is performed to ascertain the likelihood of a listings availability for potential guests to consider while making a booking. The analysis begins with exploring and examining the data to make necessary transformations that can be conducive for a better understanding of the problem at large while helping us make hypothesis. Moving further, machine learning models are built that are intuitive to use to validate the hypothesis on pricing and availability and run experiments in that context to arrive at a viable solution. The paper then concludes with a discussion on the business implications, associated risks and future scope.
    Date: 2018–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1805.12101&r=big
  6. By: Fabio D'Orlando (University of Cassino and Lazio Meridionale)
    Abstract: The aim of this paper is to discuss possible solutions to the “third wave” of technological unemployment and their main drawbacks. The process has just started and will only be fully realized in the future, but its main novelty is already well known and concerns robots (and artificial intelligence) entering the production process. Robots do not simply increase labor productivity, in cooperation with humans, but can totally substitute labor, making it possible to produce commodities without the use of human input. This in turn generates technological unemployment. Past “compensation” theories have argued that technological unemployment could be reabsorbed thanks to wage reduction and demand (and production) increase. But these theories have ignored robots. If robots are more productive and less expensive than humans, wage reduction may be insufficient due to the minimum wage subsistence boundary; and, in any case, an increase in demand would only determine an increase in the production of goods by robots alone, without any impact on human employment. Meanwhile, the resulting mass unemployment will require redistributive policies. The paper discusses the most relevant among these policies, emphasizing their drawbacks and their unwanted implications, and proposes an alternative rooted in Tietenberg’s tradable permits approach.
    JEL: B12 D21 D30 E24 J64
    Date: 2018–07
    URL: http://d.repec.org/n?u=RePEc:csn:wpaper:2018-02&r=big
  7. By: Chia-Lin Chang (Department of Applied Economics and Department of Finance National Chung Hsing University, Taiwan.); Michael McAleer (Department of Quantitative Finance National Tsing Hua University, Taiwan and Econometric Institute Erasmus School of Economics Erasmus University Rotterdam, The Netherlands and Department of Quantitative Economics Complutense University of Madrid, Spain And Institute of Advanced Sciences Yokohama National University, Japan.); Wing-Keung Wong (Department of Finance, Fintech Center, and Big Data Research Center, Asia University, Taiwan and Department of Medical Research China Medical University Hospital And Department of Economics and Finance Hang Seng Management College, Hong Kong, China and Department of Economics, Lingnan University, Hong Kong, China.)
    Abstract: This paper provides a review of some connecting literature in Decision Sciences, Economics, Finance, Business, Computing, and Big Data. We then discuss some research that is related to the six cognate disciplines. Academics could develop theoretical models and subsequent econometric and statistical models to estimate the parameters in the associated models. Moreover, they could then conduct simulations to examine whether the estimators or statistics in the new theories on estimation and hypothesis have small size and high power. Thereafter, academics and practitioners could then apply their theories to analyze interesting problems and issues in the six disciplines and other cognate areas.
    Keywords: Decision sciences; Economics; Finance; Business; Computing; Big data; theoretical models; Econometric and statistical models; Applications.
    JEL: A10 G00 G31 O32
    Date: 2018–03
    URL: http://d.repec.org/n?u=RePEc:ucm:doicae:1809&r=big
  8. Early Detection of Students at Risk - Predicting Student Dropouts Using Administrative Student Data and Machine Learning Methods Abstract: High rates of student attrition in tertiary education are a major concern for universities and public policy, as dropout is not only costly for the students but also wastes public funds. To successfully reduce student attrition, it is imperative to understand which students are at risk of dropping out and what are the underlying determinants of dropout. We develop an early detection system (EDS) that uses machine learning and classic regression techniques to predict student success in tertiary education as a basis for a targeted intervention. The method developed in this paper is highly standardized and can be easily implemented in every German institution of higher education, as it uses student performance and demographic data collected, stored, and maintained by legal mandate at all German universities and therefore self-adjusts to the university where it is employed. The EDS uses regression analysis and machine learning methods, such as neural networks, decision trees and the AdaBoost algorithm to identify student characteristics which distinguish potential dropouts from graduates. The EDS we present is tested and applied on a medium-sized state university with 23,000 students and a medium-sized private university of applied sciences with 6,700 students. Both institutes of higher education differ considerably in their organization, tuition fees and student-teacher ratios. Our results indicate a prediction accuracy at the end of the first semester of 79% for the state university and 85% for the private university of applied sciences. Furthermore, accuracy of the EDS increases with each completed semester as new performance data becomes available. After the fourth semester, the accuracy improves to 90% for the state university and 95% for the private university of applied sciences. At the day of enrollment the accuracy, relying only on demographic data, is 68% for the state university and 67% for the private university.
    By: Johannes Berens (WIB, University of Wuppertal); Simon Oster (WIB, University of Wuppertal); Kerstin Schneider (WIB, University of Wuppertal and CESifo); Julian Burghoff (University of Düsseldorf)
    Date: 2018–07
    URL: http://d.repec.org/n?u=RePEc:bwu:schdps:sdp18006&r=big

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.