nep-big New Economics Papers
on Big Data
Issue of 2017‒11‒26
seven papers chosen by
Tom Coupé
University of Canterbury

  1. Financial Time Series Prediction Using Deep Learning By Ariel Navon; Yosi Keller
  2. The STEM requirements of "non-STEM" jobs: evidence from UK online vacancy postings and implications for skills & knowledge shortages By Grinis, Inna
  3. Shocks, social protection, and resilience: Evidence from Ethiopia By Knippenberg, Erwin; Hoddinott, John F.
  4. Agent-based model calibration using machine learning surrogates By Frencesco Lamperti; Andrea Roventini; Amir Sani
  5. Principal Components and Regularized Estimation of Factor Models By Jushan Bai; Serena Ng
  6. Optimizing Variance-Bias Trade-Off in the TWANG Package for Estimation of Propensity Scores By Layla Parast; Daniel F. McCaffrey; Lane F. Burgette; Fernando Hoces de la Guardia; Daniela Golinelli; Jeremy N. V. Miles; Beth Ann Griffin
  7. Algorithmes de prix, intelligence artificielle et equilibre collusifs By Frédérique Marty

  1. By: Ariel Navon; Yosi Keller
    Abstract: In this work we present a data-driven end-to-end Deep Learning approach for time series prediction, applied to financial time series. A Deep Learning scheme is derived to predict the temporal trends of stocks and ETFs in NYSE or NASDAQ. Our approach is based on a neural network (NN) that is applied to raw financial data inputs, and is trained to predict the temporal trends of stocks and ETFs. In order to handle commission-based trading, we derive an investment strategy that utilizes the probabilistic outputs of the NN, and optimizes the average return. The proposed scheme is shown to provide statistically significant accurate predictions of financial market trends, and the investment strategy is shown to be profitable under this challenging setup. The performance compares favorably with contemporary benchmarks along two-years of back-testing.
    Date: 2017–11
  2. By: Grinis, Inna
    Abstract: Do employers in "non-STEM" occupations (e.g. Graphic Designers, Economists) seek to hire STEM (Science, Technology, Engineering, and Mathematics) graduates with a higher probability than non-STEM ones for knowledge and skills that they have acquired through their STEM education (e.g. "Microsoft C#", "Systems Engineering") and not simply for their problem solving and analytical abilities? This is an important question in the UK where less than half of STEM graduates work in STEM occupations and where this apparent leakage from the "STEM pipeline" is often considered as a wastage of resources. To address it, this paper goes beyond the discrete divide of occupations into STEM vs. non-STEM and measures STEM requirements at the level of jobs by examining the universe of UK online vacancy postings between 2012 and 2016. We design and evaluate machine learning algorithms that classify thousands of keywords collected from job adverts and millions of vacancies into STEM and nonSTEM. 35% of all STEM jobs belong to non-STEM occupations and 15% of all postings in non-STEM occupations are STEM. Moreover, STEM jobs are associated with higher wages within both STEM and non-STEM occupations, even after controlling for detailed occupations, education, experience requirements, employers, etc. Although our results indicate that the STEM pipeline breakdown may be less problematic than typically thought, we also find that many of the STEM requirements of "non-STEM" jobs could be acquired with STEM training that is less advanced than a full time STEM education. Hence, a more efficient way of satisfying the STEM demand in non-STEM occupations could be to teach more STEM in non-STEM disciplines. We develop a simple abstract framework to show how this education policy could help reduce STEM shortages in both STEM and non-STEM occupations.
    Keywords: STEM Education; Skills Shortages; Machine Learning
    JEL: N0 R14 J01 J50
    Date: 2017–05–01
  3. By: Knippenberg, Erwin; Hoddinott, John F.
    Abstract: The malign effect of shocks has long been a concern within economics, partly because they result in transitory welfare losses and partly because they may have persistent effects. In development discourse, this latter concern has spurred interest in the concept of resilience and how public interventions can enhance resilience. Within this context, we assess the impact of a social protection program, Ethiopia’s Productive Safety Net Program, on the longer term impacts of drought on household food security. We find that drought shocks reduce the number of months a household considers itself food secure and that these impacts persist for up to four years after the drought has ended. Using a Hausman instrumental variable estimator, we find that receipt of PSNP payments reduced the initial impact of drought shocks by 57 percent and eliminates their adverse impact on food security within two years. In this way, the PSNP strengthens the resilience of its beneficiaries against adverse shocks. This impact is largest for PSNP beneficiaries with little or no land. Results are robust to using an objective measure of drought derived from satellite data, the Standard Evapotranspiration Index. They are also robust to changes in sample composition, the presence of other interventions, and the estimator used.
    Keywords: food security; drought; resilience; household food security
    Date: 2017
  4. By: Frencesco Lamperti (Scuola Superiore Sant'Anna, Pisa, Italy); Andrea Roventini (Scuola Superiore Sant'Anna, Pisa, Italy); Amir Sani (Université Panthéon Sorbonne & CNRS Paris France)
    Abstract: Taking agent-based models (ABM) closer to the data is an open challenge. This paper explicitly tackles parameter space exploration and calibration of ABMs combining supervised machine-learning and intelligent sampling to build a surrogate meta-model. The proposed approach provides a fast and accurate approximation of model behaviour, dramatically reducing computation time. In that, our machine-learning surrogate facilitates large scale explorations of the parameter-space, while providing a powerful filter to gain insights into the complex functioning of agent-based models. The algorithm introduced in this paper merges model simulation and output analysis into a surrogate meta-model, which substantially ease ABM calibration. We successfully apply our approach to the Brock and Hommes (1998) asset pricing model and to the “Island” endogenous growth model (Fagiolo and Dosi, 2003). Performance is evaluated against a relatively large outof-sample set of parameter combinations, while employing different user-defined statistical tests for output analysis. The results demonstrate the capacity of machine learning surrogates to facilitate fast and precise exploration of agent-based models’ behaviour over their often rugged parameter spaces
    Keywords: Agent based model, calibration, machine learning; surrogate, meta-model
    JEL: C15 C52 C63
    Date: 2017–03
  5. By: Jushan Bai; Serena Ng
    Abstract: It is known that the common factors in a large panel of data can be consistently estimated by the method of principal components, and principal components can be constructed by iterative least squares regressions. Replacing least squares with ridge regressions turns out to have the effect of shrinking the singular values of the common component and possibly reducing its rank. The method is used in the machine learning literature to recover low-rank matrices. We study the procedure from the perspective of estimating a minimum-rank approximate factor model. We show that the constrained factor estimates are biased but can be more efficient in terms of mean-squared errors. Rank consideration suggests a data-dependent penalty for selecting the number of factors. The new criterion is more conservative in cases when the nominal number of factors is inflated by the presence of weak factors or large measurement noise. The framework is extended to incorporate a priori linear constraints on the loadings. We provide asymptotic results that can be used to test economic hypotheses.
    Date: 2017–08
  6. By: Layla Parast; Daniel F. McCaffrey; Lane F. Burgette; Fernando Hoces de la Guardia; Daniela Golinelli; Jeremy N. V. Miles; Beth Ann Griffin
    Abstract: While propensity score weighting has been shown to reduce bias in treatment effect estimation when selection bias is present, it has also been shown that such weighting can perform poorly if the estimated propensity score weights are highly variable.
    Keywords: Causal inference , Propensity score , Machine learning
    JEL: I
  7. By: Frédérique Marty (OFCE Sciences Po)
    Abstract: Les algorithmes de prix mis en œuvre par des firmes concurrentes peuvent constituer le support de collusions. Les ressources offertes par le Big Data, les possibilités d’ajustement des prix en temps réel et l’analyse prédictive peuvent permettre d’atteindre rapidement et de maintenir durablement des équilibres de collusion tacite. Le recours à l’intelligence artificielle pose un enjeu spécifique en ce sens que l’algorithme peut découvrir de lui-même l’intérêt d’un accord tacite de non-agression et que l’analyse de son processus décisionnel est particulièrement difficile. Ce faisant la sanction de l’entente sur la base du droit des pratiques anticoncurrentielles ne va pas de soi. L’article explore donc les voies de régulation possibles, que celles-ci passent par des audits ou par l’activation de règles de responsabilité
    JEL: K21 K23 L41
    Date: 2017–05

This nep-big issue is ©2017 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.