nep-big New Economics Papers
on Big Data
Issue of 2019‒04‒29
eight papers chosen by
Tom Coupé
University of Canterbury

  1. Forecasting in Big Data Environments: an Adaptable and Automated Shrinkage Estimation of Neural Networks (AAShNet) By Ali Habibnia; Esfandiar Maasoumi
  2. A neural network-based framework for financial model calibration By Shuaiqiang Liu; Anastasia Borovykh; Lech A. Grzelak; Cornelis W. Oosterlee
  3. Continuous-Time Mean-Variance Portfolio Optimization via Reinforcement Learning By Haoran Wang; Xun Yu Zhou
  4. Deep Generative Models for Reject Inference in Credit Scoring By Rogelio A. Mancisidor; Michael Kampffmeyer; Kjersti Aas; Robert Jenssen
  5. Deep Q-Learning for Nash Equilibria: Nash-DQN By Philippe Casgrain; Brian Ning; Sebastian Jaimungal
  6. Developing an Interactive Machine-Learning-based Approach for Sidewalk Digitalization By Luo, Ji; Wu, Guoyuan
  7. Evaluating Environmental Impact of Traffic Congestion in Real Time Based on Sparse Mobile Crowdsourced Data By Hao, Peng; Wang, Chao
  8. Understanding How cities can link smart mobility priorities through data By Shaheen, Susan PhD; Martin, Elliot PhD; Hoffman-Stapleton, Mikaela; Slowik, Peter

  1. By: Ali Habibnia (Virginia Tech); Esfandiar Maasoumi (Emory University)
    Abstract: This paper considers improved forecasting in possibly nonlinear dynamic settings, with high-dimension predictors ("big data" environments). To overcome the curse of dimensionality and manage data and model complexity, we examine shrinkage estimation of a back-propagation algorithm of a deep neural net with skip-layer connections. We expressly include both linear and nonlinear components. This is a high-dimensional learning approach including both sparsity L1 and smoothness L2 penalties, allowing high-dimensionality and nonlinearity to be accommodated in one step. This approach selects significant predictors as well as the topology of the neural network. We estimate optimal values of shrinkage hyperparameters by incorporating a gradient-based optimization technique resulting in robust predictions with improved reproducibility. The latter has been an issue in some approaches. This is statistically interpretable and unravels some network structure, commonly left to a black box. An additional advantage is that the nonlinear part tends to get pruned if the underlying process is linear. In an application to forecasting equity returns, the proposed approach captures nonlinear dynamics between equities to enhance forecast performance. It offers an appreciable improvement over current univariate and multivariate models by RMSE and actual portfolio performance.
    Date: 2019–04
  2. By: Shuaiqiang Liu; Anastasia Borovykh; Lech A. Grzelak; Cornelis W. Oosterlee
    Abstract: A data-driven approach called CaNN (Calibration Neural Network) is proposed to calibrate financial asset price models using an Artificial Neural Network (ANN). Determining optimal values of the model parameters is formulated as training hidden neurons within a machine learning framework, based on available financial option prices. The framework consists of two parts: a forward pass in which we train the weights of the ANN off-line, valuing options under many different asset model parameter settings; and a backward pass, in which we evaluate the trained ANN-solver on-line, aiming to find the weights of the neurons in the input layer. The rapid on-line learning of implied volatility by ANNs, in combination with the use of an adapted parallel global optimization method, tackles the computation bottleneck and provides a fast and reliable technique for calibrating model parameters while avoiding, as much as possible, getting stuck in local minima. Numerical experiments confirm that this machine-learning framework can be employed to calibrate parameters of high-dimensional stochastic volatility models efficiently and accurately.
    Date: 2019–04
  3. By: Haoran Wang; Xun Yu Zhou
    Abstract: We consider continuous-time Mean-variance (MV) portfolio optimization problem in the Reinforcement Learning (RL) setting. The problem falls into the entropy-regularized relaxed stochastic control framework recently introduced in Wang et al. (2019). We derive the feedback exploration policy as the Gaussian distribution, with time-decaying variance. Close connections between the entropy-regularized MV and the classical MV are also discussed, including the solvability equivalence and the convergence as exploration decays. Finally, we prove a policy improvement theorem (PIT) for the continuous-time MV problem under both entropy regularization and control relaxation. The PIT leads to an implementable RL algorithm for the continuous-time MV problem. Our algorithm outperforms an adaptive control based method that estimates the underlying parameters in real-time and a state-of-the-art RL method that uses deep neural networks for continuous control problems by a large margin in nearly all simulations.
    Date: 2019–04
  4. By: Rogelio A. Mancisidor; Michael Kampffmeyer; Kjersti Aas; Robert Jenssen
    Abstract: Credit scoring models based on accepted applications may be biased and their consequences can have a statistical and economic impact. Reject inference is the process of attempting to infer the creditworthiness status of the rejected applications. In this research, we use deep generative models to develop two new semi-supervised Bayesian models for reject inference in credit scoring, in which we model the data generating process to be dependent on a Gaussian mixture. The goal is to improve the classification accuracy in credit scoring models by adding reject applications. Our proposed models infer the unknown creditworthiness of the rejected applications by exact enumeration of the two possible outcomes of the loan (default or non-default). The efficient stochastic gradient optimization technique used in deep generative models makes our models suitable for large data sets. Finally, the experiments in this research show that our proposed models perform better than classical and alternative machine learning models for reject inference in credit scoring.
    Date: 2019–04
  5. By: Philippe Casgrain; Brian Ning; Sebastian Jaimungal
    Abstract: Model-free learning for multi-agent stochastic games is an active area of research. Existing reinforcement learning algorithms, however, are often restricted to zero-sum games, and are applicable only in small state-action spaces or other simplified settings. Here, we develop a new data efficient Deep-Q-learning methodology for model-free learning of Nash equilibria for general-sum stochastic games. The algorithm uses a local linear-quadratic expansion of the stochastic game, which leads to analytically solvable optimal actions. The expansion is parametrized by deep neural networks to give it sufficient flexibility to learn the environment without the need to experience all state-action pairs. We study symmetry properties of the algorithm stemming from label-invariant stochastic games and as a proof of concept, apply our algorithm to learning optimal trading strategies in competitive electronic markets.
    Date: 2019–04
  6. By: Luo, Ji; Wu, Guoyuan
    Abstract: In urban areas, many socio-economic concerns have been raised regarding fatal collisions, traffic congestion, and deteriorated air quality due to increased travel and logistic demands as well as the existing on-road transportation systems. As one of the promising remedies, active transportation has been advocated, which may not only mitigate congestion on local streets, but also promote physical fitness, foster community livability, and boost local economy. To promote the active transportation mode, extensive work has been focused on planning and developing a number of pedestrian and bicyclist related programs which require the infrastructure, e.g., sidewalks, as a premise. A significant amount of these efforts have to go for the setup, maintenance and evaluation of the sidewalk inventory on a relatively large geographic scale (e.g., citywide, statewide), which lays a solid foundation for a variety of active-mobility-focused applications and related research.
    Keywords: Engineering
    Date: 2018–01–01
  7. By: Hao, Peng; Wang, Chao
    Abstract: Traffic congestion at arterial intersections and freeway bottlenecks degrades the air quality and threatens the public health. Conventionally, air pollutants are monitored by sparsely distributed Quality Assurance Air Monitoring Sites. Sparse mobile crowd-sourced data, such as cellular network and Global Positioning System (GPS) data, contain large amount of traffic information, but have low sampling rate and penetration rate due to the cost limit on data transmission and archiving. The sparse mobile data provide a supplement or alternative approach to evaluate the environmental impact of traffic congestion. This research establishes a framework for traffic-related air pollution evaluation using sparse mobile data and traffic volume data from California Performance Measurement System (PeMS) and Los Angeles Department of Transportation (LADOT). The proposed framework integrates traffic state model, emission model and dispersion model. An effective tool is developed to evaluate the environmental impact of traffic congestion for both arterials and freeways in an accurate, timely and economic way. The proposed methods have good performance in estimating monthly peak hour fine particulate matter (PM 2.5) concentration, with error of 2 ug/m3 from the measurement from monitor sites. The estimated spatial distribution of annual PM 2.5 concentration also matches well with the concentration map from California Communities Environmental Health Screening Tool (CalEnviroScreen), but with higher resolution. The proposed system will help transportation operators and public health officials alleviate the risk of air pollution, and can serve as a platform for the development of other potential applications.
    Keywords: Engineering
    Date: 2018–02–01
  8. By: Shaheen, Susan PhD; Martin, Elliot PhD; Hoffman-Stapleton, Mikaela; Slowik, Peter
    Abstract: This white paper presents a generalized evaluation framework that can be used for assessing project impacts within the context of transportation-related city projects. In support of this framework, we discuss a selection of metrics and data sources that are needed to evaluate the performance of smart city innovations. We first present a collection of projects and applications from near-term smart city concepts or actual pilot projects underway (i.e., Smart City Challenge, Federal Transit Administration (FTA) Mobility on Demand (MOD) Sandbox, and other pilot projects operating in the regions of Los Angeles, Portland, and San Francisco). These projects are identified and explained in Section 2 of this report. Using these projects as the basis for hypothetical case studies, we present selected metrics that would be necessary to evaluate and monitor the performance of such innovations over time. We then identify the data needs to compute those metrics and further highlight the gaps in known data resources that should be covered to enable their computation. The objective of this effort is to help guide future city planners, policy makers, and practitioners in understanding the design of key metrics 3 and data needs at the outset of a project to better facilitate the establishment of rigorous and thoughtful data collection requirements.
    Keywords: Engineering, Mobility, data, intelligent transportation systems, mobility on demand
    Date: 2018–04–01

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.