nep-big New Economics Papers
on Big Data
Issue of 2022‒03‒28
nineteen papers chosen by
Tom Coupé
University of Canterbury

  1. Public Opinion Toward Artificial Intelligence By Zhang, Baobao
  2. Machine Learning Models in Stock Market Prediction By Gurjeet Singh
  3. Inverse Selection By Markus Brunnermeier; Rohit Lamba; Carlos Segura-Rodriguez
  4. Fairness constraint in Structural Econometrics and Application to fair estimation using Instrumental Variables By Samuele Centorrino; Jean-Pierre Florens; Jean-Michel Loubes
  5. Competing Models By José Luis Montiel Olea; Pietro Ortoleva; Mallesh Pai; Andrea Prat
  6. Can LSTM outperform volatility-econometric models? By German Rodikov; Nino Antulov-Fantulin
  7. Traditional marketing analytics, big data analytics and big data system quality and the success of new product development By Ahmad Ibrahim Aljumah; Mohammed T. Nuseir; Md. Mahmudul Alam
  8. Economists in the 2008 Financial Crisis: Slow to See, Fast to Act By Levy, Daniel; Mayer, Tamir; Raviv, Alon
  9. HCMD-zero: Learning Value Aligned Mechanisms from Data By Jan Balaguer; Raphael Koster; Ari Weinstein; Lucy Campbell-Gillingham; Christopher Summerfield; Matthew Botvinick; Andrea Tacchetti
  10. Foreign Doctorate Students in Europe By Laureti, Lucio; Costantiello, Alberto; Matarrese, Marco Maria; Leogrande, Angelo
  11. Pricing options on flow forwards by neural networks in Hilbert space By Fred Espen Benth; Nils Detering; Luca Galimberti
  12. Political and Non-Political Officials in Local Government By Resce, Giuliano
  13. Nonparametric Adaptive Robust Control Under Model Uncertainty By Erhan Bayraktar; Tao Chen
  14. Neural Generalised AutoRegressive Conditional Heteroskedasticity By Zexuan Yin; Paolo Barucca
  15. Big data technologies: perceived benefits and costs for adopter and non-adopter enterprises By Claudio Vitari; E. Raguseo
  16. CO2 Emissions and Corporate Performance: Japan's Evidence with Double Machine Learning By Ryo Aruga; Keiichi Goshima; Takashi Chiba
  17. Differentially Private Estimation of Heterogeneous Causal Effects By Fengshi Niu; Harsha Nori; Brian Quistorff; Rich Caruana; Donald Ngwe; Aadharsh Kannan
  18. What Attitude Did the Japanese News Media Take toward the 2020 Tokyo Olympic Games? Sentiment Analysis of the Japanese Newspapers By Annaka, Susumu; Hara, Taketo
  19. Revisiting the role of secondary towns: Effects of different types of urban growth on poverty in Indonesia By John Gibson; Yi Jiang; Bambang Susantono

  1. By: Zhang, Baobao (Cornell University)
    Abstract: This chapter in the Oxford Handbook of AI Governance synthesizes and discusses research on public opinion toward artificial intelligence (AI). This chapter synthesizes and discusses research on public opinion toward artificial intelligence (AI). Understanding citizens' and consumers' attitudes toward AI is important from a normative standpoint because the public is a major stakeholder in shaping the future of the technology and should have a voice in policy discussions. Furthermore, the research could help us anticipate future political and consumer behavior. Survey data worldwide show that the public is increasingly aware of AI; however, they -- unlike AI researchers -- tend to anthropomorphize AI. Demographic differences correlate with trust in AI in general: those living in East Asia have higher levels of trust in AI, while women and those of lower socioeconomic status across different regions have lower levels of trust. Surveys that focus on particular AI applications, including facial recognition technology, personalization algorithms, lethal autonomous weapons, and workplace automation, add complexity to this research topic. I conclude this chapter by recommending four new topics for future studies: 1) institutional trust in actors building and deploying AI systems, 2) the impact of knowledge and experience on attitudes toward AI, 3) heterogeneity in attitudes toward AI, and 4) the relationship between attitudes and behavior.
    Date: 2021–10–07
  2. By: Gurjeet Singh
    Abstract: The paper focuses on predicting the Nifty 50 Index by using 8 Supervised Machine Learning Models. The techniques used for empirical study are Adaptive Boost (AdaBoost), k-Nearest Neighbors (kNN), Linear Regression (LR), Artificial Neural Network (ANN), Random Forest (RF), Stochastic Gradient Descent (SGD), Support Vector Machine (SVM) and Decision Trees (DT). Experiments are based on historical data of Nifty 50 Index of Indian Stock Market from 22nd April, 1996 to 16th April, 2021, which is time series data of around 25 years. During the period there were 6220 trading days excluding all the non trading days. The entire trading dataset was divided into 4 subsets of different size-25% of entire data, 50% of entire data, 75% of entire data and entire data. Each subset was further divided into 2 parts-training data and testing data. After applying 3 tests- Test on Training Data, Test on Testing Data and Cross Validation Test on each subset, the prediction performance of the used models were compared and after comparison, very interesting results were found. The evaluation results indicate that Adaptive Boost, k- Nearest Neighbors, Random Forest and Decision Trees under performed with increase in the size of data set. Linear Regression and Artificial Neural Network shown almost similar prediction results among all the models but Artificial Neural Network took more time in training and validating the model. Thereafter Support Vector Machine performed better among rest of the models but with increase in the size of data set, Stochastic Gradient Descent performed better than Support Vector Machine.
    Date: 2022–02
  3. By: Markus Brunnermeier (Princeton University); Rohit Lamba (Pennsylvania State University); Carlos Segura-Rodriguez (Banco Central de Costa Rica)
    Abstract: Big data, machine learning and AI inverts adverse selection problems. It allows insurers to infer statistical information and thereby reverses information advantage from the insuree to the insurer. In a setting with two-dimensional type space whose correlation can be inferred with big data we derive three results: First, a novel tradeoff between a belief gap and price discrimination emerges. The insurer tries to protect its statistical information by offering only a few screening contracts. Second, we show that forcing the insurance company to reveal its statistical information can be welfare improving. Third, we show in a setting with naive agents that do not perfectly infer statistical information from the price of offered contracts, price discrimination significantly boosts insurer’s profits. We also discuss the significance our analysis through three stylized facts: the rise of data brokers, the importance of consumer activism and regulatory forbearance, and merits of a public data repository.
    Keywords: Insurance, Big Data, Informed Principal, Belief Gap, Price Discrimination
    JEL: G22 D82 D86 C55
    Date: 2020–04
  4. By: Samuele Centorrino; Jean-Pierre Florens; Jean-Michel Loubes
    Abstract: A supervised machine learning algorithm determines a model from a learning sample that will be used to predict new observations. To this end, it aggregates individual characteristics of the observations of the learning sample. But this information aggregation does not consider any potential selection on unobservables and any status-quo biases which may be contained in the training sample. The latter bias has raised concerns around the so-called \textit{fairness} of machine learning algorithms, especially towards disadvantaged groups. In this chapter, we review the issue of fairness in machine learning through the lenses of structural econometrics models in which the unknown index is the solution of a functional equation and issues of endogeneity are explicitly accounted for. We model fairness as a linear operator whose null space contains the set of strictly {\it fair} indexes. A {\it fair} solution is obtained by projecting the unconstrained index into the null space of this operator or by directly finding the closest solution of the functional equation into this null space. We also acknowledge that policymakers may incur a cost when moving away from the status quo. Achieving \textit{approximate fairness} is obtained by introducing a fairness penalty in the learning procedure and balancing more or less heavily the influence between the status quo and a full fair solution.
    Date: 2022–02
  5. By: José Luis Montiel Olea (Columbia University); Pietro Ortoleva (Princeton University); Mallesh Pai (Rice University); Andrea Prat (Columbia University)
    Abstract: Different agents compete to predict a variable of interest related to a set of covariates via an unknown data generating process. All agents are Bayesian, but may consider different subsets of covariates to make their prediction. After observing a common dataset, who has the highest confidence in her predictive ability? We characterize it and show that it crucially depends on the size of the dataset. With small data, typically it is an agent using a model that is small-dimensional, in the sense of considering fewer covariates than the true data generating process. With big data, it is instead typically large-dimensional, possibly using more variables than the true model. These features are reminiscent of model selection techniques used in statistics and machine learning. However, here model selection does not emerge normatively, but positively as the outcome of competition between standard Bayesian decision makers. The theory is applied to auctions of assets where bidders observe the same information but hold different priors.
    Keywords: Models. Low-dimensional Model, High-dimensional Model
    JEL: C20 C30
    Date: 2021–11
  6. By: German Rodikov; Nino Antulov-Fantulin
    Abstract: Volatility prediction for financial assets is one of the essential questions for understanding financial risks and quadratic price variation. However, although many novel deep learning models were recently proposed, they still have a "hard time" surpassing strong econometric volatility models. Why is this the case? The volatility prediction task is of non-trivial complexity due to noise, market microstructure, heteroscedasticity, exogenous and asymmetric effect of news, and the presence of different time scales, among others. In this paper, we analyze the class of long short-term memory (LSTM) recurrent neural networks for the task of volatility prediction and compare it with strong volatility-econometric models.
    Date: 2022–02
  7. By: Ahmad Ibrahim Aljumah (AAU - Al Ain University); Mohammed T. Nuseir (AAU - Al Ain University); Md. Mahmudul Alam (UUM - Universiti Utara Malaysia)
    Abstract: Purpose This study investigates the impact of traditional marketing analytics and big data analytics on the success of a new product. Moreover, it assesses the mediating effects of the quality of big data system. Design/methodology/approach This study is based on primary data that were collected through an online questionnaire survey from large manufacturing firms operating in UAE. Out of total distributed 421 samples, 327 samples were used for final data analysis. The survey was conducted from March–April 2020, and data analysis was done via Structural Equation Modelling (SEM-PLS). Findings It emerges that big data analysis (BDA), traditional marketing analysis (TMA) and big data system quality (BDSQ) are significant determinants of new product development (NPD) success. Meanwhile, the BDA and TMA significantly affect the BDSQ. Results of the mediating role of BDSQ in the relationship between the BDA and NPD, as well as TMA and NPD, are significant. Practical implications There are significant policy implications for practitioners and researchers concerning the role of analytics, particularly big data analytics and big data system quality, when attempting to achieve success in developing new products. Originality/value This is an original study based on primary data from UAE.
    Keywords: Big data,Big data analytics,Organizational performance,Manufacturing industry,Ambidexterity,Business value of big data,UAE
    Date: 2021
  8. By: Levy, Daniel; Mayer, Tamir; Raviv, Alon
    Abstract: We study the economics and finance scholars’ reaction to the 2008 financial crisis using machine learning language analyses methods of Latent Dirichlet Allocation and dynamic topic modelling algorithms, to analyze the texts of 14,270 NBER working papers covering the 1999–2016 period. We find that academic scholars as a group were insufficiently engaged in crises’ studies before 2008. As the crisis unraveled, however, they switched their focus to studying the crisis, its causes, and consequences. Thus, the scholars were “slow-to-see,” but they were “fast-to-act.” Their initial response to the ongoing Covid-19 crisis is consistent with these conclusions.
    Keywords: 2008 Financial Crisis; Financial Crises; Economic Crisis; Great Recession; Textual Analysis; LDA Topic Modeling; Dynamic Topic Modeling; Machine Learning; Securitization; Repo; Sudden Stop
    JEL: A11 C38 C55 E32 E44 E52 E58 F30 G01 G20 G21 G28
    Date: 2022–02–13
  9. By: Jan Balaguer; Raphael Koster; Ari Weinstein; Lucy Campbell-Gillingham; Christopher Summerfield; Matthew Botvinick; Andrea Tacchetti
    Abstract: Artificial learning agents are mediating a larger and larger number of interactions among humans, firms, and organizations, and the intersection between mechanism design and machine learning has been heavily investigated in recent years. However, mechanism design methods make strong assumptions on how participants behave (e.g. rationality), or on the kind of knowledge designers have access to a priori (e.g. access to strong baseline mechanisms). Here we introduce HCMD-zero, a general purpose method to construct mechanism agents. HCMD-zero learns by mediating interactions among participants, while remaining engaged in an electoral contest with copies of itself, thereby accessing direct feedback from participants. Our results on the Public Investment Game, a stylized resource allocation game that highlights the tension between productivity, equality and the temptation to free-ride, show that HCMD-zero produces competitive mechanism agents that are consistently preferred by human participants over baseline alternatives, and does so automatically, without requiring human knowledge, and by using human data sparingly and effectively Our detailed analysis shows HCMD-zero elicits consistent improvements over the course of training, and that it results in a mechanism with an interpretable and intuitive policy.
    Date: 2022–02
  10. By: Laureti, Lucio; Costantiello, Alberto; Matarrese, Marco Maria; Leogrande, Angelo
    Abstract: The determinants of the presence of “Foreign Doctorate Students” among 36 European Countries for the period 2010-2019 are analyzed in this article. Panel Data with Fixed Effects, Random Effects, WLS, Pooled OLS, and Dynamic Panel are used to investigate the data. We found that the presence of Foreign Doctorate Students is positively associated to “Attractive Research Systems”, “Finance and Support”, “Rule of Law”, “Sales Impacts”, “New Doctorate Graduates”, “Basic School Entrepreneurial Education and Training”, “Tertiary Education” and negatively associated to “Innovative Sales Share”, “Innovation Friendly Environment”, “Linkages”, “Trademark Applications”, “Government Procurement of Advanced Technology Products”, “R&D Expenditure Public Sectors”. A cluster analysis was then carried out through the application of the unsupervised k-Means algorithm optimized using the Silhouette coefficient with the identification of 5 clusters. Finally, eight different machine learning algorithms were used to predict the value of the "Foreign Doctorate Students" variable. The results show that the best predictor algorithm is the "Tree Ensemble Regression" with a predicted value growing at a rate of 114.03%.
    Keywords: Innovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation.
    JEL: O30 O31 O32 O33 O34
    Date: 2022–02–11
  11. By: Fred Espen Benth; Nils Detering; Luca Galimberti
    Abstract: We propose a new methodology for pricing options on flow forwards by applying infinite-dimensional neural networks. We recast the pricing problem as an optimization problem in a Hilbert space of real-valued function on the positive real line, which is the state space for the term structure dynamics. This optimization problem is solved by facilitating a novel feedforward neural network architecture designed for approximating continuous functions on the state space. The proposed neural net is built upon the basis of the Hilbert space. We provide an extensive case study that shows excellent numerical efficiency, with superior performance over that of a classical neural net trained on sampling the term structure curves.
    Date: 2022–02
  12. By: Resce, Giuliano
    Abstract: This paper investigates the impact of non-political administrators on the financial management of local governments. The activity of prefectorial officials is compared with the activity of elected mayors exploiting data extracted from a panel of 7826 Italian municipalities from 2007 to 2018. To address the potential confounding effects and selection biases, we combine a Difference in Difference strategy with machine learning methods for counterfactual analysis. Results show that non-political administrators bring higher financial autonomy and higher collection capacity, raising more revenues at local level. This is consistent with the hypothesis that, since they do not respond to electoral incentives, non-political administrators have lower motivations to behave strategically, not taking their own interests about electoral successes into account when they have to choose the proportion of local versus external revenues for financing local expenditure.
    Keywords: Local Government, Electoral Incentives, Accountability
    JEL: D7 H2 H77
    Date: 2022–03–16
  13. By: Erhan Bayraktar; Tao Chen
    Abstract: We consider a discrete time stochastic Markovian control problem under model uncertainty. Such uncertainty not only comes from the fact that the true probability law of the underlying stochastic process is unknown, but the parametric family of probability distributions which the true law belongs to is also unknown. We propose a nonparametric adaptive robust control methodology to deal with such problem. Our approach hinges on the following building concepts: first, using the adaptive robust paradigm to incorporate online learning and uncertainty reduction into the robust control problem; second, learning the unknown probability law through the empirical distribution, and representing uncertainty reduction in terms of a sequence of Wasserstein balls around the empirical distribution; third, using Lagrangian duality to convert the optimization over Wasserstein balls to a scalar optimization problem, and adopting a machine learning technique to achieve efficient computation of the optimal control. We illustrate our methodology by considering a utility maximization problem. Numerical comparisons show that the nonparametric adaptive robust control approach is preferable to the traditional robust frameworks.
    Date: 2022–02
  14. By: Zexuan Yin; Paolo Barucca
    Abstract: We propose Neural GARCH, a class of methods to model conditional heteroskedasticity in financial time series. Neural GARCH is a neural network adaptation of the GARCH 1,1 model in the univariate case, and the diagonal BEKK 1,1 model in the multivariate case. We allow the coefficients of a GARCH model to be time varying in order to reflect the constantly changing dynamics of financial markets. The time varying coefficients are parameterised by a recurrent neural network that is trained with stochastic gradient variational Bayes. We propose two variants of our model, one with normal innovations and the other with Students t innovations. We test our models on a wide range of univariate and multivariate financial time series, and we find that the Neural Students t model consistently outperforms the others.
    Date: 2022–02
  15. By: Claudio Vitari (CERGAM - Centre d'Études et de Recherche en Gestion d'Aix-Marseille - UTLN - Université de Toulon - AMU - Aix Marseille Université); E. Raguseo
    Date: 2021
  16. By: Ryo Aruga (Associate Director and Economist, Institute for Monetary and Economic Studies, Bank of Japan (E-mail:; Keiichi Goshima (Assistant Professor, School of Commerce, Waseda University, and Economist, Institute for Monetary and Economic Studies, Bank of Japan (currently, UTokyo Economic Consulting and Adjunct Researcher, Research Institute of Business Administration, Waseda University, E-mail:; Takashi Chiba (Economist, Institute for Monetary and Economic Studies, Bank of Japan (currently, Sumitomo Mitsui Banking Corporation, E-mail:
    Abstract: This paper empirically examines the relationship between CO2 emissions and corporate performance in terms of long-term performance, short-term performance, and cost of capital, using available firm-level data in the First Section of the Tokyo Stock Exchange from FY2011 to FY2019. To address potential biases in previous empirical studies, we employ double machine learning, which is one of the semiparametric models introduced by Chernozhukov et al. [2018], for our empirical analysis. We find that corporations with lower CO2 emissions have (i) better long-term corporate performance and (ii) lower cost of equity. These results suggest that investors estimate that corporations with lower CO2 emissions have lower business risks, setting their risk premium to be low, which results in higher market value of such corporations. In addition, our analysis indicates that corporations with lower CO2 emissions have higher short-term performance and lower cost of debt, but also shows that the results of previous studies of these relationships may contain biases and should be evaluated with caution.
    Keywords: CO2 Emissions, Corporate Performance, Double Machine Learning
    JEL: G30 M14 Q54
    Date: 2022–02
  17. By: Fengshi Niu; Harsha Nori; Brian Quistorff; Rich Caruana; Donald Ngwe; Aadharsh Kannan
    Abstract: Estimating heterogeneous treatment effects in domains such as healthcare or social science often involves sensitive data where protecting privacy is important. We introduce a general meta-algorithm for estimating conditional average treatment effects (CATE) with differential privacy (DP) guarantees. Our meta-algorithm can work with simple, single-stage CATE estimators such as S-learner and more complex multi-stage estimators such as DR and R-learner. We perform a tight privacy analysis by taking advantage of sample splitting in our meta-algorithm and the parallel composition property of differential privacy. In this paper, we implement our approach using DP-EBMs as the base learner. DP-EBMs are interpretable, high-accuracy models with privacy guarantees, which allow us to directly observe the impact of DP noise on the learned causal model. Our experiments show that multi-stage CATE estimators incur larger accuracy loss than single-stage CATE or ATE estimators and that most of the accuracy loss from differential privacy is due to an increase in variance, not biased estimates of treatment effects.
    Date: 2022–02
  18. By: Annaka, Susumu; Hara, Taketo
    Abstract: This paper examines using text analysis what attitude the Japanese news media took toward the 2020 Tokyo Olympic Games. It explores whether there is a difference in tones among the Japanese major newspaper articles. The newspapers analyzed here are the morning editions of Asahi Shimbun, Mainichi Shimbun, and Yomiuri Shimbun, three top-selling newspapers in Japan. This study utilizes the list of semantic orientations of words to capture the attitudes of the newspapers toward the game. It found that liberal papers regarded as negative to holding the Olympics do not consistently show negative attitudes toward the game. There was a clear difference in attitudes among newspapers only in sports articles after the Olympics began. This article is the first research on the relationship between the Japanese news media and the 2020 Tokyo Olympics.
    Date: 2021–10–12
  19. By: John Gibson (University of Waikato); Yi Jiang (Asian Development Bank); Bambang Susantono (Asian Development Bank)
    Abstract: There is increasing interest in assessing whether growth of big cities has effects that differ from effects of growth of secondary towns, especially for impacts on poverty. It can be difficult to study these issues with typical sub-national economic data for administrative units because urban growth often occurs outside of the administrative boundaries of cities. An emerging literature therefore uses remote sensing to measure patterns of urban growth without being restricted by limitations of data for administrative areas. We add to this literature by combining remote sensing data on night-time lights for 41 big cities and 497 districts in Indonesia with annual poverty estimates from socio-economic surveys, using spatial econometric models to examine effects of urban growth on poverty during 2011-19. We measure growth on both the extensive (lit area) and intensive (brightness within lit area) margins, and distinguish between growth of big cities and of secondary towns. The extensive margin growth of secondary towns is associated with lower rates of poverty but there is no similar effect for growth of big cities. The productivity advantages of big cities and concerns about agricultural land loss to expanding towns and cities may imply that urban growth patterns favouring big cities are warranted, while on the other hand these new results suggest, from a poverty reduction point of view, that policies to favour secondary towns may be warranted. Policymakers in countries like Indonesia therefore face difficult trade-offs when developing their urbanization strategies.
    Keywords: big cities; night-time lights; poverty; secondary towns;Indonesia
    JEL: O15 R12
    Date: 2022–02–28

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.