nep-big New Economics Papers
on Big Data
Issue of 2021‒02‒01
29 papers chosen by
Tom Coupé
University of Canterbury

  1. The Value Added of Machine Learning to Causal Inference: Evidence from Revisited Studies By Anna Baiardi; Andrea A. Naghi
  2. Big Data on Vessel Traffic: Nowcasting Trade Flows in Real Time By Serkan Arslanalp; Marco Marini; Patrizia Tumbarello
  3. PAggregation of Outputs and Inputs for DEA Analysis of Hospital Efficiency: Economics, Operations Research and Data Science Perspectives By Bao Hoang Nguyen; Valentin Zelenyuk
  4. Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem By Mochen Yang; Edward McFowland III; Gordon Burtch; Gediminas Adomavicius
  5. Sequential Deep Learning for Credit Risk Monitoring with Tabular Financial Data By Jillian M. Clements; Di Xu; Nooshin Yousefi; Dmitry Efimov
  6. Development of cloud, digital technologies and the introduction of chip technologies By Ali R. Baghirzade
  7. Trade sentiment and the stock market: new evidence based on big data textual analysis of Chinese media By Marlene Amstad; Leonardo Gambacorta; Chao He; Dora Xia
  8. Completing the Market: Generating Shadow CDS Spreads by Machine Learning By Nan Hu; Jian Li; Alexis Meyer-Cirkel
  9. Using Payments Data to Nowcast Macroeconomic Variables During the Onset of COVID-19 By James Chapman; Ajit Desai
  10. Machine Learning and Perceived Age Stereotypes in Job Ads: Evidence from an Experiment By Ian Burn; Daniel Firoozi; Daniel Ladd; David Neumark
  11. Deep reinforcement learning for portfolio management By Gang Huang; Xiaohua Zhou; Qingyang Song
  12. Long-term prediction intervals with many covariates By Sayar Karmakar; Marek Chudy; Wei Biao Wu
  13. Algorithms for Learning Graphs in Financial Markets By Jos\'e Vin\'icius de Miranda Cardoso; Jiaxi Ying; Daniel Perez Palomar
  14. Adversarial Estimation of Riesz Representers By Victor Chernozhukov; Whitney Newey; Rahul Singh; Vasilis Syrgkanis
  15. Alternative Methods for Studying Consumer Payment Choice By Oz Shy
  16. Real-time Inflation Forecasting Using Non-linear Dimension Reduction Techniques By Niko Hauzenberger; Florian Huber; Karin Klieber
  17. Towards robust and speculation-reduction real estate pricing models based on a data-driven strategy By Vladimir Vargas-Calder\'on; Jorge E. Camargo
  18. In brief...Tackling domestic violence using machine learning By Jeffrey Grogger; Ria Ivandic; Tom Kirchmaier
  19. Identification of inferential parameters in the covariate-normalized linear conditional logit model By Philip Erickson
  20. Parenting Types By Rauh, C.; Renée, L.
  21. Whose Advice Counts More – Man or Machine? An Experimental Investigation of AI-based Advice Utilization By Mesbah, Neda; Tauchert, Christoph; Buxmann, Peter
  22. The Overlooked Insights from Correlation Structures in Economic Geography By Matias Nehuen Iglesias; ;
  23. Data (r)evolution - The economics of algorithmic search and recommender services By Budzinski, Oliver; Gänßle, Sophia; Lindstädt-Dreusicke, Nadine
  24. Labour and technology at the time of Covid-19. Can artificial intelligence mitigate the need for proximity By Carbonero, Francesco; Scicchitano, Sergio
  25. Assessing the Importance of an Attribute in a Demand SystemStructural Model versus Machine Learning By Badruddoza, Syed; Amin, Modhurima; McCluskey, Jill
  26. Estimating the Impact of Weather on Agriculture By Jeffrey D. Michler; Anna Josephson; Talip Kilic; Siobhan Murray
  27. Measuring national happiness with music By Benetos, Emmanouil; Ragano, Alessandro; Sgroi, Daniel; Tuckwell, Anthony
  28. Modernizing Official Statistics with Big Data: A Case on PODES By Chaikal Nuryakin; Nandaru Annabil Gumelar; Muhammad Dhiya Ul-Haq; Riefhano Patonangi; Andhika Putra Pratama
  29. Robots, AI, and Related Technologies: A Mapping of the New Knowledge Base By Enrico Santarelli; Jacopo Staccioli; Marco Vivarelli

  1. By: Anna Baiardi; Andrea A. Naghi
    Abstract: A new and rapidly growing econometric literature is making advances in the problem of using machine learning methods for causal inference questions. Yet, the empirical economics literature has not started to fully exploit the strengths of these modern methods. We revisit influential empirical studies with causal machine learning methods and identify several advantages of using these techniques. We show that these advantages and their implications are empirically relevant and that the use of these methods can improve the credibility of causal analysis.
    Date: 2021–01
  2. By: Serkan Arslanalp; Marco Marini; Patrizia Tumbarello
    Abstract: Vessel traffic data based on the Automatic Identification System (AIS) is a big data source for nowcasting trade activity in real time. Using Malta as a benchmark, we develop indicators of trade and maritime activity based on AIS-based port calls. We test the quality of these indicators by comparing them with official statistics on trade and maritime statistics. If the challenges associated with port call data are overcome through appropriate filtering techniques, we show that these emerging “big data” on vessel traffic could allow statistical agencies to complement existing data sources on trade and introduce new statistics that are more timely (real time), offering an innovative way to measure trade activity. That, in turn, could facilitate faster detection of turning points in economic activity. The approach could be extended to create a real-time worldwide indicator of global trade activity.
    Keywords: Trade balance;Trade in goods;Imports;Big data;Exports;WP,AIS data,trade statistics,cargo load indicator,trade activity,trade data
    Date: 2019–12–13
  3. By: Bao Hoang Nguyen (School of Economics and Centre for Efficiency and Productivity Analysis (CEPA) at The University of Queensland, Australia); Valentin Zelenyuk (School of Economics and Centre for Efficiency and Productivity Analysis (CEPA) at The University of Queensland, Australia)
    Abstract: Data envelopment analysis (DEA) has been widely recognised as a powerful tool for performance analysis over the last four decades. The application of DEA in empirical works, however, has become more challenging, especially in the modern era of big data, due to the so-called `curse of dimensionality'. Dimension reduction has been recently considered as a useful technique to deal with the `curse of dimensionality' in the context of DEA with large dimensions for inputs and outputs. In this study, we investigate the two most popular dimension reduction approaches: PCA-based aggregation and price-based aggregation for hospital efficiency analysis. Using data on public hospitals in Queensland, Australia, we find that the choice of price systems (with small variation in prices) does not significantly affect the DEA estimates under the price-based aggregation approach. Moreover, the estimated efficiency scores from DEA models are also robust with respect to the two different aggregation approaches.
    Keywords: Hospital efficiency, big wide data, DEA, PCA-based aggregation, price-based aggregation
    JEL: C24 C61 I11 I18
    Date: 2020–12
  4. By: Mochen Yang; Edward McFowland III; Gordon Burtch; Gediminas Adomavicius
    Abstract: Combining machine learning with econometric analysis is becoming increasingly prevalent in both research and practice. A common empirical strategy involves the application of predictive modeling techniques to 'mine' variables of interest from available data, followed by the inclusion of those variables into an econometric framework, with the objective of estimating causal effects. Recent work highlights that, because the predictions from machine learning models are inevitably imperfect, econometric analyses based on the predicted variables are likely to suffer from bias due to measurement error. We propose a novel approach to mitigate these biases, leveraging the ensemble learning technique known as the random forest. We propose employing random forest not just for prediction, but also for generating instrumental variables to address the measurement error embedded in the prediction. The random forest algorithm performs best when comprised of a set of trees that are individually accurate in their predictions, yet which also make 'different' mistakes, i.e., have weakly correlated prediction errors. A key observation is that these properties are closely related to the relevance and exclusion requirements of valid instrumental variables. We design a data-driven procedure to select tuples of individual trees from a random forest, in which one tree serves as the endogenous covariate and the other trees serve as its instruments. Simulation experiments demonstrate the efficacy of the proposed approach in mitigating estimation biases and its superior performance over three alternative methods for bias correction.
    Date: 2020–12
  5. By: Jillian M. Clements; Di Xu; Nooshin Yousefi; Dmitry Efimov
    Abstract: Machine learning plays an essential role in preventing financial losses in the banking industry. Perhaps the most pertinent prediction task that can result in billions of dollars in losses each year is the assessment of credit risk (i.e., the risk of default on debt). Today, much of the gains from machine learning to predict credit risk are driven by gradient boosted decision tree models. However, these gains begin to plateau without the addition of expensive new data sources or highly engineered features. In this paper, we present our attempts to create a novel approach to assessing credit risk using deep learning that does not rely on new model inputs. We propose a new credit card transaction sampling technique to use with deep recurrent and causal convolution-based neural networks that exploits long historical sequences of financial data without costly resource requirements. We show that our sequential deep learning approach using a temporal convolutional network outperformed the benchmark non-sequential tree-based model, achieving significant financial savings and earlier detection of credit risk. We also demonstrate the potential for our approach to be used in a production environment, where our sampling technique allows for sequences to be stored efficiently in memory and used for fast online learning and inference.
    Date: 2020–12
  6. By: Ali R. Baghirzade
    Abstract: Hardly any other area of research has recently attracted as much attention as machine learning (ML) through the rapid advances in artificial intelligence (AI). This publication provides a short introduction to practical concepts and methods of machine learning, problems and emerging research questions, as well as an overview of the participants, an overview of the application areas and the socio-economic framework conditions of the research. In expert circles, ML is used as a key technology for modern artificial intelligence techniques, which is why AI and ML are often used interchangeably, especially in an economic context. Machine learning and, in particular, deep learning (DL) opens up entirely new possibilities in automatic language processing, image analysis, medical diagnostics, process management and customer management. One of the important aspects in this article is chipization. Due to the rapid development of digitalization, the number of applications will continue to grow as digital technologies advance. In the future, machines will more and more provide results that are important for decision making. To this end, it is important to ensure the safety, reliability and sufficient traceability of automated decision-making processes from the technological side. At the same time, it is necessary to ensure that ML applications are compatible with legal issues such as responsibility and liability for algorithmic decisions, as well as technically feasible. Its formulation and regulatory implementation is an important and complex issue that requires an interdisciplinary approach. Last but not least, public acceptance is critical to the continued diffusion of machine learning processes in applications. This requires widespread public discussion and the involvement of various social groups.
    Date: 2020–12
  7. By: Marlene Amstad; Leonardo Gambacorta; Chao He; Dora Xia
    Abstract: Trade tensions between China and US have played an important role in swinging global stock markets but effects are difficult to quantify. We develop a novel trade sentiment index (TSI) based on textual analysis and machine learning applied on a big data pool that assesses the positive or negative tone of the Chinese media coverage, and evaluates its capacity to explain the behaviour of 60 global equity markets. We find the TSI to contribute around 10% of model capacity to explain the stock price variability from January 2018 to June 2019 in countries that are more exposed to the China-US value chain. Most of the contribution is given by the tone extracted from social media (9%), while that obtained from traditional media explains only a modest part of stock price variability (1%). No equity market benefits from the China-US trade war, and Asian markets tend to be more negatively affected. In particular, we find that sectors most affected by tariffs such as information technology related ones are particularly sensitive to the tone in trade tension.
    Keywords: stock returns, trade, sentiment, big data, neural network, machine learning
    JEL: F13 F14 G15 D80 C45 C55
    Date: 2021–01
  8. By: Nan Hu; Jian Li; Alexis Meyer-Cirkel
    Abstract: We compared the predictive performance of a series of machine learning and traditional methods for monthly CDS spreads, using firms’ accounting-based, market-based and macroeconomics variables for a time period of 2006 to 2016. We find that ensemble machine learning methods (Bagging, Gradient Boosting and Random Forest) strongly outperform other estimators, and Bagging particularly stands out in terms of accuracy. Traditional credit risk models using OLS techniques have the lowest out-of-sample prediction accuracy. The results suggest that the non-linear machine learning methods, especially the ensemble methods, add considerable value to existent credit risk prediction accuracy and enable CDS shadow pricing for companies missing those securities.
    Keywords: Credit default swap;Machine learning;Credit risk;Credit ratings;Stock markets;WP,firm,default probability,failure intensity,firm size proxy
    Date: 2019–12–27
  9. By: James Chapman; Ajit Desai
    Abstract: The COVID-19 pandemic and the resulting public health mitigation have caused large-scale economic disruptions globally. During this time, there is an increased need to predict the macroeconomy’s short-term dynamics to ensure the effective implementation of fiscal and monetary policy. However, economic prediction during a crisis is challenging because of the unprecedented economic impact, which increases the unreliability of traditionally used linear models that use lagged data. We help address these challenges by using timely retail payments system data in linear and nonlinear machine learning models. We find that compared to a benchmark, our model has a roughly 15 to 45% reduction in Root Mean Square Error when used for macroeconomic nowcasting during the global financial crisis. For nowcasting during the COVID-19 shock, our model predictions are much closer to the official estimates.
    Keywords: Econometric and statistical methods; Payment clearing and settlement systems
    JEL: C55 E52
    Date: 2021–01
  10. By: Ian Burn; Daniel Firoozi; Daniel Ladd; David Neumark
    Abstract: We explore whether ageist stereotypes in job ads are detectable using machine learning methods measuring the linguistic similarity of job-ad language to ageist stereotypes identified by industrial psychologists. We then conduct an experiment to evaluate whether this language is perceived as biased against older workers. We find that language classified by the machine learning algorithm as closely related to ageist stereotypes is perceived as ageist by experimental subjects. The scores assigned to the language related to ageist stereotypes are larger when responses are incentivized by rewarding participants for guessing how other respondents rated the language. These methods could potentially help enforce anti-discrimination laws by using job ads to predict or identify employers more likely to be engaging in age discrimination.
    JEL: J14 J71 K31
    Date: 2021–01
  11. By: Gang Huang; Xiaohua Zhou; Qingyang Song
    Abstract: The objective of this paper is to verify that current cutting-edge artificial intelligence technology, deep reinforcement learning, can be applied to portfolio management. We improve on the existing Deep Reinforcement Learning Portfolio model and make many innovations. Unlike many previous studies on discrete trading signals in portfolio management, we make the agent to short in a continuous action space, design an arbitrage mechanism based on Arbitrage Pricing Theory,and redesign the activation function for acquiring action vectors, in addition, we redesign neural networks for reinforcement learning with reference to deep neural networks that process image data. In experiments, we use our model in several randomly selected portfolios which include CSI300 that represents the market's rate of return and the randomly selected constituents of CSI500. The experimental results show that no matter what stocks we select in our portfolios, we can almost get a higher return than the market itself. That is to say, we can defeat market by using deep reinforcement learning.
    Date: 2020–12
  12. By: Sayar Karmakar; Marek Chudy; Wei Biao Wu
    Abstract: Accurate forecasting is one of the fundamental focus in the literature of econometric time-series. Often practitioners and policy makers want to predict outcomes of an entire time horizon in the future instead of just a single $k$-step ahead prediction. These series, apart from their own possible non-linear dependence, are often also influenced by many external predictors. In this paper, we construct prediction intervals of time-aggregated forecasts in a high-dimensional regression setting. Our approach is based on quantiles of residuals obtained by the popular LASSO routine. We allow for general heavy-tailed, long-memory, and nonlinear stationary error process and stochastic predictors. Through a series of systematically arranged consistency results we provide theoretical guarantees of our proposed quantile-based method in all of these scenarios. After validating our approach using simulations we also propose a novel bootstrap based method that can boost the coverage of the theoretical intervals. Finally analyzing the EPEX Spot data, we construct prediction intervals for hourly electricity prices over horizons spanning 17 weeks and contrast them to selected Bayesian and bootstrap interval forecasts.
    Date: 2020–12
  13. By: Jos\'e Vin\'icius de Miranda Cardoso; Jiaxi Ying; Daniel Perez Palomar
    Abstract: In the past two decades, the field of applied finance has tremendously benefited from graph theory. As a result, novel methods ranging from asset network estimation to hierarchical asset selection and portfolio allocation are now part of practitioners' toolboxes. In this paper, we investigate the fundamental problem of learning undirected graphical models under Laplacian structural constraints from the point of view of financial market times series data. In particular, we present natural justifications, supported by empirical evidence, for the usage of the Laplacian matrix as a model for the precision matrix of financial assets, while also establishing a direct link that reveals how Laplacian constraints are coupled to meaningful physical interpretations related to the market index factor and to conditional correlations between stocks. Those interpretations lead to a set of guidelines that practitioners should be aware of when estimating graphs in financial markets. In addition, we design numerical algorithms based on the alternating direction method of multipliers to learn undirected, weighted graphs that take into account stylized facts that are intrinsic to financial data such as heavy tails and modularity. We illustrate how to leverage the learned graphs into practical scenarios such as stock time series clustering and foreign exchange network estimation. The proposed graph learning algorithms outperform the state-of-the-art methods in an extensive set of practical experiments. Furthermore, we obtain theoretical and empirical convergence results for the proposed algorithms. Along with the developed methodologies for graph learning in financial markets, we release an R package, called fingraph, accommodating the code and data to obtain all the experimental results.
    Date: 2020–12
  14. By: Victor Chernozhukov; Whitney Newey; Rahul Singh; Vasilis Syrgkanis
    Abstract: We provide an adversarial approach to estimating Riesz representers of linear functionals within arbitrary function spaces. We prove oracle inequalities based on the localized Rademacher complexity of the function space used to approximate the Riesz representer and the approximation error. These inequalities imply fast finite sample mean-squared-error rates for many function spaces of interest, such as high-dimensional sparse linear functions, neural networks and reproducing kernel Hilbert spaces. Our approach offers a new way of estimating Riesz representers with a plethora of recently introduced machine learning techniques. We show how our estimator can be used in the context of de-biasing structural/causal parameters in semi-parametric models, for automated orthogonalization of moment equations and for estimating the stochastic discount factor in the context of asset pricing.
    Date: 2020–12
  15. By: Oz Shy
    Abstract: Using machine learning techniques applied to consumer diary survey data, the author of this working paper examines methods for studying consumer payment choice. These techniques, especially when paired with regression analyses, provide useful information for understanding and predicting the payment choices consumers make.
    Keywords: studying consumer payment choice; point of sale; statistical learning; machine learning
    JEL: C19 E42
    Date: 2020–06–23
  16. By: Niko Hauzenberger; Florian Huber; Karin Klieber
    Abstract: In this paper, we assess whether using non-linear dimension reduction techniques pays off for forecasting inflation in real-time. Several recent methods from the machine learning literature are adopted to map a large dimensional dataset into a lower dimensional set of latent factors. We model the relationship between inflation and these latent factors using state-of-the-art time-varying parameter (TVP) regressions with shrinkage priors. Using monthly real-time data for the US, our results suggest that adding such non-linearities yields forecasts that are on average highly competitive to ones obtained from methods using linear dimension reduction techniques. Zooming into model performance over time moreover reveals that controlling for non-linear relations in the data is of particular importance during recessionary episodes of the business cycle.
    Date: 2020–12
  17. By: Vladimir Vargas-Calder\'on; Jorge E. Camargo
    Abstract: In many countries, real estate appraisal is based on conventional methods that rely on appraisers' abilities to collect data, interpret it and model the price of a real estate property. With the increasing use of real estate online platforms and the large amount of information found therein, there exists the possibility of overcoming many drawbacks of conventional pricing models such as subjectivity, cost, unfairness, among others. In this paper we propose a data-driven real estate pricing model based on machine learning methods to estimate prices reducing human bias. We test the model with 178,865 flats listings from Bogot\'a, collected from 2016 to 2020. Results show that the proposed state-of-the-art model is robust and accurate in estimating real estate prices. This case study serves as an incentive for local governments from developing countries to discuss and build real estate pricing models based on large data sets that increases fairness for all the real estate market stakeholders and reduces price speculation.
    Date: 2020–11
  18. By: Jeffrey Grogger; Ria Ivandic; Tom Kirchmaier
    Abstract: Artificial intelligence could help to protect victims of domestic violence, according to research by Jeffrey Grogger, Ria Ivandic and Tom Kirchmaier
    Keywords: covid-19,domestic abuse,crime
    Date: 2020–07
  19. By: Philip Erickson
    Abstract: The conditional logit model is a standard workhorse approach to estimating customers' product feature preferences using choice data. Using these models at scale, however, can result in numerical imprecision and optimization failure due to a combination of large-valued covariates and the softmax probability function. Standard machine learning approaches alleviate these concerns by applying a normalization scheme to the matrix of covariates, scaling all values to sit within some interval (such as the unit simplex). While this type of normalization is innocuous when using models for prediction, it has the side effect of perturbing the estimated coefficients, which are necessary for researchers interested in inference. This paper shows that, for two common classes of normalizers, designated scaling and centered scaling, the data-generating non-scaled model parameters can be analytically recovered along with their asymptotic distributions. The paper also shows the numerical performance of the analytical results using an example of a scaling normalizer.
    Date: 2020–12
  20. By: Rauh, C.; Renée, L.
    Abstract: In this paper we measure parenting behavior through unsupervised machine learning in a panel following children from age 5 to 29 months. The algorithm classifies parents into two distinct behavioral types: "active" and "laissez-faire". Parents of the active type tend to respond to their children's expressions and describe to children features of their environment, while parents of the laissez-faire type are less likely to engage with their children. We find that parents' types are persistent over time and are systematically related to socio-economic characteristics. More-over, children of active parents see their human capital improve relative to children of parents of the laissez-faire type.
    Keywords: Parenting styles, human capital, latent Dirichlet allocation, inequality, machine learning
    Date: 2021–01–22
  21. By: Mesbah, Neda; Tauchert, Christoph; Buxmann, Peter
    Date: 2021–01–05
  22. By: Matias Nehuen Iglesias; ;
    Abstract: Measures of cooccurrence computed from cross sectional data are used to rationalize connections among economic activities. In this work we show the grounds for unifying a multiplicity of similarity techniques applied in the literature and we precise the identification of cooccurrence to actual coexistence in space, when one side of the cross section are small administrative areas. All the similarity techniques studied here are akin to a correlation structure computed from spatial intensity, also known as locational correlation. We argue that these correlations offer objective tools to detect spatial patterns. Indeed we show that when applied to data of employment by industry and county in United States (from 2002-7) the communities of networks derived from locational correlations detect spatial patterns long acknowledged in economic geography. By addressing critical open issues on the interpretation of cooccurrence indices, this work o↵ers technical guides for their exploitation in Economic Geography studies.
    Keywords: Economic geography, co-location, spatial analysis, areal data, point data, correlation structures, distribution of economic activities
    Date: 2021–01
  23. By: Budzinski, Oliver; Gänßle, Sophia; Lindstädt-Dreusicke, Nadine
    Abstract: The paper analyses the economics behind algorithmic search and recommender services, based upon personalized user data. Such services play a paramount role for online services such as marketplaces (e.g. Amazon), audio streaming (e.g. Spotify), video streaming (e.g. Netflix, YouTube), app stores, social networks (e.g. Instagram, Tik Tok, Facebook, Twitter) and many more. We start with a systematic analysis of search and recommendation services as a commercial good, highlighting the changes to these services by the systematic use of algorithms. Then we discuss benefits and risk for welfare arising from the widespread employment of algorithmic search and recommendation systems. In doing so, we summarize the existing economics literature and go beyond its insights, including highlighting further research desires. Eventually, we derive regulatory and managerial implications drawing on the current state of academic knowledge.
    Keywords: algorithmic search and recommender services,data economics,media economics,internet economics,digital economy,cultural economics,competition,antitrust,industry regulation,digital business ecosystems
    JEL: L86 L40 K21 K23 K24 L13 L51 L82 M21 Z10
    Date: 2021
  24. By: Carbonero, Francesco; Scicchitano, Sergio
    Abstract: Social distancing has become worldwide the key public policy to be implemented during the COVID-19 epidemic and reducing the degree of proximity among workers turned out to be an important dimension. An emerging literature looks at the role of automation in supporting the work of humans but the potential of Artificial Intelligence (AI) to influence the need for physical proximity on the workplace has been left largely unexplored. By using a unique and innovative dataset that combines data on advancements of AI at the occupational level with information on the required proximity in the job-place and administrative employer-employee data on job flows, our results show that AI and proximity stand in an inverse U-shape relationship at the sectoral level, with high advancements in AI that are negatively associated with proximity. We detect this pattern among sectors that were closed due to the lockdown measures as well as among sectors that remained open. We argue that, apart from the expected gains in productivity and competitiveness, preserving jobs and economic activities in a situation of high contagion may be the additional benefits of a policy favouring digitization.
    Keywords: artificial intelligence,automation,covid19,proximity
    Date: 2021
  25. By: Badruddoza, Syed (Washington State University); Amin, Modhurima (Washington State University); McCluskey, Jill (Washington State University)
    Abstract: Firms can prioritize among the product attributes based on consumer valuations using market-level data. However, a structural estimation of market demand is challenging, especially when the data are updating in real-time and instrumental variables are scarce. We find evidence that Random Forests (RF)—a machine-learning algorithm—can detect consumers’ sensitivity to product attributes similar to the structural framework of Berry-Levinsohn-Pakes (BLP). Sensitivity to an attribute is measured by the absolute value of its coefficient. We check the RF’s capacity to rank the attributes when prices are endogenous, coefficients are random, and instrumental or demographic variables are unavailable. In our simulations, the BLP estimates correlate with the RF importance factor in ranking (68%) and magnitude (79%), and the rates increase with the sample size. Consumer sensitivity to endogenous variables (price) and variables with random coefficients are overestimated by the RF approach, but ranking of variables with non-random coefficients match with BLP’s coefficients in 96% cases. These estimates are pessimistically derived by RF without parameter-tuning. We conclude that machine-learning does not replace the structural framework but provides firms with a sensible idea of consumers’ ranking of product attributes.
    Keywords: Machine-Learning; Random Forests; Demand Estimation; BLP; Discrete Choice.
    JEL: C55 D11 Q11
    Date: 2019–12–04
  26. By: Jeffrey D. Michler; Anna Josephson; Talip Kilic; Siobhan Murray
    Abstract: We quantify the significance and magnitude of the effect of measurement error in satellite weather data on modeling agricultural production, agricultural productivity, and resilience outcomes. In order to provide rigor to our approach, we combine geospatial weather data from a variety of satellite sources with the geo-referenced household survey data from four sub-Saharan African countries that are part of the World Bank Living Standards Measurement Study - Integrated Surveys on Agriculture (LSMS-ISA) initiative. Our goal is to provide systematic evidence on which weather metrics have strong predictive power over a large set of crops and countries and which metrics are only useful in highly specific settings.
    Date: 2020–12
  27. By: Benetos, Emmanouil (Queen Mary University of London and The Alan Turing Institute.); Ragano, Alessandro (University College Dublin.); Sgroi, Daniel (University of Warwick, ESRC CAGE Centre and IZA Bonn.); Tuckwell, Anthony (University of Warwick and ESRC CAGE Centre.)
    Abstract: We propose a new measure for national happiness based on the emotional content of a country’s most popular songs. Using machine learning to detect the valence of the UK’s chart-topping song of each year since the 1970s, we find that it reliably predicts the leading survey-based measure of life satisfaction. Moreover, we find that music valence is better able to predict life satisfaction than a recently-proposed measure of happiness based on the valence of words in books (Hills et al., 2019). Our results have implications for the role of music in society, and at the same time validate a new use of music as a measure of public sentiment. JEL codes: N30, Z11, Z13
    Keywords: subjective wellbeing ; life satisfaction ; national happiness ; music information ; retrieval, machine learning. JEL Classification: N30 ; Z11 ; Z13
    Date: 2021
  28. By: Chaikal Nuryakin (Lembaga Penyelidikan Ekonomi dan Masyarakat (LPEM), Fakultas Ekonomi dan Bisnis, Universitas Indonesia); Nandaru Annabil Gumelar; Muhammad Dhiya Ul-Haq; Riefhano Patonangi; Andhika Putra Pratama
    Abstract: This study serves as an example of how Indonesia can improve its official data by using big data. In this case, we compare village potential data (PODES) published by Statistics Indonesia (BPS) with Google Places API data and ministerial data. We use the number of hospitals, high schools, and public health centers within Jakarta province as the variables. The result shows that despite counting for the same thing, there are discrepancies between all three sources with a varying margin for each variable. We discuss our findings and give suggestions in the hope of improving official data in Indonesia, which could be helped by utilizing big data as this study exemplified.
    Keywords: official statistics — big data — official data
    JEL: C8 D8 E2 R1
    Date: 2020
  29. By: Enrico Santarelli (Department of Economics, University of Bologna – Department of Economics and Management, University of Luxembourg); Jacopo Staccioli (Dipartimento di Politica Economica, DISCE, Università Cattolica del Sacro Cuore – Institute of Economics, Scuola Superiore Sant’Anna, Pisa); Marco Vivarelli (Dipartimento di Politica Economica, DISCE, Università Cattolica del Sacro Cuore – UNU-MERIT, Maastricht, The Netherlands – IZA, Bonn, Germany)
    Abstract: Using the entire population of USPTO patent applications published between 2002 and 2019, and leveraging on both patent classification and semantic analysis, this papers aims to map the current knowledge base centred on robotics and AI technologies. These technologies will be investigated both as a whole and distinguishing core and related innovations, along a 4-level core-periphery architecture. Merging patent applications with the Orbis IP firm-level database will allow us to put forward a threefold analysis based on industry of activity, geographic location, and firm productivity. In a nutshell, results show that: (i) rather than representing a technological revolution, the new knowledge base is strictly linked to the previous technological paradigm; (ii) the new knowledge base is characterised by a considerable – but not impressively widespread – degree of pervasiveness; (iii) robotics and AI are strictly related, converging (particularly among the related technologies) and jointly shaping a new knowledge base that should be considered as a whole, rather than consisting of two separate GPTs; (iv) the U.S. technological leadership turns out to be confirmed.
    Keywords: Robotics, Artificial Intelligence, General Purpose Technology, Technological Paradigm, Industry 4.0, Patents full-text
    JEL: O33
    Date: 2021–01

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.