nep-big New Economics Papers
on Big Data
Issue of 2023‒04‒10
twenty-two papers chosen by
Tom Coupé
University of Canterbury

  1. How will Language Modelers like ChatGPT Affect Occupations and Industries? By Ed Felten; Manav Raj; Robert Seamans
  2. DSE Stock Price Prediction using Hidden Markov Model By Raihan Tanvir; Md Tanvir Rouf Shawon; Md. Golam Rabiul Alam
  3. The Regulatory Quality and ESG Model at World Level By Costantiello, Alberto; Leogrande, Angelo
  4. Artificial Intelligence and the Economics of Decision-Making By Naudé, Wim
  5. Biased auctioneers By Aubry, Mathieu; Kräussl, Roman; Manso, Gustavo; Spaenjers, Christophe
  6. Does Environmental Policy Uncertainty Hinder Investments Towards a Low-Carbon Economy? By Joëlle Noailly ; Laura Nowzohour; Matthias van den Heuvel
  7. From International to Regional Commodity Price Pass-through Using Self-Driven Recurrent Networks By Ramos; Pablo Negri; Martín Breitkopf; María Laura Ojeda
  8. Machine Learning as a Tool for Hypothesis Generation By Jens Ludwig; Sendhil Mullainathan
  9. Regulation relevant to (long-form) audio recordings gathered in Namibia By Leon, Mathilde; Cristia, Alejandrina
  10. Artificial intelligence and unemployment: New insights By Mihai Mutascu
  11. Examining Competitive Strategy of Manufacturing Firms in Digital Evolution with Platform Theory (Japanese) By MOTOHASHI Kazuyuki
  12. A portrait of AI adopters across countries: Firm characteristics, assets’ complementarities and productivity By Flavio Calvino; Luca Fontanelli
  13. A Deep Reinforcement Learning Trader without Offline Training By Boian Lazov
  14. Identification-robust inference for the LATE with high-dimensional covariates By Yukun Ma
  15. Understanding Model Complexity for temporal tabular and multi-variate time series, case study with Numerai data science tournament By Thomas Wong; Prof. Mauricio Barahona
  16. Not lost in translation: The implications of machine translation technologies for language professionals and for broader society By Francesca Borgonovi; Justine Hervé; Helke Seitz
  17. Global High-Resolution Estimates of the United Nations Human Development Index Using Satellite Imagery and Machine-learning By Luke Sherman; Jonathan Proctor; Hannah Druckenmiller; Heriberto Tapia; Solomon M. Hsiang
  18. Using machine learning to monitor the equity of large-scale policy interventions: The Dutch decentralisation of the Social Domain By Verhagen, Mark D.
  19. Regulatory costs and market power By Singla, Shikhar
  20. Determinants of Heat Risk in an Aging Population: A Machine Learning Approach By Klauber, Hannah; Koch, Nicolas
  21. Artificial intelligence in science: An emerging general method of invention By Stefano Bianchini; Moritz Müller; Pierre Pelletier
  22. Superhuman Artificial Intelligence Can Improve Human Decision Making by Increasing Novelty By Minkyu Shin; Jin Kim; Bas van Opheusden; Thomas L. Griffiths

  1. By: Ed Felten; Manav Raj; Robert Seamans
    Abstract: Recent dramatic increases in AI language modeling capabilities has led to many questions about the effect of these technologies on the economy. In this paper we present a methodology to systematically assess the extent to which occupations, industries and geographies are exposed to advances in AI language modeling capabilities. We find that the top occupations exposed to language modeling include telemarketers and a variety of post-secondary teachers such as English language and literature, foreign language and literature, and history teachers. We find the top industries exposed to advances in language modeling are legal services and securities, commodities, and investments.
    Date: 2023–03
  2. By: Raihan Tanvir; Md Tanvir Rouf Shawon; Md. Golam Rabiul Alam
    Abstract: Stock market forecasting is a classic problem that has been thoroughly investigated using machine learning and artificial neural network based tools and techniques. Interesting aspects of this problem include its time reliance as well as its volatility and other complex relationships. To combine them, hidden markov models (HMMs) have been utilized to anticipate the price of stocks. We demonstrated the Maximum A Posteriori (MAP) HMM method for predicting stock prices for the next day based on previous data. An HMM is trained by analyzing the fractional change in the stock price as well as the intraday high and low values. It is then utilized to produce a MAP estimate across all possible stock prices for the next day. The approach demonstrated in our work is quite generalized and can be used to predict the stock price for any company, given that the HMM is trained on the dataset of that company's stocks dataset. We evaluated the accuracy of our models using some extensively used accuracy metrics for regression problems and came up with a satisfactory outcome.
    Date: 2023–01
  3. By: Costantiello, Alberto; Leogrande, Angelo
    Abstract: In this article, we analyse the determinants of Regulatory Quality-RQ for 193 countries in the period 2011-2020. We use a database from ESG-Environment Social Governance of the World Bank. We apply OLS, Panel Data with Fixed Effects and Panel Data with Random Effects. We found that the variables that have the most positive impact on RQ, among others, are “GHG Net Emission”, “Mean Drought Index”, and “Heat Index”. We also found that the variables that have the most negative impact on RQ are among others “Renewable Energy Consumption”, “Voice and Accountability” and “Rule of Law”. Furthermore, we have applied the k-Means algorithm optimized with the Elbow Method and we find the presence of five clusters. In adjunct, we confront eight machine learning algorithms to predict the value of RQ and we found that the best predictor is Polynomial Regression. The predictive level of RQ for the analysed countries is expected to diminish of -1, 29%. In the end, we present a network analysis with the Euclidean distance and we found the presence of a structure of seven networks using augmented data.
    Keywords: Analysis of Collective Decision-Making, General, Political Processes: Rent-Seeking, Lobbying, Elections, Legislatures, and Voting Behaviour, Bureaucracy, Administrative Processes in Public Organizations, Corruption, Positive Analysis of Policy Formulation, Implementation.
    JEL: D7 D70 D72 D73 D78
    Date: 2023–03–14
  4. By: Naudé, Wim (RWTH Aachen University)
    Abstract: Artificial Intelligence (AI) scientists are challenged to create intelligent, autonomous agents that can make rational decisions. In this challenge, they confront two questions: what decision theory to follow and how to implement it in AI systems. This paper provides answers to these questions and makes three contributions. The first is to discuss how economic decision theory – Expected Utility Theory (EUT) – can help AI systems with utility functions to deal with the problem of instrumental goals, the possibility of utility function instability, and coordination challenges in multi-actor and human-agent collectives settings. The second contribution is to show that using EUT restricts AI systems to narrow applications, which are "small worlds" where concerns about AI alignment may lose urgency and be better labelled as safety issues. This papers third contribution points to several areas where economists may learn from AI scientists as they implement EUT. These include consideration of procedural rationality, overcoming computational difficulties, and understanding decision-making in disequilibrium situations.
    Keywords: economics, artificial intelligence, expected utility theory, decision-theory
    JEL: D01 C60 C45 O33
    Date: 2023–03
  5. By: Aubry, Mathieu; Kräussl, Roman; Manso, Gustavo; Spaenjers, Christophe
    Abstract: We construct a neural network algorithm that generates price predictions for art at auction, relying on both visual and non-visual object characteristics. We find that higher automated valuations relative to auction house pre-sale estimates are associated with substantially higher price-to-estimate ratios and lower buy-in rates, pointing to estimates' informational inefficiency. The relative contribution of machine learning is higher for artists with less dispersed and lower average prices. Furthermore, we show that auctioneers' prediction errors are persistent both at the artist and at the auction house level, and hence directly predictable themselves using information on past errors.
    Keywords: art, auctions, experts, asset valuation, biases, machine learning, computer vision
    JEL: C50 D44 G12 Z11
    Date: 2023
  6. By: Joëlle Noailly ; Laura Nowzohour; Matthias van den Heuvel
    Abstract: We use machine learning algorithms to construct a novel news-based index of US environmental and climate policy uncertainty (EnvPU) available on a monthly basis over the 1990-2019 period. We find that our EnvPU index spikes during the environmental spending disputes of the 1995-1996 government shutdown, in the early 2010s due the failure of the national cap-and-trade climate bill and during the Trump presidency. We examine how elevated levels of environmental policy uncertainty relate to investments in the low-carbon economy. In firm-level estimations, we find that a rise in the EnvPU index is associated with a reduced probability for cleantech startups to receive venture capital (VC) funding. In financial markets, a rise in our EnvPU index is associated with higher stock volatility for firms with above-average green revenue shares. At the macro level, shocks in our index lead to declines in the number of cleantech VC deals and higher volatility of the main benchmark clean energy exchange-traded fund. Overall, our results are consistent with the notion that policy uncertainty has adverse effects on investments for the low-carbon economy.
    Date: 2022–09–05
  7. By: Ramos; Pablo Negri; Martín Breitkopf; María Laura Ojeda
    Keywords: Recurrent Neural Networks, Regional Commodities Prices, Shock Simulations
    JEL: C45 Q11
    Date: 2021–11
  8. By: Jens Ludwig; Sendhil Mullainathan
    Abstract: While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not. We illustrate the procedure with a concrete application: judge decisions about who to jail. We begin with a striking fact: The defendant’s face alone matters greatly for the judge’s jailing decision. In fact, an algorithm given only the pixels in the defendant’s mugshot accounts for up to half of the predictable variation. We develop a procedure that allows human subjects to interact with this black-box algorithm to produce hypotheses about what in the face influences judge decisions. The procedure generates hypotheses that are both interpretable and novel: They are not explained by demographics (e.g. race) or existing psychology research; nor are they already known (even if tacitly) to people or even experts. Though these results are specific, our procedure is general. It provides a way to produce novel, interpretable hypotheses from any high-dimensional dataset (e.g. cell phones, satellites, online behavior, news headlines, corporate filings, and high-frequency time series). A central tenet of our paper is that hypothesis generation is in and of itself a valuable activity, and hope this encourages future work in this largely “pre-scientific” stage of science.
    JEL: B4 C01
    Date: 2023–03
  9. By: Leon, Mathilde; Cristia, Alejandrina (Centre Nationale de la Recherche Scientifique)
    Abstract: In the context of research using machine-learning tools on audio-recordings gathered in several countries, the LAAC Team sought to systematize regulation relevant to such data. The most important legal issue is data protection. Data protection is an important part of using and operating technology to protect human rights, and both at the international level and at other levels in many countries, a great deal of regulation has been created to address it. In addition, we also considered regulation referencing issues on which there are fewer regulations as of yet: informed consent, machine-learning bias and the possibility of discrimination, duty to report illegal activities, and intellectual property (potentially) emerging from aboriginal resources. In this document, we provide an overview of international and national law applicable to the protection of data collected in Namibia. Whenever possible, we explain in what way a given piece of regulation is relevant to long-form audio-recordings.
    Date: 2023–03–02
  10. By: Mihai Mutascu (LEO - Laboratoire d'Économie d'Orleans - UO - Université d'Orléans - UT - Université de Tours)
    Date: 2021–03
  11. By: MOTOHASHI Kazuyuki
    Abstract: As the use of AI and big data advances in the manufacturing industry, we are seeing changes in manufacturing, including the shift to digital services. In this paper, we used platform theory to examine the current state of digital innovation in the manufacturing industry and competitive strategies with other industries, including Internet platformers like GAFA. The IoT applications of existing companies such as Komatsu and GE are not in a position to directly compete with B2C Internet platformers, but the network effect of the platform economy will use the power to rapidly transform the industrial structure. As a manufacturing company, it is essential to promote the development of digital services that utilize customer data. In addition, the importance of strategies that focus on building an ecosystem that includes companies in other industries is increasing, and the ability to predict the future through scenario analysis and the ability to respond flexibly to changes in circumstances is required.
    Date: 2023–03
  12. By: Flavio Calvino; Luca Fontanelli
    Abstract: This report analyses the use of artificial intelligence (AI) in firms across 11 countries. Based on harmonised statistical code (AI diffuse) applied to official firm-level surveys, it finds that the use of AI is prevalent in ICT and Professional Services and more widespread across large – and to some extent across young – firms. AI users tend to be more productive, especially the largest ones. Complementary assets, including ICT skills, high-speed digital infrastructure, and the use of other digital technologies, which are significantly related to the use of AI, appear to play a critical role in the productivity advantages of AI users.
    Keywords: AI, artificial intelligence, productivity, technology adoption
    Date: 2023–04–11
  13. By: Boian Lazov
    Abstract: In this paper we pursue the question of a fully online trading algorithm (i.e. one that does not need offline training on previously gathered data). For this task we use Double Deep $Q$-learning in the episodic setting with Fast Learning Networks approximating the expected reward $Q$. Additionally, we define the possible terminal states of an episode in such a way as to introduce a mechanism to conserve some of the money in the trading pool when market conditions are seen as unfavourable. Some of these money are taken as profit and some are reused at a later time according to certain criteria. After describing the algorithm, we test it using the 1-minute-tick data for Cardano's price on Binance. We see that the agent performs better than trading with randomly chosen actions on each timestep. And it does so when tested on the whole dataset as well as on different subsets, capturing different market trends.
    Date: 2023–03
  14. By: Yukun Ma
    Abstract: This paper investigates the local average treatment effect (LATE) with high-dimensional covariates, regardless of the strength of identification. We propose a novel test statistic for the high-dimensional LATE, and show that our test has uniformly correct asymptotic size. Applying the double/debiased machine learning (DML) method to estimate nuisance parameters, we develop easy-to-implement algorithms for inference/confidence interval of the high-dimensional LATE. Simulations indicate that our test is efficient in the strongly identified LATE model.
    Date: 2023–02
  15. By: Thomas Wong; Prof. Mauricio Barahona
    Abstract: In this paper, we explore the use of different feature engineering and dimensionality reduction methods in multi-variate time-series modelling. Using a feature-target cross correlation time series dataset created from Numerai tournament, we demonstrate under over-parameterised regime, both the performance and predictions from different feature engineering methods converge to the same equilibrium, which can be characterised by the reproducing kernel Hilbert space. We suggest a new Ensemble method, which combines different random non-linear transforms followed by ridge regression for modelling high dimensional time-series. Compared to some commonly used deep learning models for sequence modelling, such as LSTM and transformers, our method is more robust (lower model variance over different random seeds and less sensitive to the choice of architecture) and more efficient. An additional advantage of our method is model simplicity as there is no need to use sophisticated deep learning frameworks such as PyTorch. The learned feature rankings are then applied to the temporal tabular prediction problem in the Numerai tournament, and the predictive power of feature rankings obtained from our method is better than the baseline prediction model based on moving averages
    Date: 2023–03
  16. By: Francesca Borgonovi; Justine Hervé; Helke Seitz
    Abstract: The paper discusses the implications of recent advances in artificial intelligence for knowledge workers, focusing on possible complementarities and substitution between machine translation tools and language professionals. The emergence of machine translation tools could enhance social welfare through enhanced opportunities for inter-language communication but also create new threats because of persisting low levels of accuracy and quality in the translation output. The paper uses data on online job vacancies to map the evolution of the demand for language professionals between 2015 and 2019 in 10 countries and illustrates the set of skills that are considered important by employers seeking to hire language professionals through job vacancies posted on line.
    JEL: J21 J23 J28 Z13
    Date: 2023–03–30
  17. By: Luke Sherman; Jonathan Proctor; Hannah Druckenmiller; Heriberto Tapia; Solomon M. Hsiang
    Abstract: The United Nations Human Development Index (HDI) is arguably the most widely used alternative to gross domestic product for measuring national development. This is in large part due to its multidimensional nature, as it incorporates not only income, but also education and health. However, the low country-level resolution of the global HDI data released by the Human Development Report Office of the United Nations Development Programme (N=191 countries) has limited its use at the local level. Recent efforts used labor-intensive survey data to produce HDI estimates for first-level administrative units (e.g., states/provinces). Here, we build on recent advances in machine learning and satellite imagery to develop the first global estimates of HDI for second-level administrative units (e.g., municipalities/counties, N = 61, 591) and for a global 0.1 × 0.1 degree grid (N=806, 361). To accomplish this we develop and validate a generalizable downscaling technique based on satellite imagery that allows for training and prediction with observations of arbitrary shape and size. This enables us to train a model using provincial administrative data and generate HDI estimates at the municipality and grid levels. Our results indicate that more than half of the global population was previously assigned to the incorrect HDI quintile within each country, due to aggregation bias resulting from lower resolution estimates. We also illustrate how these data can improve decision-making. We make these high resolution HDI estimates publicly available in the hope that they increase understanding of human wellbeing globally and improve the effectiveness of policies supporting sustainable development. We also make available the satellite features and software necessary to increase the spatial resolution of any other global-scale administrative data that is detectable via imagery.
    JEL: C1 C8 I32 R1
    Date: 2023–03
  18. By: Verhagen, Mark D.
    Abstract: Since individuals' social contexts vary strongly, large-scale policy interventions will likely have heterogeneous effects throughout a population. However, policy interventions are often assessed in narrow ways, either through aggregate effects or along a select number of a-priori hypothesised groups. Historically, such a narrow approach had been a necessity due to data and computational constraints. However, the availability of registry data and novel methods from the machine learning domain allow for a more rigorous, hypothesis-free approach to monitoring policy effects. I illustrate how these developments can revolutionise our measurement and understanding of policy interventions by studying the nationwide 2015 decentralisation of the social domain in The Netherlands. This policy intervention delegated responsibilities to administer social care from the national to the municipal level. The decentralisation was criticised beforehand for risk of producing inequitable effects across demographic groups or regions, but rigorous empirical follow-up remains lacking. Using machine learning methods on entire population data in The Netherlands, I find the policy induced strongly heterogeneous effects that include evidence of local capture and strong urban / rural divides. More generally, I provide a case study of how machine learning methods can be effectively used to monitor large-scale policy interventions.
    Date: 2023–03–16
  19. By: Singla, Shikhar
    Abstract: Industry concentration and markups in the US have been rising over the last 3- 4 decades. However, the causes remain largely unknown. This paper uses machine learning on regulatory documents to construct a novel dataset on compliance costs to examine the effect of regulations on market power. The dataset is comprehensive and consists of all significant regulations at the 6-digit NAICS level from 1970-2018. We find that regulatory costs have increased by $1 trillion during this period. We document that an increase in regulatory costs results in lower (higher) sales, employment, markups, and profitability for small (large) firms. Regulation driven increase in concentration is associated with lower elasticity of entry with respect to Tobin's Q, lower productivity and investment after the late 1990s. We estimate that increased regulations can explain 31-37% of the rise in market power. Finally, we uncover the political economy of rulemaking. While large firms are opposed to regulations in general, they push for the passage of regulations that have an adverse impact on small firms.
    Keywords: Market Power, Competition, Concentration, Machine Learning, Regulations
    JEL: L51 L11 C45 D4
    Date: 2023
  20. By: Klauber, Hannah (Mercator Research Institute on Global Commons and Climate Change (MCC)); Koch, Nicolas (Mercator Research Institute on Global Commons and Climate Change (MCC))
    Abstract: This paper identifies individual and regional risk factors for hospitalizations caused by heat within the German population over 65 years of age. Using administrative insurance claims data and a machine-learning-based regression model, we causally estimate heterogeneous heat effects and explore the geographic, morbidity, and socioeconomic correlates of heat vulnerability. Our results indicate that health effects distribute highly unevenly across the population. The most vulnerable are more likely to suffer from chronic diseases such as dementia and Alzheimer's disease and live in rural areas with more old-age poverty and less nursing care. We project that unabated climate change might bring heat to areas with particularly vulnerable populations, which could lead to a five-fold increase in heat-related hospitalization by 2100.
    Keywords: heat, climate change, hospitalization, risk factors, adaptation, machine learning
    JEL: I14 I18 Q51 Q54 Q58
    Date: 2023–03
  21. By: Stefano Bianchini (BETA - Bureau d'Économie Théorique et Appliquée - AgroParisTech - UNISTRA - Université de Strasbourg - UL - Université de Lorraine - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement); Moritz Müller (BETA - Bureau d'Économie Théorique et Appliquée - AgroParisTech - UNISTRA - Université de Strasbourg - UL - Université de Lorraine - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement); Pierre Pelletier (BETA - Bureau d'Économie Théorique et Appliquée - AgroParisTech - UNISTRA - Université de Strasbourg - UL - Université de Lorraine - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement)
    Abstract: This paper offers insights into the diffusion and impact of artificial intelligence in science. More specifically, we show that neural network-based technology meets the essential properties of emerging technologies in the scientific realm. It is novel, because it shows discontinuous innovations in the originating domain and is put to new uses in many application domains; it is quick growing, its dimensions being subject to rapid change; it is coherent, because it detaches from its technological parents, and integrates and is accepted in different scientific communities; and it has a prominent impact on scientific discovery, but a high degree of uncertainty and ambiguity associated with this impact. Our findings suggest that intelligent machines diffuse in the sciences, reshape the nature of the discovery process and affect the organization of science. We propose a new conceptual framework that considers artificial intelligence as an emerging general method of invention and, on this basis, derive its policy implications.
    Keywords: Artificial intelligence, Emerging technologies, Method of invention, Scientific discovery, Novelty
    Date: 2022–12
  22. By: Minkyu Shin; Jin Kim; Bas van Opheusden; Thomas L. Griffiths
    Abstract: How will superhuman artificial intelligence (AI) affect human decision making? And what will be the mechanisms behind this effect? We address these questions in a domain where AI already exceeds human performance, analyzing more than 5.8 million move decisions made by professional Go players over the past 71 years (1950-2021). To address the first question, we use a superhuman AI program to estimate the quality of human decisions across time, generating 58 billion counterfactual game patterns and comparing the win rates of actual human decisions with those of counterfactual AI decisions. We find that humans began to make significantly better decisions following the advent of superhuman AI. We then examine human players' strategies across time and find that novel decisions (i.e., previously unobserved moves) occurred more frequently and became associated with higher decision quality after the advent of superhuman AI. Our findings suggest that the development of superhuman AI programs may have prompted human players to break away from traditional strategies and induced them to explore novel moves, which in turn may have improved their decision-making.
    Date: 2023–03

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.