nep-big New Economics Papers
on Big Data
Issue of 2023‒10‒30
seventeen papers chosen by
Tom Coupé, University of Canterbury


  1. PAMS: Platform for Artificial Market Simulations By Masanori Hirano; Ryosuke Takata; Kiyoshi Izumi
  2. Leveraging Deep Learning and Online Source Sentiment for Financial Portfolio Management By Paraskevi Nousi; Loukia Avramelou; Georgios Rodinos; Maria Tzelepi; Theodoros Manousis; Konstantinos Tsampazis; Kyriakos Stefanidis; Dimitris Spanos; Emmanouil Kirtas; Pavlos Tosidis; Avraam Tsantekidis; Nikolaos Passalis; Anastasios Tefas
  3. Artificial intelligence, complementary assets and productivity: evidence from French firms By Flavio Calvino; Luca Fontanelli
  4. Forecasting Inflation from Disaggregated Data: The Colombian case By Wilmer Martínez-Rivera; Eliana R. González-Molano; Edgar Caicedo-García
  5. Big data analytics and exports: Evidence for manufacturing firms from 27 EU countries By Wagner, Joachim
  6. The Productivity Effects of Regional Anchors on Local Firms in Swedish Regions between 2007 and 2019 – Evidence from an Expert-informed Machine-Learning Approach By Nilsson, Magnus; Schubert, Torben; Miörner, Johan
  7. Artificial Intelligence and Workers' Well-Being By Giuntella, Osea; König, Johannes; Stella, Luca
  8. Long-term effects of early adverse labour market conditions: A Causal Machine Learning approach By Petru Crudu
  9. Evaluation of Reinforcement Learning Techniques for Trading on a Diverse Portfolio By Ishan S. Khare; Tarun K. Martheswaran; Akshana Dassanaike-Perera; Jonah B. Ezekiel
  10. Assessing Look-Ahead Bias in Stock Return Predictions Generated By GPT Sentiment Analysis By Paul Glasserman; Caden Lin
  11. Artificial Intelligence and Employment: A Look into the Crystal Ball By Dario Guarascio; Jelena Reljic; Roman Stollinger
  12. Artificial Intelligence and Employment: A Look into the Crystal Ball By Dario Guarascio; Jelena Reljic; Roman Stoellinger
  13. Cite-seeing and reviewing: A study on citation bias in peer review. By Stelmakh, Ivan; Rastogi, Charvi; Liu, Ryan; Chawla, Shuchi; Shah, Nihar; Echenique, Federico
  14. Using Large Language Models for Qualitative Analysis can Introduce Serious Bias By Julian Ashwin; Aditya Chhabra; Vijayendra Rao
  15. What do telecommunications policy academics have to fear from GPT-3? By Howell, Bronwyn E.; Potgieter, Petrus H.
  16. Data sharing or algorithm sharing? By Bruno Carballa-Smichowski; Yassine Lefouili; Andrea Mantovani; Carlo Reggiani
  17. Inflation news coverage, expectations and risk premium By Perico Ortiz, Daniel

  1. By: Masanori Hirano; Ryosuke Takata; Kiyoshi Izumi
    Abstract: This paper presents a new artificial market simulation platform, PAMS: Platform for Artificial Market Simulations. PAMS is developed as a Python-based simulator that is easily integrated with deep learning and enabling various simulation that requires easy users' modification. In this paper, we demonstrate PAMS effectiveness through a study using agents predicting future prices by deep learning.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.10729&r=big
  2. By: Paraskevi Nousi; Loukia Avramelou; Georgios Rodinos; Maria Tzelepi; Theodoros Manousis; Konstantinos Tsampazis; Kyriakos Stefanidis; Dimitris Spanos; Emmanouil Kirtas; Pavlos Tosidis; Avraam Tsantekidis; Nikolaos Passalis; Anastasios Tefas
    Abstract: Financial portfolio management describes the task of distributing funds and conducting trading operations on a set of financial assets, such as stocks, index funds, foreign exchange or cryptocurrencies, aiming to maximize the profit while minimizing the loss incurred by said operations. Deep Learning (DL) methods have been consistently excelling at various tasks and automated financial trading is one of the most complex one of those. This paper aims to provide insight into various DL methods for financial trading, under both the supervised and reinforcement learning schemes. At the same time, taking into consideration sentiment information regarding the traded assets, we discuss and demonstrate their usefulness through corresponding research studies. Finally, we discuss commonly found problems in training such financial agents and equip the reader with the necessary knowledge to avoid these problems and apply the discussed methods in practice.
    Date: 2023–07
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.16679&r=big
  3. By: Flavio Calvino; Luca Fontanelli
    Abstract: In this work we characterise French firms using artificial intelligence (AI) and explore the link between AI use and productivity. We relevantly distinguish AI users that source AI from external providers (AI buyers) from those developing their own AI systems (AI developers). AI buyers tend to be larger than other firms, while AI developers are also younger. The share of firms using AI is highest in the ICT sector, which exhibits a particularly high share of developers. Complementary assets, including skills, digital capabilities and infrastructure, play a key role for AI use, with AI buyers and developers leveraging different types of human capital. Overall, AI users tend to be more productive, however this appears largely related to the self-selection of more productive and digital-intensive firms into AI use. This is not the case for AI developers, for which the positive link between AI use and productivity remains evident beyond selection, suggesting a positive effect of AI on their productivity.
    Keywords: Technology Diffusion; Artificial Intelligence; Digitalisation; Productivity.
    Date: 2023–10–13
    URL: http://d.repec.org/n?u=RePEc:ssa:lemwps:2023/35&r=big
  4. By: Wilmer Martínez-Rivera; Eliana R. González-Molano; Edgar Caicedo-García
    Abstract: Based on monthly disaggregated Consumer Price Index (CPI) item series and macroeconomic series, we explore the advantages of forecast inflation from a disaggregated to an aggregated level by aggregating the forecasts. We compare the performance of this approach with the forecast obtained modeling aggregated inflation directly. For the aggregate level, we implement some of the techniques and models, helpful to work with many predictors, such as dimension reduction, shrinkage methods, and machine learning models. Also, we implement traditional time-series models. For the disaggregated data, we use its lags and a set of macroeconomic variables as explanatory variables. Direct and recursive forecast techniques are also explored. The sample period of the analysis is from 2011 to 2022, with forecasting and evaluation out of the sample from 2017. In addition, we evaluate the forecast accuracy during the COVID-19 period. We found a reduction in the forecast error from the disaggregate analysis over the aggregate one. **** RESUMEN: En este artículo se analiza la información mensual tanto agregada como desagregada del índice de precios al consumidor (IPC) en Colombia. Se explora las ventajas de pronosticar a nivel desagregado para luego agregar pronósticos y comparar con los pronósticos obtenidos al analizar la información agregada. El cálculo de pronósticos esta basado en el ajuste de modelos y técnicas que incluyen modelos de reducción de dimensión, modelos de selección de variables, modelos de Machine Learning así como modelos tradicionales de series de tiempo ARIMA. El periodo muestral de análisis es 2011 a 2022 cuyo cálculo de pronósticos fuera de muestra se da a partir de 2017 hasta 2022.
    Keywords: Inflación, datos desagregados, pronósticos agregados, Inflación, datos desagregados, pronósticos agregados
    JEL: C52 E17 E31
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:bdr:borrec:1251&r=big
  5. By: Wagner, Joachim
    Abstract: The use of big data analytics (including data mining and predictive analytics) by firms can be expected to increase productivity and reduce trade costs, which should be positively related to export activities. This paper uses firm level data from the Flash Eurobarometer 486 survey conducted in February - May 2020 to investigate the link between the use of big data analytics and export activities in manufacturing enterprises from the 27 member countries of the European Union. We find that firms which use big data analytics do more often export, do more often export to various destinations all over the world, and do export to more different destinations. The estimated big data analytics premia for exports are statistically highly significant after controlling for firm size, firm age, patents, and country. Furthermore, the size of these premia can be considered to be large. Successful exporters tend to use big data analytics.
    Keywords: Big data analytics, exports, firm level data, Flash Eurobarometer 486
    JEL: D22 F14
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:zbw:kcgwps:28&r=big
  6. By: Nilsson, Magnus (CIRCLE, Lund University); Schubert, Torben (CIRCLE, Lund University); Miörner, Johan (CIRCLE, Lund University)
    Abstract: This paper analyses the impact of regional anchors on local firms in Swedish regions. Departing from previous idiographic research, we adopt a nomothetic research design relying on a stepwise expert-informed supervised machine learning approach to identify the population of anchor firms in the Swedish economy between 2007 and 2019. We find support for positive anchor effects on the productivity of other firms in the region. These effects are moderated by regional and anchor conditions. We find that the effects are greater when there are multiple anchors within the same industry and that the effects are larger in economically weaker regions.
    Keywords: anchor-tenant; productivity; machine learning; anchor firms; Sweden
    JEL: D24 O30 R11 R12
    Date: 2023–10–10
    URL: http://d.repec.org/n?u=RePEc:hhs:lucirc:2023_008&r=big
  7. By: Giuntella, Osea (University of Pittsburgh); König, Johannes (DIW Berlin); Stella, Luca (Free University of Berlin)
    Abstract: This study explores the relationship between artificial intelligence (AI) and workers' well-being and mental health using longitudinal survey data from Germany (2000-2020). We construct a measure of individual exposure to AI technology based on the occupation in which workers in our sample were first employed and explore an event study design and a difference-in-differences approach to compare AI-exposed and non-exposed workers. Before AI became widely available, there is no evidence of differential pre-trends in workers' well-being and concerns about their economic futures. Since 2015, however, with the increasing adoption of AI in firms across Germany, we find that AI-exposed workers have become less satisfied with their life and job and more concerned about job security and their personal economic situation. However, we find no evidence of a significant impact of AI on workers' mental health, anxiety, or depression.
    Keywords: artificial intelligence, future of work, well-being, mental health
    JEL: I10 J28 O30
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp16485&r=big
  8. By: Petru Crudu (Department of Economics, University Of Venice CÃ Foscari)
    Abstract: This study estimates the long-term causal effects of completing education during adverse labour market conditions, measuring outcomes 35 years post-education. To achieve this, the study combines historical regional unemployment rates with detailed SHARE microdata for European cohorts completing education between 1960 and 1990 in a novel database. A systematic heterogeneity analysis is conducted by leveraging the Causal Forest, a causal machine learning estimator that allows estimates at various aggregation levels. Furthermore, the causal link is validated using an instrumental variable approach. The main findings reveal that a one-percentage-point increase in the unemployment rate at the time of completing education leads to a significant decline in earnings (-5.2%) and self-perceived health (-2.23%) after 35 years. The heterogeneity analysis uncovers that the results are primarily driven by less educated individuals and highlights a permanent disadvantage for women in labour market participation. This study also provides evidence that systematic divergence in life trajectories can be explained by search theory and human capital models. Overall, the research suggests that the consequences of limited post-education opportunities can be permanent, underscoring the importance of identifying vulnerable groups for effective policy interventions.
    Keywords: Long-term Effects, Unemployment, Heterogeneous Effects, GRF
    JEL: J31 I1 J24 I24 E24
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:ven:wpaper:2023:21&r=big
  9. By: Ishan S. Khare; Tarun K. Martheswaran; Akshana Dassanaike-Perera; Jonah B. Ezekiel
    Abstract: This work seeks to answer key research questions regarding the viability of reinforcement learning over the S&P 500 index. The on-policy techniques of Value Iteration (VI) and State-action-reward-state-action (SARSA) are implemented along with the off-policy technique of Q-Learning. The models are trained and tested on a dataset comprising multiple years of stock market data from 2000-2023. The analysis presents the results and findings from training and testing the models using two different time periods: one including the COVID-19 pandemic years and one excluding them. The results indicate that including market data from the COVID-19 period in the training dataset leads to superior performance compared to the baseline strategies. During testing, the on-policy approaches (VI and SARSA) outperform Q-learning, highlighting the influence of bias-variance tradeoff and the generalization capabilities of simpler policies. However, it is noted that the performance of Q-learning may vary depending on the stability of future market conditions. Future work is suggested, including experiments with updated Q-learning policies during testing and trading diverse individual stocks. Additionally, the exploration of alternative economic indicators for training the models is proposed.
    Date: 2023–06
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.03202&r=big
  10. By: Paul Glasserman; Caden Lin
    Abstract: Large language models (LLMs), including ChatGPT, can extract profitable trading signals from the sentiment in news text. However, backtesting such strategies poses a challenge because LLMs are trained on many years of data, and backtesting produces biased results if the training and backtesting periods overlap. This bias can take two forms: a look-ahead bias, in which the LLM may have specific knowledge of the stock returns that followed a news article, and a distraction effect, in which general knowledge of the companies named interferes with the measurement of a text's sentiment. We investigate these sources of bias through trading strategies driven by the sentiment of financial news headlines. We compare trading performance based on the original headlines with de-biased strategies in which we remove the relevant company's identifiers from the text. In-sample (within the LLM training window), we find, surprisingly, that the anonymized headlines outperform, indicating that the distraction effect has a greater impact than look-ahead bias. This tendency is particularly strong for larger companies--companies about which we expect an LLM to have greater general knowledge. Out-of-sample, look-ahead bias is not a concern but distraction remains possible. Our proposed anonymization procedure is therefore potentially useful in out-of-sample implementation, as well as for de-biased backtesting.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.17322&r=big
  11. By: Dario Guarascio; Jelena Reljic; Roman Stollinger
    Abstract: This study provides evidence of the employment impact of AI exposure in European regions, addressing one of the many gaps in the emerging literature on AI's effects on employment in Europe. Building upon the occupation-based AI-exposure indicators proposed by Felten et al. (2018, 2019, 2021), which are mapped to the European occupational classification (ISCO), following Albanesi et al. (2023), we analyse the regional employment dynamics between 2011 and 2018. After controlling for a wide range of supply and demand factors, our findings indicate that, on average, AI exposure has a positive impact on regional employment. Put differently, European regions characterised by a relatively larger share of AI-exposed occupations display, all else being equal and once potential endogeneity concerns are mitigated, a more favourable employment tendency over the period 2011-2018. We also find evidence of a moderating effect of robot density on the AI-employment nexus, which however lacks a causal underpinning.
    Keywords: Artificial intelligence; industrial robots; labour; regional employment; occupations.
    Date: 2023–10–06
    URL: http://d.repec.org/n?u=RePEc:ssa:lemwps:2023/34&r=big
  12. By: Dario Guarascio; Jelena Reljic; Roman Stoellinger
    Abstract: This study provides evidence of the employment impact of AI exposure in European regions, addressing one of the many gaps in the emerging literature on AI's effects on employment in Europe. Building upon the occupation-based AI-exposure indicators proposed by Felten et al. (2018, 2019, 2021), which are mapped to the European occupational classification (ISCO), following Albanesi et al. (2023), we analyse the regional employment dynamics between 2011 and 2018. After controlling for a wide range of supply and demand factors, our findings indicate that, on average, AI exposure has a positive impact on regional employment. Put differently, European regions characterised by a relatively larger share of AI-exposed occupations display, all else being equal and once potential endogeneity concerns are mitigated, a more favourable employment tendency over the period 2011-2018. We also find evidence of a moderating effect of robot density on the AI-employment nexus, which however lacks a causal underpinning.
    Keywords: Artificial intelligence; industrial robots; labour; regional employment; occupations
    JEL: J21 J23 O33 R1
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:sap:wpaper:wp243&r=big
  13. By: Stelmakh, Ivan; Rastogi, Charvi; Liu, Ryan; Chawla, Shuchi; Shah, Nihar; Echenique, Federico
    Abstract: Citations play an important role in researchers careers as a key factor in evaluation of scientific impact. Many anecdotes advice authors to exploit this fact and cite prospective reviewers to try obtaining a more positive evaluation for their submission. In this work, we investigate if such a citation bias actually exists: Does the citation of a reviewers own work in a submission cause them to be positively biased towards the submission? In conjunction with the review process of two flagship conferences in machine learning and algorithmic economics, we execute an observational study to test for citation bias in peer review. In our analysis, we carefully account for various confounding factors such as paper quality and reviewer expertise, and apply different modeling techniques to alleviate concerns regarding the model mismatch. Overall, our analysis involves 1, 314 papers and 1, 717 reviewers and detects citation bias in both venues we consider. In terms of the effect size, by citing a reviewers work, a submission has a non-trivial chance of getting a higher score from the reviewer: an expected increase in the score is approximately 0.23 on a 5-point Likert item. For reference, a one-point increase of a score by a single reviewer improves the position of a submission by 11% on average.
    Keywords: Humans, Prospective Studies, Peer Review, Bias, Research Personnel, Machine Learning, Peer Review, Research
    Date: 2023–01–01
    URL: http://d.repec.org/n?u=RePEc:cdl:econwp:qt3883h8j1&r=big
  14. By: Julian Ashwin; Aditya Chhabra; Vijayendra Rao
    Abstract: Large Language Models (LLMs) are quickly becoming ubiquitous, but the implications for social science research are not yet well understood. This paper asks whether LLMs can help us analyse large-N qualitative data from open-ended interviews, with an application to transcripts of interviews with Rohingya refugees in Cox's Bazaar, Bangladesh. We find that a great deal of caution is needed in using LLMs to annotate text as there is a risk of introducing biases that can lead to misleading inferences. We here mean bias in the technical sense, that the errors that LLMs make in annotating interview transcripts are not random with respect to the characteristics of the interview subjects. Training simpler supervised models on high-quality human annotations with flexible coding leads to less measurement error and bias than LLM annotations. Therefore, given that some high quality annotations are necessary in order to asses whether an LLM introduces bias, we argue that it is probably preferable to train a bespoke model on these annotations than it is to use an LLM for annotation.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.17147&r=big
  15. By: Howell, Bronwyn E.; Potgieter, Petrus H.
    Abstract: Artificial intelligence (AI) tools such as ChatGPT and GPT-3 have shot to prominence recently (Lin 2023), as dramatic advances have shown them to be capable of writing plausible output that is difficult to distinguish from human-authored content. Unsurprisingly, this has led to concerns about their use by students in tertiary education contexts (Swiecki et al. 2022) and it has led to them being banned in some school districts in the United States (e.g. Rosenblatt 2023; Clarridge 2023) and from at least one top-ranking international university (e.g. Reuters 2023). There are legitimate reasons for such fears to be held, as it is difficult to differentiate students' own written work presented for assessment from that produced by the AI tools. Successfully embedding them into educational contexts requires an understanding of the tools, what they are, and what they can and cannot do. Despite their powerful modelling and description capabilities, these tools have (at least currently) significant issues and limitations (Zhang & Li 2021). As telecommunications policy academics charged with the research-led teaching and supervising both undergraduate and research students, we need to be certain that our graduates are capable of understanding the complexities of current issues in this incredibly dynamic field and applying their learnings appropriately in industry and policy environments. We must be reasonably certain that the grades we assign are based on the students' own work and understanding, To this end, we engaged in an experiment with the current (Q1 of 2023) version of the AI tool to assess how well it coped with questions on a core and current topic in telecommunications policy education: the effects of access regulation (local loop unbundling) on broadband investment and uptake. We found that while the outputs were well-written and appeared plausible, there were significant systematic errors which, once academics are aware of them, can be exploited to avoid the risk of AI use severely undermining the credibility of the assessments we make of students' written work, at least for the time being and in respect of the version of chatbot software we used.
    Keywords: Artificial Intelligence (AI), ChatGPT, GPT-3, Academia
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:zbw:itse23:277972&r=big
  16. By: Bruno Carballa-Smichowski (European Commission’s Joint Research Centre, Seville, Spain); Yassine Lefouili (Toulouse School of Economics, Toulouse, France); Andrea Mantovani (TBS Business School, Toulouse, France); Carlo Reggiani (European Commission’s Joint Research Centre, Seville, Spain and Department of Economics, University of Manchester, Manchester, United Kingdom)
    Abstract: Data combination and analytics can generate valuable insights for firms and society as a whole. Multiple firms can do so by means of new technologies that bring the algorithm to the data (“algorithm sharing†) or, more conventionally, by sharing the data (“data sharing†). Algorithm-sharing technologies are gaining traction because of their advantages in terms of privacy, security, and environmental impact. We present a model that allows us to study the economic incentives generated by these technologies for both firms and a platform facilitating data combination. We find that, first, the platform chooses data sharing unless algorithm sharing’s analytics are sufficiently superior to those associated to data sharing. Second, we identify the properties of the analytics benefit function that ensure that algorithm sharing results in a higher total data contribution. Third, we highlight scenarios in which, in presence of data externalities, there can be a mismatch between the choice of the platform and the preference of a social planner
    Keywords: data sharing, algorithm sharing, data platforms, federated learning, data externalities.
    JEL: D43 K21 L11 L13 L41 L86 M21 M31
    URL: http://d.repec.org/n?u=RePEc:net:wpaper:2308&r=big
  17. By: Perico Ortiz, Daniel
    Abstract: This paper investigates the effects of inflation news coverage on market-based inflation expectations and outcomes in the inflation-protected securities market. We employ a large corpus of news headlines from top U.S. newspapers and market data on the U.S. yield curve and inflation-protected securities. Our results indicate that news coverage, particularly regarding specific topics, exerts a significant influence on inflation compensation, expectations, and risk premiums. We observe that the impact of news diminishes as the maturity increases and varies across different news topics. This study contributes to the understanding of media influence on financial markets, specifically in shaping inflation expectations.
    Keywords: Inflation, expectations, risk premium, newspapers, term structure
    JEL: C22 D83 D84 E13 E31 E65
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:zbw:iwqwdp:052023&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.