nep-big New Economics Papers
on Big Data
Issue of 2025–04–21
six papers chosen by
Tom Coupé, University of Canterbury


  1. A Geospatial Approach to Measuring Economic Activity By Anton Yang; Jianwei Ai; Costas Arkolakis
  2. Understanding Labor Market Demand in Real Time in Argentina and Uruguay By Evelyn Vezza; Zunino, Gonzalo; Laguinge, Luis; Harry Edmund Moroz; Ignacio Raul Apella; Marla Hillary Spivack
  3. News article analysis using Naive Bayes classifier By Ana Vujovic
  4. BANK BEHAVIOR IN DETERMINING SUPPLY OF CREDIT IN INDONESIA By Danny Hermawan; Cicilia Anggadewi Harun; Wicaksono Aryo Pradipto; Yulian Zifar Ayustira; Alvin Andhika Zulen; Amin Endah Sulistiawati; Ade Dwi Aryani; Sintia Aurida
  5. The Impact of Media Attention on the Illiquidity of Stocks: Evidence from the Global FinTech Sector By Gaar, Eduard; Moritz, Valentin; Schiereck, Dirk
  6. Revisions in concurrent seasonal adjustments of daily and weekly economic time series By Webel, Karsten

  1. By: Anton Yang (Yale University); Jianwei Ai (Renmin University of China); Costas Arkolakis (Yale University)
    Abstract: We introduce a new methodology to detect and measure economic activity using geospatial data and apply it to steel production, a major industrial pollution source worldwide. Combining plant output data with geospatial data, such as ambient air pollutants, nighttime lights, and temperature, we train machine learning models to predict plant locations and output. We identify about 40% (70%) of plants missing from the training sample within a 1 km (5 km) radius and achieve R2 above 0.8 for output prediction at a 1 km grid and at the plant level, as well as for both regional and time series validations. Our approach can be adapted to other industries and regions, and used by policymakers and researchers to track and measure industrial activity in near real time.
    Date: 2025–04–03
    URL: https://d.repec.org/n?u=RePEc:cwl:cwldpp:2435
  2. By: Evelyn Vezza; Zunino, Gonzalo; Laguinge, Luis; Harry Edmund Moroz; Ignacio Raul Apella; Marla Hillary Spivack
    Abstract: This paper explores how job vacancy data can enhance labor market information systems (LMISs) in Argentina and Uruguay where, as in many countries, data on in-demand skills is lacking. By analyzing job postings collected over four years in Argentina and Uruguay, this study assesses the potential of vacancy data to fill labor market data gaps. The findings reveal that vacancy data capture labor market dynamics across time and geography, showing a strong correlation with traditional labor market indicators such as employment and unemployment. However, the data are biased towards higher-skilled occupations. Despite these limitations, the large volume of postings allows for robust inferences and provides valuable insights into skills demand. The study presents three key applications of the data: 1) using postings as a leading indicator of labor market health; 2) identifying in-demand skills; and 3) mapping similarities between occupations to improve the information available to job counselors to provide advice about job transitions. Finally, the paper contributes methodologically by developing both a manually created skills taxonomy and an experimental machine learning approach to classifying skills. The machine learning method, while less comprehensive, highlights in-demand skills and can complement the manual approach by keeping it up to date with minimal input. Overall, the paper demonstrates the potential of job vacancy data to improve LMISs and inform labor market policies in Argentina and Uruguay with immediate practical applications for labor market analysis, skills development, and workforce training.
    Date: 2025–03–17
    URL: https://d.repec.org/n?u=RePEc:wbk:wbrwps:11086
  3. By: Ana Vujovic (National Bank of Serbia)
    Abstract: The paper presents the Naive Bayes classifier (NBC), one of the standard models used for solving classification problems, in the context of textual analysis. The model is examined first from a theoretical perspective and then from a practical one. An empirical study was conducted with the aim of carrying out a thematic classification of news articles using the NBC. The results of our research confirm that the NBC has a high predictive power despite the simplified assumptions on which it is based. These findings suggest a potential for further application of the NBC in the thematic classification of texts, which may have significant implications for economic research.
    Keywords: Naive Bayes classifier, thematic classification, natural language processing
    JEL: C13 E37
    Date: 2025–03
    URL: https://d.repec.org/n?u=RePEc:nsb:bilten:27
  4. By: Danny Hermawan (Bank Indonesia); Cicilia Anggadewi Harun (Bank Indonesia); Wicaksono Aryo Pradipto (Bank Indonesia); Yulian Zifar Ayustira (Bank Indonesia); Alvin Andhika Zulen (Bank Indonesia); Amin Endah Sulistiawati (Bank Indonesia); Ade Dwi Aryani (Bank Indonesia); Sintia Aurida (Bank Indonesia)
    Abstract: With the constant diruptions in the economy stemmed from global market turbulence, technological changes, and shift toward a more sustainable way of life, understanding banking behavior become a priority to maintain financial stability. This study examines the credit allocation behavior of banks in Indonesia, influenced by economic conditions, regulatory frameworks, technological advancements, and sector-specific challenges. Bank credit plays a vital role in macroeconomic stability, and economic fluctuations impact banks procyclical credit behavior. The Indonesian banking sector faces complex pressures and sectoral risks, emphasizing the need for solid policies from Bank Indonesia to maintain financial system stability. This research addresses two main questions: how client relationships affect credit supply decisions and how structural changes such as interest rates, climate change, and cybersecurity influence bank behavior. Utilizing primary and secondary data as well as machine learning (ML) methods, the study reveals insights into credit supply practices in Indonesian banks and the potential of big data and ML for a detailed assessment of credit distribution patterns. The findings highlight the importance of stricter oversight, technological integration, and sectorspecific strategies, especially for SMEs and high-risk sectors such as tourism and mining. The study emphasizes integrating green finance, RegTech, and SupTech to enhance banking sector resilience and align credit activities with sustainability goals. By applying these insights, Indonesia can create a stable credit environment, support economic growth, and ensure banks are prepared to manage evolving risks in the financial landscape.
    Keywords: bank behavior, credit growth, credit supply, machine learning
    JEL: E51 G21 G28
    Date: 2024
    URL: https://d.repec.org/n?u=RePEc:idn:wpaper:wp112024
  5. By: Gaar, Eduard; Moritz, Valentin; Schiereck, Dirk
    Abstract: As a result of technological innovations in data processing, the exploitation of Internet usage data in relation to search engines or social networks is becoming increasingly intriguing for understanding and anticipating stock market movements. We analyze the impact of three alternative investor attention variables, i.?e. Google search volume, Wikipedia page views, and stock market-relevant news on the rapidly growing FinTech sector. The result of the simultaneous correlation analysis reveals a highly significant correlation between the trading activities of the FinTech sector and the three investor attention variables. The time-delayed regression analysis complements the results by identifying substantial changes of the effects within one week considering the order of magnitude and sign. Furthermore, multivariate regression analysis highlights that the explanatory power for future stock trading activities and illiquidity primarily depends on Google search volume and stock market-relevant news volume, while the simultaneous correlations are best explained by the number of visits to the corresponding Wikipedia page.
    Date: 2025–03–18
    URL: https://d.repec.org/n?u=RePEc:dar:wpaper:153634
  6. By: Webel, Karsten
    Abstract: The COVID-19 outbreak in 2020 has fostered in many countries the development of new weekly economic indices for the timely tracking of pandemic-related turmoils and other forms of rapid economic changes. Such indices often utilise information from daily and weekly economic time series that normally exhibit complex forms of seasonal behaviour. The latter dynamics were initially removed with ad hoc or experimental methods due to the urgent need of instant results and hence the lack of time for inventing and approving more sophisticated alternatives. This, never- theless, has in turn inspired recent developments of seasonal adjustment methods tailored to the specifics of infra-monthly time series. Although sound theoretical descriptions of these tailored methods are already available, their performance has not been evaluated empirically in great detail so far. To fill this gap, we consider real-time data vintages of several infra-monthly economic time series for Germany and analyse the cross-vintage stability of holiday-related deterministic pretreatment effects as well as the revisions in various concurrent signal estimates obtained with experimental STL-based and selected elaborate methods, such as the extended ARIMA model-based and X-11 approaches. Our main findings are that the tai- lored methods tend to outperform the experimental ones in terms of computational speed, that the considered pretreatment routines yield generally stable parameter estimates across data vintages, and that the extended ARIMA model-based approach generates the smallest and least volatile revisions in many cases.
    Keywords: extended ARIMA model-based approach, extended X-11 approach, Google trends, JDemetra+, real-time analysis, signal extraction, stability analysis, STL approach
    JEL: C01 C02 C14 C22 C40 C50
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:zbw:bubdps:315494

This nep-big issue is ©2025 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.