nep-big New Economics Papers
on Big Data
Issue of 2023‒11‒06
24 papers chosen by
Tom Coupé, University of Canterbury


  1. Identifying Nascent High-Growth Firms Using Machine Learning By Stephanie Houle; Ryan Macdonald
  2. A machine learning approach for assessing labor supply to the online labor market By Fung, Esabella
  3. Collecting, generating and analyzing national statistics with AI: what benefits and costs? By Rim, Maria J.; Kwon, Youngsun
  4. Big Data y Algoritmos para la Medición de la Pobreza y el Desarrollo By Walter Sosa Escudero
  5. Beds in Health Facilities in the Italian Regions: A Socio-Economic Approach By Leogrande, Angelo; Costantiello, Alberto; Leogrande, Domenico; Anobile, Fabio
  6. CAD: Clustering And Deep Reinforcement Learning Based Multi-Period Portfolio Management Strategy By Zhengyong Jiang; Jeyan Thiayagalingam; Jionglong Su; Jinjun Liang
  7. Fuzzy firm name matching: Merging Amadeus firm data to PATSTAT By Leon Bremer
  8. Machine learning applied to active fixed-income portfolio management: a Lasso logit approach. By Mercedes de Luis; Emilio Rodríguez; Diego Torres
  9. Learning from experts: Energy efficiency in residential buildings By Billio, Monica; Casarin, Roberto; Costola, Michele; Veggente, Veronica
  10. Artificial Intelligence and Central Bank Communication: The Case of the ECB By Nicolas Fanta; Roman Horvath
  11. The fundamental properties, stability and predictive power of distributional preferences By Ernst Fehr; Thomas Epper; Julien Senn
  12. Estimating Impact with Surveys versus Digital Traces: Evidence from Randomized Cash Transfers in Togo By Emily Aiken; Suzanne Bellue; Joshua Blumenstock; Dean Karlan; Christopher R. Udry
  13. A methodology for calculating the unmet passenger demand in the air transportation industry. By Rafael Bernardo Carmona Benitez; Maria
  14. Understanding Urban Economies, Land Use, and Social Dynamics in the City: Big Data and Measurement By Saiz, Albert; Salazar-Miranda, Arianna
  15. Combining Deep Learning and GARCH Models for Financial Volatility and Risk Forecasting By Jakub Micha\'nk\'ow; {\L}ukasz Kwiatkowski; Janusz Morajda
  16. Machine Learning Who to Nudge: Causal vs Predictive Targeting in a Field Experiment on Student Financial Aid Renewal By Susan Athey; Niall Keleher; Jann Spiess
  17. How Does Artificial Intelligence Improve Human Decision-Making? Evidence from the AI-Powered Go Program By Sukwoong Choi; Hyo Kang; Namil Kim; Junsik Kim
  18. Inherited inequality: a general framework and an application to South Africa By Brunori, Paolo; Ferreira, Francisco H. G.; Salas-Rojo, Pedro
  19. Hedging Properties of Algorithmic Investment Strategies using Long Short-Term Memory and Time Series models for Equity Indices By Jakub Michańków; Paweł Sakowski; Robert Ślepaczuk
  20. Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams By Ethan Callanan; Amarachi Mbakwe; Antony Papadimitriou; Yulong Pei; Mathieu Sibue; Xiaodan Zhu; Zhiqiang Ma; Xiaomo Liu; Sameena Shah
  21. Do Search Engines Increase Concentration in Media Markets? By Joan Calzada; Nestor Duch-Brown; Ricard Gil
  22. One question at a time! A text mining analysis of the ECB Q&A session By Angino, Siria; Robitu, Robert
  23. Social Media Publicity and New Product Entry via Entrepreneurs By Tong Guo; Boya Xu; Daniel Yi Xu
  24. Stringent COVID-19 government restrictions were associated with a marked increase in Twitter activity in Europe By Millard, Joe; Akimova, Evelina Tamerlanov; Ding, Xuejie; Leasure, Douglas; Zhao, Bo; Mills, Melinda

  1. By: Stephanie Houle; Ryan Macdonald
    Abstract: Predicting which firms will grow quickly and why has been the subject of research studies for many decades. Firms that grow rapidly have the potential to usher in new innovations, products or processes (Kogan et al. 2017), become superstar firms (Haltiwanger et al. 2013) and impact the aggregate labour share (Autor et al. 2020; De Loecker et al. 2020). We explore the use of supervised machine learning techniques to identify a population of nascent high-growth firms using Canadian administrative firm-level data. We apply a suite of supervised machine learning algorithms (elastic net model, random forest and neural net) to determine whether a large set of variables on Canadian firm tax filing financial and employment data, state variables (e.g., industry, geography) and indicators of firm complexity (e.g., multiple industrial activities, foreign ownership) can predict which firms will be high-growth firms over the next three years. The results suggest that the machine learning classifiers can select a sub-population of nascent high-growth firms that includes the majority of actual high-growth firms plus a group of firms that shared similar attributes but failed to attain high-growth status.
    Keywords: Econometric and statistical methods; Firm dynamics
    JEL: C55 C81 L25
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:bca:bocawp:23-53&r=big
  2. By: Fung, Esabella
    Abstract: The online labor market, comprised of companies such as Upwork, Amazon Mechanical Turk, and their freelancer workforce, has expanded worldwide over the past 15 years and has changed the labor market landscape. Although qualitative studies have been done to identify factors related to the global supply to the online labor market, few data modeling studies have been conducted to quantify the importance of these factors in this area. This study applied tree-based supervised learning techniques, decision tree regression, random forest, and gradient boosting, to systematically evaluate the online labor supply with 70 features related to climate, population, economics, education, health, language, and technology adoption. To provide machine learning explainability, SHAP, based on the Shapley values, was introduced to identify features with high marginal contributions. The top 5 contributing features indicate the tight integration of technology adoption, language, and human migration patterns with the online labor market supply.
    Keywords: business, boosting, commerce and trade, digital divide, economics, ensemble learning, globalization, machine learning, random forest, social factors, statistical learning, sharing economy
    JEL: C60 F14 F16 J11 J22 M2
    Date: 2023–10–09
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:118844&r=big
  3. By: Rim, Maria J.; Kwon, Youngsun
    Abstract: The paper addresses the increasing adoption of digital transformation in public sector organizations, mainly focusing on its impact on national statistical offices. The emergence of data-driven strategies powered by artificial intelligence (AI) disrupts the conventional labourintensive approaches of NSOs. This necessitates a delicate balance between real-time information and statistical accuracy, leading to exploring AI applications such as machine learning in data processing. Despite its potential benefits, the cooperation between AI and human resources requires in-depth examination to leverage their combined strengths effectively. The paper proposes an integrative review and multi-case study approach to comprehensively contribute to a deeper understanding of the benefits and costs of AI adoption in national statistical processes, facilitate the acceleration of digital transformation, and provide valuable insights for policymakers and practitioners in optimizing the use of AI in collecting, generating and analyzing national statistics.
    Keywords: Digital transformation, national statistics, artificial intelligence, human resources, data-driven strategy
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:zbw:itse23:278015&r=big
  4. By: Walter Sosa Escudero (UDESA, CONICET, CEDLAS-IIE-UNLP)
    Abstract: La revolución del combo big data-machine learning-inteligencia artificial ha invadido todos los campos del conocimiento y, esperablemente, el de la medición del bienestar no es una excepción. Y, naturalmente, urge preguntar si los enormes problemas de cuantificación de la pobreza o la desigualdad no encontraran una solución rápida y efectiva que provenga de la combinación de datos masivos de big data y los poderosos algoritmos de machine learning y la inteligencia artificial. Esta nota es una introducción técnicamente accesible a los logros y desafíos del uso big data y machine learning para la medición de la pobreza, el desarrollo, la desigualdad y otras dimensiones sociales. Se basa en Sosa Escudero, Anauati y Brau (2022), un artículo abarcativo y técnico, que estudia con detalle el estado de las artes en lo que se refiere al uso de machine learning para los estudios de desarrollo y bienestar, al cual remitiremos para mayores detalles y referencias específicas.
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:dls:wpaper:0319&r=big
  5. By: Leogrande, Angelo; Costantiello, Alberto; Leogrande, Domenico; Anobile, Fabio
    Abstract: In this article, we consider the determinants of the beds in healthcare facilities-BEDS in the Italian regions between 2004 and 2022. We use the ISTAT-BES database. We use different econometric techniques i.e.: Panel Data with Fixed Effects, Panel Data with Random Effects, Pooled Ordinary Least Squares-OLS, Weighted Least Squares-WLS, and Dynamic Panel at 1 Stage. The results show that the level of BEDS is positively associated, among others, to "General Doctors with a Number of Clients over the Threshold" and "Life Satisfaction", and negatively associated among others, to "Trust in Parties" and "Positive Judgment on Future Prospects". Furthermore, we apply a clusterization with the k-Means algorithm optimized with the Silhouette Coefficient and we find the presence of two clusters in terms of BEDS. Finally, we make a confrontation among eight machine-learning algorithms and we find that the best predictor is the ANN-Artificial Neural Network
    Date: 2023–10–05
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:9sjcr&r=big
  6. By: Zhengyong Jiang; Jeyan Thiayagalingam; Jionglong Su; Jinjun Liang
    Abstract: In this paper, we present a novel trading strategy that integrates reinforcement learning methods with clustering techniques for portfolio management in multi-period trading. Specifically, we leverage the clustering method to categorize stocks into various clusters based on their financial indices. Subsequently, we utilize the algorithm Asynchronous Advantage Actor-Critic to determine the trading actions for stocks within each cluster. Finally, we employ the algorithm DDPG to generate the portfolio weight vector, which decides the amount of stocks to buy, sell, or hold according to the trading actions of different clusters. To the best of our knowledge, our approach is the first to combine clustering methods and reinforcement learning methods for portfolio management in the context of multi-period trading. Our proposed strategy is evaluated using a series of back-tests on four datasets, comprising a of 800 stocks, obtained from the Shanghai Stock Exchange and National Association of Securities Deal Automated Quotations sources. Our results demonstrate that our approach outperforms conventional portfolio management techniques, such as the Robust Median Reversion strategy, Passive Aggressive Median Reversion Strategy, and several machine learning methods, across various metrics. In our back-test experiments, our proposed strategy yields an average return of 151% over 360 trading periods with 800 stocks, compared to the highest return of 124% achieved by other techniques over identical trading periods and stocks.
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2310.01319&r=big
  7. By: Leon Bremer (Vrije Universiteit Amsterdam)
    Abstract: When merging firms across large databases in the absence of common identifiers, text algorithms can help. I propose a high-performance fuzzy firm name matching algorithm that uses existing computational methods and works even under hardware restrictions. The algorithm consists of four steps, namely (1) cleaning, (2) similarity scoring, (3) a decision rule based on supervised machine learning, and (4) group identification using community detection. The algorithm is applied to merging firms in the Amadeus Financials and Subsidiaries databases, containing firm-level business and ownership information, to applicants in PATSTAT, a worldwide patent database. For the application the algorithm vastly outperforms an exact string match by increasing the number of matched firms in the Amadeus Financials (Subsidiaries) database with 116% (160%). 53% (74%) of this improvement is due to cleaning, and another 41% (50%) improvement is due to similarity matching. 18.1% of all patent applications since 1950 are matched to firms in the Amadeus databases, compared to 2.6% for an exact name match.
    Keywords: Fuzzy name matching, supervised machine learning, name disambiguation, patents
    JEL: C81 C88 O34
    Date: 2023–10–12
    URL: http://d.repec.org/n?u=RePEc:tin:wpaper:20230055&r=big
  8. By: Mercedes de Luis (Banco de España); Emilio Rodríguez (Banco de España); Diego Torres (Banco de España)
    Abstract: The use of quantitative methods constitutes a standard component of the institutional investors’ portfolio management toolkit. In the last decade, several empirical studies have employed probabilistic or classification models to predict stock market excess returns, model bond ratings and default probabilities, as well as to forecast yield curves. To the authors’ knowledge, little research exists into their application to active fixed-income management. This paper contributes to filling this gap by comparing a machine learning algorithm, the Lasso logit regression, with a passive (buy-and-hold) investment strategy in the construction of a duration management model for high-grade bond portfolios, specifically focusing on US treasury bonds. Additionally, a two-step procedure is proposed, together with a simple ensemble averaging aimed at minimising the potential overfitting of traditional machine learning algorithms. A method to select thresholds that translate probabilities into signals based on conditional probability distributions is also introduced. A large set of financial and economic variables is used as an input to obtain a signal for active duration management relative to a passive benchmark portfolio. As a first result, most of the variables selected by the model are related to financial flows and economic fundamentals, but the parameters seem to be unstable over time, thereby suggesting that the variable relevance may be time dependent. Backtesting of the model, which was carried out on a sovereign bond portfolio denominated in US dollars, resulted in a small but statistically significant outperformance of benchmark index in the out-of-sample dataset after controlling for overfitting. These results support the case for incorporating quantitative tools in the active portfolio management process for institutional investors, but paying special attention to potential overfitting and unstable parameters. Quantitative tools should be viewed as a complementary input to the qualitative and fundamental analysis, together with the portfolio manager’s expertise, in order to make better-informed investment decisions.
    Keywords: machine learning, probabilistic or classification models, Lasso logit regressions, active fixed-income management, absolute excess return, Sharpe ratios, duration management
    JEL: C45 C51 C53 E37 G11
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:bde:wpaper:2324&r=big
  9. By: Billio, Monica; Casarin, Roberto; Costola, Michele; Veggente, Veronica
    Abstract: Measuring and reducing energy consumption constitutes a crucial concern in public policies aimed at mitigating global warming. The real estate sector faces the challenge of enhancing building efficiency, where insights from experts play a pivotal role in the evaluation process. This research employs a machine learning approach to analyze expert opinions, seeking to extract the key determinants influencing potential residential building efficiency and establishing an efficient prediction framework. The study leverages open Energy Performance Certificate databases from two countries with distinct latitudes, namely the UK and Italy, to investigate whether enhancing energy efficiency necessitates different intervention approaches. The findings reveal the existence of non-linear relationships between efficiency and building characteristics, which cannot be captured by conventional linear modeling frameworks. By offering insights into the determinants of residential building efficiency, this study provides guidance to policymakers and stakeholders in formulating effective and sustainable strategies for energy efficiency improvement.
    Keywords: Energy efficiency, Energy Performance Certificate, Machine learning, Tree-based models, big data
    JEL: C10 C53 C50
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:zbw:safewp:403&r=big
  10. By: Nicolas Fanta (Institute of Economic Studies, Charles University, Prague); Roman Horvath (Institute of Economic Studies, Charles University, Prague)
    Abstract: We examine whether artificial intelligence (AI) can decipher European Central Bank´s communication. Employing 1769 inter-meeting verbal communication events of the European Central Bank´s Governing Council members, we construct an AI-based indicator evaluating whether communication is leaning towards easing, tightening or maintaining the monetary policy stance. We find that our AI-based indicator replicates well similar indicators based on human expert judgment but at much higher speed and at much lower costs. Using our AI-based indicator and a number of robustness checks, our regression results show that ECB communication matters for the future monetary policy even after controlling for financial market expectations and lagged monetary policy decisions.
    Keywords: Artificial intelligence, central bank communication, monetary policy
    JEL: E52 E58
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:fau:wpaper:wp2023_29&r=big
  11. By: Ernst Fehr; Thomas Epper; Julien Senn
    Abstract: Parsimony is a desirable feature of economic models but almost all human behaviors are characterized by vast individual variation that appears to defy parsimony. How much parsimony do we need to give up to capture the fundamental aspects of a population’s distributional preferences and to maintain high predictive ability? Using a Bayesian nonparametric clustering method that makes the trade-off between parsimony and descriptive accuracy explicit, we show that three preference types—an inequality averse, an altruistic and a predominantly selfish type—capture the essence of behavioral heterogeneity. These types independently emerge in four different data sets and are strikingly stable over time. They predict out-of-sample behavior equally well as a model that permits all individuals to differ and substantially better than a representative agent model and a state-of-the-art machine learning algorithm. Thus, a parsimonious model with three stable types captures key characteristics of distributional preferences and has excellent predictive power.
    Keywords: Distributional preferences, altruism, inequality aversion, preference heterogeneity, stability, out-of-sample prediction, parsimony, bayesian nonparametrics
    JEL: D31 D63 C49 C90
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:zur:econwp:439&r=big
  12. By: Emily Aiken; Suzanne Bellue; Joshua Blumenstock; Dean Karlan; Christopher R. Udry
    Abstract: Do non-traditional digital trace data and traditional survey data yield similar estimates of the impact of a cash transfer program? In a randomized controlled trial of Togo’s COVID-19 Novissi program, endline survey data indicate positive treatment effects on beneficiary food security, mental health, and self-perceived economic status. However, impact estimates based on mobile phone data – processed with machine learning to predict beneficiary welfare – do not yield similar results, even though related data and methods do accurately predict wealth and consumption in prior cross-sectional analysis in Togo. This limitation likely arises from the underlying difficulty of using mobile phone data to predict short-term changes in wellbeing within a rural population with fairly homogeneous baseline levels of poverty. We discuss the implications of these results for using new digital data sources in impact evaluation.
    JEL: C55 I32 I38
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:31751&r=big
  13. By: Rafael Bernardo Carmona Benitez (Universidad Anahuac Mexico (Mexico)); Maria (Universidad Anahuac Mexico (Mexico))
    Abstract: A methodology to estimate the unmet demand is developed using machine learning algorithms. The unmet demand in an origin-destination airports pair (OD pair) is the unattended number of passengers that could not flight because of economic conditions of supply and demand. The forecast of the unmet demand is important for strategic decisions of new planning such as opening new routes, increasing/decreasing number of services, and aircraft choice. The first contribution of this paper is to develop a single-class methodology to unconstraint or detruncate pax demand to estimate the market size of an OD pair. This methodology mixes time-series methods with the bootstrap distribution function and machine learning algorithms. This methodology considers socioeconomic variables at community zone and airport levels to forecast the market size of an OD pair. The second contribution of this paper is to design a methodology that estimates the unmet demand of an OD pair. The advantage is its ability to simulate the unmet demand based on statistical analysis with a confidence level of (1-α)%. The calculations are evaluated by describing the distribution of the market size historical data because distribution functions give the possibility to calculate pax demand without knowing the parameters that have an influence on it. Finally, the third contribution of this paper is to develop an approach to identify new airline OD pairs which could be considered as potential airline markets based in the calculation of the OD pair unmet demand. The proposed methodology is applied to the US air pax industry as case study. The results indicate that hubs airports are under extreme competition. Small and primary airports located in big cities are not under competition in some quarters of year meaning that socioeconomic factors among airports change according with the seasonality of year.
    Keywords: transporte aereo, pasajeros insatisfechos, air travel, passenger complains
    Date: 2023–05
    URL: http://d.repec.org/n?u=RePEc:amj:wpaper:23003&r=big
  14. By: Saiz, Albert (Massachusetts Institute of Technology); Salazar-Miranda, Arianna (MIT)
    Abstract: Recent advancements in data collection have expanded the tools and information available for urban and spatial-based research. This paper presents an overview of spatial big data sources used in urban science and urban economics, with the goal of directing and enriching future research by other applied economists. We structure our discussion around data origins and analytical methods, discussing geographic information maps, GPS and CDR, textual repositories, social media, credit card transactions, street imagery, sensor readings, volumetric data, street patterns, transportation metrics, public records, geocoded historical data, business analytics, real estate transactions, and crowdsourced input. While aiming to provide an overarching perspective, we also touch upon common challenges in urban big data research, especially those unique to data collection, analysis, and inference.
    Keywords: urban big data
    JEL: J01 C80 R00 R32 R58
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp16501&r=big
  15. By: Jakub Micha\'nk\'ow; {\L}ukasz Kwiatkowski; Janusz Morajda
    Abstract: In this paper, we develop a hybrid approach to forecasting the volatility and risk of financial instruments by combining common econometric GARCH time series models with deep learning neural networks. For the latter, we employ Gated Recurrent Unit (GRU) networks, whereas four different specifications are used as the GARCH component: standard GARCH, EGARCH, GJR-GARCH and APARCH. Models are tested using daily logarithmic returns on the S&P 500 index as well as gold price Bitcoin prices, with the three assets representing quite distinct volatility dynamics. As the main volatility estimator, also underlying the target function of our hybrid models, we use the price-range-based Garman-Klass estimator, modified to incorporate the opening and closing prices. Volatility forecasts resulting from the hybrid models are employed to evaluate the assets' risk using the Value-at-Risk (VaR) and Expected Shortfall (ES) at two different tolerance levels of 5% and 1%. Gains from combining the GARCH and GRU approaches are discussed in the contexts of both the volatility and risk forecasts. In general, it can be concluded that the hybrid solutions produce more accurate point volatility forecasts, although it does not necessarily translate into superior VaR and ES forecasts.
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2310.01063&r=big
  16. By: Susan Athey; Niall Keleher; Jann Spiess
    Abstract: In many settings, interventions may be more effective for some individuals than others, so that targeting interventions may be beneficial. We analyze the value of targeting in the context of a large-scale field experiment with over 53, 000 college students, where the goal was to use "nudges" to encourage students to renew their financial-aid applications before a non-binding deadline. We begin with baseline approaches to targeting. First, we target based on a causal forest that estimates heterogeneous treatment effects and then assigns students to treatment according to those estimated to have the highest treatment effects. Next, we evaluate two alternative targeting policies, one targeting students with low predicted probability of renewing financial aid in the absence of the treatment, the other targeting those with high probability. The predicted baseline outcome is not the ideal criterion for targeting, nor is it a priori clear whether to prioritize low, high, or intermediate predicted probability. Nonetheless, targeting on low baseline outcomes is common in practice, for example because the relationship between individual characteristics and treatment effects is often difficult or impossible to estimate with historical data. We propose hybrid approaches that incorporate the strengths of both predictive approaches (accurate estimation) and causal approaches (correct criterion); we show that targeting intermediate baseline outcomes is most effective, while targeting based on low baseline outcomes is detrimental. In one year of the experiment, nudging all students improved early filing by an average of 6.4 percentage points over a baseline average of 37% filing, and we estimate that targeting half of the students using our preferred policy attains around 75% of this benefit.
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2310.08672&r=big
  17. By: Sukwoong Choi; Hyo Kang; Namil Kim; Junsik Kim
    Abstract: We study how humans learn from AI, exploiting an introduction of an AI-powered Go program (APG) that unexpectedly outperformed the best professional player. We compare the move quality of professional players to that of APG's superior solutions around its public release. Our analysis of 749, 190 moves demonstrates significant improvements in players' move quality, accompanied by decreased number and magnitude of errors. The effect is pronounced in the early stages of the game where uncertainty is highest. In addition, younger players and those in AI-exposed countries experience greater improvement, suggesting potential inequality in learning from AI. Further, while players of all levels learn, less skilled players derive higher marginal benefits. These findings have implications for managers seeking to adopt and utilize AI effectively within their organizations.
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2310.08704&r=big
  18. By: Brunori, Paolo; Ferreira, Francisco H. G.; Salas-Rojo, Pedro
    Abstract: Scholars have sought to quantify the extent of inequality which is inherited from past generations in many different ways, including a large body of work on intergenerational mobility and inequality of opportunity. This paper makes three contributions to that broad literature. First, we show that many of the most prominent approaches to measuring mobility or inequality of opportunity fit within a general framework which involves, as a first step, a calculation of the extent to which inherited circumstances can predict current incomes. The importance of prediction has led to recent applications of machine learning tools to solve the model selection challenge in the presence of competing upward and downward biases. Our second contribution is to apply transformation trees to the computation of inequality of opportunity. Because the algorithm is built on a likelihood maximization that involves splitting the sample into groups with the most salient differences between their conditional cumulative distributions, it is particularly well-suited to measuring ex-post inequality of opportunity, following Roemer (1998). Our third contribution is to apply the method to data from South Africa, arguably the world’s most unequal country, and find that almost threequarters of its current inequality is inherited from predetermined circumstances, with race playing the largest role, but parental background also making an important contribution.
    Keywords: inequality; opportunity; mobility; transformation trees; South Africa
    JEL: D31 D63 J62
    Date: 2023–09–01
    URL: http://d.repec.org/n?u=RePEc:ehl:lserod:120308&r=big
  19. By: Jakub Michańków (Cracow University of Economics, Department of Informatics; University of Warsaw, Faculty of Economic Sciences, Quantitative Finance Research Group, Department of Quantitative Finance); Paweł Sakowski (University of Warsaw, Faculty of Economic Sciences, Quantitative Finance Research Group, Department of Quantitative Finance); Robert Ślepaczuk (University of Warsaw, Faculty of Economic Sciences, Quantitative Finance Research Group, Department of Quantitative Finance)
    Abstract: This paper proposes a novel approach to hedging portfolios of risky assets when financial markets are affected by financial turmoils. We introduce a completely novel approach to diversification activity not on the level of single assets but on the level of ensemble algorithmic investment strategies (AIS) built based on the prices of these assets. We employ four types of diverse theoretical models (LSTM - Long Short-Term Memory, ARIMA-GARCH - Autoregressive Integrated Moving Average - Generalized Autoregressive Conditional Heteroskedasticity, momentum, and contrarian) to generate price forecasts, which are then used to produce investment signals in single and complex AIS. In such a way, we are able to verify the diversification potential of different types of investment strategies consisting of various assets (energy commodities, precious metals, cryptocurrencies, or soft commodities) in hedging ensemble AIS built for equity indices (S&P 500 index). Empirical data used in this study cover the period between 2004 and 2022. Our main conclusion is that LSTM-based strategies outperform the other models and that the best diversifier for the AIS built for the S&P 500 index is the AIS built for Bitcoin. Finally, we test the LSTM model for a higher frequency of data (1 hour). We conclude that it outperforms the results obtained using daily data.
    Keywords: machine learning, recurrent neural networks, long short-term memory, algorithmic investment strategies, testing architecture, loss function, walk-forward optimization, over-optimization
    JEL: C4 C14 C45 C53 C58 G13
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:war:wpaper:2023-25&r=big
  20. By: Ethan Callanan; Amarachi Mbakwe; Antony Papadimitriou; Yulong Pei; Mathieu Sibue; Xiaodan Zhu; Zhiqiang Ma; Xiaomo Liu; Sameena Shah
    Abstract: Large Language Models (LLMs) have demonstrated remarkable performance on a wide range of Natural Language Processing (NLP) tasks, often matching or even beating state-of-the-art task-specific models. This study aims at assessing the financial reasoning capabilities of LLMs. We leverage mock exam questions of the Chartered Financial Analyst (CFA) Program to conduct a comprehensive evaluation of ChatGPT and GPT-4 in financial analysis, considering Zero-Shot (ZS), Chain-of-Thought (CoT), and Few-Shot (FS) scenarios. We present an in-depth analysis of the models' performance and limitations, and estimate whether they would have a chance at passing the CFA exams. Finally, we outline insights into potential strategies and improvements to enhance the applicability of LLMs in finance. In this perspective, we hope this work paves the way for future studies to continue enhancing LLMs for financial reasoning through rigorous evaluation.
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2310.08678&r=big
  21. By: Joan Calzada; Nestor Duch-Brown; Ricard Gil
    Abstract: Search engines are important access channels to news content of traditional newspapers with Google alone responsible for 35% of online visits to news outlets in the European Union. Yet, the effects of Google Search on market competition and information diversity have received scant attention. Using daily traffic data for 606 news outlets from 15 European countries, we analyze Google’s capacity to influence organic search visits by exploiting exogenous variation in news outlets’ indexation caused by nine core algorithm updates rolled out by Google between 2018 and 2020. We find Google core updates overall reduced the number of keywords (queries) for which news outlets occupy one of the top 10 organic search results positions. Therefore, given the positive impact that the number of top keywords have on traffic this led to the decrease in the overall number of news outlets’ visits. Finally, when studying the impact of Google core updates on media market concentration, we find the three “big” core updates identified in this period reduced market concentration by 1%, but this effect was offset by the rest of the updates. Similarly, in the context of Spain, we find the three “big” core updates reduced monthly keyword concentration by 4%.
    Keywords: search engines, market concentration Google, news sites, Europe
    JEL: D43 L50 L82 M31
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:ces:ceswps:_10671&r=big
  22. By: Angino, Siria; Robitu, Robert
    Abstract: News media play a fundamental role in the communication between central banks and the public. Besides stimulating institutional transparency, the reporting of the news media on a central bank’s activities is also the main source of information about the institution for most citizens. To better understand how this intermediation process works, this paper explores the Q&A session of the European Central Bank (ECB)’s press conferences, where journalists have an opportunity to set the discussion and inquire into the central bank’s thinking. Using a structural topic model on a novel dataset consisting of all questions asked at ECB press conferences since May 2012, we conduct a systematic examination of the topics the ECB is questioned about and uncover differences in the focus of outlets from different geographical areas and with different types of audiences. We find that international outlets devote more attention to technical topics, relevant for market participants, while domestic media in the European Union (EU) dedicate greater focus to national affairs and the more political dimensions of the ECB’s activities. JEL Classification: E52, E58, E59
    Keywords: Central bank communication, European Central Bank, media, Structural Topic Modelling
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:ecb:ecbwps:20232852&r=big
  23. By: Tong Guo (Fuqua School of Business, Duke University, 100 Fuqua Drive, Durham, NC 27708); Boya Xu (Fuqua School of Business, Duke University, 100 Fuqua Drive, Durham, NC 27708); Daniel Yi Xu (Department of Economics, Duke University, 213 Social Science Building, 419 Chapel Drive, Durham, NC 27708)
    Abstract: We study the early-stage adoption of impossible meat products by local businesses, overcoming the common challenges in understanding new product entries via local intermediaries: 1) empirically tracking intermediary decisions at scale in a timely manner is difficult if not at all impossible; 2) marketing communications for the innovation is largely endogenous to unobserved demand shocks, making it hard to causally identify the driving factors behind the innovation adoption. Focusing on the key producers in their US market debut between 2015-2019, we construct a novel location-specific adoption metric that accurately measures the decisions of local intermediaries, and link it to comprehensive marketing communication extracted from social media corpus using Natural Language Processing. Using an identification strategy interacting the global shocks in news content with pre-determined local shares of topic-specific news consumption, we find that local news mentioning the innovation increases the regional adoption of impossible meat products by intermediaries. Interestingly, news content about producer financials appears to be as important as content about sustainability in driving local adoption of impossible meat products. We conjecture that financial news plays a role in boosting the perceived market potential of the innovation both by signaling the trustworthiness of the technology (thus lower uncertainty) and by reinforcing the trendiness of the technology (thus providing free marketing to small businesses who adopted the innovation). We further explore news topic heterogeneities by socio-economic conditions and timing.
    Keywords: innovation, new product entry, entrepreneurship, social media marketing, news, sustainability, health, natural language processing, topic modeling
    JEL: C14 C81 D40 L10 L11 M31 O33
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:net:wpaper:2306&r=big
  24. By: Millard, Joe; Akimova, Evelina Tamerlanov; Ding, Xuejie; Leasure, Douglas; Zhao, Bo; Mills, Melinda
    Abstract: The COVID-19 pandemic has had an unprecedented effect on health, well-being, and socioeconomic conditions worldwide. One consequence was changes in social media activity, disruption of schedules, and potentially sleep. We use Twitter data to explore changes in daily and nightly online activity at the onset of the COVID-19 pandemic in 2020. Using a pseudo-random sample of 2, 489 users across 6 cities in the UK (Aberdeen, Belfast, Bristol, Cardiff, London, and Manchester), 4 cities in Italy (Milan, Naples, Rome, and Turin), and 4 cities in Sweden (Göteborg, Malmo, Stockholm, Uppsala), we test the extent to which the COVID-19 pandemic changed online activity in Europe. Using a dataset of ~24 million tweets, we show that tweet activity increased by ~20% in 2020 relative to the previous non-pandemic year of 2019. We further show that tweet activity is associated with the degree of government response to COVID-19, particularly during the day, and that the stringency of restrictions was the strongest predictive component of change in tweet count.
    Date: 2023–10–03
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:g9apk&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.