nep-big New Economics Papers
on Big Data
Issue of 2023‒10‒23
twenty-one papers chosen by
Tom Coupé, University of Canterbury


  1. Can I Trust the Explanations? Investigating Explainable Machine Learning Methods for Monotonic Models By Dangxing Chen
  2. Automatic Product Classification in International Trade: Machine Learning and Large Language Models By Marra de Artiñano, Ignacio; Riottini Depetris, Franco; Volpe Martincus, Christian
  3. Short-Term Stock Price Forecasting using exogenous variables and Machine Learning Algorithms By Albert Wong; Steven Whang; Emilio Sagre; Niha Sachin; Gustavo Dutra; Yew-Wei Lim; Gaetan Hains; Youry Khmelevsky; Frank Zhang
  4. Transformers versus LSTMs for electronic trading By Paul Bilokon; Yitao Qiu
  5. Startup success prediction and VC portfolio simulation using CrunchBase data By Mark Potanin; Andrey Chertok; Konstantin Zorin; Cyril Shtabtsovsky
  6. A Comprehensive Review on Financial Explainable AI By Wei Jie Yeo; Wihan van der Heever; Rui Mao; Erik Cambria; Ranjan Satapathy; Gianmarco Mengaldo
  7. Mean Absolute Directional Loss as a New Loss Function for Machine Learning Problems in Algorithmic Investment Strategies By Jakub Michańków; Paweł Sakowski; Robert Ślepaczuk
  8. AI Adoption in America: Who, What, and Where By Kristina McElheran; J. Frank Li; Erik Brynjolfsson; Zachary Krof; Emin Dinlersoz; Lucia Foster; Nikolas Zolas
  9. Mean Absolute Directional Loss as a New Loss Function for Machine Learning Problems in Algorithmic Investment Strategies By Jakub Micha\'nk\'ow; Pawe{\l} Sakowski; Robert \'Slepaczuk
  10. Utiliser la presse pour construire un nouvel indicateur de perception d’inflation en France By De Bandt Olivier; Bricongne Jean-Charles; Denes Julien; Dhenin Alexandre; De Gaye Annabelle; Robert Pierre-Antoine
  11. Quantifying Credit Portfolio sensitivity to asset correlations with interpretable generative neural networks By Sergio Caprioli; Emanuele Cagliero; Riccardo Crupi
  12. Electricity price forecasting on the day-ahead market using machine learning By Léonard Tschora; Erwan Pierre; Marc Plantevit; Céline Robardet
  13. Forecasting Global Maize Prices From Regional Productions By Rotem Zelingher; David Makowski
  14. Stock Market Sentiment Classification and Backtesting via Fine-tuned BERT By Jiashu Lou
  15. New News is Bad News By Paul Glasserman; Harry Mamaysky; Jimmy Qin
  16. Univariate Forecasting for REIT with Deep Learning: A Comparative Analysis with an ARIMA Model By Axelsson, Birger; Song, Han-Suck
  17. An Ensemble Method of Deep Reinforcement Learning for Automated Cryptocurrency Trading By Shuyang Wang; Diego Klabjan
  18. Platform Competition and Information Sharing By Georgios Petropoulos; Bertin Martens; Geoffrey Parker; Marshall Van Alstyne
  19. Estimating the Long-term Effects of a Fruit Fly Eradication Program Using Satellite Imagery By Salazar, Lina; Agurto Adrianzen, Marcos; Alvarez, Luis
  20. Stock Volatility Prediction Based on Transformer Model Using Mixed-Frequency Data By Wenting Liu; Zhaozhong Gui; Guilin Jiang; Lihua Tang; Lichun Zhou; Wan Leng; Xulong Zhang; Yujiang Liu
  21. Beyond Citations: Text-Based Metrics for Assessing Novelty and its Impact in Scientific Publications By Sam Arts; Nicola Melluso; Reinhilde Veugelers

  1. By: Dangxing Chen
    Abstract: In recent years, explainable machine learning methods have been very successful. Despite their success, most explainable machine learning methods are applied to black-box models without any domain knowledge. By incorporating domain knowledge, science-informed machine learning models have demonstrated better generalization and interpretation. But do we obtain consistent scientific explanations if we apply explainable machine learning methods to science-informed machine learning models? This question is addressed in the context of monotonic models that exhibit three different types of monotonicity. To demonstrate monotonicity, we propose three axioms. Accordingly, this study shows that when only individual monotonicity is involved, the baseline Shapley value provides good explanations; however, when strong pairwise monotonicity is involved, the Integrated gradients method provides reasonable explanations on average.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.13246&r=big
  2. By: Marra de Artiñano, Ignacio; Riottini Depetris, Franco; Volpe Martincus, Christian
    Abstract: Accurately classifying products is essential in international trade. Virtually all countries categorize products into tariff lines using the Harmonized System (HS) nomenclature for both statistical and duty collection purposes. In this paper, we apply and assess several different algorithms to automatically classify products based on text descriptions. To do so, we use agricultural product descriptions from several public agencies, including customs authorities and the United States Department of Agriculture (USDA). We find that while traditional machine learning (ML) models tend to perform well within the dataset in which they were trained, their precision drops dramatically when implemented outside of it. In contrast, large language models (LLMs) such as GPT 3.5 show a consistently good performance across all datasets, with accuracy rates ranging between 60% and 90% depending on HS aggregation levels. Our analysis highlights the valuable role that artificial intelligence (AI) can play in facilitating product classification at scale and, more generally, in enhancing the categorization of unstructured data.
    Keywords: Product Classification;machine learning;Large Language Models;Trade
    JEL: F10 C55 C81 C88
    Date: 2023–07
    URL: http://d.repec.org/n?u=RePEc:idb:brikps:12962&r=big
  3. By: Albert Wong; Steven Whang; Emilio Sagre; Niha Sachin; Gustavo Dutra; Yew-Wei Lim; Gaetan Hains; Youry Khmelevsky; Frank Zhang
    Abstract: Creating accurate predictions in the stock market has always been a significant challenge in finance. With the rise of machine learning as the next level in the forecasting area, this research paper compares four machine learning models and their accuracy in forecasting three well-known stocks traded in the NYSE in the short term from March 2020 to May 2022. We deploy, develop, and tune XGBoost, Random Forest, Multi-layer Perceptron, and Support Vector Regression models. We report the models that produce the highest accuracies from our evaluation metrics: RMSE, MAPE, MTT, and MPE. Using a training data set of 240 trading days, we find that XGBoost gives the highest accuracy despite running longer (up to 10 seconds). Results from this study may improve by further tuning the individual parameters or introducing more exogenous variables.
    Date: 2023–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.00618&r=big
  4. By: Paul Bilokon; Yitao Qiu
    Abstract: With the rapid development of artificial intelligence, long short term memory (LSTM), one kind of recurrent neural network (RNN), has been widely applied in time series prediction. Like RNN, Transformer is designed to handle the sequential data. As Transformer achieved great success in Natural Language Processing (NLP), researchers got interested in Transformer's performance on time series prediction, and plenty of Transformer-based solutions on long time series forecasting have come out recently. However, when it comes to financial time series prediction, LSTM is still a dominant architecture. Therefore, the question this study wants to answer is: whether the Transformer-based model can be applied in financial time series prediction and beat LSTM. To answer this question, various LSTM-based and Transformer-based models are compared on multiple financial prediction tasks based on high-frequency limit order book data. A new LSTM-based model called DLSTM is built and new architecture for the Transformer-based model is designed to adapt for financial prediction. The experiment result reflects that the Transformer-based model only has the limited advantage in absolute price sequence prediction. The LSTM-based models show better and more robust performance on difference sequence prediction, such as price difference and price movement.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.11400&r=big
  5. By: Mark Potanin; Andrey Chertok; Konstantin Zorin; Cyril Shtabtsovsky
    Abstract: Predicting startup success presents a formidable challenge due to the inherently volatile landscape of the entrepreneurial ecosystem. The advent of extensive databases like Crunchbase jointly with available open data enables the application of machine learning and artificial intelligence for more accurate predictive analytics. This paper focuses on startups at their Series B and Series C investment stages, aiming to predict key success milestones such as achieving an Initial Public Offering (IPO), attaining unicorn status, or executing a successful Merger and Acquisition (M\&A). We introduce novel deep learning model for predicting startup success, integrating a variety of factors such as funding metrics, founder features, industry category. A distinctive feature of our research is the use of a comprehensive backtesting algorithm designed to simulate the venture capital investment process. This simulation allows for a robust evaluation of our model's performance against historical data, providing actionable insights into its practical utility in real-world investment contexts. Evaluating our model on Crunchbase's, we achieved a 14 times capital growth and successfully identified on B round high-potential startups including Revolut, DigitalOcean, Klarna, Github and others. Our empirical findings illuminate the importance of incorporating diverse feature sets in enhancing the model's predictive accuracy. In summary, our work demonstrates the considerable promise of deep learning models and alternative unstructured data in predicting startup success and sets the stage for future advancements in this research area.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.15552&r=big
  6. By: Wei Jie Yeo; Wihan van der Heever; Rui Mao; Erik Cambria; Ranjan Satapathy; Gianmarco Mengaldo
    Abstract: The success of artificial intelligence (AI), and deep learning models in particular, has led to their widespread adoption across various industries due to their ability to process huge amounts of data and learn complex patterns. However, due to their lack of explainability, there are significant concerns regarding their use in critical sectors, such as finance and healthcare, where decision-making transparency is of paramount importance. In this paper, we provide a comparative survey of methods that aim to improve the explainability of deep learning models within the context of finance. We categorize the collection of explainable AI methods according to their corresponding characteristics, and we review the concerns and challenges of adopting explainable AI methods, together with future directions we deemed appropriate and important.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.11960&r=big
  7. By: Jakub Michańków (Cracow University of Economics, Department of Informatics; University of Warsaw, Faculty of Economic Sciences, Quantitative Finance Research Group, Department of Quantitative Finance); Paweł Sakowski (University of Warsaw, Faculty of Economic Sciences, Quantitative Finance Research Group, Department of Quantitative Finance); Robert Ślepaczuk (University of Warsaw, Faculty of Economic Sciences, Quantitative Finance Research Group, Department of Quantitative Finance)
    Abstract: This paper investigates the issue of an adequate loss function in the optimization of machine learning models used in the forecasting of financial time series for the purpose of algorithmic investment strategies (AIS) construction. We propose the Mean Absolute Directional Loss (MADL) function, solving important problems of classical forecast error functions in extracting information from forecasts to create efficient buy/sell signals in algorithmic investment strategies. Finally, based on the data from two different asset classes (cryptocurrencies: Bitcoin and commodities: Crude Oil), we show that the new loss function enables us to select better hyperparameters for the LSTM model and obtain more efficient investment strategies, regarding risk-adjusted return metrics on the out-of-sample data.
    Keywords: machine learning, recurrent neural networks, long short-term memory, algorithmic investment strategies, testing architecture, loss function, walk-forward optimization, over-optimization
    JEL: C4 C14 C45 C53 C58 G13
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:war:wpaper:2023-23&r=big
  8. By: Kristina McElheran; J. Frank Li; Erik Brynjolfsson; Zachary Krof; Emin Dinlersoz; Lucia Foster; Nikolas Zolas
    Abstract: We study the early adoption and diffusion of five AI-related technologies (automated-guided vehicles, machine learning, machine vision, natural language processing, and voice recognition) as documented in the 2018 Annual Business Survey of 850, 000 firms across the United States. We find that fewer than 6% of firms used any of the AI-related technologies we measure, though most very large firms reported at least some AI use. Weighted by employment, average adoption was just over 18%. Among dynamic young firms, AI use was highest alongside more-educated, more-experienced, and younger owners, including owners motivated by bringing new ideas to market or helping the community. AI adoption was also more common in startups displaying indicators of high-growth entrepreneurship, such as venture capital funding, recent innovation, and growth-oriented business strategies. Adoption was far from evenly spread across America: a handful of “superstar” cities and emerging technology hubs led startups’ use of AI. These patterns of early AI use foreshadow economic and social impacts far beyond its limited initial diffusion, with the possibility of a growing “AI divide” if early patterns persist.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:cen:wpaper:23-48&r=big
  9. By: Jakub Micha\'nk\'ow; Pawe{\l} Sakowski; Robert \'Slepaczuk
    Abstract: This paper investigates the issue of an adequate loss function in the optimization of machine learning models used in the forecasting of financial time series for the purpose of algorithmic investment strategies (AIS) construction. We propose the Mean Absolute Directional Loss (MADL) function, solving important problems of classical forecast error functions in extracting information from forecasts to create efficient buy/sell signals in algorithmic investment strategies. Finally, based on the data from two different asset classes (cryptocurrencies: Bitcoin and commodities: Crude Oil), we show that the new loss function enables us to select better hyperparameters for the LSTM model and obtain more efficient investment strategies, with regard to risk-adjusted return metrics on the out-of-sample data.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.10546&r=big
  10. By: De Bandt Olivier; Bricongne Jean-Charles; Denes Julien; Dhenin Alexandre; De Gaye Annabelle; Robert Pierre-Antoine
    Abstract: The paper applies Natural Language Processing techniques (NLP) to the quasi-universe of newspaper articles for France, concentrating on the period 2004-2022, in order to measure inflation attention as well as perceptions by households and firms for that country. The indicator, constructed along the lines of a balance of opinions, is well correlated with actual HICP inflation. It also exhibits good forecasting properties for the European Commission survey on households’ inflation expectations, as well as overall HICP inflation. The method used is a supervised approach that we describe step-by-step. It performs better on our data than the Latent-Dirichlet-Allocation (LDA)-based approach of Angelico et al. (2022). The indicator can be used as an early real-time indicator of future inflation developments and expectations. It also provides a new set of indicators at a time when central banks monitor inflation through new types of surveys of households and firms.
    Keywords: Inflation, Natural Language Processing, Households and Firms, Expectations, Machine Learning
    JEL: C53 C55 D84 E31 E58
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:bfr:banfra:921&r=big
  11. By: Sergio Caprioli; Emanuele Cagliero; Riccardo Crupi
    Abstract: In this research, we propose a novel approach for the quantification of credit portfolio Value-at-Risk (VaR) sensitivity to asset correlations with the use of synthetic financial correlation matrices generated with deep learning models. In previous work Generative Adversarial Networks (GANs) were employed to demonstrate the generation of plausible correlation matrices, that capture the essential characteristics observed in empirical correlation matrices estimated on asset returns. Instead of GANs, we employ Variational Autoencoders (VAE) to achieve a more interpretable latent space representation. Through our analysis, we reveal that the VAE latent space can be a useful tool to capture the crucial factors impacting portfolio diversification, particularly in relation to credit portfolio sensitivity to asset correlations changes.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.08652&r=big
  12. By: Léonard Tschora (LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information - UL2 - Université Lumière - Lyon 2 - ECL - École Centrale de Lyon - Université de Lyon - UCBL - Université Claude Bernard Lyon 1 - Université de Lyon - INSA Lyon - Institut National des Sciences Appliquées de Lyon - Université de Lyon - INSA - Institut National des Sciences Appliquées - CNRS - Centre National de la Recherche Scientifique, DM2L - Data Mining and Machine Learning - LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information - UL2 - Université Lumière - Lyon 2 - ECL - École Centrale de Lyon - Université de Lyon - UCBL - Université Claude Bernard Lyon 1 - Université de Lyon - INSA Lyon - Institut National des Sciences Appliquées de Lyon - Université de Lyon - INSA - Institut National des Sciences Appliquées - CNRS - Centre National de la Recherche Scientifique); Erwan Pierre; Marc Plantevit; Céline Robardet (LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information - UL2 - Université Lumière - Lyon 2 - ECL - École Centrale de Lyon - Université de Lyon - UCBL - Université Claude Bernard Lyon 1 - Université de Lyon - INSA Lyon - Institut National des Sciences Appliquées de Lyon - Université de Lyon - INSA - Institut National des Sciences Appliquées - CNRS - Centre National de la Recherche Scientifique, DM2L - Data Mining and Machine Learning - LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information - UL2 - Université Lumière - Lyon 2 - ECL - École Centrale de Lyon - Université de Lyon - UCBL - Université Claude Bernard Lyon 1 - Université de Lyon - INSA Lyon - Institut National des Sciences Appliquées de Lyon - Université de Lyon - INSA - Institut National des Sciences Appliquées - CNRS - Centre National de la Recherche Scientifique)
    Abstract: The price of electricity on the European market is very volatile. This is due both to its mode of production by different sources, each with its own constraints (volume of production, dependence on the weather, or production inertia), and by the difficulty of its storage. Being able to predict the prices of the next day is an important issue, to allow the development of intelligent uses of electricity. In this article, we investigate the capabilities of different machine learning techniques to accurately predict electricity prices. Specifically, we extend current state-of-the-art approaches by considering previously unused predictive features such as price histories of neighboring countries. We show that these features significantly improve the quality of forecasts, even in the current period when sudden changes are occurring. We also develop an analysis of the contribution of the different features in model prediction using Shap values, in order to shed light on how models make their prediction and to build user confidence in models.
    Date: 2022–05
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-03621974&r=big
  13. By: Rotem Zelingher (ECO-PUB - Economie Publique - AgroParisTech - Université Paris-Saclay - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement); David Makowski (MIA Paris-Saclay - Mathématiques et Informatique Appliquées - AgroParisTech - Université Paris-Saclay - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement)
    Abstract: This study analyses the quality of six regression algorithms in forecasting the monthly price of maize in its primary international trading market, using publicly available data of agricultural production at a regional scale. The forecasting process is done between one and twelve months ahead, using six different forecasting techniques. Three (CART, RF, and GBM) are tree-based machine learning techniques that capture the relative influence of maize-producing regions on global maize price variations. Additionally, we consider two types of linear models—standard multiple linear regression and vector autoregressive (VAR) model. Finally, TBATS serves as an advanced time-series model that holds the advantages of several commonly used time-series algorithms. The predictive capabilities of these six methods are compared by cross-validation. We find RF and GBM have superior forecasting abilities relative to the linear models. At the same time, TBATS is more accurate for short time forecasts when the time horizon is shorter than three months. On top of that, all models are trained to assess the marginal contribution of each producing region to the most extreme price shocks that occurred through the past 60 years of data in both positive and negative directions, using Shapley decompositions. Our results reveal a strong influence of North-American yield variation on the global price, except for the last months preceding the new-crop season.
    Keywords: Price forecasting, Regional production
    Date: 2022–04–28
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-03764942&r=big
  14. By: Jiashu Lou
    Abstract: With the rapid development of big data and computing devices, low-latency automatic trading platforms based on real-time information acquisition have become the main components of the stock trading market, so the topic of quantitative trading has received widespread attention. And for non-strongly efficient trading markets, human emotions and expectations always dominate market trends and trading decisions. Therefore, this paper starts from the theory of emotion, taking East Money as an example, crawling user comment titles data from its corresponding stock bar and performing data cleaning. Subsequently, a natural language processing model BERT was constructed, and the BERT model was fine-tuned using existing annotated data sets. The experimental results show that the fine-tuned model has different degrees of performance improvement compared to the original model and the baseline model. Subsequently, based on the above model, the user comment data crawled is labeled with emotional polarity, and the obtained label information is combined with the Alpha191 model to participate in regression, and significant regression results are obtained. Subsequently, the regression model is used to predict the average price change for the next five days, and use it as a signal to guide automatic trading. The experimental results show that the incorporation of emotional factors increased the return rate by 73.8\% compared to the baseline during the trading period, and by 32.41\% compared to the original alpha191 model. Finally, we discuss the advantages and disadvantages of incorporating emotional factors into quantitative trading, and give possible directions for further research in the future.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.11979&r=big
  15. By: Paul Glasserman; Harry Mamaysky; Jimmy Qin
    Abstract: An increase in the novelty of news predicts negative stock market returns and negative macroeconomic outcomes over the next year. We quantify news novelty - changes in the distribution of news text - through an entropy measure, calculated using a recurrent neural network applied to a large news corpus. Entropy is a better out-of-sample predictor of market returns than a collection of standard measures. Cross-sectional entropy exposure carries a negative risk premium, suggesting that assets that positively covary with entropy hedge the aggregate risk associated with shifting news language. Entropy risk cannot be explained by existing long-short factors.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.05560&r=big
  16. By: Axelsson, Birger (Department of Real Estate and Construction Management, Royal Institute of Technology); Song, Han-Suck (Department of Real Estate and Construction Management, Royal Institute of Technology)
    Abstract: This study aims to investigate whether the newly developed deep learning-based algorithms, specifically Long-Short Term Memory (LSTM), outperform traditional algorithms in forecasting Real Estate Investment Trust (REIT) returns. The empirical analysis conducted in this research compares the forecasting performance of LSTM and Autoregressive Integrated Moving Average (ARIMA) models using out-of-sample data. The results demonstrate that in general, the LSTM model does not exhibit superior performance over the ARIMA model for forecasting REIT returns. While the LSTM model showed some improvement over the ARIMA model for shorter forecast horizons, it did not demonstrate a significant advantage in the majority of forecast scenarios, including both recursive multi-step forecasts and rolling forecasts. The comparative evaluation reveals that neither the LSTM nor ARIMA model demonstrated satisfactory performance in predicting REIT returns out-of-sample for longer forecast horizons. This outcome aligns with the efficient market hypothesis, suggesting that REIT returns may exhibit a random walk behavior. While this observation does not exclude other potential factors contributing to the models' performance, it supports the notion of the presence of market efficiency in the REIT sector. The error rates obtained by both models were comparable, indicating the absence of a significant advantage for LSTM over ARIMA, as well as the challenges in accurately predicting REIT returns using these approaches. These findings emphasize the need for careful consideration when employing advanced deep learning techniques, such as LSTM, in the context of REIT return forecasting and financial time series. While LSTM has shown promise in various domains, its performance in the context of financial time series forecasting, particularly with a univariate regression approach using daily data, may be influenced by multiple factors. Potential reasons for the observed limitations of our LSTM model, within this specific framework, include the presence of significant noise in the daily data and the suitability of the LSTM model for financial time series compared to other problem domains. However, it is important to acknowledge that there could be additional factors that impact the performance of LSTM models in financial time series forecasting, warranting further investigation and exploration. This research contributes to the understanding of the applicability of deep learning algorithms in the context of REIT return forecasting and encourages further exploration of alternative methodologies for improved forecasting accuracy in this domain.
    Keywords: Forecasting; Equity REITs; deep learning; LSTM; ARIMA
    JEL: G17 G19
    Date: 2023–09–28
    URL: http://d.repec.org/n?u=RePEc:hhs:kthrec:2023_010&r=big
  17. By: Shuyang Wang; Diego Klabjan
    Abstract: We propose an ensemble method to improve the generalization performance of trading strategies trained by deep reinforcement learning algorithms in a highly stochastic environment of intraday cryptocurrency portfolio trading. We adopt a model selection method that evaluates on multiple validation periods, and propose a novel mixture distribution policy to effectively ensemble the selected models. We provide a distributional view of the out-of-sample performance on granular test periods to demonstrate the robustness of the strategies in evolving market conditions, and retrain the models periodically to address non-stationarity of financial data. Our proposed ensemble method improves the out-of-sample performance compared with the benchmarks of a deep reinforcement learning strategy and a passive investment strategy.
    Date: 2023–07
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.00626&r=big
  18. By: Georgios Petropoulos; Bertin Martens; Geoffrey Parker; Marshall Van Alstyne
    Abstract: Digital platforms, empowered by artificial intelligence algorithms, facilitate efficient interactions between consumers and merchants that allow the collection of profiling information which drives innovation and welfare. Private incentives, however, lead to information asymmetries resulting in market failures. This paper develops a product differentiation model of competition between two platforms to study private and social incentives to share information. Sharing information can be welfare-enhancing because it solves the data bottleneck market failure. Our findings imply that there is scope for the introduction of a mandatory information sharing mechanism from big tech to their competitors that help the latter to improve their network value proposition and become more competitive in the market. The price of information in this sharing mechanism matters. We show that price regulation over information sharing like the one applied in the EU jurisdiction increases the incentives of big platforms to collect and analyze more data. It has ambiguous effects on their competitors that depend on the exact relationship between information and network value.
    Keywords: information sharing, digital platforms, data bottleneck, data portability
    JEL: D47 D82 K21 L21 L22 L40 L41 L43 L51 L86
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:ces:ceswps:_10663&r=big
  19. By: Salazar, Lina; Agurto Adrianzen, Marcos; Alvarez, Luis
    Abstract: This analysis applies a regression discontinuity approach combined with remote sensing data to measure the productivity impacts linked to a fruit-fly eradication program, implemented in Peru. For this purpose, satellite imagery was used to estimate a vegetation index over a 10-year span for a sample of 305 producers -155 treated and 150 controls-. The results confirmed that program participation increased agricultural productivity in the short and long terms, in a range from 12% to 49%. However, quantile regression methods suggest that most productive farmers were able to obtain greater impacts.
    Keywords: Agricultural productivity;Impact Evaluation;Remote Sensing;Satellite Images;Peru
    JEL: Q12 Q16 O13
    Date: 2023–08
    URL: http://d.repec.org/n?u=RePEc:idb:brikps:13038&r=big
  20. By: Wenting Liu; Zhaozhong Gui; Guilin Jiang; Lihua Tang; Lichun Zhou; Wan Leng; Xulong Zhang; Yujiang Liu
    Abstract: With the increasing volume of high-frequency data in the information age, both challenges and opportunities arise in the prediction of stock volatility. On one hand, the outcome of prediction using tradition method combining stock technical and macroeconomic indicators still leaves room for improvement; on the other hand, macroeconomic indicators and peoples' search record on those search engines affecting their interested topics will intuitively have an impact on the stock volatility. For the convenience of assessment of the influence of these indicators, macroeconomic indicators and stock technical indicators are then grouped into objective factors, while Baidu search indices implying people's interested topics are defined as subjective factors. To align different frequency data, we introduce GARCH-MIDAS model. After mixing all the above data, we then feed them into Transformer model as part of the training data. Our experiments show that this model outperforms the baselines in terms of mean square error. The adaption of both types of data under Transformer model significantly reduces the mean square error from 1.00 to 0.86.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.16196&r=big
  21. By: Sam Arts; Nicola Melluso; Reinhilde Veugelers
    Abstract: We use text mining to identify the origin and impact of new scientific ideas in the population of scientific papers from Microsoft Academic Graph (MAG). We validate the new techniques and their improvement over the traditional metrics based on citations. First, we collect scientific papers linked to Nobel prizes. These papers arguably introduced fundamentally new scientific ideas with a major impact on scientific progress. Second, we identify literature review papers which typically summarize prior scientific findings rather than pioneer new scientific insights. Finally, we illustrate that papers pioneering new scientific ideas are more likely to become highly cited. Our findings support the use of text mining both to measure novel scientific ideas at the time of publication and to measure the impact of these new ideas on later scientific work. Moreover, the results illustrate the significant improvement of the new text metrics over the traditional metrics based on paper citations. We provide open access to code and data for all scientific papers in MAG up to December 2020.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.16437&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.