nep-big New Economics Papers
on Big Data
Issue of 2023‒12‒11
twenty papers chosen by
Tom Coupé, University of Canterbury


  1. Critical AI Challenges in Legal Practice: An application to French Administrative Decisions By Khaoula Naili
  2. The Predictive Value of Data from Virtual Investment Communities By Abdel-Karim, Benjamin M.; Benlian, Alexander; Hinz, Oliver
  3. Predicting Market Value in Professional Soccer: Insights from Explainable Machine Learning Models By Chunyang Huang; Shaoliang Zhang
  4. A Neural Network Classifies Traumatic Brain Injury Outcomes: Glasgow Coma Triples Are Needed By Nabeel, Rao
  5. Using Machine Learning to Create a Property Tax Roll: Evidence from the City of Kananga, D.R. Congo By Bergeron, Austin; Fournier, Arnaud; Kabeya, John Kabeya; Tourek, Gabriel; Weigel, Jonathan L.
  6. Machine predictions and human decisions with variation in payoffs and skill: the case of antibiotic prescribing By Hannes Ullrich; Michael Allan Ribers
  7. Combining Deep Learning on Order Books with Reinforcement Learning for Profitable Trading By Koti S. Jaddu; Paul A. Bilokon
  8. Predicting risk/reward ratio in financial markets for asset management using machine learning By Reza Yarbakhsh; Mahdieh Soleymani Baghshah; Hamidreza Karimaghaie
  9. Neural Tangent Kernel in Implied Volatility Forecasting: A Nonlinear Functional Autoregression Approach By Chen, Ying; Grith, Maria; Lai, Hannah L. H.
  10. Forecasting Volatility with Machine Learning and Rough Volatility: Example from the Crypto-Winter By Siu Hin Tang; Mathieu Rosenbaum; Chao Zhou
  11. An Investigation into the use of artificial intelligence in property valuations in Zambia By Christopher Mulenga; Joseph Phiri
  12. Narrative-Driven Fluctuations in Sentiment: Evidence Linking Traditional and Social Media By Alistair Macaulay; Wenting Song
  13. Decoding Social Sentiment in DAO: A Comparative Analysis of Blockchain Governance Communities By Quan, Yutong; Wu, Xintong; Deng, Wanlin; Zhang, Luyao
  14. Round-Number Effects in Real Estate Prices: Evidence from Germany By Florian Englmaier; Andreas Roider; Lars Schlereth; Steffen Sebastian
  15. Enhancing Large Language Models with Climate Resources By Mathias Kraus; Julia Bingler; Markus Leippold; Tobias Schimanski; Chiara Colesanti Senni; Dominik Stammbach; Saeid Vaghefi; Nicolas Webersinke
  16. Portfolio Construction using Black-Litterman Model and Factors By Fanyu Zhao
  17. Improving out-of-sample Forecasts of Stock Price Indexes with Forecast Reconciliation and Clustering By George Athanasopoulos; Rob J Hyndman; Raffaele Mattera
  18. Fed Transparency and Policy Expectation Errors: A Text Analysis Approach By Eric Fischer; Rebecca McCaughrin; Saketh Prazad; Mark Vandergon
  19. Causal Inference on Investment Constraints and Non-stationarity in Dynamic Portfolio Optimization through Reinforcement Learning By Yasuhiro Nakayama; Tomochika Sawaki
  20. Mapping employee mobility and employer networks using professional network data By Breithaupt, Patrick; Hottenrott, Hanna; Rammer, Christian; Römer, Konstantin

  1. By: Khaoula Naili (Université de Franche-Comté, CRESE, F-25000 Besançon, France)
    Abstract: We use AI methods to evaluate the accuracy of several standard machine learning models for predicting judicial decision outcomes. We highlight the key steps and challenges in predicting judicial outcomes by applying these models to a database of administrative court decisions.These findings significantly contribute to our understanding of the potential advantages of AI in the context of predictive justice. We utilize AI methods to analyze administrative court decisions sourced from the database provided by the French Council of State. This analysis has been made possible due to the Council of State’s decision to make its decisions publicly accessible since March 2022. Our innovative approach pioneers the use of prediction models on the open data from the French Council of State, addressing the complexities associated with data analysis. Our primary objective is to assess the accuracy of these models in predicting outcomes in French administrative tribunals and identify the most effective model for forecasting administrative tribunal court decisions. The selected models are trained and evaluated on multi-class datasets, where decisions are traditionally categorized into various classes.
    Keywords: artificial intelligence, machine learning, natural language processing, Predictive jus- tice, Legal text.
    JEL: K4
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:crb:wpaper:2023-06&r=big
  2. By: Abdel-Karim, Benjamin M.; Benlian, Alexander; Hinz, Oliver
    Abstract: Optimal investment decisions by institutional investors require accurate predictions with respect to the development of stock markets. Motivated by previous research that revealed the unsatisfactory performance of existing stock market prediction models, this study proposes a novel prediction approach. Our proposed system combines Artificial Intelligence (AI) with data from Virtual Investment Communities (VICs) and leverages VICs’ ability to support the process of predicting stock markets. An empirical study with two different models using real data shows the potential of the AI-based system with VICs information as an instrument for stock market predictions. VICs can be a valuable addition but our results indicate that this type of data is only helpful in certain market phases.
    Date: 2023–11–20
    URL: http://d.repec.org/n?u=RePEc:dar:wpaper:141359&r=big
  3. By: Chunyang Huang; Shaoliang Zhang
    Abstract: This study presents an innovative method for predicting the market value of professional soccer players using explainable machine learning models. Using a dataset curated from the FIFA website, we employ an ensemble machine learning approach coupled with Shapley Additive exPlanations (SHAP) to provide detailed explanations of the models' predictions. The GBDT model achieves the highest mean R-Squared (0.8780) and the lowest mean Root Mean Squared Error (3, 221, 632.175), indicating its superior performance among the evaluated models. Our analysis reveals that specific skills such as ball control, short passing, finishing, interceptions, dribbling, and tackling are paramount within the skill dimension, whereas sprint speed and acceleration are critical in the fitness dimension, and reactions are preeminent in the cognitive dimension. Our results offer a more accurate, objective, and consistent framework for market value estimation, presenting useful insights for managerial decisions in player transfers.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.04599&r=big
  4. By: Nabeel, Rao
    Abstract: Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by non-human animals and humans
    Date: 2023–01–09
    URL: http://d.repec.org/n?u=RePEc:osf:osfxxx:a472n&r=big
  5. By: Bergeron, Austin; Fournier, Arnaud; Kabeya, John Kabeya; Tourek, Gabriel; Weigel, Jonathan L.
    Abstract: Developing countries often lack the financial resources to provide public goods. Property taxation has been identified as a promising source of local revenue, because it is relatively efficient, captures growth in real estate value, and can be progressive. However, many low-income countries do not collect property taxes effectively due to missing or incomplete property tax rolls. We use machine learning and computer vision models to construct a property tax roll in a large Congolese city. To train the algorithm and predict the value of all properties in the city, we rely on the value of 1, 654 randomly chosen properties assessed by government land surveyors during in-person property appraisal visits, and property characteristics from administrative data or extracted from property photographs. The best machine learning algorithm, trained on property characteristics from administrative data, achieves a cross-validated R2 of 60 per cent, and 22 per cent of the predicted values are within 20 per cent of the target value. The computer vision algorithms, trained on property picture features, perform less well, with only 9 per cent of the predicted values within 20 per cent of the target value for the best algorithm. We interpret the results as suggesting that simple machine learning methods can be used to construct a property tax roll, even in a context where information about properties is limited and the government can only collect a small number of property values using in-person property appraisal visits.
    Keywords: Economic Development,
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:idq:ictduk:18184&r=big
  6. By: Hannes Ullrich; Michael Allan Ribers
    Abstract: We analyze how machine learning predictions may improve antibiotic prescribing in the context of the global health policy challenge of increasing antibiotic resistance. Estimating a binary antibiotic treatment choice model, we find variation in the skill to diagnose bacterial urinary tract infections and in how general practitioners trade off the expected cost of resistance against antibiotic curative benefits. In counterfactual analyses we find that providing machine learning predictions of bacterial infections to physicians increases prescribing efficiency. However, to achieve the policy objective of reducing antibiotic prescribing, physicians must also be incentivized. Our results highlight the potential misalignment of social and heterogeneous individual objectives in utilizing machine learning for prediction policy problems.
    Date: 2023–11–13
    URL: http://d.repec.org/n?u=RePEc:bdp:dpaper:0027&r=big
  7. By: Koti S. Jaddu; Paul A. Bilokon
    Abstract: High-frequency trading is prevalent, where automated decisions must be made quickly to take advantage of price imbalances and patterns in price action that forecast near-future movements. While many algorithms have been explored and tested, analytical methods fail to harness the whole nature of the market environment by focusing on a limited domain. With the evergrowing machine learning field, many large-scale end-to-end studies on raw data have been successfully employed to increase the domain scope for profitable trading but are very difficult to replicate. Combining deep learning on the order books with reinforcement learning is one way of breaking down large-scale end-to-end learning into more manageable and lightweight components for reproducibility, suitable for retail trading. The following work focuses on forecasting returns across multiple horizons using order flow imbalance and training three temporal-difference learning models for five financial instruments to provide trading signals. The instruments used are two foreign exchange pairs (GBPUSD and EURUSD), two indices (DE40 and FTSE100), and one commodity (XAUUSD). The performances of these 15 agents are evaluated through backtesting simulation, and successful models proceed through to forward testing on a retail trading platform. The results prove potential but require further minimal modifications for consistently profitable trading to fully handle retail trading costs, slippage, and spread fluctuation.
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.02088&r=big
  8. By: Reza Yarbakhsh; Mahdieh Soleymani Baghshah; Hamidreza Karimaghaie
    Abstract: Financial market forecasting remains a formidable challenge despite the surge in computational capabilities and machine learning advancements. While numerous studies have underscored the precision of computer-generated market predictions, many of these forecasts fail to yield profitable trading outcomes. This discrepancy often arises from the unpredictable nature of profit and loss ratios in the event of successful and unsuccessful predictions. In this study, we introduce a novel algorithm specifically designed for forecasting the profit and loss outcomes of trading activities. This is further augmented by an innovative approach for integrating these forecasts with previous predictions of market trends. This approach is designed for algorithmic trading, enabling traders to assess the profitability of each trade and calibrate the optimal trade size. Our findings indicate that this method significantly improves the performance of traditional trading strategies as well as algorithmic trading systems, offering a promising avenue for enhancing trading decisions.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.09148&r=big
  9. By: Chen, Ying; Grith, Maria; Lai, Hannah L. H.
    Abstract: Implied volatility (IV) forecasting is inherently challenging due to its high dimensionality across various moneyness and maturity, and nonlinearity in both spatial and temporal aspects. We utilize implied volatility surfaces (IVS) to represent comprehensive spatial dependence and model the nonlinear temporal dependencies within a series of IVS. Leveraging advanced kernel-based machine learning techniques, we introduce the functional Neural Tangent Kernel (fNTK) estimator within the Nonlinear Functional Autoregression framework, specifically tailored to capture intricate relationships within implied volatilities. We establish the connection between fNTK and kernel regression, emphasizing its role in contemporary nonparametric statistical modeling. Empirically, we analyze S&P 500 Index options from January 2009 to December 2021, encompassing more than 6 million European calls and puts, thereby showcasing the superior forecast accuracy of fNTK.We demonstrate the significant economic value of having an accurate implied volatility forecaster within trading strategies. Notably, short delta-neutral straddle trading, supported by fNTK, achieves a Sharpe ratio ranging from 1.45 to 2.02, resulting in a relative enhancement in trading outcomes ranging from 77% to 583%.
    Keywords: Implied Volatility Surfaces; Neural Networks; Neural Tangent Kernel; Implied Volatility Forecasting; Nonlinear Functional Autoregression; Option Trading Strategies
    JEL: C14 C45 C58 G11 G13 G17
    Date: 2023–10–24
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:119022&r=big
  10. By: Siu Hin Tang; Mathieu Rosenbaum; Chao Zhou
    Abstract: We extend the application and test the performance of a recently introduced volatility prediction framework encompassing LSTM and rough volatility. Our asset class of interest is cryptocurrencies, at the beginning of the "crypto-winter" in 2022. We first show that to forecast volatility, a universal LSTM approach trained on a pool of assets outperforms traditional models. We then consider a parsimonious parametric model based on rough volatility and Zumbach effect. We obtain similar prediction performances with only five parameters whose values are non-asset-dependent. Our findings provide further evidence on the universality of the mechanisms underlying the volatility formation process.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.04727&r=big
  11. By: Christopher Mulenga; Joseph Phiri
    Abstract: Real estate valuations, especially the case of mass valuation where statistical analysis methods are applied. New methods of determination of real estate value should be explored. Artificial omputerizat provides an alternative for the omputer applied method of multiple linear regressions. The omputerization of real estate values has been in existence since the 2000s with the consideration of various artificial intelligence techniques which include Artificial Neural Network, fuzzy logic, generic algorithm, and expert system. Since most properties comprise of both physical and economic characteristics which renders the conventional valuation methods cumbersome. In order to counter these challenges, soft computing techniques with higher data handling capabilities maybe an optimum choice.
    Keywords: Artificial Intelligence; Fuzzy Logic; multiple regressions; statistical techniques
    JEL: R3
    Date: 2023–01–01
    URL: http://d.repec.org/n?u=RePEc:afr:wpaper:afres2023-024&r=big
  12. By: Alistair Macaulay; Wenting Song
    Abstract: This paper studies the role of narratives for macroeconomic fluctuations. Micro-founding narratives as directed acyclic graphs, we show how exposure to different narratives can affect expectations in an otherwise-standard macroeconomic framework. We identify such competing narratives in news media reports on the US yield curve inversion in 2019, using techniques in natural language processing. Linking this to data from Twitter, we show that exposure to the narrative of an imminent recession causes consumers to display a more pessimistic sentiment, while exposure to a more neutral narrative implies no such change in sentiment. Applying the same technique to media narratives on inflation, we estimate that a shift to a viral narrative of inflation damaging the real economy in 2021 accounts for 42% of the fall in consumer sentiment in the second half of the year.
    Date: 2022–06–29
    URL: http://d.repec.org/n?u=RePEc:oxf:wpaper:973&r=big
  13. By: Quan, Yutong; Wu, Xintong; Deng, Wanlin; Zhang, Luyao
    Abstract: Blockchain technology is leading a revolutionary transformation across diverse industries, with effective governance standing as a critical determinant for the success and sustainability of blockchain projects. Community forums, pivotal in engaging decentralized autonomous organizations (DAOs), wield a substantial impact on blockchain governance decisions. Concurrently, Natural Language Processing (NLP), particularly sentiment analysis, provides powerful insights from textual data. While prior research has explored the potential of NLP tools in social media sentiment analysis, a gap persists in understanding the sentiment landscape of blockchain governance communities. The evolving discourse and sentiment dynamics on the forums of top DAOs remain largely unknown. This paper delves deep into the evolving discourse and sentiment dynamics on the public forums of leading DeFi projects—Aave, Uniswap, Curve Dao, Aragon, Yearn.finance, Merit Circle, and Balancer—placing a primary focus on discussions related to governance issues. Despite differing activity patterns, participants across these decentralized communities consistently express positive sentiments in their Discord discussions, indicating optimism towards governance decisions. Additionally, our research suggests a potential interplay between discussion intensity and sentiment dynamics, indicating that higher discussion volumes may contribute to more stable and positive emotions. The insights gained from this study are valuable for decision-makers in blockchain governance, underscoring the pivotal role of sentiment analysis in interpreting community emotions and its evolving impact on the landscape of blockchain governance. This research significantly contributes to the interdisciplinary exploration of the intersection of blockchain and society, with a specific emphasis on the decentralized blockchain governance ecosystem. We provide our data and code for replicability as open access on GitHub.
    Date: 2023–10–31
    URL: http://d.repec.org/n?u=RePEc:osf:osfxxx:bq6tu&r=big
  14. By: Florian Englmaier; Andreas Roider; Lars Schlereth; Steffen Sebastian
    Abstract: Round numbers affect behavior in various domains, e.g., as prominent thresholds or focal points in bargaining. In line with earlier findings, residential real estate transactions in Germany cluster at round-number prices, but there are also interesting (presumably cultural) differences. We extend our analysis to the commercial real estate market, where stakes are even higher and market participants arguably more experienced. For the same type of object, professionals cluster significantly less on round-number prices compared to non-professionals. We employ machine learning and show that transactions of family homes and condominiums at round-number prices are 2–7% above their hedonic values.
    Keywords: round-number effects, focal points, residential real estate, commercial real estate, housing prices, machine learning
    JEL: D01 D91 C78 R31
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:ces:ceswps:_10746&r=big
  15. By: Mathias Kraus (University of Erlangen-Nuremberg); Julia Bingler (University of Oxford); Markus Leippold (University of Zurich; Swiss Finance Institute); Tobias Schimanski (University of Zurich); Chiara Colesanti Senni (ETH Zürich; University of Zurich); Dominik Stammbach (ETH Zürich); Saeid Vaghefi (University of Zurich); Nicolas Webersinke (Friedrich-Alexander-Universität Erlangen-Nürnberg)
    Abstract: Large language models (LLMs) have significantly transformed the landscape of artificial intelligence by demonstrating their ability to generate human-like text across diverse topics. However, despite their impressive capabilities, LLMs lack recent information and often employ imprecise language, which can be detrimental in domains where accuracy is crucial, such as climate change. In this study, we make use of recent ideas to harness the potential of LLMs by viewing them as agents that access multiple sources, including databases containing recent and precise information about organizations, institutions, and companies. We demonstrate the effectiveness of our method through a prototype agent that retrieves emission data from ClimateWatch (https://www.climatewatchdata.org/) and leverages general Google search. By integrating these resources with LLMs, our approach overcomes the limitations associated with imprecise language and delivers more reliable and accurate information in the critical domain of climate change. This work paves the way for future advancements in LLMs and their application in domains where precision is of paramount importance.
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:chf:rpseri:rp2399&r=big
  16. By: Fanyu Zhao
    Abstract: This paper presents a portfolio construction process, including mainly two parts, Factors Selection and Weight Allocations. For the factors selection part, We have chosen 20 factors by considering three aspects, the global market, different assets class, and stock idiosyncratic characteristics. Each factor is proxied by a corresponding ETF. Then, we would apply several weight allocation methods to those factors, including two fixed weight allocation methods, three optimisation methods, and a Black-Litterman model. In addition, we would also fit a Deep Learning model for generating views periodically and incorporating views with the prior to achieve dynamically updated weights by using the Black-Litterman model. In the end, the robustness checking shows how weights change with respect to time evolving and variance increasing. Results using shrinkage variance are provided to alleviate the impacts of representativeness of historical data, but there sadly has little impact. Overall, the model by using the Deep Learning plus Black-Litterman model results outperform the portfolio by other weight allocation schemes, even though further improvement and robustness checking should be performed.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.04475&r=big
  17. By: George Athanasopoulos; Rob J Hyndman; Raffaele Mattera
    Abstract: This paper discusses the use of forecast reconciliation with stock price time series and the corresponding stock index. The individual stock price series may be grouped using known meta-data or other clustering methods. We propose a novel forecasting framework that combines forecast reconciliation and clustering, to lead to better forecasts of both the index and the individual stock price series. The proposed approach is applied to the Dow Jones Industrial Average Index and its component stocks. The results demonstrate empirically that reconciliation improves forecasts of the stock market index and its constituents.
    Keywords: financial time series, hierarchical forecasting, clustering, unsupervised learning, prediction, machine learning, finance
    JEL: C53 C10
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:msh:ebswps:2023-17&r=big
  18. By: Eric Fischer; Rebecca McCaughrin; Saketh Prazad; Mark Vandergon
    Abstract: This paper seeks to estimate the extent to which market-implied policy expectations could be improved with further information disclosure from the FOMC. Using text analysis methods based on large language models, we show that if FOMC meeting materials with five-year lagged release dates—like meeting transcripts and Tealbooks—were accessible to the public in real time, market policy expectations could substantially improve forecasting accuracy. Most of this improvement occurs during easing cycles. For instance, at the six-month forecasting horizon, the market could have predicted as much as 125 basis points of additional easing during the 2001 and 2008 recessions, equivalent to a 40-50 percent reduction in mean squared error. This potential forecasting improvement appears to be related to incomplete information about the Fed’s reaction function, particularly with respect to financial stability concerns in 2008. In contrast, having enhanced access to meeting materials would not have improved the market’s policy rate forecasting during tightening cycles.
    Keywords: interest rates; monetary policy; central banks and their policies; sentiment analysis
    JEL: E43 E52 E58 C80
    Date: 2023–11–01
    URL: http://d.repec.org/n?u=RePEc:fip:fednsr:97356&r=big
  19. By: Yasuhiro Nakayama; Tomochika Sawaki
    Abstract: In this study, we have developed a dynamic asset allocation investment strategy using reinforcement learning techniques. To begin with, we have addressed the crucial issue of incorporating non-stationarity of financial time series data into reinforcement learning algorithms, which is a significant implementation in the application of reinforcement learning in investment strategies. Our findings highlight the significance of introducing certain variables such as regime change in the environment setting to enhance the prediction accuracy. Furthermore, the application of reinforcement learning in investment strategies provides a remarkable advantage of setting the optimization problem flexibly. This enables the integration of practical constraints faced by investors into the algorithm, resulting in efficient optimization. Our study has categorized the investment strategy formulation conditions into three main categories, including performance measurement indicators, portfolio management rules, and other constraints. We have evaluated the impact of incorporating these conditions into the environment and rewards in a reinforcement learning framework and examined how they influence investment behavior.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.04946&r=big
  20. By: Breithaupt, Patrick; Hottenrott, Hanna; Rammer, Christian; Römer, Konstantin
    Abstract: The availability of social media data is growing and represents a new data source for economic research. This paper presents a detailed study on the use of data from a careeroriented social networking platform for measuring employee flows and employer networks. The employment data are exported from user profiles and linked to the Mannheim Enterprise Panel (MUP). The linked employer-employee (LEE) data consists of 14 million employments for 1.5 million employers. The platform-based LEE data is used to create annual employer networks comprised of data from 9 million employee flows. Plausibility checks confirm that career-oriented social networking data contain valuable data about employment, employee flows, and employer networks. Using such data provides opportunities for research on employee mobility, networks, and local ecosystems' role in economic performance at the employer and the regional level.
    Keywords: social networks, platform data, lee data, labour mobility, network analysis
    JEL: C81 J60 L14
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:zbw:zewdip:279575&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.