nep-big New Economics Papers
on Big Data
Issue of 2023‒12‒18
twenty-six papers chosen by
Tom Coupé, University of Canterbury


  1. Large Language Models in Finance: A Survey By Yinheng Li; Shaofei Wang; Han Ding; Hang Chen
  2. Analysis of frequent trading effects of various machine learning models By Jiahao Chen; Xiaofei Li
  3. Critical AI Challenges in Legal Practice : An application to French Administrative Decisions By Khaoula Naili
  4. Predicting Re-Employment: Machine Learning Versus Assessments by Unemployed Workers and by Their Caseworkers By Gerard J. van den Berg; Max Kunaschk; Julia Lang; Gesine Stephan; Arne Uhlendorf
  5. A Hypothesis on Good Practices for AI-based Systems for Financial Time Series Forecasting: Towards Domain-Driven XAI Methods By Branka Hadji Misheva; Joerg Osterrieder
  6. Advancing Algorithmic Trading: A Multi-Technique Enhancement of Deep Q-Network Models By Gang Hu
  7. The Power of Trust: Designing Trustworthy Machine Learning Systems in Healthcare By Fecho, Mariska; Zöll, Anne
  8. Adaptive Modelling Approach for Row-Type Dependent Predictive Analysis (RTDPA): A Framework for Designing Machine Learning Models for Credit Risk Analysis in Banking Sector By Minati Rath; Hema Date
  9. Towards a data-driven debt collection strategy based on an advanced machine learning framework By Abel Sancarlos; Edgar Bahilo; Pablo Mozo; Lukas Norman; Obaid Ur Rehma; Mihails Anufrijevs
  10. A Gaussian Process Based Method with Deep Kernel Learning for Pricing High-dimensional American Options By Jirong Zhuang; Deng Ding; Weiguo Lu; Xuan Wu; Gangnan Yuan
  11. Do Two Wrongs Make a Right? Measuring the Effect of Publications on Science Careers By Donna K. Ginther; Carlos Zambrana; Patricia Oslund; Wan-Ying Chang
  12. AI-powered decision-making in facilitating insurance claim dispute resolution By Zhang, Wen; Shi, Jingwen; Wang, Xiaojun; Wynn, Henry
  13. Enhancing Actuarial Non-Life Pricing Models via Transformers By Alexej Brauer
  14. Natural Language Processing for Financial Regulation By Ixandra Achitouv; Dragos Gorduza; Antoine Jacquier
  15. Earnings Prediction Using Recurrent Neural Networks By Moritz Scherrmann; Ralf Elsas
  16. Predictive AI for SME and Large Enterprise Financial Performance Management By Ricardo Cuervo
  17. Deep Calibration of Market Simulations using Neural Density Estimators and Embedding Networks By Namid R. Stillman; Rory Baggott; Justin Lyon; Jianfei Zhang; Dingqiu Zhu; Tao Chen; Perukrishnen Vytelingum
  18. Using Domain-Specific Word Embeddings to Examine the Demand for Skills By Chaturvedi, Sugat; Mahajan, Kanika; Siddique, Zahra
  19. Error Analysis of Option Pricing via Deep PDE Solvers: Empirical Study By Rawin Assabumrungrat; Kentaro Minami; Masanori Hirano
  20. Harnessing Deep Q-Learning for Enhanced Statistical Arbitrage in High-Frequency Trading: A Comprehensive Exploration By Soumyadip Sarkar
  21. Predicting Patient Length of Stay Using Artificial Intelligence to Assist Healthcare Professionals in Resource Planning and Scheduling Decisions By Yazan Alnsour; Marina Johnson; Abdullah Albizri; Antoine Harfouche Harfouche
  22. Reinforcement Learning with Maskable Stock Representation for Portfolio Management in Customizable Stock Pools By Wentao Zhang; Yilei Zhao; Shuo Sun; Jie Ying; Yonggang Xie; Zitao Song; Xinrun Wang; Bo An
  23. EU Cohesion Policy on the Ground: Analyzing Small-Scale Effects Using Satellite Data By Krolage, Carla; Bachtrögler-Unger, Julia; Dolls, Mathias; Schüle, Paul; Taubenböck, Hannes; Weigand, Matthias
  24. How big is the media multiplier? Evidence from dyadic news data By Besley, Timothy; Fetzer, Thiemo; Mueller, Hannes
  25. Replicable Patent Indicators Using the Google Patents Public Datasets By George Abi Younes; Gaetan de Rassenfosse
  26. Multi-Label Topic Model for Financial Textual Data By Moritz Scherrmann

  1. By: Yinheng Li; Shaofei Wang; Han Ding; Hang Chen
    Abstract: Recent advances in large language models (LLMs) have opened new possibilities for artificial intelligence applications in finance. In this paper, we provide a practical survey focused on two key aspects of utilizing LLMs for financial tasks: existing solutions and guidance for adoption. First, we review current approaches employing LLMs in finance, including leveraging pretrained models via zero-shot or few-shot learning, fine-tuning on domain-specific data, and training custom LLMs from scratch. We summarize key models and evaluate their performance improvements on financial natural language processing tasks. Second, we propose a decision framework to guide financial professionals in selecting the appropriate LLM solution based on their use case constraints around data, compute, and performance needs. The framework provides a pathway from lightweight experimentation to heavy investment in customized LLMs. Lastly, we discuss limitations and challenges around leveraging LLMs in financial applications. Overall, this survey aims to synthesize the state-of-the-art and provide a roadmap for responsibly applying LLMs to advance financial AI.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.10723&r=big
  2. By: Jiahao Chen; Xiaofei Li
    Abstract: In recent years, high-frequency trading has emerged as a crucial strategy in stock trading. This study aims to develop an advanced high-frequency trading algorithm and compare the performance of three different mathematical models: the combination of the cross-entropy loss function and the quasi-Newton algorithm, the FCNN model, and the vector machine. The proposed algorithm employs neural network predictions to generate trading signals and execute buy and sell operations based on specific conditions. By harnessing the power of neural networks, the algorithm enhances the accuracy and reliability of the trading strategy. To assess the effectiveness of the algorithm, the study evaluates the performance of the three mathematical models. The combination of the cross-entropy loss function and the quasi-Newton algorithm is a widely utilized logistic regression approach. The FCNN model, on the other hand, is a deep learning algorithm that can extract and classify features from stock data. Meanwhile, the vector machine is a supervised learning algorithm recognized for achieving improved classification results by mapping data into high-dimensional spaces. By comparing the performance of these three models, the study aims to determine the most effective approach for high-frequency trading. This research makes a valuable contribution by introducing a novel methodology for high-frequency trading, thereby providing investors with a more accurate and reliable stock trading strategy.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.10719&r=big
  3. By: Khaoula Naili (Université de Franche-Comté, CRESE, Besançon, France)
    Abstract: We use AI methods to evaluate the accuracy of several standard machine learning models for predicting judicial decision outcomes. We highlight the key steps and challenges in predicting judicial outcomes by applying these models to a database of administrative court decisions.These findings significantly contribute to our understanding of the potential advantages of AI in the context of predictive justice. We utilize AI methods to analyze administrative court decisions sourced from the database provided by the French Council of State. This analysis has been made possible due to the Council of State’s decision to make its decisions publicly accessible since March 2022. Our innovative approach pioneers the use of prediction models on the open data from the French Council of State, addressing the complexities associated with data analysis. Our primary objective is to assess the accuracy of these models in predicting outcomes in French administrative tribunals and identify the most effective model for forecasting administrative tribunal court decisions. The selected models are trained and evaluated on multi-class datasets, where decisions are traditionally categorized into various classes.
    Keywords: Artificial intelligence, Machine learning, Natural language processing, Predictive justice, Legal text
    JEL: K4
    Date: 2023–12
    URL: http://d.repec.org/n?u=RePEc:afd:wpaper:2304&r=big
  4. By: Gerard J. van den Berg (University of Groningen, University Medical Center Groningen, IFAU Uppsala, ZEW, IZA, CEPR); Max Kunaschk (IAB Nuremberg); Julia Lang (IAB Nuremberg); Gesine Stephan (IAB Nuremberg, Friedrich-Alexander-Universität Erlangen-Nürnberg); Arne Uhlendorf (CNRS and CREST, IAB Nuremberg, DIW, IZA)
    Abstract: Predictions of whether newly unemployed individuals will become long-term unemployed are important for the planning and policy mix of unemployment insurance agencies. We analyze unique data on three sources of information on the probability of re-employment within 6 months (RE6), for the same individuals sampled from the inflow into unemployment. First, they were asked for their perceived probability of RE6. Second, their caseworkers revealed whether they expected RE6. Third, random-forest machine learning methods are trained on administrative data on the full inflow, to predict individual RE6. We compare the predictive performance of these measures and consider whether combinations improve this performance. We show that self-reported and caseworker assessments sometimes contain information not captured by the machine learning algorithm.
    Keywords: unemployment, expectations, prediction, random forest, unemployment insurance, information
    JEL: J64 J65 C55 C53 C41 C21
    Date: 2023–08–28
    URL: http://d.repec.org/n?u=RePEc:crs:wpaper:2023-09&r=big
  5. By: Branka Hadji Misheva; Joerg Osterrieder
    Abstract: Machine learning and deep learning have become increasingly prevalent in financial prediction and forecasting tasks, offering advantages such as enhanced customer experience, democratising financial services, improving consumer protection, and enhancing risk management. However, these complex models often lack transparency and interpretability, making them challenging to use in sensitive domains like finance. This has led to the rise of eXplainable Artificial Intelligence (XAI) methods aimed at creating models that are easily understood by humans. Classical XAI methods, such as LIME and SHAP, have been developed to provide explanations for complex models. While these methods have made significant contributions, they also have limitations, including computational complexity, inherent model bias, sensitivity to data sampling, and challenges in dealing with feature dependence. In this context, this paper explores good practices for deploying explainability in AI-based systems for finance, emphasising the importance of data quality, audience-specific methods, consideration of data properties, and the stability of explanations. These practices aim to address the unique challenges and requirements of the financial industry and guide the development of effective XAI tools.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.07513&r=big
  6. By: Gang Hu
    Abstract: This study enhances a Deep Q-Network (DQN) trading model by incorporating advanced techniques like Prioritized Experience Replay, Regularized Q-Learning, Noisy Networks, Dueling, and Double DQN. Extensive tests on assets like BTC/USD and AAPL demonstrate superior performance compared to the original model, with marked increases in returns and Sharpe Ratio, indicating improved risk-adjusted rewards. Notably, convolutional neural network (CNN) architectures, both 1D and 2D, significantly boost returns, suggesting their effectiveness in market trend analysis. Across instruments, these enhancements have yielded stable and high gains, eclipsing the baseline and highlighting the potential of CNNs in trading systems. The study suggests that applying sophisticated deep learning within reinforcement learning can greatly enhance automated trading, urging further exploration into advanced methods for broader financial applicability. The findings advocate for the continued evolution of AI in finance.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.05743&r=big
  7. By: Fecho, Mariska; Zöll, Anne
    Abstract: Machine Learning (ML) systems have an enormous potential to improve medical care, but skepticism about their use persists. Their inscrutability is a major concern which can lead to negative attitudes reducing end users trust and resulting in rejection. Consequently, many ML systems in healthcare suffer from a lack of user-centricity. To overcome these challenges, we designed a user-centered, trustworthy ML system by applying design science research. The design includes meta-requirements and design principles instantiated by mockups. The design is grounded on our kernel theory, the Trustworthy Artificial Intelligence principles. In three design cycles, we refined the design through focus group discussions (N1=8), evaluation of existing applications, and an online survey (N2=40). Finally, an effectiveness test was conducted with end users (N3=80) to assess the perceived trustworthiness of our design. The results demonstrated that the end users did indeed perceive our design as more trustworthy.
    Date: 2023–12–10
    URL: http://d.repec.org/n?u=RePEc:dar:wpaper:138903&r=big
  8. By: Minati Rath; Hema Date
    Abstract: In many real-world datasets, rows may have distinct characteristics and require different modeling approaches for accurate predictions. In this paper, we propose an adaptive modeling approach for row-type dependent predictive analysis(RTDPA). Our framework enables the development of models that can effectively handle diverse row types within a single dataset. Our dataset from XXX bank contains two different risk categories, personal loan and agriculture loan. each of them are categorised into four classes standard, sub-standard, doubtful and loss. We performed tailored data pre processing and feature engineering to different row types. We selected traditional machine learning predictive models and advanced ensemble techniques. Our findings indicate that all predictive approaches consistently achieve a precision rate of no less than 90%. For RTDPA, the algorithms are applied separately for each row type, allowing the models to capture the specific patterns and characteristics of each row type. This approach enables targeted predictions based on the row type, providing a more accurate and tailored classification for the given dataset.Additionally, the suggested model consistently offers decision makers valuable and enduring insights that are strategic in nature in banking sector.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.10799&r=big
  9. By: Abel Sancarlos; Edgar Bahilo; Pablo Mozo; Lukas Norman; Obaid Ur Rehma; Mihails Anufrijevs
    Abstract: The European debt purchase market as measured by the total book value of purchased debt approached 25bn euros in 2020 and it was growing at double-digit rates. This is an example of how big the debt collection and debt purchase industry has grown and the important impact it has in the financial sector. However, in order to ensure an adequate return during the debt collection process, a good estimation of the propensity to pay and/or the expected cashflow is crucial. These estimations can be employed, for instance, to create different strategies during the amicable collection to maximize quality standards and revenues. And not only that, but also to prioritize the cases in which a legal process is necessary when debtors are unreachable for an amicable negotiation. This work offers a solution for these estimations. Specifically, a new machine learning modelling pipeline is presented showing how outperforms current strategies employed in the sector. The solution contains a pre-processing pipeline and a model selector based on the best model calibration. Performance is validated with real historical data of the debt industry.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.06292&r=big
  10. By: Jirong Zhuang; Deng Ding; Weiguo Lu; Xuan Wu; Gangnan Yuan
    Abstract: Machine learning methods, such as Gaussian process regression (GPR), have been widely used in recent years for pricing American options. The GPR is considered as a potential method for estimating the continuation value of an option in the regression-based Monte Carlo method. However, it has some drawbacks, such as high computational cost and unreliability in high-dimensional cases. In this paper, we apply the Deep Kernel Learning with variational inference to the regression-based Monte Carlo method in order to overcome those drawbacks for high-dimensional American option pricing, and test its performance under geometric Brownian motion and Merton's jump diffusion models.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.07211&r=big
  11. By: Donna K. Ginther; Carlos Zambrana; Patricia Oslund; Wan-Ying Chang
    Abstract: This paper examines whether publication data matched to the Survey of Doctorate Recipients can be used for research purposes. We use Gold Standard data created to validate the publication match quality and compare these measures to publications assigned by a machine-learning algorithm developed by Thomson Reuters (now Clarivate). Our econometric model demonstrates that publications likely suffer from non-classical measurement error. Using horse race and instrumental variable models, we confirm that the Gold Standard data are relatively free from measurement error but show that the Clarivate data suffer from non-classical measurement error. We employ a variety of methods to adjust the Clarivate data for false negatives and false positives and demonstrate that with these adjustments the data produce estimates very similar to the Gold Standard. However, these adjustments are not as useful when publications are used as a dependent variable. We recommend using subsamples of the data that have better match quality when using the Clarivate data as a dependent variable.
    JEL: C26 J40 O30
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:31844&r=big
  12. By: Zhang, Wen; Shi, Jingwen; Wang, Xiaojun; Wynn, Henry
    Abstract: Leveraging Artificial Intelligence (AI) techniques to empower decision-making can promote social welfare by generating significant cost savings and promoting efficient utilization of public resources, besides revolutionizing commercial operations. This study investigates how AI can expedite dispute resolution in road traffic accident (RTA) insurance claims, benefiting all parties involved. Specifically, we devise and implement a disciplined AI-driven approach to derive the cost estimates and inform negotiation decision-making, compared to conventional practices that draw upon official guidance and lawyer experience. We build the investigation on 88 real-life RTA cases and detect an asymptotic relationship between the final judicial cost and the duration of the most severe injury, marked by a notable predicted R2 value of 0.527. Further, we illustrate how various AI-powered toolkits can facilitate information processing and outcome prediction: (1) how regular expression (RegEx) collates precise injury information for subsequent predictive analysis; (2) how alternative natural language processing (NLP) techniques construct predictions directly from narratives. Our proposed RegEx framework enables automated information extraction that accommodates diverse report formats; different NLP methods deliver comparable plausible performance. This research unleashes AI’s untapped potential for social good to reinvent legal-related decision-making processes, support litigation efforts, and aid in the optimization of legal resource consumption.
    Keywords: professional service operation; insurance claim; civil litigation; AI; natural language processing
    JEL: C1
    Date: 2023–10–23
    URL: http://d.repec.org/n?u=RePEc:ehl:lserod:120649&r=big
  13. By: Alexej Brauer
    Abstract: Currently, there is a lot of research in the field of neural networks for non-life insurance pricing. The usual goal is to improve the predictive power via neural networks while building upon the generalized linear model, which is the current industry standard. Our paper contributes to this current journey via novel methods to enhance actuarial non-life models with transformer models for tabular data. We build here upon the foundation laid out by the combined actuarial neural network as well as the localGLMnet and enhance those models via the feature tokenizer transformer. The manuscript demonstrates the performance of the proposed methods on a real-world claim frequency dataset and compares them with several benchmark models such as generalized linear models, feed-forward neural networks, combined actuarial neural networks, LocalGLMnet, and pure feature tokenizer transformer. The paper shows that the new methods can achieve better results than the benchmark models while preserving certain generalized linear model advantages. The paper also discusses the practical implications and challenges of applying transformer models in actuarial settings.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.07597&r=big
  14. By: Ixandra Achitouv; Dragos Gorduza; Antoine Jacquier
    Abstract: This article provides an understanding of Natural Language Processing techniques in the framework of financial regulation, more specifically in order to perform semantic matching search between rules and policy when no dataset is available for supervised learning. We outline how to outperform simple pre-trained sentences-transformer models using freely available resources and explain the mathematical concepts behind the key building blocks of Natural Language Processing.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.08533&r=big
  15. By: Moritz Scherrmann; Ralf Elsas
    Abstract: Firm disclosures about future prospects are crucial for corporate valuation and compliance with global regulations, such as the EU's MAR and the US's SEC Rule 10b-5 and RegFD. To comply with disclosure obligations, issuers must identify nonpublic information with potential material impact on security prices as only new, relevant and unexpected information materially affects prices in efficient markets. Financial analysts, assumed to represent public knowledge on firms' earnings prospects, face limitations in offering comprehensive coverage and unbiased estimates. This study develops a neural network to forecast future firm earnings, using four decades of financial data, addressing analysts' coverage gaps and potentially revealing hidden insights. The model avoids selectivity and survivorship biases as it allows for missing data. Furthermore, the model is able to produce both fiscal-year-end and quarterly earnings predictions. Its performance surpasses benchmark models from the academic literature by a wide margin and outperforms analysts' forecasts for fiscal-year-end earnings predictions.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.10756&r=big
  16. By: Ricardo Cuervo
    Abstract: Financial performance management is at the core of business management and has historically relied on financial ratio analysis using Balance Sheet and Income Statement data to assess company performance as compared with competitors. Little progress has been made in predicting how a company will perform or in assessing the risks (probabilities) of financial underperformance. In this study I introduce a new set of financial and macroeconomic ratios that supplement standard ratios of Balance Sheet and Income Statement. I also provide a set of supervised learning models (ML Regressors and Neural Networks) and Bayesian models to predict company performance. I conclude that the new proposed variables improve model accuracy when used in tandem with standard industry ratios. I also conclude that Feedforward Neural Networks (FNN) are simpler to implement and perform best across 6 predictive tasks (ROA, ROE, Net Margin, Op Margin, Cash Ratio and Op Cash Generation); although Bayesian Networks (BN) can outperform FNN under very specific conditions. BNs have the additional benefit of providing a probability density function in addition to the predicted (expected) value. The study findings have significant potential helping CFOs and CEOs assess risks of financial underperformance to steer companies in more profitable directions; supporting lenders in better assessing the condition of a company and providing investors with tools to dissect financial statements of public companies more accurately.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.05840&r=big
  17. By: Namid R. Stillman; Rory Baggott; Justin Lyon; Jianfei Zhang; Dingqiu Zhu; Tao Chen; Perukrishnen Vytelingum
    Abstract: The ability to construct a realistic simulator of financial exchanges, including reproducing the dynamics of the limit order book, can give insight into many counterfactual scenarios, such as a flash crash, a margin call, or changes in macroeconomic outlook. In recent years, agent-based models have been developed that reproduce many features of an exchange, as summarised by a set of stylised facts and statistics. However, the ability to calibrate simulators to a specific period of trading remains an open challenge. In this work, we develop a novel approach to the calibration of market simulators by leveraging recent advances in deep learning, specifically using neural density estimators and embedding networks. We demonstrate that our approach is able to correctly identify high probability parameter sets, both when applied to synthetic and historical data, and without reliance on manually selected or weighted ensembles of stylised facts.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.11913&r=big
  18. By: Chaturvedi, Sugat (Ahmedabad University); Mahajan, Kanika (Ashoka University); Siddique, Zahra (University of Bristol)
    Abstract: We study the demand for skills by using text analysis methods on job descriptions in a large volume of ads posted on an online Indian job portal. We make use of domain-specific unlabeled data to obtain word vector representations (i.e., word embeddings) and discuss how these can be leveraged for labor market research. We start by carrying out a data-driven categorization of required skill words and construct gender associations of different skill categories using word embeddings. Next, we examine how different required skill categories correlate with log posted wages as well as explore how skills demand varies with firm size. We find that female skills are associated with lower posted wages, potentially contributing to observed gender wage gaps. We also find that large firms require a more extensive range of skills, implying that complementarity between female and male skills is greater among these firms.
    Keywords: text analysis, online job ads, gender, skills demand, machine learning
    JEL: J16 J23 J31 J63 J71 L2
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp16593&r=big
  19. By: Rawin Assabumrungrat; Kentaro Minami; Masanori Hirano
    Abstract: Option pricing, a fundamental problem in finance, often requires solving non-linear partial differential equations (PDEs). When dealing with multi-asset options, such as rainbow options, these PDEs become high-dimensional, leading to challenges posed by the curse of dimensionality. While deep learning-based PDE solvers have recently emerged as scalable solutions to this high-dimensional problem, their empirical and quantitative accuracy remains not well-understood, hindering their real-world applicability. In this study, we aimed to offer actionable insights into the utility of Deep PDE solvers for practical option pricing implementation. Through comparative experiments, we assessed the empirical performance of these solvers in high-dimensional contexts. Our investigation identified three primary sources of errors in Deep PDE solvers: (i) errors inherent in the specifications of the target option and underlying assets, (ii) errors originating from the asset model simulation methods, and (iii) errors stemming from the neural network training. Through ablation studies, we evaluated the individual impact of each error source. Our results indicate that the Deep BSDE method (DBSDE) is superior in performance and exhibits robustness against variations in option specifications. In contrast, some other methods are overly sensitive to option specifications, such as time to expiration. We also find that the performance of these methods improves inversely proportional to the square root of batch size and the number of time steps. This observation can aid in estimating computational resources for achieving desired accuracies with Deep PDE solvers.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.07231&r=big
  20. By: Soumyadip Sarkar
    Abstract: The realm of High-Frequency Trading (HFT) is characterized by rapid decision-making processes that capitalize on fleeting market inefficiencies. As the financial markets become increasingly competitive, there is a pressing need for innovative strategies that can adapt and evolve with changing market dynamics. Enter Reinforcement Learning (RL), a branch of machine learning where agents learn by interacting with their environment, making it an intriguing candidate for HFT applications. This paper dives deep into the integration of RL in statistical arbitrage strategies tailored for HFT scenarios. By leveraging the adaptive learning capabilities of RL, we explore its potential to unearth patterns and devise trading strategies that traditional methods might overlook. We delve into the intricate exploration-exploitation trade-offs inherent in RL and how they manifest in the volatile world of HFT. Furthermore, we confront the challenges of applying RL in non-stationary environments, typical of financial markets, and investigate methodologies to mitigate associated risks. Through extensive simulations and backtests, our research reveals that RL not only enhances the adaptability of trading strategies but also shows promise in improving profitability metrics and risk-adjusted returns. This paper, therefore, positions RL as a pivotal tool for the next generation of HFT-based statistical arbitrage, offering insights for both researchers and practitioners in the field.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.10718&r=big
  21. By: Yazan Alnsour (UWO - University of Wisconsin Oshkosh); Marina Johnson (MSU - Montclair State University [USA]); Abdullah Albizri (MSU - Montclair State University [USA]); Antoine Harfouche Harfouche (UPN - Université Paris Nanterre)
    Abstract: Artificial intelligence (AI) significantly revolutionizes and transforms the global healthcare industry by improving outcomes, increasing efficiency, and enhancing resource utilization. The applications of AI impact every aspect of healthcare operation, particularly resource allocation and capacity planning. This study proposes a multi-step AI-based framework and applies it to a real dataset to predict the length of stay (LOS) for hospitalized patients. The results show that the proposed framework can predict the LOS categories with an AUC of 0.85 and their actual LOS with a mean absolute error of 0.85 days. This framework can support decision-makers in healthcare facilities providing inpatient care to make better front-end operational decisions, such as resource capacity planning and scheduling decisions. Predicting LOS is pivotal in today's healthcare supply chain (HSC) systems where resources are scarce, and demand is abundant due to various global crises and pandemics. Thus, this research's findings have practical and theoretical implications in AI and HSC management.
    Keywords: Artificial Intelligence, Predictive Analytics, Length of Stay, Healthcare Supply Chain, Clinical Decision Support
    Date: 2023–01
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-04263512&r=big
  22. By: Wentao Zhang; Yilei Zhao; Shuo Sun; Jie Ying; Yonggang Xie; Zitao Song; Xinrun Wang; Bo An
    Abstract: Portfolio management (PM) is a fundamental financial trading task, which explores the optimal periodical reallocation of capitals into different stocks to pursue long-term profits. Reinforcement learning (RL) has recently shown its potential to train profitable agents for PM through interacting with financial markets. However, existing work mostly focuses on fixed stock pools, which is inconsistent with investors' practical demand. Specifically, the target stock pool of different investors varies dramatically due to their discrepancy on market states and individual investors may temporally adjust stocks they desire to trade (e.g., adding one popular stocks), which lead to customizable stock pools (CSPs). Existing RL methods require to retrain RL agents even with a tiny change of the stock pool, which leads to high computational cost and unstable performance. To tackle this challenge, we propose EarnMore, a rEinforcement leARNing framework with Maskable stOck REpresentation to handle PM with CSPs through one-shot training in a global stock pool (GSP). Specifically, we first introduce a mechanism to mask out the representation of the stocks outside the target pool. Second, we learn meaningful stock representations through a self-supervised masking and reconstruction process. Third, a re-weighting mechanism is designed to make the portfolio concentrate on favorable stocks and neglect the stocks outside the target pool. Through extensive experiments on 8 subset stock pools of the US stock market, we demonstrate that EarnMore significantly outperforms 14 state-of-the-art baselines in terms of 6 popular financial metrics with over 40% improvement on profit.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.10801&r=big
  23. By: Krolage, Carla; Bachtrögler-Unger, Julia; Dolls, Mathias; Schüle, Paul; Taubenböck, Hannes; Weigand, Matthias
    JEL: R11 O18 H54
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:zbw:vfsc23:277604&r=big
  24. By: Besley, Timothy; Fetzer, Thiemo; Mueller, Hannes
    Abstract: This paper estimates the size of the media multiplier, an easily generalizable model-based measure of how far media coverage magnifies the economic response to shocks. We combine monthly aggregated and anonymized credit card activity data from 114 card issuing countries in 5 destination countries with a large corpus of news coverage in issuing countries reporting on violent events in the destinations. To define and quantify the media multiplier we estimate a model in which latent beliefs, shaped by either events or news coverage, drive card activity. According to the model, media coverage can more than triple the economic impact of an event. We document, through our model, that this effect is highly heterogenous and depends on the broader media representation of countries in each others news. We speculate about the role of the media in driving international travel patterns an.
    Keywords: media; economic behavior; new shocks; 101042703; International Growth Centre
    JEL: O10 F50 D80 F10 L80
    Date: 2023–11–13
    URL: http://d.repec.org/n?u=RePEc:ehl:lserod:120778&r=big
  25. By: George Abi Younes (Ecole polytechnique federale de Lausanne); Gaetan de Rassenfosse (Ecole polytechnique federale de Lausanne)
    Abstract: Recognizing the increasing accessibility and importance of patent data, the paper underscores the need for standardized and transparent data analysis methods. By leveraging the BigQuery language, we illustrate the construction and relevance of commonly used patent indicators derived from Google Patents Public Datasets. The indicators range from citation counts to more advanced metrics like patent text similarity. The code is available in an open Kaggle notebook, explaining operational intricacies and potential data issues. By providing clear, adaptable queries and emphasizing transparent methodologies, this paper hopes to contribute to the standardization and accessibility of patent analysis, offering a valuable resource for researchers and practitioners alike.
    Keywords: BigQuery language; data transparency; patent analytics; patent data
    JEL: O34
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:iip:wpaper:24&r=big
  26. By: Moritz Scherrmann
    Abstract: This paper presents a multi-label topic model for financial texts like ad-hoc announcements, 8-K filings, finance related news or annual reports. I train the model on a new financial multi-label database consisting of 3, 044 German ad-hoc announcements that are labeled manually using 20 predefined, economically motivated topics. The best model achieves a macro F1 score of more than 85%. Translating the data results in an English version of the model with similar performance. As application of the model, I investigate differences in stock market reactions across topics. I find evidence for strong positive or negative market reactions for some topics, like announcements of new Large Scale Projects or Bankruptcy Filings, while I do not observe significant price effects for some other topics. Furthermore, in contrast to previous studies, the multi-label structure of the model allows to analyze the effects of co-occurring topics on stock market reactions. For many cases, the reaction to a specific topic depends heavily on the co-occurrence with other topics. For example, if allocated capital from a Seasoned Equity Offering (SEO) is used for restructuring a company in the course of a Bankruptcy Proceeding, the market reacts positively on average. However, if that capital is used for covering unexpected, additional costs from the development of new drugs, the SEO implies negative reactions on average.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.07598&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.