nep-big New Economics Papers
on Big Data
Issue of 2023‒04‒17
twenty-one papers chosen by
Tom Coupé
University of Canterbury

  1. Deep hybrid model with satellite imagery: how to combine demand modeling and computer vision for behavior analysis? By Qingyi Wang; Shenhao Wang; Yunhan Zheng; Hongzhou Lin; Xiaohu Zhang; Jinhua Zhao; Joan Walker
  2. Is COVID-19 reflected in AnaCredit dataset? A big data - machine learning approach for analysing behavioural patterns using loan level granular information By Anastasios Petropoulos; Evangelos Stavroulakis; Panagiotis Lazaris; Vasilis Siakoulis; Nikolaos Vlachogiannakis
  3. Forecasting the movements of Bitcoin prices: an application of machine learning algorithms By Hakan Pabuccu; Serdar Ongan; Ayse Ongan
  4. The Impact of Feature Selection and Transformation on Machine Learning Methods in Determining the Credit Scoring By Oguz Koc; Omur Ugur; A. Sevtap Kestel
  5. Comparing Out-of-Sample Performance of Machine Learning Methods to Forecast U.S. GDP Growth By Ba Chu; Shafiullah Qureshi
  6. Predicting Poverty with Missing Incomes By Paolo Verme
  7. Stock Trend Prediction: A Semantic Segmentation Approach By Shima Nabiee; Nader Bagherzadeh
  8. Langevin algorithms for Markovian Neural Networks and Deep Stochastic control By Pierre Bras; Gilles Pagès
  9. How to conduct impact evaluations in humanitarian and conflict settings By Aysegül Kayaoglu; Ghassan Baliki; Tilman Brück; Melodie Al Daccache; Dorothee Weiffen
  10. The Impact of Voice and Accountability in the ESG Framework in a Global Perspective By Costantiello, Alberto; Leogrande, Angelo
  11. Collusion and Artificial Intelligence: A computational experiment with sequential pricing algorithms under stochastic costs By Gonzalo Ballestero
  12. Research on CPI Prediction Based on Natural Language Processing By Xiaobin Tang; Nuo Lei
  13. Collusion and Artificial Intelligence: A computational experiment with sequential pricing algorithms under stochastic costs By Gonzalo Ballestero
  14. A parsimonious neural network approach to solve portfolio optimization problems without using dynamic programming By Pieter M. van Staden; Peter A. Forsyth; Yuying Li
  15. Improving CNN-base Stock Trading By Considering Data Heterogeneity and Burst By Keer Yang; Guanqun Zhang; Chuan Bi; Qiang Guan; Hailu Xu; Shuai Xu
  16. A Robustness Analysis of Newspaper-based Indices By Roman Valovic; Daniel Pastorek
  17. Style Miner: Find Significant and Stable Explanatory Factors in Time Series with Constrained Reinforcement Learning By Dapeng Li; Feiyang Pan; Jia He; Zhiwei Xu; Dandan Tu; Guoliang Fan
  18. Analysing the response of U.S. financial market to the Federal Open Market Committee statements and minutes based on computational linguistic approaches By Xuefan, Pan
  19. What Purpose Do Corporations Purport? Evidence from Letters to Shareholders By Raghuram Rajan; Pietro Ramella; Luigi Zingales
  20. Sentiment, Productivity, and Economic Growth By George M. Constantinides; Maurizio Montone; Valerio Potì; Stella Spilioti
  21. Strategic Trading in Quantitative Markets through Multi-Agent Reinforcement Learning By Hengxi Zhang; Zhendong Shi; Yuanquan Hu; Wenbo Ding; Ercan E. Kuruoglu; Xiao-Ping Zhang

  1. By: Qingyi Wang; Shenhao Wang; Yunhan Zheng; Hongzhou Lin; Xiaohu Zhang; Jinhua Zhao; Joan Walker
    Abstract: Classical demand modeling analyzes travel behavior using only low-dimensional numeric data (i.e. sociodemographics and travel attributes) but not high-dimensional urban imagery. However, travel behavior depends on the factors represented by both numeric data and urban imagery, thus necessitating a synergetic framework to combine them. This study creates a theoretical framework of deep hybrid models with a crossing structure consisting of a mixing operator and a behavioral predictor, thus integrating the numeric and imagery data into a latent space. Empirically, this framework is applied to analyze travel mode choice using the MyDailyTravel Survey from Chicago as the numeric inputs and the satellite images as the imagery inputs. We found that deep hybrid models outperform both the traditional demand models and the recent deep learning in predicting the aggregate and disaggregate travel behavior with our supervision-as-mixing design. The latent space in deep hybrid models can be interpreted, because it reveals meaningful spatial and social patterns. The deep hybrid models can also generate new urban images that do not exist in reality and interpret them with economic theory, such as computing substitution patterns and social welfare changes. Overall, the deep hybrid models demonstrate the complementarity between the low-dimensional numeric and high-dimensional imagery data and between the traditional demand modeling and recent deep learning. It generalizes the latent classes and variables in classical hybrid demand models to a latent space, and leverages the computational power of deep learning for imagery while retaining the economic interpretability on the microeconomics foundation.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.04204&r=big
  2. By: Anastasios Petropoulos (Bank of Greece); Evangelos Stavroulakis (Bank of Greece); Panagiotis Lazaris (Bank of Greece); Vasilis Siakoulis (Bank of Greece); Nikolaos Vlachogiannakis (Bank of Greece)
    Abstract: In this study, we explore the impact of COVID-19 pandemic on the default risk of loan portfolios of the Greek banking system, using cutting edge machine learning technologies, like deep learning. Our analysis is based on loan level monthly data, spanning a 42-month period, collected through the ECB AnaCredit database. Our dataset contains more than three million records, including both the pre- and post-pandemic periods. We develop a series of credit rating models implementing state of the art machine learning algorithms. Through an extensive validation process, we explore the best machine learning technique to build a behavioral credit scoring model and subsequently we investigate the estimated sensitivities of various features on predicting default risk. To select the best candidate model, we perform comparisons of the classification accuracy of the proposed methods, in 2-months out-of-time period. Our empirical results indicate that the Deep Neural Networks (DNN) have a superior predictive performance, signalling better generalization capacity against Random Forests, Extreme Gradient Boosting (XGBoost), and logistic regression. The proposed DNN model can accurately simulate the non-linearities caused by the pandemic outbreak on the evolution of default rates for Greek corporate customers. Under this multivariate setup we apply interpretability algorithms to isolate the impact of COVID-19 on the probability of default, controlling for the rest of the features of the DNN. Our results indicate that the impact of the pandemic peaks in the first year, and then it slowly decreases, though without reaching yet the pre COVID-19 levels. Furthermore, our empirical results also suggest different behavioral patterns between Stage 1 and Stage 2 loans, and that default rate sensitivities vary significantly across sectors. The current empirical work can facilitate a more in-depth analysis of AnaCredit database, by providing robust statistical tools for a more effective and responsive micro and macro supervision of credit risk.
    Keywords: Credit Risk;Deep Learning; AnaCredit; COVID-19
    JEL: G24 C38 C45 C55
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:bog:wpaper:315&r=big
  3. By: Hakan Pabuccu; Serdar Ongan; Ayse Ongan
    Abstract: Cryptocurrencies, such as Bitcoin, are one of the most controversial and complex technological innovations in today's financial system. This study aims to forecast the movements of Bitcoin prices at a high degree of accuracy. To this aim, four different Machine Learning (ML) algorithms are applied, namely, the Support Vector Machines (SVM), the Artificial Neural Network (ANN), the Naive Bayes (NB) and the Random Forest (RF) besides the logistic regression (LR) as a benchmark model. In order to test these algorithms, besides existing continuous dataset, discrete dataset was also created and used. For the evaluations of algorithm performances, the F statistic, accuracy statistic, the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE) and the Root Absolute Error (RAE) metrics were used. The t test was used to compare the performances of the SVM, ANN, NB and RF with the performance of the LR. Empirical findings reveal that, while the RF has the highest forecasting performance in the continuous dataset, the NB has the lowest. On the other hand, while the ANN has the highest and the NB the lowest performance in the discrete dataset. Furthermore, the discrete dataset improves the overall forecasting performance in all algorithms (models) estimated.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.04642&r=big
  4. By: Oguz Koc; Omur Ugur; A. Sevtap Kestel
    Abstract: Banks utilize credit scoring as an important indicator of financial strength and eligibility for credit. Scoring models aim to assign statistical odds or probabilities for predicting if there is a risk of nonpayment in relation to many other factors which may be involved in. This paper aims to illustrate the beneficial use of the eight machine learning (ML) methods (Support Vector Machine, Gaussian Naive Bayes, Decision Trees, Random Forest, XGBoost, K-Nearest Neighbors, Multi-layer Perceptron Neural Networks) and Logistic Regression in finding the default risk as well as the features contributing to it. An extensive comparison is made in three aspects: (i) which ML models with and without its own wrapper feature selection performs the best; (ii) how feature selection combined with appropriate data scaling method influences the performance; (iii) which of the most successful combination (algorithm, feature selection, and scaling) delivers the best validation indicators such as accuracy rate, Type I and II errors and AUC. An open-access credit scoring default risk data sets on German and Australian cases are taken into account, for which we determine the best method, scaling, and features contributing to default risk best and compare our findings with the literature ones in related. We illustrate the positive contribution of the selection method and scaling on the performance indicators compared to the existing literature.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.05427&r=big
  5. By: Ba Chu (Department of Economics, Carleton University); Shafiullah Qureshi (Department of Economics, Carleton University)
    Abstract: We run a 'horse race' among popular forecasting methods, including machine learning (ML) and deep learning (DL) methods, employed to forecast U.S. GDP growth. Given the unstable nature of GDP growth data, we implement a recursive forecasting strategy to calculate the out-of-sample performance metrics of forecasts for multiple subperiods. We use three sets of predictors: a large set of 224 predictors [of U.S. GDP growth] taken from a large quarterly macroeconomic database (namely, FRED-QD), a small set of nine strong predictors selected from the large set, and another small set including these nine strong predictors together with a high-frequency business condition index. We then obtain the following three main findings: (1) when forecasting with a large number of predictors with mixed predictive power, density-based ML methods (such as bagging or boosting) can outperform sparsity-based methods (such as Lasso) for long-horizon forecast, but this is not necessarily the case for short-horizon forecast; (2) density-based ML methods tend to perform better with a large set of predictors than with a small subset of strong predictors; and (3) parsimonious models using a strong high-frequency predictor can outperform sophisticated ML and DL models using a large number of low-frequency predictors, highlighting the important role of predictors in economic forecasting. We also find that ensemble ML methods (which are the special cases of density-based ML methods) can outperform popular DL methods.
    Keywords: Lasso, Ridge Regression, Random Forest, Boosting Algorithms, Artifical Neural Networks, Dimensional Reduction Methods, MIDAS, GDP growth
    Date: 2021–10–30
    URL: http://d.repec.org/n?u=RePEc:car:carecp:21-12&r=big
  6. By: Paolo Verme (World Bank)
    Abstract: Poverty prediction models are used by economists to address missing data issues in a variety of contexts such as poverty profiling, targeting with proxy-means tests, cross-survey imputations such as poverty mapping, or vulnerability analyses. Based on the models used by this literature, this paper conducts an experiment by artificially corrupting data with different patterns and shares of missing incomes. It then compares the capacity of classic econometric and machine learning models to predict poverty under these different scenarios. It finds that the quality of predictions and the choice of the optimal prediction model are dependent on the distribution of observed and unobserved incomes, the poverty line, the choice of objective function and policy preferences, and various other modeling choices. Logistic and random forest models are found to be more robust than other models to variations in these features, but no model invariably outperforms all others. The paper concludes with some reflections on the use of these models for predicting poverty.
    Keywords: Income modeling, Income Distributions, Poverty Predictions
    JEL: D31 D63 E64 O15
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:inq:inqwps:ecineq2023-642&r=big
  7. By: Shima Nabiee; Nader Bagherzadeh
    Abstract: Market financial forecasting is a trending area in deep learning. Deep learning models are capable of tackling the classic challenges in stock market data, such as its extremely complicated dynamics as well as long-term temporal correlation. To capture the temporal relationship among these time series, recurrent neural networks are employed. However, it is difficult for recurrent models to learn to keep track of long-term information. Convolutional Neural Networks have been utilized to better capture the dynamics and extract features for both short- and long-term forecasting. However, semantic segmentation and its well-designed fully convolutional networks have never been studied for time-series dense classification. We present a novel approach to predict long-term daily stock price change trends with fully 2D-convolutional encoder-decoders. We generate input frames with daily prices for a time-frame of T days. The aim is to predict future trends by pixel-wise classification of the current price frame. We propose a hierarchical CNN structure to encode multiple price frames to multiscale latent representation in parallel using Atrous Spatial Pyramid Pooling blocks and take that temporal coarse feature stacks into account in the decoding stages. Our hierarchical structure of CNNs makes it capable of capturing both long and short-term temporal relationships effectively. The effect of increasing the input time horizon via incrementing parallel encoders has been studied with interesting and substantial changes in the output segmentation masks. We achieve overall accuracy and AUC of %78.18 and 0.88 for joint trend prediction over the next 20 days, surpassing other semantic segmentation approaches. We compared our proposed model with several deep models specifically designed for technical analysis and found that for different output horizons, our proposed models outperformed other models.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.09323&r=big
  8. By: Pierre Bras (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistique et Modélisation - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique - UPCité - Université Paris Cité); Gilles Pagès (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistique et Modélisation - UPD7 - Université Paris Diderot - Paris 7 - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique)
    Abstract: Stochastic Gradient Descent Langevin Dynamics (SGLD) algorithms, which add noise to the classic gradient descent, are known to improve the training of neural networks in some cases where the neural network is very deep. In this paper we study the possibilities of training acceleration for the numerical resolution of stochastic control problems through gradient descent, where the control is parametrized by a neural network. If the control is applied at many discretization times then solving the stochastic control problem reduces to minimizing the loss of a very deep neural network. We numerically show that Langevin algorithms improve the training on various stochastic control problems like hedging and resource management, and for different choices of gradient descent methods.
    Keywords: Langevin algorithm, SGLD, Markovian neural network, Stochastic control, Deep neural network, Stochastic optimization
    Date: 2022–12–22
    URL: http://d.repec.org/n?u=RePEc:hal:wpaper:hal-03980632&r=big
  9. By: Aysegül Kayaoglu (ISDC - International Security and Development Center, Germany; Department of Economics, Istanbul Technical University, Türkiye; IMIS, University of Osnabrück, Germany); Ghassan Baliki (ISDC - International Security and Development Center, Germany); Tilman Brück (Humboldt-University of Berlin, Germany; ISDC - International Security and Development Center, Berlin, Germany; Thaer-Institute, Humboldt-University of Berlin, Germany; Leibniz Institute of Vegetable and Ornamental Crops (IGZ), Germany); Melodie Al Daccache (American University of Beirut, Lebanon); Dorothee Weiffen (ISDC - International Security and Development Center, Germany)
    Abstract: Methodological, ethical and practical challenges make it difficult to use experimental and rigorous quasi-experimental approaches to conduct impact evaluations in humanitarian emergencies and conflict settings (HECS). This paper discusses recent developments in the design, measurement, data and analysis of impact evaluations that can overcome these challenges and provide concrete examples from our recent research where we analyse the impact of agricultural emergency interventions in post-war Syria. More specifically, the paper offers solutions: First, discuss the challenges in designing rapid and rigorous impact evaluations in HECS. By doing so, we mainly show alternative ways to construct counterfactuals in the absence of meaningful control groups; Second, we review how researchers can use additional data sources to create a counterfactual or even data on treated units when it is difficult to collect data and in some cases provide ethical and methodological benefits in addition to providing cost-effectiveness. Third, we argue that finding and fine-tuning proxy measures for the ‘unmeasurable’ concepts and outcomes such as resilience and fragility are crucial. Fourth, we highlight how adaptive machine learning algorithms are helpful in rigorous impact evaluations in HECS to overcome the drawbacks related to data availability and heterogeneity analysis. We provide an example from our recent work where we use honest causal forest estimation to test the heterogeneous impact of an agricultural intervention when sample sizes are small. Fifth, we discuss how standardisation across methods, data and measures ensures the external validity and transferability of the evidence to other complex settings where impact evaluation is challenging to conduct. Finally, the paper recommends how future research and policy can adapt these tools to ensure significant and effective learning in conflict-affected and humanitarian settings.
    Keywords: impact evaluation, research design, machine learning, conflict setting, humanitarian emergencies
    JEL: C18 C30 C80 D04 D74 Q34
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:hic:wpaper:387&r=big
  10. By: Costantiello, Alberto; Leogrande, Angelo
    Abstract: We estimate the value of Voice and Accountability-VA in the context of the Environmental, Social and Governance-ESG data of the World Bank using data from 193 countries in the period 2011-2021. We use Panel Data with Fixed Effects, Panel Data with Random Effects and Pooled Ordinary Least Squares-OLS. We found that the level of VA is positively associated, among others, to “Maximum 5-Day Rainfall”, and “Mortality Rate Under 5” and negatively associated, among others, to “Adjusted Savings: Natural Resources Depletion”, and “Annualized Average Growth Rate in Per Capita Real Survey Mean Consumption or Income”. Furthermore, we apply the k-Means algorithm optimized with the Elbow Method. We found the k-Means useless due to the low variance of the variable among countries with the result of a hyper-concentration of elements in a unique cluster. Finally, we confront eight machine-learning algorithms for the prediction of VA. Polynomial Regression is the best predictive algorithm according to R-Squared, MAE, MSE and RMSE. The level of VA is expected to growth on average of 2.92% for the treated countries.
    Keywords: Analysis of Collective Decision-Making; General; Political Processes: Rent-Seeking; Lobbying; Elections; Legislatures; and Voting Behaviour; Bureaucracy; Administrative Processes in Public Organizations; Corruption; Positive Analysis of Policy Formulation; Implementation.
    JEL: D7 D70 D72 D73 D78
    Date: 2023–03–23
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:116805&r=big
  11. By: Gonzalo Ballestero
    Keywords: Artificial Intelligence, Algorithmic Collusion, Competition Policy
    JEL: D43 L23
    Date: 2021–11
    URL: http://d.repec.org/n?u=RePEc:aep:anales:4433&r=big
  12. By: Xiaobin Tang; Nuo Lei
    Abstract: In the past, the seed keywords for CPI prediction were often selected based on empirical summaries of research and literature studies, which were prone to select omitted and invalid variables. In this paper, we design a keyword expansion technique for CPI prediction based on the cutting-edge NLP model, PANGU. We improve the CPI prediction ability using the corresponding web search index. Compared with the unsupervised pre-training and supervised downstream fine-tuning natural language processing models such as BERT and NEZHA, the PANGU model can be expanded to obtain more reliable CPI-generated keywords by its excellent zero-sample learning capability without the limitation of the downstream fine-tuning data set. Finally, this paper empirically tests the keyword prediction ability obtained by this keyword expansion method with historical CPI data.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.05666&r=big
  13. By: Gonzalo Ballestero (Department of Economics, Universidad de San Andres)
    Abstract: Firms increasingly delegate their strategic decisions to algorithms. A potential con- cern is that algorithms may undermine competition by leading to pricing outcomes that are collusive, even without having been designed to do so. This paper investigates whether Q-learning algorithms can learn to collude in a setting with sequential price competition and stochastic marginal costs adapted from Maskin and Tirole (1988). By extending a previous model developed in Klein (2021), I find that sequential Q-learning algorithms leads to supracompetitive profits despite they compete under uncertainty and this finding is robust to various extensions. The algorithms can coordinate on focal price equilibria or an Edgeworth cycle provided that uncertainty is not too large. However, as the market environment becomes more uncertain, price wars emerge as the only possible pricing pattern. Even though sequential Q-learning algorithms gain supracompetitive profits, uncertainty tends to make collusive outcomes more difficult to achieve.
    Keywords: Competition Policy
    Date: 2021–11
    URL: http://d.repec.org/n?u=RePEc:sad:ypaper:1&r=big
  14. By: Pieter M. van Staden; Peter A. Forsyth; Yuying Li
    Abstract: We present a parsimonious neural network approach, which does not rely on dynamic programming techniques, to solve dynamic portfolio optimization problems subject to multiple investment constraints. The number of parameters of the (potentially deep) neural network remains independent of the number of portfolio rebalancing events, and in contrast to, for example, reinforcement learning, the approach avoids the computation of high-dimensional conditional expectations. As a result, the approach remains practical even when considering large numbers of underlying assets, long investment time horizons or very frequent rebalancing events. We prove convergence of the numerical solution to the theoretical optimal solution of a large class of problems under fairly general conditions, and present ground truth analyses for a number of popular formulations, including mean-variance and mean-conditional value-at-risk problems. We also show that it is feasible to solve Sortino ratio-inspired objectives (penalizing only the variance of wealth outcomes below the mean) in dynamic trading settings with the proposed approach. Using numerical experiments, we demonstrate that if the investment objective functional is separable in the sense of dynamic programming, the correct time-consistent optimal investment strategy is recovered, otherwise we obtain the correct pre-commitment (time-inconsistent) investment strategy. The proposed approach remains agnostic as to the underlying data generating assumptions, and results are illustrated using (i) parametric models for underlying asset returns, (ii) stationary block bootstrap resampling of empirical returns, and (iii) generative adversarial network (GAN)-generated synthetic asset returns.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.08968&r=big
  15. By: Keer Yang; Guanqun Zhang; Chuan Bi; Qiang Guan; Hailu Xu; Shuai Xu
    Abstract: In recent years, there have been quite a few attempts to apply intelligent techniques to financial trading, i.e., constructing automatic and intelligent trading framework based on historical stock price. Due to the unpredictable, uncertainty and volatile nature of financial market, researchers have also resorted to deep learning to construct the intelligent trading framework. In this paper, we propose to use CNN as the core functionality of such framework, because it is able to learn the spatial dependency (i.e., between rows and columns) of the input data. However, different with existing deep learning-based trading frameworks, we develop novel normalization process to prepare the stock data. In particular, we first empirically observe that the stock data is intrinsically heterogeneous and bursty, and then validate the heterogeneity and burst nature of stock data from a statistical perspective. Next, we design the data normalization method in a way such that the data heterogeneity is preserved and bursty events are suppressed. We verify out developed CNN-based trading framework plus our new normalization method on 29 stocks. Experiment results show that our approach can outperform other comparing approaches.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.09407&r=big
  16. By: Roman Valovic (Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemedelska 1, 613 00 Brno, Czech Republic); Daniel Pastorek (Department of Finance, Faculty of Business and Economics, Mendel University in Brno, Zemedelska 1, 613 00 Brno, Czech Republic)
    Abstract: In this paper, we subject the methodology for newspaper-based indices to several tests of robustness, to address the potential problems of this proposed approach. Firstly, we examine the strong dependency between the selected keywords and the entered query. We do this using state-of-the-art language models, such as BERT, to automatically select relevant articles to build the index. Secondly, we propose that the weighting of articles partly allows for the control of the context of the articles and potential errors in the incorrect identification of articles, which leads to more stable index results. Finally, we track composition changes in newspaper articles, which have been evolving over time. The implications of these tests may be of interest to the users of these indices as well as suggesting a future direction for this approach.
    Keywords: newspapers, economic-policy uncertainty, EPU index, NLP, text-mining, similarity search
    JEL: C80 D80 E66
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:men:wpaper:89_2023&r=big
  17. By: Dapeng Li; Feiyang Pan; Jia He; Zhiwei Xu; Dandan Tu; Guoliang Fan
    Abstract: In high-dimensional time-series analysis, it is essential to have a set of key factors (namely, the style factors) that explain the change of the observed variable. For example, volatility modeling in finance relies on a set of risk factors, and climate change studies in climatology rely on a set of causal factors. The ideal low-dimensional style factors should balance significance (with high explanatory power) and stability (consistent, no significant fluctuations). However, previous supervised and unsupervised feature extraction methods can hardly address the tradeoff. In this paper, we propose Style Miner, a reinforcement learning method to generate style factors. We first formulate the problem as a Constrained Markov Decision Process with explanatory power as the return and stability as the constraint. Then, we design fine-grained immediate rewards and costs and use a Lagrangian heuristic to balance them adaptively. Experiments on real-world financial data sets show that Style Miner outperforms existing learning-based methods by a large margin and achieves a relatively 10% gain in R-squared explanatory power compared to the industry-renowned factors proposed by human experts.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.11716&r=big
  18. By: Xuefan, Pan (University of Warwick)
    Abstract: I conduct content analysis and extent the existing models of analysing the reaction of the stock market and foreign currency markets to the release of Federal Open Market Committee (FOMC) statements and meeting minutes. The tone changes and uncertainty level of the monetary policy communication are constructed using the dictionary-based word-count approach at the whole document level. I further apply the Latent Dirichlet Allocation (LDA) algorithm to investigate the different impacts of topics in the meeting minutes. High-frequency data is used as the analysis is an event study. I find that the tone change and uncertainty level have limited explanation power on the magnitude of the effect of the release of FOMC documents especially statements on the financial market. The communication from FOMC is more informative for the market during the zero lower bound period, compared to the whole sample period.
    Keywords: Monetary policy ;Communication ; Text Mining JEL Classification: E52 ; E58
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:wrk:wrkesp:43&r=big
  19. By: Raghuram Rajan; Pietro Ramella; Luigi Zingales
    Abstract: Using natural language processing, we identify corporate goals stated in the shareholder letters of the 150 largest companies in the United States from 1955 to 2020. Corporate goals have proliferated, from less than one on average in 1955 to more than 7 in 2020. While in 1955, profit maximization, market share growth, and customer service were dominant goals, today almost all companies proclaim social and environmental goals as well. We examine why firms announce goals and when. We find goal announcements are associated with management’s responses to the firm’s (possibly changed) circumstances, with the changing power and preferences of key constituencies, as well as from management’s attempts to deflect scrutiny. While executive compensation is still overwhelmingly based on financial performance, we do observe a rise in bonus payments contingent on meeting social and environmental objectives. Firms that announce environmental and social goals tend to implement programs intended to achieve those goals, although their impact on outcomes is unclear. The evidence is consistent with firms focusing on shareholder interests while incorporating stakeholder interests as interim goals. Goals also do seem to be announced opportunistically to deflect attention and alleviate pressure on management.
    JEL: G30 L21
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:31054&r=big
  20. By: George M. Constantinides; Maurizio Montone; Valerio Potì; Stella Spilioti
    Abstract: Previous research finds correlation between sentiment and future economic growth, but disagrees on the channel that explains this result. In this paper, we shed new light on this issue by exploiting cross-country variation in sentiment and market efficiency. We find that sentiment shocks in G7 countries increase economic activity, but only temporarily and without affecting productivity. By contrast, sentiment shocks in non-G7 countries predict prolonged economic growth and a corresponding increase in productivity. The results suggest that sentiment can indeed create economic booms, but only in less advanced economies where noisy asset prices make sentiment and fundamentals harder to disentangle.
    JEL: G10 G30 F36 F43
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:31031&r=big
  21. By: Hengxi Zhang; Zhendong Shi; Yuanquan Hu; Wenbo Ding; Ercan E. Kuruoglu; Xiao-Ping Zhang
    Abstract: Due to the rapid dynamics and a mass of uncertainties in the quantitative markets, the issue of how to take appropriate actions to make profits in stock trading remains a challenging one. Reinforcement learning (RL), as a reward-oriented approach for optimal control, has emerged as a promising method to tackle this strategic decision-making problem in such a complex financial scenario. In this paper, we integrated two prior financial trading strategies named constant proportion portfolio insurance (CPPI) and time-invariant portfolio protection (TIPP) into multi-agent deep deterministic policy gradient (MADDPG) and proposed two specifically designed multi-agent RL (MARL) methods: CPPI-MADDPG and TIPP-MADDPG for investigating strategic trading in quantitative markets. Afterward, we selected 100 different shares in the real financial market to test these specifically proposed approaches. The experiment results show that CPPI-MADDPG and TIPP-MADDPG approaches generally outperform the conventional ones.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.11959&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.