nep-big 2022-09-26 papers

on Big Data

Issue of 2022‒09‒26
twenty-one papers chosen by
Tom Coupé
University of Canterbury

Using online vacancy and job applicants’ data to study skills dynamics By Bennett, Fidel,; Escudero, Verónica,; Liepmann, Hannah.,; Podjanin, Ana,
Application of Convolutional Neural Networks with Quasi-Reversibility Method Results for Option Forecasting By Zheng Cao; Wenyu Du; Kirill V. Golubnichiy
Macroeconomic Predictions using Payments Data and Machine Learning By James T. E. Chapman; Ajit Desai
Deep Reinforcement Learning Approach for Trading Automation in The Stock Market By Taylan Kabbani; Ekrem Duman
Asset Allocation: From Markowitz to Deep Reinforcement Learning By Ricard Durall
Stock Performance Evaluation for Portfolio Design from Different Sectors of the Indian Stock Market By Jaydip Sen; Arpit Awad; Aaditya Raj; Gourav Ray; Pusparna Chakraborty; Sanket Das; Subhasmita Mishra
Using Online Vacancy and Job Applicants' Data to Study Skills Dynamics By Bennett, Fidel; Escudero, Veronica; Liepmann, Hannah; Podjanin, Ana
Big data forecasting of South African inflation By Byron Botha; Rulof Burger; Kevin Kotze; Neil Rankin; Daan Steenkamp
Index Tracking via Learning to Predict Market Sensitivities By Yoonsik Hong; Yanghoon Kim; Jeonghun Kim; Yongmin Choi
Machine Learning and the Implementable Efficient Frontier By Theis Ingerslev Jensen; Bryan T. Kelly; Semyon Malamud; Lasse Heje Pedersen
Efficient Market Hypothesis Test with Stock Tweets and Natural Language Processing Models By Bolin Mao; Chenhui Chu; Yuta Nakashima; Hajime Nagahara
How and When are High-Frequency Stock Returns Predictable? By Yacine Aït-Sahalia; Jianqing Fan; Lirong Xue; Yifeng Zhou
Does Environmental Policy Uncertainty Hinder Investments Towards a Low-Carbon Economy? By Joelle Noailly; Laura Nowzohour; Matthias van den Heuvel
Understanding intra-day price formation process by agent-based financial market simulation: calibrating the extended chiarella model By Kang Gao; Perukrishnen Vytelingum; Stephen Weston; Wayne Luk; Ce Guo
Urban Resilience and Social Security Uptake: New Zealand Evidence from the Global Financial Crisis and the COVID-19 Pandemic By Cochrane, William; Poot, Jacques; Roskruge, Matthew
Identifying Dominant Industrial Sectors in Market States of the S&P 500 Financial Data By Tobias Wand; Martin He{\ss}ler; Oliver Kamps
A Discussion of Discrimination and Fairness in Insurance Pricing By Mathias Lindholm; Ronald Richman; Andreas Tsanakas; Mario V. W\"uthrich
Deep Weighted Monte Carlo: A hybrid option pricing framework using neural networks By S\'andor Kuns\'agi-M\'at\'e; G\'abor F\'ath; Istv\'an Csabai; G\'abor Moln\'ar-S\'aska
Next-Year Bankruptcy Prediction from Textual Data: Benchmark and Baselines By Henri Arno; Klaas Mulier; Joke Baeck; Thomas Demeester
What Purpose Do Corporations Purport? Evidence from Letters to Shareholders By Rajan, Raghuram G.; Ramella, Pietro; Zingales, Luigi
What Makes a Program Good? Evidence from Short-Cycle Higher Education Programs in Five Developing Countries By Lelys I. Dinarte Diaz; Maria Marta Ferreyra; Sergio S. Urzúa; Marina Bassi

Using online vacancy and job applicants’ data to study skills dynamics

By:	Bennett, Fidel,; Escudero, Verónica,; Liepmann, Hannah.,; Podjanin, Ana,
Abstract:	This paper finds that big data on vacancies and applications to an online job board can be a promising data source for studying skills dynamics, especially in countries where alternative sources are scarce. To show this, we develop a skills taxonomy, assess the characteristics of such online data, and employ natural language processing and machine-learning techniques. The empirical implementation uses data from the Uruguayan job board BuscoJobs, but can be replicated with similar data from other countries.
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:ilo:ilowps:995202692602676&r=

Application of Convolutional Neural Networks with Quasi-Reversibility Method Results for Option Forecasting

By:	Zheng Cao; Wenyu Du; Kirill V. Golubnichiy
Abstract:	This paper presents a novel way to apply mathematical finance and machine learning (ML) to forecast stock options prices. Following results from the paper Quasi-Reversibility Method and Neural Network Machine Learning to Solution of Black-Scholes Equations (appeared on the AMS Contemporary Mathematics journal), we create and evaluate new empirical mathematical models for the Black-Scholes equation to analyze data for 92,846 companies. We solve the Black-Scholes (BS) equation forwards in time as an ill-posed inverse problem, using the Quasi-Reversibility Method (QRM), to predict option price for the future one day. For each company, we have 13 elements including stock and option daily prices, volatility, minimizer, etc. Because the market is so complicated that there exists no perfect model, we apply ML to train algorithms to make the best prediction. The current stage of research combines QRM with Convolutional Neural Networks (CNN), which learn information across a large number of data points simultaneously. We implement CNN to generate new results by validating and testing on sample market data. We test different ways of applying CNN and compare our CNN models with previous models to see if achieving a higher profit rate is possible.
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2208.14385&r=

Macroeconomic Predictions using Payments Data and Machine Learning

By:	James T. E. Chapman; Ajit Desai
Abstract:	Predicting the economy's short-term dynamics -- a vital input to economic agents' decision-making process -- often uses lagged indicators in linear models. This is typically sufficient during normal times but could prove inadequate during crisis periods. This paper aims to demonstrate that non-traditional and timely data such as retail and wholesale payments, with the aid of nonlinear machine learning approaches, can provide policymakers with sophisticated models to accurately estimate key macroeconomic indicators in near real-time. Moreover, we provide a set of econometric tools to mitigate overfitting and interpretability challenges in machine learning models to improve their effectiveness for policy use. Our models with payments data, nonlinear methods, and tailored cross-validation approaches help improve macroeconomic nowcasting accuracy up to 40\% -- with higher gains during the COVID-19 period. We observe that the contribution of payments data for economic predictions is small and linear during low and normal growth periods. However, the payments data contribution is large, asymmetrical, and nonlinear during strong negative or positive growth periods.
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2209.00948&r=

Deep Reinforcement Learning Approach for Trading Automation in The Stock Market

By:	Taylan Kabbani; Ekrem Duman
Abstract:	Deep Reinforcement Learning (DRL) algorithms can scale to previously intractable problems. The automation of profit generation in the stock market is possible using DRL, by combining the financial assets price "prediction" step and the "allocation" step of the portfolio in one unified process to produce fully autonomous systems capable of interacting with their environment to make optimal decisions through trial and error. This work represents a DRL model to generate profitable trades in the stock market, effectively overcoming the limitations of supervised learning approaches. We formulate the trading problem as a Partially Observed Markov Decision Process (POMDP) model, considering the constraints imposed by the stock market, such as liquidity and transaction costs. We then solve the formulated POMDP problem using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm reporting a 2.68 Sharpe Ratio on unseen data set (test data). From the point of view of stock market forecasting and the intelligent decision-making mechanism, this paper demonstrates the superiority of DRL in financial markets over other types of machine learning and proves its credibility and advantages of strategic decision-making.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2208.07165&r=

Asset Allocation: From Markowitz to Deep Reinforcement Learning

By:	Ricard Durall
Abstract:	Asset allocation is an investment strategy that aims to balance risk and reward by constantly redistributing the portfolio's assets according to certain goals, risk tolerance, and investment horizon. Unfortunately, there is no simple formula that can find the right allocation for every individual. As a result, investors may use different asset allocations' strategy to try to fulfil their financial objectives. In this work, we conduct an extensive benchmark study to determine the efficacy and reliability of a number of optimization techniques. In particular, we focus on traditional approaches based on Modern Portfolio Theory, and on machine-learning approaches based on deep reinforcement learning. We assess the model's performance under different market tendency, i.e., both bullish and bearish markets. For reproducibility, we provide the code implementation code in this repository.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2208.07158&r=

Stock Performance Evaluation for Portfolio Design from Different Sectors of the Indian Stock Market

By:	Jaydip Sen; Arpit Awad; Aaditya Raj; Gourav Ray; Pusparna Chakraborty; Sanket Das; Subhasmita Mishra
Abstract:	The stock market offers a platform where people buy and sell shares of publicly listed companies. Generally, stock prices are quite volatile; hence predicting them is a daunting task. There is still much research going to develop more accuracy in stock price prediction. Portfolio construction refers to the allocation of different sector stocks optimally to achieve a maximum return by taking a minimum risk. A good portfolio can help investors earn maximum profit by taking a minimum risk. Beginning with Dow Jones Theory a lot of advancement has happened in the area of building efficient portfolios. In this project, we have tried to predict the future value of a few stocks from six important sectors of the Indian economy and also built a portfolio. As part of the project, our team has conducted a study of the performance of various Time series, machine learning, and deep learning models in stock price prediction on selected stocks from the chosen six important sectors of the economy. As part of building an efficient portfolio, we have studied multiple portfolio optimization theories beginning with the Modern Portfolio theory. We have built a minimum variance portfolio and optimal risk portfolio for all the six chosen sectors by using the daily stock prices over the past five years as training data and have also conducted back testing to check the performance of the portfolio. We look forward to continuing our study in the area of stock price prediction and asset allocation and consider this project as the first stepping stone.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2208.07166&r=

Using Online Vacancy and Job Applicants' Data to Study Skills Dynamics

By:	Bennett, Fidel; Escudero, Veronica (ILO International Labour Organization); Liepmann, Hannah (ILO International Labour Organization); Podjanin, Ana (ILO International Labour Organization)
Abstract:	We assess whether online data on vacancies and applications to a job board are a suitable source for studying skills dynamics outside of Europe and the United States, where a rich literature has examined skills dynamics using online vacancy data. Yet, the knowledge on skills dynamics is scarce for other countries, irrespective of their level of development. We first propose a taxonomy that systematically aggregates three broad categories of skills – cognitive, socioemotional and manual – and fourteen commonly observed and recognizable skills sub-categories, which we define based on unique skills identified through keywords and expressions. Our aim is to develop a taxonomy that is comprehensive but succinct, suitable for the labour market realities of developing and emerging economies and adapted to online vacancies and applicants' data. Using machine-learning techniques, we then develop a methodology that allows implementing the skills taxonomy in online vacancy and applicants' data, thus capturing both the supply and the demand side. Implementing the methodology with Uruguayan data from the job board BuscoJobs, we assign skills to 64 per cent of applicants' employment spells and 94 per cent of vacancies. We consider this a successful implementation since the exploited text information often does not follow a standardized format. The advantage of our approach is its reliance on data that is currently available in many countries across the world, thereby allowing for country-specific analysis that does not need to assume that occupational skills bundles are the same across countries. To the best of our knowledge, we are the first to explore this approach in the context of emerging economies.
Keywords:	online data, job board, skills dynamics, skills taxonomy, natural language processing
JEL:	C81 J24 O33 O54
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp15506&r=

Big data forecasting of South African inflation

By:	Byron Botha (Codera Analytics); Rulof Burger (Department of Economics, University of Stellenbosch, Stellenbosch, 7601, South Africa.); Kevin Kotze (School of Economics, University of Cape Town); Neil Rankin (Predictive Insights, 3 Meson Street, Techno Park, Stellenbosch, 7600, South Africa.); Daan Steenkamp (Codera Analytics and Research Fellow, Department of Economics, Stellenbosch University.)
Abstract:	We investigate whether the use of statistical learning techniques and big data can enhance the accuracy of inflation forecasts. We make use of a large dataset for the disaggregated prices of consumption goods and services, which we partially reconstruct, and a large suite of different statistical learning and traditional time series models. We find that the statistical learning models are able to compete with most benchmarks over medium to longer horizons, despite the fact that we only have a relatively small sample of available data, but are usually inferior over shorter horizons. Our findings suggest that this result may be attributed to the ability of these models to make use of relevant information, when it is available, and may be particularly useful during periods of crisis, when deviations from the steady state are more persistent. We find that the accuracy of the central bank's near-term inflation forecasts compare favourably with those of other models, while the inclusion of off-model information, such as electricity tariff adjustments and other sources of within-month data, provides these models with a competitive advantage. Lastly, we also investigate the relative performance of the different models as we experienced the effects of the pandemic.
JEL:	C10 C11 C52 C55 E31
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:ctn:dpaper:2022-03&r=

Index Tracking via Learning to Predict Market Sensitivities

By:	Yoonsik Hong; Yanghoon Kim; Jeonghun Kim; Yongmin Choi
Abstract:	A significant number of equity funds are preferred by index funds nowadays, and market sensitivities are instrumental in managing them. Index funds might replicate the index identically, which is, however, cost-ineffective and impractical. Moreover, to utilize market sensitivities to replicate the index partially, they must be predicted or estimated accurately. Accordingly, first, we examine deep learning models to predict market sensitivities. Also, we present pragmatic applications of data processing methods to aid training and generate target data for the prediction. Then, we propose a partial-index-tracking optimization model controlling the net predicted market sensitivities of the portfolios and index to be the same. These processes' efficacy is corroborated by the Korea Stock Price Index 200. Our experiments show a significant reduction of the prediction errors compared with historical estimations, and competitive tracking errors of replicating the index using fewer than half of the entire constituents. Therefore, we show that applying deep learning to predict market sensitivities is promising and that our portfolio construction methods are practically effective. Additionally, to our knowledge, this is the first study that addresses market sensitivities focused on deep learning.
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2209.00780&r=

Machine Learning and the Implementable Efficient Frontier

By:	Theis Ingerslev Jensen (Copenhagen Business School); Bryan T. Kelly (Yale SOM; AQR Capital Management, LLC; National Bureau of Economic Research (NBER)); Semyon Malamud (Ecole Polytechnique Federale de Lausanne; Centre for Economic Policy Research (CEPR); Swiss Finance Institute); Lasse Heje Pedersen (AQR Capital Management, LLC; Copenhagen Business School - Department of Finance; New York University (NYU); Centre for Economic Policy Research (CEPR))
Abstract:	We propose that investment strategies should be evaluated based on their net-of-trading-cost return for each level of risk, which we term the "implementable efficient frontier." While numerous studies use machine learning return forecasts to generate portfolios, their agnosticism toward trading costs leads to excessive reliance on fleeting small-scale characteristics, resulting in poor net returns. We develop a framework that produces a superior frontier by integrating trading-cost-aware portfolio optimization with machine learning. The superior net-of-cost performance is achieved by learning directly about portfolio weights using an economic objective. Further, our model gives rise to a new measure of "economic feature importance."
Keywords:	asset pricing, machine learning, transaction costs, economic significance, investments
JEL:	C5 C61 G00 G11 G12
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:chf:rpseri:rp2263&r=

Efficient Market Hypothesis Test with Stock Tweets and Natural Language Processing Models

By:	Bolin Mao (Kyoto Institute of Economic Research, Kyoto University); Chenhui Chu (Graduate School of Informatics, Kyoto University); Yuta Nakashima (Institute for Datability Science, Osaka University); Hajime Nagahara (Institute for Datability Science, Osaka University)
Abstract:	The efficient market hypothesis (EMH) plays a fundamental role in modern financial theory. Previous empirical studies have tested the weak and semi-strong forms of EMH with typical financial data, such as historical stock prices and annual earnings. However, few tests have been extended to include alternative data such as tweets. In this study, we use 1) two stock tweet datasets that have different features and 2) nine natural language processing (NLP)-based deep learning models to test the semi-strong form EMH in the United States stock market. None of our experimental results show that stock tweets with NLP-based models can prominently improve the daily stock price prediction accuracy compared with random guesses. Our experiment provides evidence that the semi-strong form of EMH holds in the United States stock market on a daily basis when considering stock tweet information with the NLP-based models.
Keywords:	Efficient Market Hypothesis Test, Daily Stock Price Prediction, Stock Tweet, Natural Language Processing
JEL:	C4 C5 G1
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:kyo:wpaper:1082&r=

How and When are High-Frequency Stock Returns Predictable?

By:	Yacine Aït-Sahalia; Jianqing Fan; Lirong Xue; Yifeng Zhou
Abstract:	This paper studies the predictability of ultra high-frequency stock returns and durations to relevant price, volume and transactions events, using machine learning methods. We find that, contrary to low frequency and long horizon returns, where predictability is rare and inconsistent, predictability in high frequency returns and durations is large, systematic and pervasive over short horizons. We identify the relevant predictors constructed from trades and quotes data and examine what determines the variation in predictability across different stock's own characteristics and market environments. Next, we compute how the predictability improves with the timeliness of the data on a scale of milliseconds, providing a valuation of each millisecond gained. Finally, we simulate the impact of getting an (imperfect) peek at the incoming order flow, a look ahead ability that is often attributed to the fastest high frequency traders, in terms of improving the predictability of the following returns and durations.
JEL:	C45 C53 C58 G12 G14 G17
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:30366&r=

Does Environmental Policy Uncertainty Hinder Investments Towards a Low-Carbon Economy?

By:	Joelle Noailly; Laura Nowzohour; Matthias van den Heuvel
Abstract:	We use machine learning algorithms to construct a novel news-based index of US environmental and climate policy uncertainty (EnvPU) available on a monthly basis over the 1990-2019 period. We find that our EnvPU index spikes during the environmental spending disputes of the 1995-1996 government shutdown, in the early 2010s due the failure of the national cap-and-trade climate bill and during the Trump presidency. We examine how elevated levels of environmental policy uncertainty relate to investments in the low-carbon economy. In firm-level estimations, we find that a rise in the EnvPU index is associated with a reduced probability for cleantech startups to receive venture capital (VC) funding. In financial markets, a rise in our EnvPU index is associated with higher stock volatility for firms with above-average green revenue shares. At the macro level, shocks in our index lead to declines in the number of cleantech VC deals and higher volatility of the main benchmark clean energy exchange-traded fund. Overall, our results are consistent with the notion that policy uncertainty has adverse effects on investments for the low-carbon economy.
JEL:	C55 D81 E22 Q58
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:30361&r=

Understanding intra-day price formation process by agent-based financial market simulation: calibrating the extended chiarella model

By:	Kang Gao; Perukrishnen Vytelingum; Stephen Weston; Wayne Luk; Ce Guo
Abstract:	This article presents XGB-Chiarella, a powerful new approach for deploying agent-based models to generate realistic intra-day artificial financial price data. This approach is based on agent-based models, calibrated by XGBoost machine learning surrogate. Following the Extended Chiarella model, three types of trading agents are introduced in this agent-based model: fundamental traders, momentum traders, and noise traders. In particular, XGB-Chiarella focuses on configuring the simulation to accurately reflect real market behaviours. Instead of using the original Expectation-Maximisation algorithm for parameter estimation, the agent-based Extended Chiarella model is calibrated using XGBoost machine learning surrogate. It is shown that the machine learning surrogate learned in the proposed method is an accurate proxy of the true agent-based market simulation. The proposed calibration method is superior to the original Expectation-Maximisation parameter estimation in terms of the distance between historical and simulated stylised facts. With the same underlying model, the proposed methodology is capable of generating realistic price time series in various stocks listed at three different exchanges, which indicates the universality of intra-day price formation process. For the time scale (minutes) chosen in this paper, one agent per category is shown to be sufficient to capture the intra-day price formation process. The proposed XGB-Chiarella approach provides insights that the price formation process is comprised of the interactions between momentum traders, fundamental traders, and noise traders. It can also be used to enhance risk management by practitioners.
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2208.14207&r=

Urban Resilience and Social Security Uptake: New Zealand Evidence from the Global Financial Crisis and the COVID-19 Pandemic

By:	Cochrane, William (University of Waikato); Poot, Jacques (University of Waikato); Roskruge, Matthew (Massey University)
Abstract:	This paper focuses on the spatial variation in the uptake of social security benefits following a large and detrimental exogenous shock. Specifically, we focus on the Global Financial Crisis (GFC) and the onset of the COVID-19 pandemic. We construct a two-period panel of 66 Territorial Authorities (TAs) of New Zealand (NZ) observed in 2008-09 and 2020-21. We find that, despite the totally different nature of the two shocks, the initial increase in benefit uptake due to the COVID-19 pandemic was of a similar magnitude as that of the GFC, and the spatial pattern was also quite similar. We link the social security data with 146 indicator variables across 15 domains that were obtained from population censuses that were held two years before each of the two periods. To identify urban characteristics that point to economic resilience, we formulate spatial panel regression models. Additionally, we use machine learning techniques. We find that the most resilient TAs had two years previously: (1) a low unemployment rate; and (2) a large public sector. Additionally, but with less predictive power, we find that TAs had a smaller increase in social security uptake after the shock when they had previously: (3) a high employment rate (or high female labour force participation rate); (4) a smaller proportion of the population stating ethnicities other than NZ European; (5) a smaller proportion of the population living in more deprived area units. We also find that interregional spillovers matter and that resilient regions cluster.
Keywords:	urban economic resilience, social security, Global Financial Crisis, COVID-19, panel data, model selection, spatial econometrics, machine learning
JEL:	C45 C52 H53 R23
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp15510&r=

Identifying Dominant Industrial Sectors in Market States of the S&P 500 Financial Data

By:	Tobias Wand; Martin He{\ss}ler; Oliver Kamps
Abstract:	Understanding and forecasting changing market conditions in complex economic systems like the financial market is of great importance to various stakeholders such as financial institutions and regulatory agencies. Based on the finding that the dynamics of sector correlation matrices of the S&P 500 stock market can be described by a sequence of distinct states via a clustering algorithm, we try to identify the industrial sectors dominating the correlation structure of each state. For this purpose, we use a method from Explainable Artificial Intelligence (XAI) on daily S&P 500 stock market data from 1992 to 2012 to assign relevance scores to every feature of each data point. To compare the significance of the features for the entire data set we develop an aggregation procedure and apply a Bayesian change point analysis to identify the most significant sector correlations. We show that the correlation matrix of each state is dominated only by a few sector correlations. Especially the energy and IT sector are identified as key factors in determining the state of the economy. Additionally we show that a reduced surrogate model, using only the eight sector correlations with the highest XAI-relevance, can replicate 90% of the cluster assignments. In general our findings imply an additional dimension reduction of the dynamics of the financial market.
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2208.14106&r=

A Discussion of Discrimination and Fairness in Insurance Pricing

By:	Mathias Lindholm; Ronald Richman; Andreas Tsanakas; Mario V. W\"uthrich
Abstract:	Indirect discrimination is an issue of major concern in algorithmic models. This is particularly the case in insurance pricing where protected policyholder characteristics are not allowed to be used for insurance pricing. Simply disregarding protected policyholder information is not an appropriate solution because this still allows for the possibility of inferring the protected characteristics from the non-protected ones. This leads to so-called proxy or indirect discrimination. Though proxy discrimination is qualitatively different from the group fairness concepts in machine learning, these group fairness concepts are proposed to 'smooth out' the impact of protected characteristics in the calculation of insurance prices. The purpose of this note is to share some thoughts about group fairness concepts in the light of insurance pricing and to discuss their implications. We present a statistical model that is free of proxy discrimination, thus, unproblematic from an insurance pricing point of view. However, we find that the canonical price in this statistical model does not satisfy any of the three most popular group fairness axioms. This seems puzzling and we welcome feedback on our example and on the usefulness of these group fairness axioms for non-discriminatory insurance pricing.
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2209.00858&r=

Deep Weighted Monte Carlo: A hybrid option pricing framework using neural networks

By:	S\'andor Kuns\'agi-M\'at\'e; G\'abor F\'ath; Istv\'an Csabai; G\'abor Moln\'ar-S\'aska
Abstract:	Recent studies have demonstrated the efficiency of Variational Autoencoders (VAE) to compress high-dimensional implied volatility surfaces. The encoder part of the VAE plays the role of a calibration operation which maps the vol surface into a low dimensional latent space representing the most relevant implicit model parameters. The decoder part of the VAE performs a pricing operation and reconstructs the vol surface from the latent (model) space. Since this decoder module predicts volatilities of vanilla options directly, it does not provide any explicit information about the dynamics of the underlying asset. It is unclear how the latent model could be used to price exotic, non-vanilla options. In this paper we demonstrate an effective way to overcome this problem. We use a Weighted Monte Carlo approach to first generate paths from a simple a priori Brownian dynamics, and then calculate path weights to price options correctly. We develop and successfully train a neural network that is able to assign these weights directly from the latent space. Combining the encoder network of the VAE and this new "weight assigner" module we are able to build a dynamic pricing framework which cleanses the volatility surface from irrelevant noise fluctuations, and then can price not just vanillas, but also exotic options on this idealized vol surface. This pricing method can provide relative value signals for option traders as well.
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2208.14038&r=

Next-Year Bankruptcy Prediction from Textual Data: Benchmark and Baselines

By:	Henri Arno; Klaas Mulier; Joke Baeck; Thomas Demeester
Abstract:	Models for bankruptcy prediction are useful in several real-world scenarios, and multiple research contributions have been devoted to the task, based on structured (numerical) as well as unstructured (textual) data. However, the lack of a common benchmark dataset and evaluation strategy impedes the objective comparison between models. This paper introduces such a benchmark for the unstructured data scenario, based on novel and established datasets, in order to stimulate further research into the task. We describe and evaluate several classical and neural baseline models, and discuss benefits and flaws of different strategies. In particular, we find that a lightweight bag-of-words model based on static in-domain word representations obtains surprisingly good results, especially when taking textual data from several years into account. These results are critically assessed, and discussed in light of particular aspects of the data and the task. All code to replicate the data and experimental results will be released.
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2208.11334&r=

What Purpose Do Corporations Purport? Evidence from Letters to Shareholders

By:	Rajan, Raghuram G.; Ramella, Pietro; Zingales, Luigi
Abstract:	Using natural language processing, we identify and categorize the corporate goals in the shareholder letters of the 150 largest companies in the United States, from 1955 to 2020. Corporate goals have proliferated during this period from an average of two in 1955 to almost 10 in 2020. We find a variety of factors are associated with a corporation stating a specific goal including advertising a firm's strengths, promising improved performance, signaling a commitment to specific constituencies, building societal legitimacy, and conforming to the behavior of other corporations. In spite of the proliferation of corporate goals, executive compensation is still overwhelmingly based on shareholder value, as measured by stock prices and financial performance. Yet, we do observe a rise in bonus payments made contingent on social and environmental objectives, especially among the signatories of the 2019 Business Roundtable statement on corporate purpose.
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:zbw:cbscwp:314&r=

What Makes a Program Good? Evidence from Short-Cycle Higher Education Programs in Five Developing Countries

By:	Lelys I. Dinarte Diaz; Maria Marta Ferreyra; Sergio S. Urzúa; Marina Bassi
Abstract:	Short-cycle higher education programs (SCPs) can play a central role in skill development and higher education expansion, yet their quality varies greatly within and among countries. In this paper we explore the relationship between programs’ practices and inputs (quality determinants) and student academic and labor market outcomes. We design and conduct a novel survey to collect program-level information on quality determinants and average outcomes for Brazil, Colombia, Dominican Republic, Ecuador, and Peru. Categories of quality determinants include training and curriculum, infrastructure, faculty, link with productive sector, costs and funding, and other practices on student admission and institutional governance. We also collect administrative, student-level data on higher education and formal employment for SCP students in Brazil and Ecuador and match it to survey data. Using machine learning methods, we select the quality determinants that predict outcomes at the program and student levels. Estimates indicate that some quality determinants may favor academic and labor market outcomes while others may hinder them. Two practices predict improvements in all labor market outcomes in Brazil and Ecuador—teaching numerical competencies and providing job market information—and one practice—teaching numerical competencies—additionally predicts improvements in labor market outcomes for all survey countries. Since quality determinants account for 20-40 percent of the explained variation in student-level outcomes, estimates indicate a role for quality determinants to shrink the quality gap among programs. These findings have implications for the design and replication of high-quality SCPs, their regulation, and the development of information systems.
JEL:	I2 J24
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:30364&r=

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.