nep-big 2022-11-28 papers

on Big Data

Issue of 2022‒11‒28
thirty-one papers chosen by
Tom Coupé
University of Canterbury

Gender, Sex, and the Constraints of Machine Learning Methods By Lockhart, Jeffrey W
Predicting Politicians Misconduct: Evidence From Colombia By Gallego, J; Prem, M; Vargas, J. F.
Predicting Politicians' Misconduct: Evidence from Colombia By Gallego, Jorge; Prem, Mounu; Vargas, Juan F.
Modeling Machine Learning By Andrew Caplin; Daniel J. Martin; Philip Marx
Mapping the Knowledge Space: Exploiting Unassisted Machine Learning Tools By Florenta Teodoridis; Jino Lu; Jeffrey L. Furman
Recovering Missing Firm Characteristics with Attention-Based Machine Learning By Beckmeyer, Heiner; Wiedemann, Timo
Artificial Intelligence, the Evolution of the Healthcare Value Chain, and the Future of the Physician By David Dranove; Craig Garthwaite
Measuring the environmental impacts of artificial intelligence compute and applications: The AI footprint By OECD
A Multivariate Analysis of Technology and Education in the 21st Century: Antecedents and Determinants By Rocque, Sarvesh Raj
Asymptotic expansion and deep neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with nonlinear coefficients By Akihiko Takahashi; Toshihiro Yamada
Rating Triggers for Collateral-Inclusive XVA via Machine Learning and SDEs on Lie Groups By Kevin Kamm; Michelle Muniz
Reservoir Computing for Macroeconomic Forecasting with Mixed Frequency Data By Giovanni Ballarin; Petros Dellaportas; Lyudmila Grigoryeva; Marcel Hirt; Sophie van Huellen; Juan-Pablo Ortega
Twitter and Crime: The Effect of Social Movements on GenderBased Violence By Michele Battisti; Ilpo Kauppinen; Britta Rude
Deep neural network expressivity for optimal stopping problems By Lukas Gonon
State-dependent asset allocation using neural networks By Bradrania, Reza; Pirayesh Neghab, Davood
Estimation of Heterogeneous Treatment Effects Using a Conditional Moment Based Approach By Xiaolin Sun
AI, Skill, and Productivity: The Case of Taxi Drivers By Kyogo Kanazawa; Daiji Kawaguchi; Hitoshi Shigeoka; Yasutora Watanabe
Newton Raphson Emulation Network for Highly Efficient Computation of Numerous Implied Volatilities By Geon Lee; Tae-Kyoung Kim; Hyun-Gyoon Kim; Jeonggyu Huh
The Proof is in the Pudding. Revealing the SDGs with Artificial Intelligence By Jean-Baptiste JACOUTON; Régis MARODON; Adeline LAULANIE
Multiresolution Signal Processing of Financial Market Objects By Ioana Boier
Supply Chain Characteristics as Predictors of Cyber Risk: A Machine-Learning Assessment By Kevin Hu; Retsef Levi; Raphael Yahalom; El Ghali Zerhouni
Incorporating Interactive Facts for Stock Selection via Neural Recursive ODEs By Qiang Gao; Xinzhu Zhou; Kunpeng Zhang; Li Huang; Siyuan Liu; Fan Zhou
Predicting the State of Synchronization of Financial Time Series using Cross Recurrence Plots By Mostafa Shabani; Martin Magris; George Tzagkarakis; Juho Kanniainen; Alexandros Iosifidis
Intergenerational Mobility in the Land of Inequality By Diogo G. C. Britto; Alexandre Fonseca; Paolo Pinotti; Breno Sampaio; Lucas Warwar
Market proximity, resilience, and food security: A cross-country empirical analysis By Alessandra Garbero; Tulia Gattone; Marco Letta; Pierluigi Montalbano
The Heterogeneous Response of Real Estate Asset Prices to a Global Shock By Heinger, Sandro; Koeniger, Winfried; Lechner, Michael
Whatâ€™s that noise? Analysing sentiment-based variation in central bank communication By Bernd Hayo; Johannes Zahner
The Anatomy of Out-of-Sample Forecasting Accuracy By Daniel Borup; Philippe Goulet Coulombe; Erik Christian Montes Schütte; David E. Rapach; Sander Schwenk-Nebbe
Blowing against the Wind? A Narrative Approach to Central Bank Foreign Exchange Intervention By Naef, Alain
Measuring Gender Differences in Personalities through Natural Language in the Labor Force: Application of the 5-Factor Model By Dania Eugenidis; David Lenz
Analyzing the commentator network within the French YouTube environment By Kurt Maxwell Kusterer; Sylvain Mignot; Annick Vignes

Gender, Sex, and the Constraints of Machine Learning Methods

By:	Lockhart, Jeffrey W (University of Chicago)
Abstract:	Machine learning interacts with gender and sex in myriad ways, intentionally, unintentionally, and sometimes even against practitioner's concerted efforts. Some of these interactions are born out of the allure of a seemingly simple, unambiguous, binary, variable ideally aligned with the technical needs and sensibilities of ML. Most of the time, gender lurks in ML systems without any explicit invitation, simply because these systems mine data for associations, and gendered associations are ubiquitous. And in a growing body of work, scholars are using ML to actively interrogate gender and sexuality, in turn shaping what they mean and how we think about them. Machine learning brings with it new paradigms of quantitative reasoning which hold the potential to either reinscribe or revolutionize gender in not only technical systems, but scientific knowledge as well. Throughout, the key is for people in and around machine learning to pay close attention to what the technology is actually doing with gender and sex.
Date:	2022–11–03
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:zj468&r=big

Predicting Politicians Misconduct: Evidence From Colombia

By: Gallego, J; Prem, M; Vargas, J. F.

Keywords: Prediction, Corruption, Machine Learning, Colombia

Date: 2022–10–18

URL: http://d.repec.org/n?u=RePEc:col:000092:020504&r=big

Predicting Politicians' Misconduct: Evidence from Colombia

By:	Gallego, Jorge; Prem, Mounu; Vargas, Juan F.
Abstract:	Corruption has pervasive effects on economic development and the well-being of the population. Despite being crucial and necessary, fighting corruption is not an easy task because it is a difficult phenomenon to measure and detect. However, recent advances in the field of artificial intelligence may help in this quest. In this article, we propose the use of machine learning models to predict municipality-level corruption in a developing country. Using data from disciplinary prosecutions conducted by an anti-corruption agency in Colombia, we trained four canonical models (Random Forests, Gradient Boosting Machine, Lasso, and Neural Networks), and ensemble their predictions, to predict whether or not a mayor will commit acts of corruption. Our models achieve acceptable levels of performance, based on metrics such as the precision and the area under the ROC curve, demonstrating that these tools are useful in predicting where misbehavior is most likely to occur. Moreover, our feature-importance analysis shows us which groups of variables are most important upon predicting corruption.
Date:	2022–10–18
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:5dp8t&r=big

Modeling Machine Learning

By:	Andrew Caplin; Daniel J. Martin; Philip Marx
Abstract:	What do machines learn, and why? To answer these questions we import models of human cognition into machine learning. We propose two ways of modeling machine learners based on this join: feasibility-based and cost-based machine learning. We evaluate and estimate our models using a deep learning convolutional neural network that predicts pneumonia from chest X-rays. We find these predictions are consistent with our model of cost-based machine learning, and we recover the algorithm's implied costs of learning.
JEL:	C0 D80
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:30600&r=big

Mapping the Knowledge Space: Exploiting Unassisted Machine Learning Tools

By:	Florenta Teodoridis; Jino Lu; Jeffrey L. Furman
Abstract:	Understanding factors affecting the direction of innovation is a central aim of research in the economics of innovation. Progress on this topic has been inhibited by difficulties in measuring distance and movement in knowledge space. We describe a methodology that infers the mapping of the knowledge landscape based on text documents. The approach is based on an unassisted machine learning technique, Hierarchical Dirichlet Process (HDP), which flexibly identifies patterns in text corpora. The resulting mapping of the knowledge landscape enables calculations of distance and movement, measures that are valuable in several contexts for research in innovation. We benchmark and demonstrate the benefits of this approach in the context of 44 years of USPTO data.
JEL:	C55 C80 O3 O31 O32
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:30603&r=big

Recovering Missing Firm Characteristics with Attention-Based Machine Learning

By: Beckmeyer, Heiner; Wiedemann, Timo

JEL: G10

Date: 2022

URL: http://d.repec.org/n?u=RePEc:zbw:vfsc22:264135&r=big

Artificial Intelligence, the Evolution of the Healthcare Value Chain, and the Future of the Physician

By:	David Dranove; Craig Garthwaite
Abstract:	Artificial intelligence (AI) is transforming production across all sectors of the economy, with the potential to both complement and substitute for traditional labor inputs. Healthcare is no exception. Dozens of recent academic studies demonstrate that AI can contribute to the healthcare value chain, by improving both diagnostic accuracy and treatment recommendations. In these ways, AI may wither complement or substitute for physicians. We argue that AI represents the culmination of decades of efforts to enhance medical decision making. Using an historical lens that considers long-standing institutional features of healthcare markets, we identify numerous obstacles to the implementation of AI in medical care, and identify which specialties are most at risk for substitution by AI.
JEL:	I11 I19 O32 O38
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:30607&r=big

Measuring the environmental impacts of artificial intelligence compute and applications: The AI footprint

By:	OECD
Abstract:	Artificial intelligence (AI) systems can use massive computational resources, raising sustainability concerns. This report aims to improve understanding of the environmental impacts of AI, and help measure and decrease AI’s negative effects while enabling it to accelerate action for the good of the planet. It distinguishes between the direct environmental impacts of developing, using and disposing of AI systems and related equipment, and the indirect costs and benefits of using AI applications. It recommends the establishment of measurement standards, expanding data collection, identifying AI-specific impacts, looking beyond operational energy use and emissions, and improving transparency and equity to help policy makers make AI part of the solution to sustainability challenges.
Date:	2022–11–15
URL:	http://d.repec.org/n?u=RePEc:oec:stiaab:341-en&r=big

A Multivariate Analysis of Technology and Education in the 21st Century: Antecedents and Determinants

By:	Rocque, Sarvesh Raj
Abstract:	Globally, educational systems are undergoing a restructuring in which emerging technologies and information sciences will play a significant role. Education will undergo the most significant changes in over a century as a result of new technology and mobile devices with cutting-edge capabilities. As emerging technologies continue to advance, mobile learning methods are becoming increasingly popular. Literature reviews conducted by the author indicate that as this field of study continues to develop, more and more researchers are investigating how technology impacts learning, how it influences teaching methods, and how teachers are evaluated. Further, this paper examines the educational benefits of using technology to facilitate independent learning. Technology is helping to facilitate a fundamental rethinking of what should be taught and how it should be taught rather than serving as an adjunct to learning and teaching. An undertaking of this magnitude represents both an exciting opportunity and a serious responsibility. In an effort to meet this challenge, this article examines several key antecedents and determinants associated with education and technology. A number of terms have been integrated into the educational field of vision in recent years, including portal connectivity, artificial intelligence, big data, machine learning, mobile technologies, and intelligent learning patterns. Consequently, society and education have undergone unprecedented changes. Consequently, the use of technology in education is likely to follow a hockey stick pattern, according to the author's research. In simple terms, in light of the rapid development of technology in the field of education, the way in which knowledge is delivered and the capability to learn new things will undergo a great deal of change.
Keywords:	Technology, Information Technology, Integrated learning, Technology and Education, ICT- Enabled Education
JEL:	I21 I25 Q55
Date:	2022–10–16
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:115239&r=big

Asymptotic expansion and deep neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with nonlinear coefficients

By:	Akihiko Takahashi (University of Tokyo); Toshihiro Yamada (Hitotsubashi University, Japan Science and Technology Agency (JST))
Abstract:	This paper proposes a new spatial approximation method without the curse of dimensionality for solving high-dimensional partial differential equations (PDEs) by using an asymptotic expansion method with a deep learning-based algorithm. In particular, the mathematical justification on the spatial approximation is provided, and a numerical example for a 100 dimensional Kolmogorov PDE shows effectiveness of our method.
Date:	2022–11
URL:	http://d.repec.org/n?u=RePEc:cfi:fseres:cf546&r=big

Rating Triggers for Collateral-Inclusive XVA via Machine Learning and SDEs on Lie Groups

By:	Kevin Kamm; Michelle Muniz
Abstract:	In this paper, we model the rating process of an entity by using a geometrical approach. We model rating transitions as an SDE on a Lie group. Specifically, we focus on calibrating the model to both historical data (rating transition matrices) and market data (CDS quotes) and compare the most popular choices of changes of measure to switch from the historical probability to the risk-neutral one. For this, we show how the classical Girsanov theorem can be applied in the Lie group setting. Moreover, we overcome some of the imperfections of rating matrices published by rating agencies, which are computed with the cohort method, by using a novel Deep Learning approach. This leads to an improvement of the entire scheme and makes the model more robust for applications. We apply our model to compute bilateral credit and debit valuation adjustments of a netting set under a CSA with thresholds depending on ratings of the two parties.
Date:	2022–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2211.00326&r=big

Reservoir Computing for Macroeconomic Forecasting with Mixed Frequency Data

By:	Giovanni Ballarin; Petros Dellaportas; Lyudmila Grigoryeva; Marcel Hirt; Sophie van Huellen; Juan-Pablo Ortega
Abstract:	Macroeconomic forecasting has recently started embracing techniques that can deal with large-scale datasets and series with unequal release periods. The aim is to exploit the information contained in heterogeneous data sampled at different frequencies to improve forecasting exercises. Currently, MIxed-DAta Sampling (MIDAS) and Dynamic Factor Models (DFM) are the two main state-of-the-art approaches that allow modeling series with non-homogeneous frequencies. We introduce a new framework called the Multi-Frequency Echo State Network (MFESN), which originates from a relatively novel machine learning paradigm called reservoir computing (RC). Echo State Networks are recurrent neural networks with random weights and trainable readout. They are formulated as nonlinear state-space systems with random state coefficients where only the observation map is subject to estimation. This feature makes the estimation of MFESNs considerably more efficient than DFMs. In addition, the MFESN modeling framework allows to incorporate many series, as opposed to MIDAS models, which are prone to the curse of dimensionality. Our discussion encompasses hyperparameter tuning, penalization, and nonlinear multistep forecast computation. In passing, a new DFM aggregation scheme with Almon exponential structure is also presented, bridging MIDAS and dynamic factor models. All methods are compared in extensive multistep forecasting exercises targeting US GDP growth. We find that our ESN models achieve comparable or better performance than MIDAS and DFMs at a much lower computational cost.
Date:	2022–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2211.00363&r=big

Twitter and Crime: The Effect of Social Movements on GenderBased Violence

By:	Michele Battisti; Ilpo Kauppinen; Britta Rude
Abstract:	This paper asks whether social movements taking place on Twitter affect genderbased violence (GBV). Using Twitter data and machine learning methods, we construct a novel data set on the prevalence of Twitter conversations about GBV. We then link this data to weekly crime reports at the federal state level from the United States. We exploit the high-frequency nature of our data and an event study design to establish a causal impact of Twitter social movements on GBV. Our results point out that Twitter tweets related to GBV lead to a decrease in reported crime rates. The evidence shows that perpetrators commit these crimes less due to increased social pressure and perceived social costs. The results indicate that social media could significantly decrease reported GBV and might facilitate the signaling of social norms.
Keywords:	Economics of gender, US, domestic abuse, public policy, criminal law, illegal behavior and the enforcement of law
JEL:	J12 J16 J78 K14 K42 O51
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:ces:ifowps:_381&r=big

Deep neural network expressivity for optimal stopping problems

By:	Lukas Gonon
Abstract:	This article studies deep neural network expression rates for optimal stopping problems of discrete-time Markov processes on high-dimensional state spaces. A general framework is established in which the value function and continuation value of an optimal stopping problem can be approximated with error at most $\varepsilon$ by a deep ReLU neural network of size at most $\kappa d^{\mathfrak{q}} \varepsilon^{-\mathfrak{r}}$. The constants $\kappa,\mathfrak{q},\mathfrak{r} \geq 0$ do not depend on the dimension $d$ of the state space or the approximation accuracy $\varepsilon$. This proves that deep neural networks do not suffer from the curse of dimensionality when employed to solve optimal stopping problems. The framework covers, for example, exponential L\'evy models, discrete diffusion processes and their running minima and maxima. These results mathematically justify the use of deep neural networks for numerically solving optimal stopping problems and pricing American options in high dimensions.
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2210.10443&r=big

State-dependent asset allocation using neural networks

By:	Bradrania, Reza; Pirayesh Neghab, Davood
Abstract:	Changes in market conditions present challenges for investors as they cause performance to deviate from the ranges predicted by long-term averages of means and covariances. The aim of conditional asset allocation strategies is to overcome this issue by adjusting portfolio allocations to hedge changes in the investment opportunity set. This paper proposes a new approach to conditional asset allocation that is based on machine learning; it analyzes historical market states and asset returns and identifies the optimal portfolio choice in a new period when new observations become available. In this approach, we directly relate state variables to portfolio weights, rather than firstly modeling the return distribution and subsequently estimating the portfolio choice. The method captures nonlinearity among the state (predicting) variables and portfolio weights without assuming any particular distribution of returns and other data, without fitting a model with a fixed number of predicting variables to data and without estimating any parameters. The empirical results for a portfolio of stock and bond indices show the proposed approach generates a more efficient outcome compared to traditional methods and is robust in using different objective functions across different sample periods.
Keywords:	asset allocation; portfolio optimization; market state, machine learning; neural networks; performance ratio
JEL:	C1 C10 C15 C18 C53 C55 C58 G0 G1 G11 G12 G17
Date:	2021–02–01
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:115254&r=big

Estimation of Heterogeneous Treatment Effects Using a Conditional Moment Based Approach

By:	Xiaolin Sun
Abstract:	We propose a new estimator for heterogeneous treatment effects in a partially linear model (PLM) with many exogenous covariates and a possibly endogenous treatment variable. The PLM has a parametric part that includes the treatment and the interactions between the treatment and exogenous characteristics, and a nonparametric part that contains those characteristics and many other covariates. The new estimator is a combination of a Robinson transformation to partial out the nonparametric part of the model, the Smooth Minimum Distance (SMD) approach to exploit all the information of the conditional mean independence restriction, and a Neyman-Orthogonalized first-order condition (FOC). With the SMD method, our estimator using only one valid binary instrument identifies both parameters. With the sparsity assumption, using regularized machine learning methods (i.e., the Lasso method) allows us to choose a relatively small number of polynomials of covariates. The Neyman-Orthogonalized FOC reduces the effect of the bias associated with the regularization method on estimates of the parameters of interest. Our new estimator allows for many covariates and is less biased, consistent, and $\sqrt{n}$-asymptotically normal under standard regularity conditions. Our simulations show that our estimator behaves well with different sets of instruments, but the GMM type estimators do not. We estimate the heterogeneous treatment effects of Medicaid on individual outcome variables from the Oregon Health Insurance Experiment. We find using our new method with only one valid instrument produces more significant and more reliable results for heterogeneous treatment effects of health insurance programs on economic outcomes than using GMM type estimators.
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2210.15829&r=big

AI, Skill, and Productivity: The Case of Taxi Drivers

By:	Kyogo Kanazawa; Daiji Kawaguchi; Hitoshi Shigeoka; Yasutora Watanabe
Abstract:	We examine the impact of Artificial Intelligence (AI) on productivity in the context of taxi drivers. The AI we study assists drivers with finding customers by suggesting routes along which the demand is predicted to be high. We find that AI improves drivers’ productivity by shortening the cruising time, and such gain is accrued only to low-skilled drivers, narrowing the productivity gap between high- and low-skilled drivers by 14%. The result indicates that AI's impact on human labor is more nuanced and complex than a job displacement story, which was the primary focus of existing studies.
JEL:	J22 J24 L92 R41
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:30612&r=big

Newton Raphson Emulation Network for Highly Efficient Computation of Numerous Implied Volatilities

By:	Geon Lee; Tae-Kyoung Kim; Hyun-Gyoon Kim; Jeonggyu Huh
Abstract:	In finance, implied volatility is an important indicator that reflects the market situation immediately. Many practitioners estimate volatility using iteration methods, such as the Newton--Raphson (NR) method. However, if numerous implied volatilities must be computed frequently, the iteration methods easily reach the processing speed limit. Therefore, we emulate the NR method as a network using PyTorch, a well-known deep learning package, and optimize the network further using TensorRT, a package for optimizing deep learning models. Comparing the optimized emulation method with the NR function in SciPy, a popular implementation of the NR method, we demonstrate that the emulation network is up to 1,000 times faster than the benchmark function.
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2210.15969&r=big

The Proof is in the Pudding. Revealing the SDGs with Artificial Intelligence

By:	Jean-Baptiste JACOUTON; Régis MARODON; Adeline LAULANIE
Abstract:	The use of frontier technologies in the field of sustainability is likely to accompany its visibility, and the quality of information available to decision makers. This paper explores the possibility of using artificial intelligence to analyze Public Development Banks’ annual reports.
JEL:	Q
Date:	2022–10–05
URL:	http://d.repec.org/n?u=RePEc:avg:wpaper:en14520&r=big

Multiresolution Signal Processing of Financial Market Objects

By:	Ioana Boier
Abstract:	Financial markets are among the most complex entities in our environment, yet mainstream quantitative models operate at predetermined scale, rely on linear correlation measures, and struggle to recognize non-linear or causal structures. In this paper, we combine neural networks known to capture non-linear associations with a multiscale decomposition approach to facilitate a better understanding of financial market data substructures. Quantization keeps our decompositions calibrated to market at every scale. We illustrate our approach in the context of a wide spectrum of applications.
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2210.15934&r=big

Supply Chain Characteristics as Predictors of Cyber Risk: A Machine-Learning Assessment

By:	Kevin Hu (Massachusetts Institute of Technology); Retsef Levi (Massachusetts Institute of Technology); Raphael Yahalom (Massachusetts Institute of Technology); El Ghali Zerhouni (Massachusetts Institute of Technology)
Abstract:	This paper provides the first large-scale data-driven analysis to evaluate the predictive power of different attributes for assessing risk of cyberattack data breaches. Furthermore, motivated by rapid increase in third party enabled cyberattacks, the paper provides the first quantitative empirical evidence that digital supply-chain attributes are significant predictors of enterprise cyber risk. The paper leverages outside-in cyber risk scores that aim to capture the quality of the enterprise internal cybersecurity management, but augment these with supply chain features that are inspired by observed third party cyberattack scenarios, as well as concepts from network science research. The main quantitative result of the paper is to show that supply chain network features add significant detection power to predicting enterprise cyber risk, relative to merely using enterprise-only attributes. Particularly, compared to a base model that relies only on internal enterprise features, the supply chain network features improve the out-of-sample AUC by 2.3\%. Given that each cyber data breach is a low probability high impact risk event, these improvements in the prediction power have significant value. Additionally, the model highlights several cybersecurity risk drivers related to third party cyberattack and breach mechanisms and provides important insights as to what interventions might be effective to mitigate these risks.
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2210.15785&r=big

Incorporating Interactive Facts for Stock Selection via Neural Recursive ODEs

By:	Qiang Gao; Xinzhu Zhou; Kunpeng Zhang; Li Huang; Siyuan Liu; Fan Zhou
Abstract:	Stock selection attempts to rank a list of stocks for optimizing investment decision making, aiming at minimizing investment risks while maximizing profit returns. Recently, researchers have developed various (recurrent) neural network-based methods to tackle this problem. Without exceptions, they primarily leverage historical market volatility to enhance the selection performance. However, these approaches greatly rely on discrete sampled market observations, which either fail to consider the uncertainty of stock fluctuations or predict continuous stock dynamics in the future. Besides, some studies have considered the explicit stock interdependence derived from multiple domains (e.g., industry and shareholder). Nevertheless, the implicit cross-dependencies among different domains are under-explored. To address such limitations, we present a novel stock selection solution -- StockODE, a latent variable model with Gaussian prior. Specifically, we devise a Movement Trend Correlation module to expose the time-varying relationships regarding stock movements. We design Neural Recursive Ordinary Differential Equation Networks (NRODEs) to capture the temporal evolution of stock volatility in a continuous dynamic manner. Moreover, we build a hierarchical hypergraph to incorporate the domain-aware dependencies among the stocks. Experiments conducted on two real-world stock market datasets demonstrate that StockODE significantly outperforms several baselines, such as up to 18.57% average improvement regarding Sharpe Ratio.
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2210.15925&r=big

Predicting the State of Synchronization of Financial Time Series using Cross Recurrence Plots

By:	Mostafa Shabani; Martin Magris; George Tzagkarakis; Juho Kanniainen; Alexandros Iosifidis
Abstract:	Cross-correlation analysis is a powerful tool for understanding the mutual dynamics of time series. This study introduces a new method for predicting the future state of synchronization of the dynamics of two financial time series. To this end, we use the cross-recurrence plot analysis as a nonlinear method for quantifying the multidimensional coupling in the time domain of two time series and for determining their state of synchronization. We adopt a deep learning framework for methodologically addressing the prediction of the synchronization state based on features extracted from dynamically sub-sampled cross-recurrence plots. We provide extensive experiments on several stocks, major constituents of the S\&P100 index, to empirically validate our approach. We find that the task of predicting the state of synchronization of two time series is in general rather difficult, but for certain pairs of stocks attainable with very satisfactory performance.
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2210.14605&r=big

Intergenerational Mobility in the Land of Inequality

By:	Diogo G. C. Britto; Alexandre Fonseca; Paolo Pinotti; Breno Sampaio; Lucas Warwar
Abstract:	We provide the first estimates of intergenerational income mobility for a developing country, namely Brazil. We measure formal income from tax and employment registries, and we train machine learning models on census and survey data to predict informal income. The data reveal a much higher degree of persistence than previous estimates available for developed economies: a 10 percentile increase in parental income rank is associated with a 5.5 percentile increase in child income rank, and persistence is even higher in the top 5%. Children born to parents in the first income quintile face a 46% chance of remaining at the bottom when adults. We validate these estimates using two novel mobility measures that rank children and parents without the need to impute informal income. We document substantial heterogeneity in mobility across individual characteristics - notably gender and race - and across Brazilian regions. Leveraging children who migrate at different ages, we estimate that causal place effects explain 57% of the large spatial variation in mobility. Finally, assortative mating plays a strong role in household income persistence, and parental income is also strongly associated with several key long-term outcomes such as education, teenage pregnancy, occupation, mortality, and victimization.
Keywords:	intergenerational mobility, inequality, Brazil, migration, place effects
JEL:	J62 D31 I31 R23
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_10004&r=big

Market proximity, resilience, and food security: A cross-country empirical analysis

By:	Alessandra Garbero (International Fund for Agricultural Development); Tulia Gattone (Department of Social Sciences and Economics, Sapienza University of Rome); Marco Letta (Department of Social Sciences and Economics, Sapienza University of Rome); Pierluigi Montalbano (Department of Social Sciences and Economics, Sapienza University of Rome)
Abstract:	Scholars advocate that proximity to final markets increases food security, but empirical evidence is scarce. We shed light on this issue by applying a hybrid empirical approach – which combines machine learning algorithms, vulnerability models and mediation analysis – to a new cross-country household dataset made available by the International Fund for Agricultural Development in 2017-2018. Specifically, we find positive and statistically significant associations among proximity to markets, resilience and food security. We tested the plausibility of the exclusion restriction that market proximity does not affect food security fluctuations other than through its impact on resilience capacity by implementing an instrumental variable approach and a mediation analysis. The latter method reveals that market proximity accounts for a significant share of the positive correlation between household resilience and food security outcomes. The dampening role played by market proximity in decreasing welfare fluctuations is also confirmed when replacing food security outcomes with income ones. Overall, these findings suggest that policymakers should prioritize interventions to improve infrastructure and access to markets as a means to boost household resilience and, in turn, decrease welfare fluctuations and vulnerability to food insecurity.
Keywords:	rural development, market chain, vulnerability, resilience, food security
JEL:	Q12 O12 C31 C3
Date:	2022–11
URL:	http://d.repec.org/n?u=RePEc:saq:wpaper:9/22&r=big

The Heterogeneous Response of Real Estate Asset Prices to a Global Shock

By:	Heinger, Sandro; Koeniger, Winfried; Lechner, Michael
Abstract:	We estimate the transmission of the pandemic shock in 2020 to prices in the residential and commercial real estate market by causal machine learning, using new granular data at the municipal level for Germany. We exploit differences in the incidence of Covid infections or short-time work at the municipal level for identification. In contrast to evidence for other countries, we find that the pandemic had only temporary negative effects on rents for some real estate types and increased asset prices of real estate particularly in the top price segment of commercial real estate.
Keywords:	Real estate, Asset prices, Rents, Covid pandemic, Short-time work, Affordability crisis
JEL:	E21 E22 G12 G51 R21 R31
Date:	2022–11
URL:	http://d.repec.org/n?u=RePEc:usg:econwp:2022:14&r=big

Whatâ€™s that noise? Analysing sentiment-based variation in central bank communication

By:	Bernd Hayo (Marburg University); Johannes Zahner (Marburg University)
Abstract:	To which degree can variation in sentiment-based indicators of central bank communication be attributed to changes in macroeconomic, financial, and monetary variables; idiosyncratic speaker effects; sentiment persistence; and random â€˜noiseâ€™? Using the Loughran and McDonald (2011) dictionary on a text corpus containing more than 10,000 speeches and press statements, we construct sentiment-based indicators for the ECB and the Fed. An analysis of variance (ANOVA) shows that sentiment is strongly persistent and influenced by speaker-specific effects. With about 80% of the variation in sentiment being due to noise, our findings cast doubt on the reliability of conclusions based on variation in dictionary-based indicators.
Keywords:	Sentiment index, monetary policy, central banks, Loughran and McDonald (2011) dictionary, information content of sentiment indices
JEL:	C55 E58 E61 Z13
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:mar:magkse:202241&r=big

The Anatomy of Out-of-Sample Forecasting Accuracy

By:	Daniel Borup; Philippe Goulet Coulombe; Erik Christian Montes Schütte; David E. Rapach; Sander Schwenk-Nebbe
Abstract:	We develop metrics based on Shapley values for interpreting time-series forecasting models, including “black-box” models from machine learning. Our metrics are model agnostic, so that they are applicable to any model (linear or nonlinear, parametric or nonparametric). Two of the metrics, iShapley-VI and oShapley-VI, measure the importance of individual predictors in fitted models for explaining the in-sample and out-of-sample predicted target values, respectively. The third metric is the performance-based Shapley value (PBSV), our main methodological contribution. PBSV measures the contributions of individual predictors in fitted models to the out-of-sample loss and thereby anatomizes out-of-sample forecasting accuracy. In an empirical application forecasting US inflation, we find important discrepancies between individual predictor relevance according to the in-sample iShapley-VI and out-of-sample PBSV. We use simulations to analyze potential sources of the discrepancies, including overfitting, structural breaks, and evolving predictor volatilities.
Keywords:	variable importance; out-of-sample performance; Shapley value; loss function; machine learning; inflation
JEL:	C22 C45 C53 E37 G17
Date:	2022–11–07
URL:	http://d.repec.org/n?u=RePEc:fip:fedawp:94993&r=big

Blowing against the Wind? A Narrative Approach to Central Bank Foreign Exchange Intervention

By:	Naef, Alain
Abstract:	Most countries in the world use foreign exchange intervention, but measuring the success of the policy is difficult. By using a narrative approach, I identify interventions when the central bank manages to reverse the exchange rate based on pure luck. I separate them from interventions when the central bank actually impacted the exchange rate. Because intervention records are daily aggregates, an intervention might appear to have changed the direction of the exchange rate, when it is more likely to have been caused by market news. This analysis allows to have a better understanding of how successful central bank operations really are. I use new daily data on Bank of England interventions in the 1980s and 1990s. Some studies find that interventions work in up to 80% of cases. Yet, by accounting for intraday market moving news, I find in adverse conditions, the Bank of England managed to influence the exchange rate only in 8% of cases. I use natural language processing to confirm the validity of the narrative approach. Using lasso and a VAR analysis, I investigate what makes the Bank of England intervene. I find that only movement on the Deutschmark and not US dollar exchange rate made the Bank intervene. Also, I find that interest rate hikes were mostly a tool for currency management and accompanied by large reserve sales.
Date:	2022–10–08
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:u59gc&r=big

Measuring Gender Differences in Personalities through Natural Language in the Labor Force: Application of the 5-Factor Model

By:	Dania Eugenidis (Justus Liebig University Giessen); David Lenz (Justus Liebig University Giessen)
Abstract:	Gender stereotypes still play a major role in the perception and representation of people in the workplace. Measuring the effects of those stereotypes quantitatively is very hard though. Traditional methods, such as questionnaires, struggle to provide the full picture, for example through misunderstanding, omission or incorrect answering of questions. However, evidence-based policy making requires accurate indicators of gender inequalities to promote equality. We present a framework measuring gender stereotypes on company level using publicly available big data. Specifically, we analyse the one million websites of all German companies using natural language processing with regard to differences in their portrayal of genders through the use of certain terms. We then contextualize the gender stereotype measures following the personality traits of the Five Factor Model and their sublevels. Statistical analysis of the results indicates significant stereotypes within personality traits for large portions of the sample. The qualitative differences in gender presentation are mostly consistent with those found in the literature, which serves as a validation for the presented framework. The presented approach complements traditional quantitative measurement techniques by capturing a mainly latent level of inequality. The fully automated and comprehensive analysis of the linguistic portrayal of gender stereotypes in a corporate context is at low cost, with little delay and at a granular basis.
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:mar:magkse:202240&r=big

Analyzing the commentator network within the French YouTube environment

By:	Kurt Maxwell Kusterer (LISIS - Laboratoire Interdisciplinaire Sciences, Innovations, Sociétés - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement - Université Gustave Eiffel); Sylvain Mignot (UCL - Université catholique de Lille); Annick Vignes (CAMS - Centre d'Analyse et de Mathématique sociales - EHESS - École des hautes études en sciences sociales - CNRS - Centre National de la Recherche Scientifique)
Abstract:	YouTube is the largest video hosting platform. The site has emerged in 2005 and has achieved a continuous pattern of growth since its conception (Burgess & Green 2018). A high number of creators, viewers, subscribers and commentators act in this specific ecosystem which generates a huge amount of money. In this article, YouTube is considered as a bilateral network between the videos and the commentators. Analyzing a detailed data set focused on French YouTubers, we consider each comment as a link between a commentator and a video. The main objective of this paper is to understand the determinants of the creation of these links. This is to say, what can explain the choice of an agent to comment a specific video instead of another one, taking into account characteristics of commentators, videos, topics, channels as well as recommendations. This work is different from the classic NLP studies, using text mining techniques to analyze the contents of the comments and the kind of information they diffuse.
Keywords:	Youtube ecosystem,Behavioral analysis,Network analysis of Web links
Date:	2022–11–08
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-03799185&r=big

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.

By:	Gallego, J; Prem, M; Vargas, J. F.
Keywords:	Prediction, Corruption, Machine Learning, Colombia
Date:	2022–10–18
URL:	http://d.repec.org/n?u=RePEc:col:000092:020504&r=big

By:	Beckmeyer, Heiner; Wiedemann, Timo
JEL:	G10
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:zbw:vfsc22:264135&r=big