nep-cmp New Economics Papers
on Computational Economics
Issue of 2021‒03‒15
thirty-six papers chosen by



  1. Understanding the performance of machine learning models to predict credit default: a novel approach for supervisory evaluation By Andrés Alonso; José Manuel Carbó
  2. Forex exchange rate forecasting using deep recurrent neural networks By Dautel, Alexander Jakob; Härdle, Wolfgang Karl; Lessmann, Stefan; Seow, Hsin-Vonn
  3. DeepSets and their derivative networks for solving symmetric PDEs By Maximilien Germain; Mathieu Lauri\`ere; Huy\^en Pham; Xavier Warin
  4. Panel semiparametric quantile regression neural network for electricity consumption forecasting By Xingcai Zhou; Jiangyan Wang
  5. History-Augmented Collaborative Filtering for Financial Recommendations By Baptiste Barreau; Laurent Carlier
  6. Standing on the Shoulders of Machine Learning: Can We Improve Hypothesis Testing? By Gary Cornwall; Jeff Chen; Beau Sauley
  7. The LOB Recreation Model: Predicting the Limit Order Book from TAQ History Using an Ordinary Differential Equation Recurrent Neural Network By Zijian Shi; Yu Chen; John Cartlidge
  8. Matching supply and demand of electricity network-supportive flexibility: A case study with three comprehensible matching algorithms By Erik Heilmann; Andreas Zeiselmair; Thomas Estermann
  9. A Simple but Powerful Simulated Certainty Equivalent Approximation Method for Dynamic Stochastic Problems By Yongyang Cai; Kenneth L. Judd
  10. Explainable AI in Credit Risk Management By Branka Hadji Misheva; Joerg Osterrieder; Ali Hirsa; Onkar Kulkarni; Stephen Fung Lin
  11. Slow-Growing Trees By Philippe Goulet Coulombe
  12. Portfolio Construction as Linearly Constrained Separable Optimization By Nicholas Moehle; Jack Gindi; Stephen Boyd; Mykel Kochenderfer
  13. Neighborhood Effects and Housing Vouchers By Morris A. Davis; Jesse M. Gregory; Daniel A. Hartley; Kegon T.K. Tan
  14. Gender distribution across topics in Top 5 economics journals: A machine learning approach By J. Ignacio Conde-Ruiz; Juan José Ganuza; Manu Garcia; Luis A. Puch
  15. Thinking outside the container: A machine learning approach to forecasting trade flows By Stamer, Vincent
  16. Trading Signals In VIX Futures By M. Avellaneda; T. N. Li; A. Papanicolaou; G. Wang
  17. An Agent-Based Modelling Approach to Brain Drain By Furkan G\"ursoy; Bertan Badur
  18. Service Data Analytics and Business Intelligence By Wu, Desheng Dang; Härdle, Wolfgang Karl
  19. Disambiguation by namesake risk assessment By Doherr, Thorsten
  20. Fairness in Credit Scoring: Assessment, Implementation and Profit Implications By Nikita Kozodoi; Johannes Jacob; Stefan Lessmann
  21. No-Transaction Band Network: A Neural Network Architecture for Efficient Deep Hedging By Shota Imaki; Kentaro Imajo; Katsuya Ito; Kentaro Minami; Kei Nakagawa
  22. Can Machine Learning Catch the COVID-19 Recession? By Philippe Goulet Coulombe; Massimiliano Marcellino; Dalibor Stevanovic
  23. Confronting Machine Learning With Financial Research By Kristof Lommers; Ouns El Harzli; Jack Kim
  24. Time Matters: Exploring the Effects of Urgency and Reaction Speed in Automated Traders By Henry Hanifan; Ben Watson; John Cartlidge; Dave Cliff
  25. A mathematical model for automatic differentiation in machine learning By Bolte, Jérôme; Pauwels, Edouard
  26. Tail-risk protection: Machine Learning meets modern Econometrics By Spilak, Bruno; Härdle, Wolfgang Karl
  27. Forecasting the Stability and Growth Pact compliance using Machine Learning. By Kéa Baret; Amélie Barbier-Gauchard; Théophilos Papadimitriou
  28. Big data and machine learning in central banking By Sebastian Doerr; Leonardo Gambacorta; José María Serena Garralda
  29. Return on Investment on AI: The Case of Capital Requirement By Henri Fraisse; Matthias Laporte
  30. Prediction of Attrition in Large Longitudinal Studies: Tree-based methods versus Multinomial Logistic Models By Best, Katherine Laura; Speyer, Lydia Gabriela; Murray, Aja Louise; Ushakova, Anastasia
  31. Answering the Queen: Machine learning and financial crises By Jérémy Fouliard; Michael Howell; Hélène Rey
  32. A Machine Learning Based Regulatory Risk Index for Cryptocurrencies By Ni, Xinwen; Härdle, Wolfgang Karl; Xie, Taojun
  33. Using Machine Learning for Measuring Democracy: An Update By Klaus Gründler; Tommy Krieger
  34. On the Subbagging Estimation for Massive Data By Tao Zou; Xian Li; Xuan Liang; Hansheng Wang
  35. Make it or Break it: Vaccination Intention at the Time of Covid-19 By Jacques Bughin; Michele Cincera; Kelly Peters; Dorota Reykowska; Marcin Zyszkiewicz; Rafal Ohme
  36. The European venture capital landscape: An EIF perspective. Volume VI: The impact of VC on the exit and innovation outcomes of EIF-backed start-ups By Pavlova, Elitsa; Signore, Simone

  1. By: Andrés Alonso (Banco de España); José Manuel Carbó (Banco de España)
    Abstract: In this paper we study the performance of several machine learning (ML) models for credit default prediction. We do so by using a unique and anonymized database from a major Spanish bank. We compare the statistical performance of a simple and traditionally used model like the Logistic Regression (Logit), with more advanced ones like Lasso penalized logistic regression, Classification And Regression Tree (CART), Random Forest, XGBoost and Deep Neural Networks. Following the process deployed for the supervisory validation of Internal Rating-Based (IRB) systems, we examine the benefits of using ML in terms of predictive power, both in classification and calibration. Running a simulation exercise for different sample sizes and number of features we are able to isolate the information advantage associated to the access to big amounts of data, and measure the ML model advantage. Despite the fact that ML models outperforms Logit both in classification and in calibration, more complex ML algorithms do not necessarily predict better. We then translate this statistical performance into economic impact. We do so by estimating the savings in regulatory capital when using ML models instead of a simpler model like Lasso to compute the risk-weighted assets. Our benchmark results show that implementing XGBoost could yield savings from 12.4% to 17% in terms of regulatory capital requirements under the IRB approach. This leads us to conclude that the potential benefits in economic terms for the institutions would be significant and this justify further research to better understand all the risks embedded in ML models.
    Keywords: machine learning, credit risk, prediction, probability of default, IRB system
    JEL: C45 C38 G21
    Date: 2021–01
    URL: http://d.repec.org/n?u=RePEc:bde:wpaper:2105&r=all
  2. By: Dautel, Alexander Jakob; Härdle, Wolfgang Karl; Lessmann, Stefan; Seow, Hsin-Vonn
    Abstract: Deep learning has substantially advanced the state of the art in computer vision, natural language processing, and other fields. The paper examines the potential of deep learning for exchange rate forecasting. We systematically compare long short- term memory networks and gated recurrent units to traditional recurrent network architectures as well as feedforward networks in terms of their directional forecasting accuracy and the profitability of trading model predictions. Empirical results indicate the suitability of deep networks for exchange rate forecasting in general but also evidence the difficulty of implementing and tuning corresponding architectures. Especially with regard to trading profit, a simpler neural network may perform as well as if not better than a more complex deep neural network.
    Keywords: Deep learning,Financial time series forecasting,Recurrent neural networks,Foreign exchange rates
    JEL: C14 C22 C45
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:zbw:irtgdp:2020006&r=all
  3. By: Maximilien Germain (LPSM); Mathieu Lauri\`ere (LPSM); Huy\^en Pham (LPSM); Xavier Warin
    Abstract: Machine learning methods for solving nonlinear partial differential equations (PDEs) are hot topical issues, and different algorithms proposed in the literature show efficient numerical approximation in high dimension. In this paper, we introduce a class of PDEs that are invariant to permutations, and called symmetric PDEs. Such problems are widespread, ranging from cosmology to quantum mechanics, and option pricing/hedging in multi-asset market with exchangeable payoff. Our main application comes actually from the particles approximation of mean-field control problems. We design deep learning algorithms based on certain types of neural networks, named PointNet and DeepSet (and their associated derivative networks), for computing simultaneously an approximation of the solution and its gradient to symmetric PDEs. We illustrate the performance and accuracy of the PointNet/DeepSet networks compared to classical feedforward ones, and provide several numerical results of our algorithm for the examples of a mean-field systemic risk, mean-variance problem and a min/max linear quadratic McKean-Vlasov control problem.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.00838&r=all
  4. By: Xingcai Zhou; Jiangyan Wang
    Abstract: China has made great achievements in electric power industry during the long-term deepening of reform and opening up. However, the complex regional economic, social and natural conditions, electricity resources are not evenly distributed, which accounts for the electricity deficiency in some regions of China. It is desirable to develop a robust electricity forecasting model. Motivated by which, we propose a Panel Semiparametric Quantile Regression Neural Network (PSQRNN) by utilizing the artificial neural network and semiparametric quantile regression. The PSQRNN can explore a potential linear and nonlinear relationships among the variables, interpret the unobserved provincial heterogeneity, and maintain the interpretability of parametric models simultaneously. And the PSQRNN is trained by combining the penalized quantile regression with LASSO, ridge regression and backpropagation algorithm. To evaluate the prediction accuracy, an empirical analysis is conducted to analyze the provincial electricity consumption from 1999 to 2018 in China based on three scenarios. From which, one finds that the PSQRNN model performs better for electricity consumption forecasting by considering the economic and climatic factors. Finally, the provincial electricity consumptions of the next $5$ years (2019-2023) in China are reported by forecasting.
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.00711&r=all
  5. By: Baptiste Barreau (MICS - Mathématiques et Informatique pour la Complexité et les Systèmes - CentraleSupélec - Université Paris-Saclay, BNPP CIB GM Lab - BNP Paribas CIB Global Markets Data & AI Lab); Laurent Carlier (BNPP CIB GM Lab - BNP Paribas CIB Global Markets Data & AI Lab)
    Abstract: In many businesses, and particularly in finance, the behavior of a client might drastically change over time. It is consequently crucial for recommender systems used in such environments to be able to adapt to these changes. In this study, we propose a novel collaborative filtering algorithm that captures the temporal context of a user-item interaction through the users' and items' recent interaction histories to provide dynamic recommendations. The algorithm, designed with issues specific to the financial world in mind, uses a custom neural network architecture that tackles the non-stationarity of users' and items' behaviors. The performance and properties of the algorithm are monitored in a series of experiments on a G10 bond request for quotation proprietary database from BNP Paribas Corporate and Institutional Banking.
    Keywords: matrix factorization,collaborative filtering,context-aware,time,neural networks
    Date: 2020–09
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-03144669&r=all
  6. By: Gary Cornwall; Jeff Chen; Beau Sauley
    Abstract: In this paper we have updated the hypothesis testing framework by drawing upon modern computational power and classification models from machine learning. We show that a simple classification algorithm such as a boosted decision stump can be used to fully recover the full size-power trade-off for any single test statistic. This recovery implies an equivalence, under certain conditions, between the basic building block of modern machine learning and hypothesis testing. Second, we show that more complex algorithms such as the random forest and gradient boosted machine can serve as mapping functions in place of the traditional null distribution. This allows for multiple test statistics and other information to be evaluated simultaneously and thus form a pseudo-composite hypothesis test. Moreover, we show how practitioners can make explicit the relative costs of Type I and Type II errors to contextualize the test into a specific decision framework. To illustrate this approach we revisit the case of testing for unit roots, a difficult problem in time series econometrics for which existing tests are known to exhibit low power. Using a simulation framework common to the literature we show that this approach can improve upon overall accuracy of the traditional unit root test(s) by seventeen percentage points, and the sensitivity by thirty six percentage points.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.01368&r=all
  7. By: Zijian Shi; Yu Chen; John Cartlidge
    Abstract: In an order-driven financial market, the price of a financial asset is discovered through the interaction of orders - requests to buy or sell at a particular price - that are posted to the public limit order book (LOB). Therefore, LOB data is extremely valuable for modelling market dynamics. However, LOB data is not freely accessible, which poses a challenge to market participants and researchers wishing to exploit this information. Fortunately, trades and quotes (TAQ) data - orders arriving at the top of the LOB, and trades executing in the market - are more readily available. In this paper, we present the LOB recreation model, a first attempt from a deep learning perspective to recreate the top five price levels of the LOB for small-tick stocks using only TAQ data. Volumes of orders sitting deep in the LOB are predicted by combining outputs from: (1) a history compiler that uses a Gated Recurrent Unit (GRU) module to selectively compile prediction relevant quote history; (2) a market events simulator, which uses an Ordinary Differential Equation Recurrent Neural Network (ODE-RNN) to simulate the accumulation of net order arrivals; and (3) a weighting scheme to adaptively combine the predictions generated by (1) and (2). By the paradigm of transfer learning, the source model trained on one stock can be fine-tuned to enable application to other financial assets of the same class with much lower demand on additional data. Comprehensive experiments conducted on two real world intraday LOB datasets demonstrate that the proposed model can efficiently recreate the LOB with high accuracy using only TAQ data as input.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.01670&r=all
  8. By: Erik Heilmann (University of Kassel); Andreas Zeiselmair (Technical University of Munich); Thomas Estermann (Technical University of Munich)
    Abstract: Due to an ongoing energy transition, electricity networks are increasingly challenged by situations where local electrical power demands are high but local generation is low and vice versa. This finally leads to a growing number of technical problems. To solve these problems in the short-term, the electrical power of load and generation must be adjusted as available flexibility. In zonal electricity systems, one often discussed concept to utilize flexibility is local flexibility markets. Based on auction theory, we provide a comprehensible framework for the use of network-supportive flexibility in general. In this context, we discuss the problem of matching supply and demand. We introduce three matching approaches that can be applied and adapted for different network situations. In addition to a qualitative description of the three approaches, we present a case study of an exemplary distribution network and explore different scenarios to demonstrate the utility of the algorithms. We compare the three approaches on a qualitative level with quantitative inputs from the case study. The comparison considers the specific cost, flexible energy, ensured demand coverage, data minimization, computational effort and the transferability of the three approaches.
    Keywords: local flexibility markets, matching, multi-dimensional winner determination, electricity network operation
    JEL: D44 L94 Q41
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:mar:magkse:202110&r=all
  9. By: Yongyang Cai; Kenneth L. Judd
    Abstract: We introduce a novel simulated certainty equivalent approximation (SCEQ) method for solving dynamic stochastic problems. Our examples show that this method only requires a desktop computer to solve high-dimensional finite- or infinite-horizon, stationary or nonstationary dynamic stochastic problems with hundreds of state variables, a wide state space, and occasionally binding constraints. The SCEQ method is simple, stable, and efficient, which makes it suitable for solving complex economic problems that cannot be solved by other algorithms.
    JEL: C61 C63 C68 E31 E52 Q54 Q58
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:28502&r=all
  10. By: Branka Hadji Misheva; Joerg Osterrieder; Ali Hirsa; Onkar Kulkarni; Stephen Fung Lin
    Abstract: Artificial Intelligence (AI) has created the single biggest technology revolution the world has ever seen. For the finance sector, it provides great opportunities to enhance customer experience, democratize financial services, ensure consumer protection and significantly improve risk management. While it is easier than ever to run state-of-the-art machine learning models, designing and implementing systems that support real-world finance applications have been challenging. In large part because they lack transparency and explainability which are important factors in establishing reliable technology and the research on this topic with a specific focus on applications in credit risk management. In this paper, we implement two advanced post-hoc model agnostic explainability techniques called Local Interpretable Model Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) to machine learning (ML)-based credit scoring models applied to the open-access data set offered by the US-based P2P Lending Platform, Lending Club. Specifically, we use LIME to explain instances locally and SHAP to get both local and global explanations. We discuss the results in detail and present multiple comparison scenarios by using various kernels available for explaining graphs generated using SHAP values. We also discuss the practical challenges associated with the implementation of these state-of-art eXplainabale AI (XAI) methods and document them for future reference. We have made an effort to document every technical aspect of this research, while at the same time providing a general summary of the conclusions.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.00949&r=all
  11. By: Philippe Goulet Coulombe
    Abstract: Random Forest's performance can be matched by a single slow-growing tree (SGT), which uses a learning rate to tame CART's greedy algorithm. SGT exploits the view that CART is an extreme case of an iterative weighted least square procedure. Moreover, a unifying view of Boosted Trees (BT) and Random Forests (RF) is presented. Greedy ML algorithms' outcomes can be improved using either "slow learning" or diversification. SGT applies the former to estimate a single deep tree, and Booging (bagging stochastic BT with a high learning rate) uses the latter with additive shallow trees. The performance of this tree ensemble quaternity (Booging, BT, SGT, RF) is assessed on simulated and real regression tasks.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.01926&r=all
  12. By: Nicholas Moehle; Jack Gindi; Stephen Boyd; Mykel Kochenderfer
    Abstract: Mean-variance portfolio optimization problems often involve separable nonconvex terms, including penalties on capital gains, integer share constraints, and minimum position and trade sizes. We propose a heuristic algorithm for this problem based on the alternating direction method of multipliers (ADMM). This method allows for solve times in tens to hundreds of milliseconds with around 1000 securities and 100 risk factors. We also obtain a bound on the achievable performance. Our heuristic and bound are both derived from similar results for other optimization problems with a separable objective and affine equality constraints. We discuss a concrete implementation in the case where the separable terms in the objective are piecewise-quadratic, and we demonstrate their effectiveness empirically in realistic tax-aware portfolio construction problems.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.05455&r=all
  13. By: Morris A. Davis; Jesse M. Gregory; Daniel A. Hartley; Kegon T.K. Tan
    Abstract: Researchers and policy-makers have explored the possibility of restricting the use of housing vouchers to neighborhoods that may positively affect the outcomes of children. Using the framework of a dynamic model of optimal location choice, we estimate preferences over neighborhoods of likely recipients of housing vouchers in Los Angeles. We combine simulations of the model with estimates of how locations affect adult earnings of children to understand how a voucher policy that restricts neighborhoods in which voucher-recipients may live affects both the location decisions of households and the adult earnings of children. We show the model can nearly replicate the impact of the Moving to Opportunity experiment on the adult wages of children. Simulations suggest a policy that restricts housing vouchers to the top 20% of neighborhoods maximizes expected aggregate adult earnings of children of households offered these vouchers.
    JEL: I24 I31 I38 J13 R23 R38
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:28508&r=all
  14. By: J. Ignacio Conde-Ruiz; Juan José Ganuza; Manu Garcia; Luis A. Puch
    Abstract: We analyze all the articles published in Top 5 economic journals between 2002 and 2019 in order to find gender differences in their research approach. Using an unsuper vised machine learning algorithm (Structural Topic Model) developed by Roberts et al. (2019) we characterize jointly the set of latent topics that best fits our data (the set of abstracts) and how the documents/abstracts are allocated in each latent topic. This latent topics are mixtures over words were each word has a probability of belonging to a topic after controlling by year and journal. This latent topics may capture research fields but also other more subtle characteristics related to the way in which the articles are written. We find that females are uneven distributed along these latent topics by using only data driven methods. The differences about gender research approaches we found in this paper, are "automatically" generated given the research articles, without an arbitrary allocation to particular categories (as JEL codes, or research areas).
    Keywords: machine learning, structural topic model, gender, research fields
    JEL: I20 J16
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:upf:upfgen:1771&r=all
  15. By: Stamer, Vincent
    Abstract: Global container ship movements may reliably predict global trade flows. Aggregating both movements at sea and port call events produces a wealth of explanatory variables. The machine learning algorithm partial least squares can map these explanatory time series to unilateral imports and exports, as well as bilateral trade flows. Applying out-of-sample and time series methods on monthly trade data of 75 countries, this paper shows that the new shipping indicator outperforms benchmark models for the vast majority of countries. This holds true for predictions for the current and subsequent month even if one limits the analysis to data during the first half of the month. This makes the indicator available at least as early as other leading indicators.
    Keywords: Trade,Forecasting,Machine Learning,Container Shipping
    JEL: F17 C53
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:zbw:ifwkwp:2179&r=all
  16. By: M. Avellaneda; T. N. Li; A. Papanicolaou; G. Wang
    Abstract: We propose a new approach for trading VIX futures. We assume that the term structure of VIX futures follows a Markov model. The trading strategy selects a multi-tenor position by maximizing the expected utility for a day-ahead horizon given the current shape and level of the VIX futures term structure. Computationally, we model the functional dependence between the VIX futures curves, the VIX futures positions, and the expected utility as a deep neural network with five hidden layers. Out-of-sample backtests of the VIX futures trading strategy suggest that this approach gives rise to reasonable portfolio performance, and to positions in which the investor can be either long or short VIX futures contracts depending on the market environment.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.02016&r=all
  17. By: Furkan G\"ursoy; Bertan Badur
    Abstract: The phenomenon of brain drain, that is the emigration of highly skilled people, has many undesirable effects, particularly for developing countries. In this study, an agent-based model is developed to understand the dynamics of such emigration. We hypothesise that skilled people's emigration decisions are based on several factors including the overall economic and social difference between the home and host countries, people's ability and capacity to obtain good jobs and start a life abroad, and the barriers of moving abroad. Furthermore, the social network of individuals also plays a significant role. The model is validated using qualitative and quantitative pattern matching with real-world observations. Sensitivity and uncertainty analyses are performed in addition to several scenario analyses. Linear and random forest response surface models are created to provide quick predictions on the number of emigrants as well as to understand the effect sizes of individual parameters. Overall, the study provides an abstract model where brain drain dynamics can be explored. Findings from the simulation outputs show that future socioeconomic state of the country is more important than the current state, lack of barriers results in a large number of emigrants, and network effects ensue compounding effects on emigration. Upon further development and customisation, future versions can assist in the decision-making of social policymakers regarding brain drain.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.03234&r=all
  18. By: Wu, Desheng Dang; Härdle, Wolfgang Karl
    Abstract: With growing economic globalization, the modern service sector is in great need of business intelligence for data analytics and computational statistics. The joint application of big data analytics, computational statistics and business intelligence has great potential to make the engineering of advanced service systems more efficient. The purpose of this COST issue is to publish high- quality research papers (including reviews) that address the challenges of service data analytics with business intelligence in the face of uncertainty and risk. High quality contributions that are not yet published or that are not under review by other journals or peer-reviewed conferences have been collected. The resulting topic oriented special issue includes research on business intelligence and computational statistics, data-driven financial engineering, service data analytics and algorithms for optimizing the business engineering. It also covers implementation issues of managing the service process, computational statistics for risk analysis and novel theoretical and computational models, data mining algorithms for risk management related business applications.
    Keywords: Data Analytics,Business Intelligence Systems
    JEL: C00
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:zbw:irtgdp:2020002&r=all
  19. By: Doherr, Thorsten
    Abstract: Most bibliometric databases only provide names as the handle to their careers leading to the issue of namesakes. We introduce a universal method to assess the risk of linking documents of different individuals sharing the same name with the goal of collecting the documents into personalized clusters. A theoretical setup for the probability of drawing a namesake depending on the number of namesakes in the population and the size of the observed unit replaces the need for training datasets, thereby avoiding a namesake bias caused by the inherent underestimation of namesakes in training/benchmark data. A Poisson model based on a master sample of unambiguously identified individuals estimates the main component, the number of namesakes for any given name. To implement the algorithm, we reduce the complexity in the data by resolving similarity in properties. At the core of the implementation is a mechanism returning the unit size of the intersected mutual properties linking two documents. Because of the high computational demands of this mechanism, it is a necessity to discuss means to optimize the procedure.
    Keywords: homonymy,namesakes,disambiguation,scientific careers,inventors,patents,publications
    JEL: C18 C36
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:zbw:zewdip:21021&r=all
  20. By: Nikita Kozodoi; Johannes Jacob; Stefan Lessmann
    Abstract: The rise of algorithmic decision-making has spawned much research on fair machine learning (ML). Financial institutions use ML for building risk scorecards that support a range of credit-related decisions. Yet, the literature on fair ML in credit scoring is scarce. The paper makes two contributions. First, we provide a systematic overview of algorithmic options for incorporating fairness goals in the ML model development pipeline. In this scope, we also consolidate the space of statistical fairness criteria and examine their adequacy for credit scoring. Second, we perform an empirical study of different fairness processors in a profit-oriented credit scoring setup using seven real-world data sets. The empirical results substantiate the evaluation of fairness measures, identify more and less suitable options to implement fair credit scoring, and clarify the profit-fairness trade-off in lending decisions. Specifically, we find that multiple fairness criteria can be approximately satisfied at once and identify separation as a proper criterion for measuring the fairness of a scorecard. We also find fair in-processors to deliver a good balance between profit and fairness. More generally, we show that algorithmic discrimination can be reduced to a reasonable level at a relatively low cost.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.01907&r=all
  21. By: Shota Imaki; Kentaro Imajo; Katsuya Ito; Kentaro Minami; Kei Nakagawa
    Abstract: Deep hedging (Buehler et al. 2019) is a versatile framework to compute the optimal hedging strategy of derivatives in incomplete markets. However, this optimal strategy is hard to train due to action dependence, that is, the appropriate hedging action at the next step depends on the current action. To overcome this issue, we leverage the idea of a no-transaction band strategy, which is an existing technique that gives optimal hedging strategies for European options and the exponential utility. We theoretically prove that this strategy is also optimal for a wider class of utilities and derivatives including exotics. Based on this result, we propose a no-transaction band network, a neural network architecture that facilitates fast training and precise evaluation of the optimal hedging strategy. We experimentally demonstrate that for European and lookback options, our architecture quickly attains a better hedging strategy in comparison to a standard feed-forward network.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.01775&r=all
  22. By: Philippe Goulet Coulombe; Massimiliano Marcellino; Dalibor Stevanovic
    Abstract: Based on evidence gathered from a newly built large macroeconomic data set for the UK, labeled UK-MD and comparable to similar datasets for the US and Canada, it seems the most promising avenue for forecasting during the pandemic is to allow for general forms of nonlinearity by using machine learning (ML) methods. But not all nonlinear ML methods are alike. For instance, some do not allow to extrapolate (like regular trees and forests) and some do (when complemented with linear dynamic components). This and other crucial aspects of ML-based forecasting in unprecedented times are studied in an extensive pseudo-out-of-sample exercise.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.01201&r=all
  23. By: Kristof Lommers; Ouns El Harzli; Jack Kim
    Abstract: This study aims to examine the challenges and applications of machine learning for financial research. Machine learning algorithms have been developed for certain data environments which substantially differ from the one we encounter in finance. Not only do difficulties arise due to some of the idiosyncrasies of financial markets, there is a fundamental tension between the underlying paradigm of machine learning and the research philosophy in financial economics. Given the peculiar features of financial markets and the empirical framework within social science, various adjustments have to be made to the conventional machine learning methodology. We discuss some of the main challenges of machine learning in finance and examine how these could be accounted for. Despite some of the challenges, we argue that machine learning could be unified with financial research to become a robust complement to the econometrician's toolbox. Moreover, we discuss the various applications of machine learning in the research process such as estimation, empirical discovery, testing, causal inference and prediction.
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.00366&r=all
  24. By: Henry Hanifan; Ben Watson; John Cartlidge; Dave Cliff
    Abstract: We consider issues of time in automated trading strategies in simulated financial markets containing a single exchange with public limit order book and continuous double auction matching. In particular, we explore two effects: (i) reaction speed - the time taken for trading strategies to calculate a response to market events; and (ii) trading urgency - the sensitivity of trading strategies to approaching deadlines. Much of the literature on trading agents focuses on optimising pricing strategies only and ignores the effects of time, while real-world markets continue to experience a race to zero latency, as automated trading systems compete to quickly access information and act in the market ahead of others. We demonstrate that modelling reaction speed can significantly alter previously published results, with simple strategies such as SHVR outperforming more complex adaptive algorithms such as AA. We also show that adding a pace parameter to ZIP traders (ZIP-Pace, or ZIPP) can create a sense of urgency that significantly improves profitability.
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.00600&r=all
  25. By: Bolte, Jérôme; Pauwels, Edouard
    Abstract: Automatic differentiation, as implemented today, does not have a simple mathematical model adapted to the needs of modern machine learning. In this work we articulate the relationships between differentiation of programs as implemented in practice and differentiation of nonsmooth functions. To this end we provide a simple class of functions, a nonsmooth calculus, and show how they apply to stochastic approximation methods. We also evidence the issue of artificial critical points created by algorithmic differentiation and show how usual methods avoid these points with probability one.
    Date: 2021–02–01
    URL: http://d.repec.org/n?u=RePEc:tse:wpaper:125195&r=all
  26. By: Spilak, Bruno; Härdle, Wolfgang Karl
    Abstract: Tail risk protection is in the focus of the financial industry and requires solid mathematical and statistical tools, especially when a trading strategy is derived. Recent hype driven by machine learning (ML) mechanisms has raised the necessity to display and understand the functionality of ML tools. In this paper, we present a dynamic tail risk protection strategy that targets a maximum predefined level of risk measured by Value-At-Risk while controlling for participation in bull market regimes. We propose different weak classifiers, parametric and non-parametric, that estimate the exceedance probability of the risk level from which we derive trading signals in order to hedge tail events. We then compare the different approaches both with statistical and trading strategy performance, finally we propose an ensemble classifier that produces a meta tail risk protection strategy improving both generalization and trading performance.
    JEL: C00
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:zbw:irtgdp:2020015&r=all
  27. By: Kéa Baret; Amélie Barbier-Gauchard; Théophilos Papadimitriou
    Abstract: Since the reinforcement of the Stability and Growth Pact (1996), the European Commission closely monitors public finance in the EU members. A failure to comply with the 3% limit rule on the public deficit by a country triggers an audit. In this paper, we present a Machine Learning based forecasting model for the compliance with the 3% limit rule. To do so, we use data spanning the period from 2006 to 2018 (a turbulent period including the Global Financial Crisis and the Sovereign Debt Crisis) for the 28 EU Member States. A set of eight features are identified as predictors from 141 variables through a feature selection procedure. The forecasting is performed using the Support Vector Machines (SVM). The proposed model reached 91.7% forecasting accuracy and outperformed the Logit model that we used as benchmark.
    Keywords: Fiscal Rules; Fiscal Compliance; Stability and Growth Pact; Machine learning.
    JEL: E62 H11 H60 H68
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:ulp:sbbeta:2021-01&r=all
  28. By: Sebastian Doerr; Leonardo Gambacorta; José María Serena Garralda
    Abstract: This paper reviews the use of big data and machine learning in central banking, leveraging on a recent survey conducted among the members of the Irving Fischer Committee (IFC). The majority of central banks discuss the topic of big data formally within their institution. Big data is used with machine learning applications in a variety of areas, including research, monetary policy and financial stability. Central banks also report using big data for supervision and regulation (suptech and regtech applications). Data quality, sampling and representativeness are major challenges for central banks, and so is legal uncertainty around data privacy and confidentiality. Several institutions report constraints in setting up an adequate IT infrastructure and in developing the necessary human capital. Cooperation among public authorities could improve central banks' ability to collect, store and analyse big data.
    Keywords: big data, central banks, machine learning, artificial intelligence, data science
    JEL: G17 G18 G23 G32
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:bis:biswps:930&r=all
  29. By: Henri Fraisse; Matthias Laporte
    Abstract: Taking advantage of granular data we measure the change in bank capital requirement resulting from the implementation of AI techniques to predict corporate defaults. For each of the largest banks operating in France we design an algorithm to build pseudo-internal models of credit risk management for a range of methodologies extensively used in AI (random forest, gradient boosting, ridge regression, deep learning). We compare these models to the traditional model usually in place that basically relies on a combination of logistic regression and expert judgement. The comparison is made along two sets of criterias capturing : the ability to pass compliance tests used by the regulators during on-site missions of model validation (i), and the induced changes in capital requirement (ii). The different models show noticeable differences in their ability to pass the regulatory tests and to lead to a reduction in capital requirement. While displaying a similar ability than the traditional model to pass compliance tests, neural networks provide the strongest incentive for banks to apply AI models for their internal model of credit risk of corporate businesses as they lead in some cases to sizeable reduction in capital requirement.[1]
    Keywords: Artificial Intelligence, Credit Risk, Regulatory Requirement
    JEL: C4 C55 G21 K35
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:bfr:banfra:809&r=all
  30. By: Best, Katherine Laura; Speyer, Lydia Gabriela; Murray, Aja Louise; Ushakova, Anastasia
    Abstract: Identifying predictors of attrition is essential for designing longitudinal studies such that attrition bias can be minimised, and for identifying the variables that can be used as auxiliary in statistical techniques to help correct for non-random drop-out. This paper provides a comparative overview of predictive techniques that can be used to model attrition and identify important risk factors that help in its prediction. Logistic regression and several tree-based machine learning methods were applied to Wave 2 dropout in an illustrative sample of 5000 individuals from a large UK longitudinal study, Understanding Society. Each method was evaluated based on accuracy, AUC-ROC, plausibility of key assumptions and interpretability. Our results suggest a 10% improvement in accuracy for random forest compared to logistic regression methods. However, given the differences in estimation procedures we suggest that both models could be used in conjunction to provide the most comprehensive understanding of attrition predictors.
    Date: 2021–03–02
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:tyszr&r=all
  31. By: Jérémy Fouliard; Michael Howell; Hélène Rey
    Abstract: Financial crises cause economic, social and political havoc. Macroprudential policies are gaining traction but are still severely under-researched compared to monetary policy and fiscal policy. We use the general framework of sequential predictions also called online machine learning to forecast crises out-of-sample. Our methodology is based on model averaging and is "meta-statistic" since we can incorporate any predictive model of crises in our set of experts and test its ability to add information. We are able to predict systemic financial crises twelve quarters ahead out-of-sample with high signal-to-noise ratio in most cases. We analyse which experts provide the most information for our predictions at each point in time and for each country, allowing us to gain some insights into economic mechanisms underlying the building of risk in economies.
    JEL: E37 E44 G01
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:bis:biswps:926&r=all
  32. By: Ni, Xinwen; Härdle, Wolfgang Karl; Xie, Taojun
    Abstract: Cryptocurrencies’ values often respond aggressively to major policy changes, but none of the existing indices informs on the market risks associated with regulatory changes. In this paper, we quantify the risks originating from new regulations on FinTech and cryptocurrencies (CCs), and analyse their impact on market dynamics. Specifically, a Cryptocurrency Regulatory Risk IndeX (CRRIX) is constructed based on policy-related news coverage frequency. The unlabeled news data are collected from the top online CC news platforms and further classified using a Latent Dirichlet Allocation model and Hellinger distance. Our results show that the machine-learning-based CRRIX successfully captures major policy-changing moments. The movements for both the VCRIX, a market volatility index, and the CRRIX are synchronous, meaning that the CRRIX could be helpful for all participants in the cryptocurrency market. The algorithms and Python code are available for research purposes on www.quantlet.de.
    Keywords: Cryptocurrency,Regulatory Risk,Index,LDA,News Classification
    JEL: C45 G11 G18
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:zbw:irtgdp:2020013&r=all
  33. By: Klaus Gründler; Tommy Krieger
    Abstract: We provide a comprehensive overview of the literature on the measurement of democracy and present an extensive update of the Machine Learning indicator of Gründler and Krieger (2016, European Journal of Political Economy). Four improvements are particularly notable: First, we produce a continuous and a dichotomous version of the Machine Learning democracy indicator. Second, we calculate intervals that reflect the degree of measurement uncertainty. Third, we refine the conceptualization of the Machine Learning Index. Finally, we largely expand the data coverage by providing democracy indicators for 186 countries in the period from 1919 to 2019.
    Keywords: data aggregation, democracy indicators, machine learning, measurement issues, regime classifications, support vector machines
    JEL: C38 C43 C82 E02 P16
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:ces:ceswps:_8903&r=all
  34. By: Tao Zou; Xian Li; Xuan Liang; Hansheng Wang
    Abstract: This article introduces subbagging (subsample aggregating) estimation approaches for big data analysis with memory constraints of computers. Specifically, for the whole dataset with size $N$, $m_N$ subsamples are randomly drawn, and each subsample with a subsample size $k_N\ll N$ to meet the memory constraint is sampled uniformly without replacement. Aggregating the estimators of $m_N$ subsamples can lead to subbagging estimation. To analyze the theoretical properties of the subbagging estimator, we adapt the incomplete $U$-statistics theory with an infinite order kernel to allow overlapping drawn subsamples in the sampling procedure. Utilizing this novel theoretical framework, we demonstrate that via a proper hyperparameter selection of $k_N$ and $m_N$, the subbagging estimator can achieve $\sqrt{N}$-consistency and asymptotic normality under the condition $(k_Nm_N)/N\to \alpha \in (0,\infty]$. Compared to the full sample estimator, we theoretically show that the $\sqrt{N}$-consistent subbagging estimator has an inflation rate of $1/\alpha$ in its asymptotic variance. Simulation experiments are presented to demonstrate the finite sample performances. An American airline dataset is analyzed to illustrate that the subbagging estimate is numerically close to the full sample estimate, and can be computationally fast under the memory constraint.
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.00631&r=all
  35. By: Jacques Bughin; Michele Cincera; Kelly Peters; Dorota Reykowska; Marcin Zyszkiewicz; Rafal Ohme
    Abstract: This research updates early studies on the intention to be vaccinated against the Covid-19 virus among a representative sample of adults in 6 European countries (France, Germany, Italy, Spain, Sweden, and the UK) and differentiated by groups of “acceptors”, “refusers”, and “ hesitant”. The research relies on a set of traditional logistic and more complex classification techniques such as Neural Networks and Random Forest techniques to determine common predictors of vaccination preferences. The findings highlight that socio-demographics are not a reliable measure of vaccination propensity, after one controls for different risk perceptions, and illustrate the key role of institutional and peer trust for vaccination success. Policymakers must build vaccine promotion techniques differentiated according to “acceptors”, “refusers”, and “ hesitant”, while restoring much larger trust in their actions upfront since the pandemics if one wishes the vaccination coverage to close part of the gap to the level of herd immunity.
    Keywords: Covid-19, vaccine strategy, deconfinement, priority groups, random-forest, tree classification
    Date: 2021–01
    URL: http://d.repec.org/n?u=RePEc:ict:wpaper:2013/320284&r=all
  36. By: Pavlova, Elitsa; Signore, Simone
    Abstract: We use competing risks methods to investigate the causal link between venture capital (VC) investments supported by the EIF and the exit prospects and patenting activity of young and innovative firms. Using a novel dataset covering European start-ups receiving VC financing in the years 2007 to 2014, we generate a counterfactual group of non-VC-backed young and innovative firms via a combination of exact and propensity score matching. To offset the limited set of observables allowed by our data, we introduce novel measures based on machine learning, network theory, and satellite imagery analysis to estimate treatment propensity. Our estimates indicate that start-ups receiving EIF VC experienced a significant threefold increase in their likelihood to exit via M&A. We find a similarly large effect in the case of IPO, albeit only weakly significant. Moreover, we find that EIF VC contributed to a 13 percentage points higher incidence in patenting activity during the five years following the investment date. Overall, our work provides meaningful evidence towards the positive effects of EIF's VC activity on the exit prospects and innovative capacity of young and innovative businesses in Europe.
    Keywords: EIF,venture capital,public intervention,exit strategy,innovation,start-ups,machine learning,geospatial analysis,network theory
    JEL: G24 G34 M13 O32 O38
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:zbw:eifwps:202170&r=all

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.