nep-big New Economics Papers
on Big Data
Issue of 2019‒09‒02
25 papers chosen by
Tom Coupé
University of Canterbury

  1. Will this time be different? A review of the literature on the Impact of Artificial Intelligence on Employment, Incomes and Growth By Bertin Martens; Songul Tolan
  2. Intertemporal Evidence on the Strategy of Populism By Gennaro, Gloria; Lecce, Giampaolo; Morelli, Massimo
  3. Challenges and Opportunities in the Future Applications of IoT Technology By Attia, Tarek M.
  4. The impact of data access regimes on artificial intelligence and machine learning By Bertin Martens
  5. Machine Learning vs Traditional Forecasting Methods: An Application to South African GDP By Lisa-Cheree Martin
  6. The Promise and Pitfalls of Conflict Prediction: Evidence from Colombia and Indonesia By Bazzi, Samuel; Blair, Robert; Blattman, Christopher; Dube, Oeindrila; Gudgeon, Matthew; Peck, Richard
  7. Modulations Recognition using Deep Neural Network in Wireless Communications By Mossad, Omar S.; ElNainay, Mustafa; Torki, Marwan
  8. Intra-day Equity Price Prediction using Deep Learning as a Measure of Market Efficiency By David Byrd; Tucker Hybinette Balch
  9. Does the estimation of the propensity score by machine learning improve matching estimation? The case of Germany’s programmes for long term unemployed By Goller, Daniel; Lechner, Michael; Moczall, Andreas; Wolff, Joachim
  10. On deep calibration of (rough) stochastic volatility models By Christian Bayer; Blanka Horvath; Aitor Muguruza; Benjamin Stemper; Mehdi Tomas
  11. Fair and Unbiased Algorithmic Decision Making: Current State and Future Challenges By Songul Tolan
  12. AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy using Interpretable Deep Reinforcement Attention Networks By Jingyuan Wang; Yang Zhang; Ke Tang; Junjie Wu; Zhang Xiong
  13. Process orientation in the modern controlling By Károly Szóka; Brigitta Kovács
  14. Nonparametric estimation of causal heterogeneity under high-dimensional confounding By Michael Zimmert; Michael Lechner
  15. Inference on weighted average value function in high-dimensional state space By Victor Chernozhukov; Whitney Newey; Vira Semenova
  16. Multidimensional Self-Organizing Chord-Based Networking for Internet of Things By Abdel Ghafar, Ahmed Ismail; Vazquez Castro, Ágeles; Essam Khedr, Mohamed
  17. Estimating the Economy-Wide Rebound Effect Using Empirically Identified Structural Vector Autoregressions By Stephan B. Bruns; Alessio Moneta; David I. Stern
  18. An artificial neural network augmented GARCH model for Islamic stock market volatility: Do asymmetry and long memory matter? By Manel Hamdi; Walid Chkili
  19. Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics By Mark Weber; Giacomo Domeniconi; Jie Chen; Daniel Karl I. Weidele; Claudio Bellei; Tom Robinson; Charles E. Leiserson
  20. Machine Learning With Kernels for Portfolio Valuation and Risk Management By Lotfi Boudabsa; Damir Filipović
  21. Predicting Consumer Default: A Deep Learning Approach By Albanesi, Stefania; Vamossy, Domonkos
  22. Back to the Future - Changing Job Profiles in the Digital Age By Stephany, Fabian; Lorenz, Hanno
  23. Predicting Consumer Default: A Deep Learning Approach By Stefania Albanesi; Domonkos F. Vamossy
  24. The potential of tax microdata for tax policy By Seán Kennedy
  25. Results of a survey on standardization activities: Japanese institutions' standardization activities in 2017 (Implementation, knowledge source, organizational structure, and interest to artificial intelligence) By TAMURA Suguru

  1. By: Bertin Martens (European Commission – JRC); Songul Tolan (European Commission – JRC)
    Abstract: There is a long-standing economic research literature on the impact of technological innovation and automation in general on employment and economic growth. Traditional economic models trade off a negative displacement or substitution effect against a positive complementarity effect on employment. Economic history since the industrial revolution as strongly supports the view that the net effect on employment and incomes is positive though recent evidence points to a declining labour share in total income. There are concerns that with artificial intelligence (AI) "this time may be different". The state-of-the-art task-based model creates an environment where humans and machines compete for the completion of tasks. It emphasizes the labour substitution effects of automation. This has been tested on robots data, with mixed results. However, the economic characteristics of rival robots are not comparable with non-rival and scalable AI algorithms that may constitute a general purpose technology and may accelerate the pace of innovation in itself. These characteristics give a hint that this time might indeed be different. However, there is as yet very little empirical evidence that relates AI or Machine Learning (ML) to employment and incomes. General growth models can only present a wide range of highly diverging and hypothetical scenarios, from growth implosion to an optimistic future with growth acceleration. Even extreme scenarios of displacement of men by machines offer hope for an overall wealthier economic future. The literature is clearer on the negative implications that automation may have for income equality. Redistributive policies to counteract this trend will have to incorporate behavioural responses to such policies. We conclude that that there are some elements that suggest that the nature of AI/ML is different from previous technological change but there is no empirical evidence yet to underpin this view.
    Keywords: labour markets, employment, technological change, task-based model, artificial intelligence, income distribution
    JEL: J62 O33
    Date: 2018–08
  2. By: Gennaro, Gloria; Lecce, Giampaolo; Morelli, Massimo
    Abstract: Do candidates use populism to maximize the impact of political campaigns? Is the supply of populism strategic? We apply automated text analysis to all available 2016 US Presidential campaign speeches and 2018 midterm campaign programs using a continuous index of populism. This novel dataset shows that the use of populist rhetoric is responsive to the level of expected demand for populism in the local audience. In particular, we provide evidence that current U.S. President Donald Trump uses more populist rhetoric in swing states and in locations where economic insecurity is prevalent. These findings were confirmed when the analysis was extended to recent legislative campaigns wherein candidates tended towards populism when campaigning in stiffly competitive districts where constituents are experiencing high levels of economic insecurity. We also show that pandering is more common for candidates who can credibly sustain anti-elite positions, such as those with shorter political careers. Finally, our results suggest that a populist strategy is rewarded by voters since higher levels of populism are associated with higher shares of the vote, precisely in competitive districts where voters are experiencing economic insecurity.
    Keywords: American Politics; Electoral Campaign; populism; Text Analysis
    JEL: D7
    Date: 2019–06
  3. By: Attia, Tarek M.
    Abstract: The advent of internet of things (IoT) has influenced and revolutionized the information systems and computing technologies. A computing concept where physical objects used in daily life, will identify themselves by getting connected to the internet is called IoT. Physical objects embedded with electronic, radio-frequency identification, software, sensors, actuators and smart objects converge with the internet to accumulate and share data in IoT. IoT is expected to bring in extreme changes and solutions to most of the daily problems in the real world. Thus, IoT provides connectivity for everyone and everything at any time. The IoT embeds some intelligence in Internet connected objects to communicate, exchange information, take decisions, invoke actions and provide amazing services. It has an imperative economic and societal impact for the future construction of information, network, and communication technology. In the upcoming years, the IoT is expected to bridge various technologies to enable new applications by connecting physical objects together to support the intelligent decision making. As the most cost-effective and performant source of positioning and timing information in outdoor environments, the global navigation satellite systems(GNSS) has become an essential element of major contemporary technology developments notably including the IoT, Big Data, Smart Cities and Multimodal Logistics. By 2020, there will be more than 20 billion interconnected IoT devices, and its market size may reach $1.5 trillion. Projections for the impact of IoT on the Internet and economy are impressive, with some anticipating as many as 100 billion connected IoT devices and a global economic impact of more than $11 trillion by 2025. Regulators can play a role in encouraging the development and adoption of the IoT, by preventing abuse of market dominance, protecting users and protecting Internet networks while promoting efficient markets and the public interest. Regulators can consider and identify some measures to foster development of the IoT. Encourage development of LTE‐A and 5G wireless networks, and keep need for IoT‐specific spectrum under review. Universal IPv6 adoption by governments in their own services and procurements, and other incentives for private sector adoption. Increasing interoperability through competition law and give users a right to easy access to personal data. Support global standardization and deployment of remotely provisioned SIMs for greater machine to machine competition. Particular attention will be needed from regulators to IoT privacy and security issues, which are key to encouraging public trust in and adoption of the technology. This paper focuses specifically on the essential technologies that enable the implementation of IoT and the general layered architecture of IoT, the market of IoT and GNSS technologies and their impact of the world economy, application domain of IoT and finally the Policy and regulatory implications and best practices.
    Keywords: Internet of Things(IoT),Global Navigation Satellite Systems(GNSS),Applications,Marketing,Policy and Regulation
    Date: 2019
  4. By: Bertin Martens (European Commission – JRC)
    Abstract: Digitization triggered a steep drop in the cost of information. The resulting data glut created a bottleneck because human cognitive capacity is unable to cope with large amounts of information. Artificial intelligence and machine learning (AI/ML) triggered a similar drop in the cost of machine-based decision-making and helps in overcoming this bottleneck. Substantial change in the relative price of resources puts pressure on ownership and access rights to these resources. This explains pressure on access rights to data. ML thrives on access to big and varied datasets. We discuss the implications of access regimes for the development of AI in its current form of ML. The economic characteristics of data (non-rivalry, economies of scale and scope) favour data aggregation in big datasets. Non-rivalry implies the need for exclusive rights in order to incentivise data production when it is costly. The balance between access and exclusion is at the centre of the debate on data regimes. We explore the economic implications of several modalities for access to data, ranging from exclusive monopolistic control to monopolistic competition and free access. Regulatory intervention may push the market beyond voluntary exchanges, either towards more openness or reduced access. This may generate private costs for firms and individuals. Society can choose to do so if the social benefits of this intervention outweigh the private costs. We briefly discuss the main EU legal instruments that are relevant for data access and ownership, including the General Data Protection Regulation (GDPR) that defines the rights of data subjects with respect to their personal data and the Database Directive (DBD) that grants ownership rights to database producers. These two instruments leave a wide legal no-man's land where data access is ruled by bilateral contracts and Technical Protection Measures that give exclusive control to de facto data holders, and by market forces that drive access, trade and pricing of data. The absence of exclusive rights might facilitate data sharing and access or it may result in a segmented data landscape where data aggregation for ML purposes is hard to achieve. It is unclear if incompletely specified ownership and access rights maximize the welfare of society and facilitate the development of AI/ML.
    Keywords: digital data, ownership and access rights, trade in data, machine learning, artificial intelligence
    JEL: L00
    Date: 2018–09
  5. By: Lisa-Cheree Martin (Department of Economics, Stellenbosch University)
    Abstract: This study employs traditional autoregressive and vector autoregressive forecasting models, as well as machine learning methods of forecasting, in order to compare the performance of each of these techniques. Each technique is used to forecast the percentage change of quarterly South African Gross Domestic Product, quarter-on-quarter. It is found that machine learning methods outperform traditional methods according to the chosen criteria of minimising root mean squared error and maximising correlation with the actual trend of the data. Overall, the outcomes suggest that machine learning methods are a viable option for policy-makers to use, in order to aid their decision-making process regarding trends in macroeconomic data. As this study is limited by data availability, it is recommended that policy-makers consider further exploration of these techniques.
    Keywords: Machine learning, Forecasting, Elastic-net, Random Forests, Support Vector Machines, Recurrent Neural Networks
    JEL: C32 C45 C53 C88
    Date: 2019
  6. By: Bazzi, Samuel; Blair, Robert; Blattman, Christopher; Dube, Oeindrila; Gudgeon, Matthew; Peck, Richard
    Abstract: Policymakers can take actions to prevent local conflict before it begins, if such violence can be accurately predicted. We examine the two countries with the richest available sub-national data: Colombia and Indonesia. We assemble two decades of finegrained violence data by type, alongside hundreds of annual risk factors. We predict violence one year ahead with a range of machine learning techniques. Models reliably identify persistent, high-violence hot spots. Violence is not simply autoregressive, as detailed histories of disaggregated violence perform best. Rich socio-economic data also substitute well for these histories. Even with such unusually rich data, however, the models poorly predict new outbreaks or escalations of violence. "Best case" scenarios with panel data fall short of workable early-warning systems.
    Keywords: Civil War; Colombia; conflict; Forecasting; Indonesia; Machine Learning; prediction
    JEL: C52 C53 D74
    Date: 2019–06
  7. By: Mossad, Omar S.; ElNainay, Mustafa; Torki, Marwan
    Abstract: Automatic modulations recognition is one of the most important aspects in cognitive radios (CRs). Unlicensed users or secondary users (SUs) tend to classify the incoming signals to recognize the type of users in the system. Once the available users are detected and classified accurately, the CR can modify his transmission parameters to avoid any interference with the licensed users or primary users (PUs). In this paper, we propose a deep learning technique to detect the modulations schemes used in a number of sampled transmissions. This approach uses a deep neural network that consists of a large number of convolutional filters to extract the distinct features that separate the various modulation classes. The training is performed to improve the overall classification accuracy with a major focus on the misclassified classes. The results demonstrate that our approach outperforms the recently proposed Convolutional, Long Short Term Memory (LSTM), Deep Neural Network (CLDNN) in terms of overall classification accuracy. Moreover, the classification accuracy obtained by the proposed approach is greater than the CLDNN algorithm at the highest signal-to-noise ratio used.
    Keywords: modulation recognition,deep learning,convolutional neural networks
    Date: 2019
  8. By: David Byrd; Tucker Hybinette Balch
    Abstract: In finance, the weak form of the Efficient Market Hypothesis asserts that historic stock price and volume data cannot inform predictions of future prices. In this paper we show that, to the contrary, future intra-day stock prices could be predicted effectively until 2009. We demonstrate this using two different profitable machine learning-based trading strategies. However, the effectiveness of both approaches diminish over time, and neither of them are profitable after 2009. We present our implementation and results in detail for the period 2003-2017 and propose a novel idea: the use of such flexible machine learning methods as an objective measure of relative market efficiency. We conclude with a candidate explanation, comparing our returns over time with high-frequency trading volume, and suggest concrete steps for further investigation.
    Date: 2019–08
  9. By: Goller, Daniel; Lechner, Michael; Moczall, Andreas; Wolff, Joachim
    Abstract: Matching-type estimators using the propensity score are the major workhorse in active labour market policy evaluation. This work investigates if machine learning algorithms for estimating the propensity score lead to more credible estimation of average treatment effects on the treated using a radius matching framework. Considering two popular methods, the results are ambiguous: We find that using LASSO based logit models to estimate the propensity score delivers more credible results than conventional methods in small and medium sized high dimensional datasets. However, the usage of Random Forests to estimate the propensity score may lead to a deterioration of the performance in situations with a low treatment share. The application reveals a positive effect of the training programme on days in employment for longterm unemployed. While the choice of the “first stage” is highly relevant for settings with low number of observations and few treated, machine learning and conventional estimation becomes more similar in larger samples and higher treatment shares.
    Keywords: Programme evaluation, active labour market policy, causal machine learning, treatment effects, radius matching, propensity score
    JEL: J68 C21
    Date: 2019–08
  10. By: Christian Bayer; Blanka Horvath; Aitor Muguruza; Benjamin Stemper; Mehdi Tomas
    Abstract: Techniques from deep learning play a more and more important role for the important task of calibration of financial models. The pioneering paper by Hernandez [Risk, 2017] was a catalyst for resurfacing interest in research in this area. In this paper we advocate an alternative (two-step) approach using deep learning techniques solely to learn the pricing map -- from model parameters to prices or implied volatilities -- rather than directly the calibrated model parameters as a function of observed market data. Having a fast and accurate neural-network-based approximating pricing map (first step), we can then (second step) use traditional model calibration algorithms. In this work we showcase a direct comparison of different potential approaches to the learning stage and present algorithms that provide a suffcient accuracy for practical use. We provide a first neural network-based calibration method for rough volatility models for which calibration can be done on the y. We demonstrate the method via a hands-on calibration engine on the rough Bergomi model, for which classical calibration techniques are diffcult to apply due to the high cost of all known numerical pricing methods. Furthermore, we display and compare different types of sampling and training methods and elaborate on their advantages under different objectives. As a further application we use the fast pricing method for a Bayesian analysis of the calibrated model.
    Date: 2019–08
  11. By: Songul Tolan (European Commission – JRC)
    Abstract: Machine learning algorithms are now frequently used in sensitive contexts that substantially affect the course of human lives, such as credit lending or criminal justice. This is driven by the idea that‘objective’ machines base their decisions solely on facts and remain unaffected by human cognitive biases, discriminatory tendencies or emotions. Yet, there is overwhelming evidence showing that algorithms can inherit or even perpetuate human biases in their decision making when they are based on data that contains biased human decisions. This has led to a call for fairness-aware machine learning. However, fairness is a complex concept which is also reflected in the attempts to formalize fairness for algorithmic decision making. Statistical formalizations of fairness lead to a long list of criteria that are each flawed (or harmful even) in different contexts. Moreover,inherent tradeoffs in these criteria make it impossible to unify them in one general framework. Thus,fairness constraintsin algorithms have to be specific to the domains to which the algorithms are applied. In the future, research in algorithmic decision making systems should be aware of data and developer biases and add a focus on transparency to facilitate regular fairness audits.
    Keywords: fairness, machine learning, algorithmic bias, algorithmic transparency
    Date: 2018–12
  12. By: Jingyuan Wang; Yang Zhang; Ke Tang; Junjie Wu; Zhang Xiong
    Abstract: Recent years have witnessed the successful marriage of finance innovations and AI techniques in various finance applications including quantitative trading (QT). Despite great research efforts devoted to leveraging deep learning (DL) methods for building better QT strategies, existing studies still face serious challenges especially from the side of finance, such as the balance of risk and return, the resistance to extreme loss, and the interpretability of strategies, which limit the application of DL-based strategies in real-life financial markets. In this work, we propose AlphaStock, a novel reinforcement learning (RL) based investment strategy enhanced by interpretable deep attention networks, to address the above challenges. Our main contributions are summarized as follows: i) We integrate deep attention networks with a Sharpe ratio-oriented reinforcement learning framework to achieve a risk-return balanced investment strategy; ii) We suggest modeling interrelationships among assets to avoid selection bias and develop a cross-asset attention mechanism; iii) To our best knowledge, this work is among the first to offer an interpretable investment strategy using deep reinforcement learning models. The experiments on long-periodic U.S. and Chinese markets demonstrate the effectiveness and robustness of AlphaStock over diverse market states. It turns out that AlphaStock tends to select the stocks as winners with high long-term growth, low volatility, high intrinsic value, and being undervalued recently.
    Date: 2019–07
  13. By: Károly Szóka (University of Sopron, Alexandre Lamfalussy Faculty of Economics); Brigitta Kovács (University of Sopron, Alexandre Lamfalussy Faculty of Economics, Doctoral School)
    Abstract: The controlling is a combination of the target-oriented control activity, the used methods and other soft factors as well. The controlling is constantly changing because it always must meet the current challenges. Each company receives and generates a lot of data, Big Data and Data Mining help in predictive, secure and user-friendly analysis. One of the most important trend is the Industry 4.0 and the compliance to the Digital Business Models through intelligent networks and Cyber-Physical Systems. The controller is responsible for identifying and evaluating the business changes and requirements; so he will support the management in the implementation of it. We are reviewing the importance of the Industry 4.0 and how have to develop the process orientation of controlling at the modern Digital Business Models. In the paper, we will illustrate how can be achieved it by taking into account the Industry 4.0 strategy and how digitalisation can help.
    Keywords: Controlling, Industry 4.0, Process orientation, Digital Business Model
    JEL: M49 O11 O20
    Date: 2019–07
  14. By: Michael Zimmert; Michael Lechner
    Abstract: This paper considers the practically important case of nonparametrically estimating heterogeneous average treatment effects that vary with a limited number of discrete and continuous covariates in a selection-on-observables framework where the number of possible confounders is very large. We propose a two-step estimator for which the first step is estimated by machine learning. We show that this estimator has desirable statistical properties like consistency, asymptotic normality and rate double robustness. In particular, we derive the coupled convergence conditions between the nonparametric and the machine learning steps. We also show that estimating population average treatment effects by averaging the estimated heterogeneous effects is semi-parametrically efficient. The new estimator is an empirical example of the effects of mothers' smoking during pregnancy on the resulting birth weight.
    Date: 2019–08
  15. By: Victor Chernozhukov; Whitney Newey; Vira Semenova
    Abstract: This paper gives a consistent, asymptotically normal estimator of the expected value function when the state space is high-dimensional and the first-stage nuisance functions are estimated by modern machine learning tools. First, we show that value function is orthogonal to the conditional choice probability, therefore, this nuisance function needs to be estimated only at $n^{-1/4}$ rate. Second, we give a correction term for the transition density of the state variable. The resulting orthogonal moment is robust to misspecification of the transition density and does not require this nuisance function to be consistently estimated. Third, we generalize this result by considering the weighted expected value. In this case, the orthogonal moment is doubly robust in the transition density and additional second-stage nuisance functions entering the correction term. We complete the asymptotic theory by providing bounds on second-order asymptotic terms.
    Date: 2019–08
  16. By: Abdel Ghafar, Ahmed Ismail; Vazquez Castro, Ágeles; Essam Khedr, Mohamed
    Abstract: IoT is a coin term recently used in ICT research and industrial community to express the involvement of devices of different capabilities and functionalities in the daily activities of people and organizations. With the enormous amount of data generated by highly dynamic users, the problem of storing, looking up, validating and manipulating data becomes crucial for the success of future networks. A multidimensional chord peer-peer network as extension to the successful chord technology is proposed to cope with the dynamism of IoT networks. Novel approaches have been developed to tackle the high frequency of nodes joining and leaving/failure the network and to deal with Big data, similarity of data, filtering and Geo data.
    Keywords: Internet of Things,peer to peer networks,multidimensional chord networks,distributed resource sharing
    Date: 2019
  17. By: Stephan B. Bruns; Alessio Moneta; David I. Stern
    Abstract: The size of the economy-wide rebound effect is crucial for estimating the contribution that energy efficiency improvements can make to reducing greenhouse gas emissions and for understanding the drivers of energy use. Existing estimates, which vary widely, are based on computable general equilibrium models or partial equilibrium econometric estimates. The former depend on many a priori assumptions and the parameter values adopted, and the latter do not include all mechanisms that might increase or reduce the rebound and mostly do not credibly identify the rebound effect. Using a structural vector autoregressive (SVAR) model, we identify the dynamic causal impact of structural shocks, including an energy efficiency shock, applying identification methods developed in machine learning. In this manner, we are able to estimate the rebound effect with a minimum of a priori assumptions. We apply the SVAR to U.S. monthly and quarterly data, finding that after four years rebound is around 100%. This implies that policies to encourage cost-reducing energy efficiency innovation are not likely to significantly reduce energy use and greenhouse gas emissions in the long run.
    Keywords: Energy efficiency; Rebound effect; Structural VAR; Impulse response functions; Independent component analysis.
    Date: 2019–08–19
  18. By: Manel Hamdi (International Financial Group-Tunisia, Faculty of Economics and Management of Tunis, University of Tunis); Walid Chkili (International Financial Group-Tunisia, Faculty of Economics and Management of Tunis, University of Tunis)
    Abstract: The aim of this paper is to study the volatility and forecast accuracy of the Islamic stock market. For this purpose, we construct a new hybrid GARCH-type models based on artificial neural network (ANN). This model is applied to daily prices for DW Islamic markets during the period June 1999-December 2016. Our in-sample results show that volatility of Islamic stock market can be better described by the FIAPARCH approach that take into account asymmetry and long memory features. Considering the out of sample analysis, we have applied a hybrid forecasting model, which combines the FIAPARCH approach and the artificial neural network (ANN). Empirical results show that the proposed hybrid model (FIAPARCH-ANN) outperforms all other single models such as GARCH, FIGARCH, FIAPARCH in terms of all performance criteria used in our study.
    Date: 2019–08–21
  19. By: Mark Weber; Giacomo Domeniconi; Jie Chen; Daniel Karl I. Weidele; Claudio Bellei; Tom Robinson; Charles E. Leiserson
    Abstract: Anti-money laundering (AML) regulations play a critical role in safeguarding financial systems, but bear high costs for institutions and drive financial exclusion for those on the socioeconomic and international margins. The advent of cryptocurrency has introduced an intriguing paradox: pseudonymity allows criminals to hide in plain sight, but open data gives more power to investigators and enables the crowdsourcing of forensic analysis. Meanwhile advances in learning algorithms show great promise for the AML toolkit. In this workshop tutorial, we motivate the opportunity to reconcile the cause of safety with that of financial inclusion. We contribute the Elliptic Data Set, a time series graph of over 200K Bitcoin transactions (nodes), 234K directed payment flows (edges), and 166 node features, including ones based on non-public data; to our knowledge, this is the largest labelled transaction data set publicly available in any cryptocurrency. We share results from a binary classification task predicting illicit transactions using variations of Logistic Regression (LR), Random Forest (RF), Multilayer Perceptrons (MLP), and Graph Convolutional Networks (GCN), with GCN being of special interest as an emergent new method for capturing relational information. The results show the superiority of Random Forest (RF), but also invite algorithmic work to combine the respective powers of RF and graph methods. Lastly, we consider visualization for analysis and explainability, which is difficult given the size and dynamism of real-world transaction graphs, and we offer a simple prototype capable of navigating the graph and observing model performance on illicit activity over time. With this tutorial and data set, we hope to a) invite feedback in support of our ongoing inquiry, and b) inspire others to work on this societally important challenge.
    Date: 2019–07
  20. By: Lotfi Boudabsa (Ecole Polytechnique Fédérale de Lausanne - School of Basic Sciences); Damir Filipović (Ecole Polytechnique Fédérale de Lausanne; Swiss Finance Institute)
    Abstract: We introduce a computational framework for dynamic portfolio valuation and risk management building on machine learning with kernels. We learn the replicating martingale of a portfolio from a finite sample of its terminal cumulative cash flow. The learned replicating martingale is given in closed form thanks to a suitable choice of the kernel. We develop an asymptotic theory and prove convergence and a central limit theorem. We also derive finite sample error bounds and concentration inequalities. Numerical examples show good results for a relatively small training sample size.
    Keywords: dynamic portfolio valuation, kernel ridge regression, learning theory, reproducing kernel Hilbert space, portfolio risk management
    Date: 2019–06
  21. By: Albanesi, Stefania; Vamossy, Domonkos
    Abstract: We develop a model to predict consumer default based on deep learning. We show that the model consistently outperforms standard credit scoring models, even though it uses the same data. Our model is interpretable and is able to provide a score to a larger class of borrowers relative to standard credit scoring models while accurately tracking variations in systemic risk. We argue that these properties can provide valuable insights for the design of policies targeted at reducing consumer default and alleviating its burden on borrowers and lenders, as well as macroprudential regulation.
    Keywords: Consumer default; credit scores; deep learning; macroprudential policy
    JEL: C45 D1 E27 E44 G21 G24
    Date: 2019–08
  22. By: Stephany, Fabian; Lorenz, Hanno
    Abstract: The uniqueness of human labour is at question in times of smart technologies. The 250 years-old discussion on technological unemployment reawakens. Frey and Osborne (2013) estimate that half of US employment will be automated by algorithms within the next 20 years. Other follow-up studies conclude that only a small fraction of workers will be replaced by digital technologies. The main contribution of our work is to show that the diversity of previous findings regarding the degree of job automation is, to a large extent, driven by model selection and not by controlling for personal characteristics or tasks. For our case study, we consult Austrian experts in machine learning and industry professionals on the susceptibility to digital technologies in the Austrian labour market. Our results indicate that, while clerical computer-based routine jobs are likely to change in the next decade, professional activities, such as the processing of complex information, are less prone to digital change.
    Keywords: Classification,Employment,GLM,Technological Change
    JEL: E24 J24 J31 J62 O33
    Date: 2019
  23. By: Stefania Albanesi; Domonkos F. Vamossy
    Abstract: We develop a model to predict consumer default based on deep learning. We show that the model consistently outperforms standard credit scoring models, even though it uses the same data. Our model is interpretable and is able to provide a score to a larger class of borrowers relative to standard credit scoring models while accurately tracking variations in systemic risk. We argue that these properties can provide valuable insights for the design of policies targeted at reducing consumer default and alleviating its burden on borrowers and lenders, as well as macroprudential regulation.
    JEL: C45 D14 D18 E44 G0 G2
    Date: 2019–08
  24. By: Seán Kennedy
    Abstract: This paper explores one distinctive form of the ‘big data’ of economics – individual tax record microdata – and its potential for tax policy analysis. The paper draws on OECD collaborations with Slovenia and Ireland in 2018 where tax microdata was used.Most empirical economics is based on survey data. However, the current trend of low and falling response rates has placed a question mark over the future value of survey practice generally. By contrast, this paper discusses the increasing use of tax microdata in economic research and the new types of policy analysis made possible by it. In the future, best-practice tax policy analysis is likely to combine tax microdata with survey and national account data. The advantages of these combined data will be important for policymakers to understand and address future policy challenges including protecting tax revenues in an era of population ageing and supporting fairness given the changing nature of economic mobility.
    Keywords: big data, economic mobility, income distributions, income inequality, tax administration data, tax policy analysis
    JEL: D31 H24
    Date: 2019–09–09
  25. By: TAMURA Suguru
    Abstract: This study discusses the results of a survey on standardization activities to provide valuable information on this topic. Currently, standardization is similar to platform formation in that it serves as a firm's central theme for creating a strategy. Furthermore, the management structure of standardization is of interest to understand the organizational structures of firms better. Data from selected Japanese institutions' standardization activities in 2017 are collected using a questionnaire survey. The survey contains three main categories: (1) degree of standardization activities, (2) knowledge sources for standard formation, and (3) organization of standardization activities. Particular focus is on standardization activities with regard to artificial intelligence. To the best of my knowledge, this comprehensive survey related to standardization activities is the first of its kind.
    Date: 2019–08

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.