nep-big New Economics Papers
on Big Data
Issue of 2019‒09‒16
27 papers chosen by
Tom Coupé
University of Canterbury

  1. US monetary policy since the 1950s and the changing content of FOMC minutes By Pierre L Siklos
  2. Boosting the Hodrick-Prescott Filter By Peter C.B. Phillips; Zhentao Shi
  3. Deep Prediction Of Investor Interest: a Supervised Clustering Approach By Baptiste Barreau; Laurent Carlier; Damien Challet
  4. De-biased Machine Learning for Compliers By Rahul Singh; Liyang Sun
  5. Validating Weak-form Market Efficiency in United States Stock Markets with Trend Deterministic Price Data and Machine Learning By Samuel Showalter; Jeffrey Gropp
  6. On Artificial Intelligence’s Razor’s Edge: On the Future of Democracy and Society in the Artificial Age By Julia M. Puaschunder
  7. Systemic Risk Clustering of China Internet Financial Based on t-SNE Machine Learning Algorithm By Mi Chuanmin; Xu Runjie; Lin Qingtong
  8. Usage of artificial neural networks in data classification By Elda Xhumari; Julian Fejzaj
  9. Automatic Financial Trading Agent for Low-risk Portfolio Management using Deep Reinforcement Learning By Wonsup Shin; Seok-Jun Bu; Sung-Bae Cho
  10. Multiway Cluster Robust Double/Debiased Machine Learning By Harold D. Chiang; Kengo Kato; Yukun Ma; Yuya Sasaki
  11. Strategic Realignment within Smart Ecosystems: Organizational Preparedness for Smart Cities and the Sharing Economy By Musabbir Chowdhury
  12. Combining Family History and Machine Learning to Link Historical Records By Joseph Price; Kasey Buckles; Jacob Van Leeuwen; Isaac Riley
  13. Artificial Intelligence Market Disruption By Julia M. Puaschunder
  14. Tree-based Control Methods: Consequences of Moving the US Embassy By Nicolaj N{\o}rgaard M\"uhlbach
  15. The macroeconomic consequences of artificial intelligence: A theoretical framework By Huang, Xu; Hu, Yan; Dong, Zhiqiang
  16. Tehran Stock Exchange Prediction Using Sentiment Analysis of Online Textual Opinions By Arezoo Hatefi Ghahfarrokhi; Mehrnoush Shamsfard
  17. Myopic Agents in Assessments of Economic Conditions: Application of Weakly Supervised Learning and Text Mining By Masahiro Kato
  18. Machine Learning in Least-Squares Monte Carlo Proxy Modeling of Life Insurance Companies By Anne-Sophie Krah; Zoran Nikoli\'c; Ralf Korn
  19. Mortality rate forecasting: can recurrent neural networks beat the Lee-Carter model? By G\'abor Petneh\'azi; J\'ozsef G\'all
  20. Using data mining techniques on Moodle data for classification of student?s learning styles By Alda Kika; Loreta Leka; Suela Maxhelaku; Ana Ktona
  21. Deep Prediction of Investor Interest: a Supervised Clustering Approach By Baptiste Barreau; Laurent Carlier; Damien Challet
  22. Using Wasserstein Generative Adversial Networks for the Design of Monte Carlo Simulations By Susan Athey; Guido Imbens; Jonas Metzger; Evan Munro
  23. Teaching with Tableau: Infusing Analytics into your Course By Adam Villa
  24. Regression to the Mean's Impact on the Synthetic Control Method: Bias and Sensitivity Analysis By Nicholas Illenberger; Dylan S. Small; Pamela A. Shaw
  25. Developing an Adaptive Chinese Near-Synonym Corpus for Word of Mouth Classification By Chihli Hung; Jheng-Hua Huang
  26. Mapping the potential of EU regions to contribute to Industry 4.0 By Pierre-Alexandre Balland; Ron Boschma
  27. Misclassification Errors in Remote Sensing Data and Land Use Modeling By Ji, Yongjie

  1. By: Pierre L Siklos
    Abstract: Content analysis is used to analyze 60 years of FOMC minutes. Since there is no unique algorithm to quantify content two different algorithms are applied. Wordscores compares content relative to a chosen benchmark while DICTION is an alternative algorithm that is specifically designed to capture various elements that capture the sentiment or tone conveyed in a text. The resulting indicators are then incorporated into a VAR. The content of FOMC minutes is found to be significantly related to the state of the economy, notably real GDP growth and changes in the fed funds rate. However, the relationship between content and macroeconomic conditions changes after 1993 when minutes are made public with a lag. Both content indicators also suggest substantive changes in the content of FOMC minutes since the 1950s in terms of the FOMC’s dovishness or hawkishness.
    Keywords: FOMC minutes, Wordscores, DICTION, monetary policy stance, vector autoregression
    JEL: E58 E52 E31 E37
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:een:camaaa:2019-69&r=all
  2. By: Peter C.B. Phillips (Cowles Foundation, Yale University); Zhentao Shi (The Chinese University of Hong Kong)
    Abstract: The Hodrick-Prescott (HP) filter is one of the most widely used econometric methods in applied macroeconomic research. The technique is nonparametric and seeks to decompose a time series into a trend and a cyclical component unaided by economic theory or prior trend speci?cation. Like all nonparametric methods, the HP filter depends critically on a tuning parameter that controls the degree of smoothing. Yet in contrast to modern nonparametric methods and applied work with these procedures, empirical practice with the HP filter almost universally relies on standard settings for the tuning parameter that have been suggested largely by experimentation with macroeconomic data and heuristic reasoning about the form of economic cycles and trends. As recent research has shown, standard settings may not be adequate in removing trends, particularly stochastic trends, in economic data. This paper proposes an easy-to-implement practical procedure of iterating the HP smoother that is intended to make the filter a smarter smoothing device for trend estimation and trend elimination. We call this iterated HP technique the boosted HP filter in view of its connection to L_2-boosting in machine learning. The paper develops limit theory to show that the boosted HP filter asymptotically recovers trend mechanisms that involve unit root processes, deterministic polynomial drifts, and polynomial drifts with structural breaks – the most common trends that appear in macroeconomic data and current modeling methodology. In doing so, the boosted filter provides a new mechanism for consistently estimating multiple structural breaks. A stopping criterion is used to automate the iterative HP algorithm, making it a data-determined method that is ready for modern data-rich environments in economic research. The methodology is illustrated using three real data examples that highlight the differences between simple HP filtering, the data-determined boosted filter, and an alternative autoregressive approach. These examples show that the boosted HP filter is helpful in analyzing a large collection of heterogeneous macroeconomic time series that manifest various degrees of persistence, trend behavior, and volatility.
    Keywords: Boosting, Cycles, Empirical macroeconomics, Hodrick-Prescott filter, Machine learning, Nonstationary time series, Trends, Unit root processes
    JEL: C22 E20
    Date: 2019–05
    URL: http://d.repec.org/n?u=RePEc:cwl:cwldpp:2192&r=all
  3. By: Baptiste Barreau (MICS - Mathématiques et Informatique pour la Complexité et les Systèmes - CentraleSupélec, BNPP CIB GM Lab - BNP Paribas CIB Global Markets Data & AI Lab); Laurent Carlier (BNPP CIB GM Lab - BNP Paribas CIB Global Markets Data & AI Lab); Damien Challet (MICS - Mathématiques et Informatique pour la Complexité et les Systèmes - CentraleSupélec)
    Abstract: We propose a novel deep learning architecture suitable for the prediction of investor interest for a given asset in a given timeframe. This architecture performs both investor clustering and modelling at the same time. We first verify its superior performance on a simulated scenario inspired by real data and then apply it to a large proprietary database from BNP Paribas Corporate and Institutional Banking.
    Keywords: investor activity prediction,deep learning,neural networks,mixture of experts,clustering
    Date: 2019–09–02
    URL: http://d.repec.org/n?u=RePEc:hal:wpaper:hal-02276055&r=all
  4. By: Rahul Singh; Liyang Sun
    Abstract: Instrumental variable identification is a concept in causal statistics for estimating the counterfactual effect of treatment D on output Y controlling for covariates X using observational data. Even when measurements of (Y,D) are confounded, the treatment effect on the subpopulation of compliers can nonetheless be identified if an instrumental variable Z is available, which is independent of (Y,D) conditional on X and the unmeasured confounder. We introduce a de-biased machine learning (DML) approach to estimating complier parameters with high-dimensional data. Complier parameters include local average treatment effect, average complier characteristics, and complier counterfactual outcome distributions. In our approach, the de-biasing is itself performed by machine learning, a variant called de-biased machine learning via regularized Riesz representers (DML-RRR). We prove our estimator is consistent, asymptotically normal, and semi-parametrically efficient. In experiments, our estimator outperforms state of the art alternatives. We use it to estimate the effect of 401(k) participation on the distribution of net financial assets.
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1909.05244&r=all
  5. By: Samuel Showalter; Jeffrey Gropp
    Abstract: The Efficient Market Hypothesis has been a staple of economics research for decades. In particular, weak-form market efficiency -- the notion that past prices cannot predict future performance -- is strongly supported by econometric evidence. In contrast, machine learning algorithms implemented to predict stock price have been touted, to varying degrees, as successful. Moreover, some data scientists boast the ability to garner above-market returns using price data alone. This study endeavors to connect existing econometric research on weak-form efficient markets with data science innovations in algorithmic trading. First, a traditional exploration of stationarity in stock index prices over the past decade is conducted with Augmented Dickey-Fuller and Variance Ratio tests. Then, an algorithmic trading platform is implemented with the use of five machine learning algorithms. Econometric findings identify potential stationarity, hinting technical evaluation may be possible, though algorithmic trading results find little predictive power in any machine learning model, even when using trend-specific metrics. Accounting for transaction costs and risk, no system achieved above-market returns consistently. Our findings reinforce the validity of weak-form market efficiency.
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1909.05151&r=all
  6. By: Julia M. Puaschunder (The New School, NY)
    Abstract: The introduction of Artificial Intelligence in our contemporary society imposes historically unique challenges for humankind. The emerging autonomy of AI holds unique potentials of eternal life of robots, AI and algorithms alongside unprecedented economic superiority, data storage and computational advantages. However, the introduction of AI to society also raises ethical questions. What is the social impact of robots, algorithms, blockchain and AI entering the workforce and our daily lives on the economy and human society? Should AI become eternal or is there a virtue in switching off AI at a certain point? If so, we may have to define a ‘virtue of killing’ and a ‘right to destroy’ that may draw from legal but also philosophical sources to answer the question how to handle the abyss of killing with ethical grace and fair style. In light of robots already having gained citizenship and being attributed as quasi-human under Common Law jurisdiction, should AI and robots be granted full citizen rights – such as voting rights? Or should we simply reap the benefits of AI and consider to define a democracy with different classes having diversified access to public choice and voting – as practiced in the ancient Athenian city state, which became the cradle of Western civilization and democratic traditions spread around the globe. Or should we legally justify AI slaves to economically reap their benefits, as was common in ancient Rome, which became the Roman Law legal foundation for Continental and some of Scandinavian Law traditions and which inspired very many different codifications around the world. Finally, we may also draw from the Code Napoléon, the French Code Civil established under Napoleon in 1804, which defined male and female into two classes of human with substantial right and power differences, and – to this day – accounts for one of the few documents that have influenced the whole world in legal and societal ways. In asking critical questions and unraveling the ethical boundary conditions of our future artificial world, the paper thereby takes a descriptive – afar from normative – theoretical angle targeted at aiding a successful introduction of AI into our contemporary workforce, democracy and society.
    Keywords: AI, Artificial Intelligence, Athenian city state, Code Civil, Code Napoléon, Democracy, Right to destroy, Roman Law, Slavery, Society, Workforce
    Date: 2019–04
    URL: http://d.repec.org/n?u=RePEc:smo:cpaper:5jp&r=all
  7. By: Mi Chuanmin; Xu Runjie; Lin Qingtong
    Abstract: With the rapid development of Internet finance, a large number of studies have shown that Internet financial platforms have different financial systemic risk characteristics when they are subject to macroeconomic shocks or fragile internal crisis. From the perspective of regional development of Internet finance, this paper uses t-SNE machine learning algorithm to obtain data mining of China's Internet finance development index involving 31 provinces and 335 cities and regions. The conclusion of the peak and thick tail characteristics, then proposed three classification risks of Internet financial systemic risk, providing more regionally targeted recommendations for the systematic risk of Internet finance.
    Date: 2019–08
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1909.03808&r=all
  8. By: Elda Xhumari (University of Tirana, Faculty of Natural Sciences, Department of Informatics); Julian Fejzaj (University of Tirana, Faculty of Natural Sciences, Department of Informatics)
    Abstract: Data classification is broadly defined as the process of organizing data by respective categories so that it can be used and protected more efficiently. Data classification is performed for different purposes, one of the most common is for preserving data privacy. Data classification often includes a number of attributes, determining the type of data, confidentiality, and integrity. Neural networks help solve different problems. They are very good at data classification problems, they can classify any data with arbitrary precision.
    Keywords: Artificial Neural Networks, Data Classification, Naïve Bayes, Discriminant Analysis, Nearest Neighbor
    JEL: C45
    Date: 2019–07
    URL: http://d.repec.org/n?u=RePEc:sek:iacpro:9211565&r=all
  9. By: Wonsup Shin; Seok-Jun Bu; Sung-Bae Cho
    Abstract: The autonomous trading agent is one of the most actively studied areas of artificial intelligence to solve the capital market portfolio management problem. The two primary goals of the portfolio management problem are maximizing profit and restrainting risk. However, most approaches to this problem solely take account of maximizing returns. Therefore, this paper proposes a deep reinforcement learning based trading agent that can manage the portfolio considering not only profit maximization but also risk restraint. We also propose a new target policy to allow the trading agent to learn to prefer low-risk actions. The new target policy can be reflected in the update by adjusting the greediness for the optimal action through the hyper parameter. The proposed trading agent verifies the performance through the data of the cryptocurrency market. The Cryptocurrency market is the best test-ground for testing our trading agents because of the huge amount of data accumulated every minute and the market volatility is extremely large. As a experimental result, during the test period, our agents achieved a return of 1800% and provided the least risky investment strategy among the existing methods. And, another experiment shows that the agent can maintain robust generalized performance even if market volatility is large or training period is short.
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1909.03278&r=all
  10. By: Harold D. Chiang; Kengo Kato; Yukun Ma; Yuya Sasaki
    Abstract: This paper investigates double/debiased machine learning (DML) under multiway clustered sampling environments. We propose a novel multiway cross fitting algorithm and a multiway DML estimator based on this algorithm. Simulations indicate that the proposed procedure has favorable finite sample performance.
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1909.03489&r=all
  11. By: Musabbir Chowdhury (Niagara College Canada)
    Abstract: Intelligent technologies such as block chain, internet of things, data analytics, artificial intelligence, and sensor fusion that are all necessary for smart cities and the sharing economy are now wide-spread. The Four Pillars of Productivity (4POP) framework is applied to determine the appropriate business positioning, given that these modern cities will very soon start to emerge and will make even greater use of the sharing economy. The financial gain, convenience, and overall quality of life improvements that the sharing economy can offer need to be fully realized. This will involve the sharing of almost all resources and skills, both in the home and work environments. Alignment with intelligent technology trends are considered; these include coordination of logistics and operations, digital governance, corporate culture, and smart urbanization effects on behavior and business practices. The paper also addresses the increased systematic risk and cybersecurity implications that come with complexity and uncertainty.
    Keywords: Smart city, sharing economy, Intelligent technologies, Four Pillars of Productivity Framework
    Date: 2019–07
    URL: http://d.repec.org/n?u=RePEc:sek:iacpro:8710707&r=all
  12. By: Joseph Price; Kasey Buckles; Jacob Van Leeuwen; Isaac Riley
    Abstract: A key challenge for research on many questions in the social sciences is that it is difficult to link historical records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we develop a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. First, we use these “true” links to inform the decisions one needs to make when using traditional linking methods. Second, we use the links to construct a training data set for use in supervised machine learning methods. We describe the procedure we use and illustrate the potential of our approach by linking individuals across the 100% samples of the US decennial censuses from 1900, 1910, and 1920. We obtain an overall match rate of about 70 percent, with a false positive rate of about 12 percent. This combination of high match rate and accuracy represents a point beyond the current frontier for record linking methods.
    JEL: C81 J1 N01
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:26227&r=all
  13. By: Julia M. Puaschunder (The New School, Department of Economics)
    Abstract: The introduction of Artificial Intelligence in our contemporary society imposes historically unique challenges for humankind. The emerging autonomy of AI holds unique potentials of eternal life of robots, AI and algorithms alongside unprecedented economic superiority, data storage and computational advantages. Yet to this day, it remains unclear what impact AI taking over the workforce will have on economic growth.
    Keywords: AI, AI-GDP Index, AI market entry, Artificial Intelligence, capital, economic growth, endogenous growth, exogenous growth, Global Connectivity Index, GDP, Gross Domestic Product, labor, law and economics, society, State of the Mobile Internet Connectivity, workforce
    Date: 2019–07
    URL: http://d.repec.org/n?u=RePEc:smo:dpaper:01jp&r=all
  14. By: Nicolaj N{\o}rgaard M\"uhlbach
    Abstract: We recast the synthetic controls for evaluating policies as a counterfactual prediction problem and replace its linear regression with a non-parametric model inspired by machine learning. The proposed method enables us to achieve more accurate counterfactual predictions. We apply our method to a highly-debated policy: the movement of the US embassy to Jerusalem. In Israel and Palestine, we find that the average number of weekly conflicts has increased by roughly 103 % over 48 weeks since the movement was announced on December 6, 2017. Using conformal inference tests, we justify our model and find the increase to be statistically significant.
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1909.03968&r=all
  15. By: Huang, Xu; Hu, Yan; Dong, Zhiqiang
    Abstract: The authors explore the impact of artificial intelligence on the economy by improving the neoclassical production function and the task-based model. Based on the capital accumulation of artificial intelligence and technological progress, they present a theoretical model that explores the effect of alternative and complementary artificial intelligence on wages, capital prices, labor share, capital share and economic growth. The model shows that artificial intelligence capital lowers the capital prices and increases wages. In addition, if artificial intelligence and labor force are complementary, artificial intelligence capital has a positive impact on labor share, but if artificial intelligence and labor force can substitute each other, labor share is negatively influenced by artificial intelligence capital. The authors extend the task-based model and find that technological progress increases both wages and labor share by generating new tasks. In the long run, without consideration of exogenous technology, as the artificial intelligence capital accumulates, per capita output, per capita traditional capital and per capita artificial intelligence capital grow at the same rate, and economic growth finally reaches steady state equili- brium. With exogenous technology considered, artificial intelligence technology improves, and sustained economic growth is achieved.
    Keywords: artificial intelligence,automation,economic growth,share of labor
    JEL: J23 J24
    Date: 2019
    URL: http://d.repec.org/n?u=RePEc:zbw:ifwedp:201948&r=all
  16. By: Arezoo Hatefi Ghahfarrokhi; Mehrnoush Shamsfard
    Abstract: In this paper, we investigate the impact of the social media data in predicting the Tehran Stock Exchange (TSE) variables for the first time. We consider the closing price and daily return of three different stocks for this investigation. We collected our social media data from Sahamyab.com/stocktwits for about three months. To extract information from online comments, we propose a hybrid sentiment analysis approach that combines lexicon-based and learning-based methods. Since lexicons that are available for the Persian language are not practical for sentiment analysis in the stock market domain, we built a particular sentiment lexicon for this domain. After designing and calculating daily sentiment indices using the sentiment of the comments, we examine their impact on the baseline models that only use historical market data and propose new predictor models using multi regression analysis. In addition to the sentiments, we also examine the comments volume and the users' reliabilities. We conclude that the predictability of various stocks in TSE is different depending on their attributes. Moreover, we indicate that for predicting the closing price only comments volume and for predicting the daily return both the volume and the sentiment of the comments could be useful. We demonstrate that Users' Trust coefficients have different behaviors toward the three stocks.
    Date: 2019–08
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1909.03792&r=all
  17. By: Masahiro Kato
    Abstract: We reveal thepsychological bias of economic agents in their judgments of future economic conditions by applying the behavioral economics and weakly supervised learning. In the Economy Watcher Survey, which is a dataset published by the Japanese government, there are assessments of current and future economic conditions by people with various occupations. Although this dataset gives essential insights regarding economic policy to the Japanese government and the central bank of Japan, there is no clear definition of future economic conditions. Hence, in the survey, respondents answer their assessments based on their interpretations of the future. In our research, we classify the text data using learning from positive and unlabeled data (PU learning), which is a method of weakly supervised learning. The dataset is composed of several periods, and we develop a new algorithm of PU learning for efficient training with the dataset. Through empirical analysis, we show the interpretation of the classification results from the viewpoint of behavioral economics.
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1909.03348&r=all
  18. By: Anne-Sophie Krah; Zoran Nikoli\'c; Ralf Korn
    Abstract: Under the Solvency II regime, life insurance companies are asked to derive their solvency capital requirements from the full loss distributions over the coming year. Since the industry is currently far from being endowed with sufficient computational capacities to fully simulate these distributions, the insurers have to rely on suitable approximation techniques such as the least-squares Monte Carlo (LSMC) method. The key idea of LSMC is to run only a few wisely selected simulations and to process their output further to obtain a risk-dependent proxy function of the loss. In this paper, we present and analyze various adaptive machine learning approaches that can take over the proxy modeling task. The studied approaches range from ordinary and generalized least-squares regression variants over GLM and GAM methods to MARS and kernel regression routines. We justify the combinability of their regression ingredients in a theoretical discourse. Further, we illustrate the approaches in slightly disguised real-world experiments and perform comprehensive out-of-sample tests.
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1909.02182&r=all
  19. By: G\'abor Petneh\'azi; J\'ozsef G\'all
    Abstract: This article applies a long short-term memory recurrent neural network to mortality rate forecasting. The model can be trained jointly on the mortality rate history of different countries, ages, and sexes. The RNN-based method seems to outperform the popular Lee-Carter model.
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1909.05501&r=all
  20. By: Alda Kika (University of Tirana, Facultu of Natural Sciences); Loreta Leka (University of Tirana, Facultu of Natural Sciences); Suela Maxhelaku (University of Tirana, Facultu of Natural Sciences); Ana Ktona (University of Tirana, Facultu of Natural Sciences)
    Abstract: Building an adaptive e-learning system based on learning styles is a very challenging task. Two approaches to determine students learning style are mainly used: using questionnaires or data mining techniques on LMS log data. In order to build an adaptive Moodle LMS based on learning styles we aim to construct and use a mixed approach. 63 students from two courses that attended the same subject ?User interface? completed the ILS (Index of Learning Styles) questionnaire based on Felder-Silverman model. This learning style model is used to assess preferences on four dimensions (active/reflective, sensing/intuitive, visual/verbal, and sequential/global). Moodle keeps detailed logs of all activities that students perform which can be used to predict the learning style for each dimension. In this paper we have analyzed student?s log data from Moodle LMS using data mining techniques for classifying their learning styles focusing on one dimension of Felder-Silverman learning style: visual/verbal. Several classification algorithms provided by WEKA as J48 Decision Tree classifier, Naive Bayes and Part are compared. A 10-fold cross validation was used to evaluate the selected classifiers. The experiments showed that the Naive Bayes reached the best result at 71.18% accuracy.
    Keywords: Learning styles; Felder-Silverman learning style model; Weka; Moodle; data mining
    Date: 2019–07
    URL: http://d.repec.org/n?u=RePEc:sek:iacpro:9211567&r=all
  21. By: Baptiste Barreau; Laurent Carlier; Damien Challet
    Abstract: We propose a novel deep learning architecture suitable for the prediction of investor interest for a given asset in a given timeframe. This architecture performs both investor clustering and modelling at the same time. We first verify its superior performance on a simulated scenario inspired by real data and then apply it to a large proprietary database from BNP Paribas Corporate and Institutional Banking.
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1909.05289&r=all
  22. By: Susan Athey; Guido Imbens; Jonas Metzger; Evan Munro
    Abstract: Researchers often use artificial data to assess the performance of new econometric methods. In many cases the data generating processes used in these Monte Carlo studies do not resemble real data sets and instead reflect many arbitrary decisions made by the researchers. As a result potential users of the methods are rarely persuaded by these simulations that the new methods are as attractive as the simulations make them out to be. We discuss the use of Wasserstein Generative Adversarial Networks (WGANs) as a method for systematically generating artificial data that mimic closely any given real data set without the researcher having many degrees of freedom. We apply the methods to compare in three different settings twelve different estimators for average treatment effects under unconfoundedness. We conclude in this example that (i) there is not one estimator that outperforms the others in all three settings, and (ii) that systematic simulation studies can be helpful for selecting among competing methods.
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1909.02210&r=all
  23. By: Adam Villa (Providence College)
    Abstract: Tableau is a software application that can help you visualize and understand your data. It can connect to almost any data type and allows users to quickly drag and drop data items to create visualizations that can be shared across platforms. Tableau can be infused into any course that works with data and several examples of data analysis in different disciplines will be demonstrated and explained. This talk will showcase some of the software?s capabilities, including a variety of visualizations, tables, dashboards, and stories. It is also an excellent tool for a data analytics course and provides a great supplemental application for a database systems course, as Tableau can interface with most popular database systems. At the end of the talk, sample class exercises, and projects from an introductory data analytics course will also be discussed and presented.
    Keywords: Data Analytics, Teaching Tool, Software
    Date: 2019–06
    URL: http://d.repec.org/n?u=RePEc:sek:iacpro:9011181&r=all
  24. By: Nicholas Illenberger; Dylan S. Small; Pamela A. Shaw
    Abstract: To make informed policy recommendations from observational data, we must be able to discern true treatment effects from random noise and effects due to confounding. Difference-in-Difference techniques which match treated units to control units based on pre-treatment outcomes, such as the synthetic control approach have been presented as principled methods to account for confounding. However, we show that use of synthetic controls or other matching procedures can introduce regression to the mean (RTM) bias into estimates of the average treatment effect on the treated. Through simulations, we show RTM bias can lead to inflated type I error rates as well as decreased power in typical policy evaluation settings. Further, we provide a novel correction for RTM bias which can reduce bias and attain appropriate type I error rates. This correction can be used to perform a sensitivity analysis which determines how results may be affected by RTM. We use our proposed correction and sensitivity analysis to reanalyze data concerning the effects of California's Proposition 99, a large-scale tobacco control program, on statewide smoking rates.
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1909.04706&r=all
  25. By: Chihli Hung (Chung Yuan Christian University); Jheng-Hua Huang (Chung Yuan Christian University)
    Abstract: Word of mouth (WOM) is the subjective opinion of consumers for a brand, a product or a service. Its impact on consumer?s purchasing decision is greater than the marketing activities of a product. Word of mouth classification is an effective means for document organization in an era of big data. However, the existing tasks of WOM classification are mainly dependent on the bag of word (BOW) in the vector space model (VSM), which usually suffers from the curse of dimensionality while dealing with large amounts of documents. We compared characters, context, and homophones, and integrated thesauruses to establish an adaptable Chinese near-synonym corpus. Subsequently, lexical replacement was applied, and the adaptable Chinese near-synonym corpus was created for classifying WOM documents. Two static corpora, the Ministry of Education?s Revised Mandarin Chinese Dictionary and the Extended Chinese Synonym Forest, were used as the benchmarks of comparison for the proposed adaptable near-synonym corpus in the classification and evaluation stage. Evaluations were conducted by calculating recall, precision, F-measure, accuracy, and area under receiver operating characteristic curves (AUC). The results indicate that the classification accuracy of the adaptable near-synonym corpus proposed in the research exceeds that of static corpora when used in the fields of movie, leisure and travel, food, and cosmetics.
    Keywords: Near-Synonym, Adaptive Corpus, Word of Mouth Classification
    JEL: C63
    Date: 2019–07
    URL: http://d.repec.org/n?u=RePEc:sek:iacpro:8711023&r=all
  26. By: Pierre-Alexandre Balland; Ron Boschma
    Abstract: This paper aims to identify the future Industry 4.0 centers of knowledge production in Europe. We expect Industry 4.0 Technologies (I4Ts) to thrive in regions where they can draw on local resources from related technologies. We use OECD-REGPAT data to identify I4T-related technologies, and find that I4Ts are located at the periphery of the knowledge space. Regions with a high potential in terms of I4T-related technologies were more likely to diversify successfully in new I4Ts in the period 2002-2016. We find big differences across EU regions: some show high but most regions show weak I4T potential.
    Keywords: Industry 4.0, relatedness, patents, knowledge space, regional diversification, EU regions
    JEL: B52 O33 R11
    Date: 2019–09
    URL: http://d.repec.org/n?u=RePEc:egu:wpaper:1925&r=all
  27. By: Ji, Yongjie
    Keywords: Research Methods/ Statistical Methods
    Date: 2019–06–25
    URL: http://d.repec.org/n?u=RePEc:ags:aaea19:291218&r=all

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.