nep-big 2020-12-21 papers

on Big Data

Issue of 2020‒12‒21
24 papers chosen by
Tom Coupé
University of Canterbury

Biased Programmers? Or Biased Data? A Field Experiment in Operationalizing AI Ethics By Bo Cowgill; Fabrizio Dell'Acqua; Samuel Deng; Daniel Hsu; Nakul Verma; Augustin Chaintreau
Every Corporation Owns Its Image: Corporate Credit Ratings via Convolutional Neural Networks By Bojing Feng; Wenfang Xue; Bindang Xue; Zeyu Liu
Explainable AI for Interpretable Credit Scoring By Lara Marie Demajo; Vince Vella; Alexiei Dingli
New Directions for Data-Driven Transport Safety By ITF
Competition analysis on the over-the-counter credit default swap market By Louis Abraham
Impact of weather factors on migration intention using machine learning algorithms By John Aoga; Juhee Bae; Stefanija Veljanoska; Siegfried Nijssen; Pierre Schaus
Voting: A machine learning approach By Burka, Dávid; Puppe, Clemens; Szepesváry, László; Tasnádi, Attila
Double machine learning for (weighted) dynamic treatment effects By Hugo Bodory; Martin Huber; Luk\'a\v{s} Laff\'ers
It’s in the News: Developing a Real Time Index for Economic Uncertainty Based on Finnish News Titles By Avela, Aleksi; Lehmus, Markku
Using neural networks to model long-term dependencies in occupancy behavior By Kleinebrahm, Max; Torriti, Jacopo; McKenna, Russell; Ardone, Armin; Fichtner, Wolf
Serie de Machine Learning Análisis de Componentes Principales (PCA) By Sergio A. Pernice
What can we learn about mortgage supply from online data? By Agnese Carella; Federica Ciocchetta; Valentina Michelangeli; Federico Maria Signoretti
Reading the city through its neighbourhoods: Deep text embeddings of Yelp reviews as a basis for determining similarity and change By Olson, Alex; Calderon-Figueroa, Fernando; Bidian, Olimpia; Silver, Daniel; Sanner, Scott
Artificial Intelligence in Agribusiness is Growing in Emerging Markets By Peter Cook; Felicity O'Neill
Stata/SQL/Python integration to emulate prospective cohort studies from big register data By Matteo Marrazzo; Nicola Orsini
Revolutionieren Big Data und KI die Versicherungswirtschaft? 24. Kölner Versicherungssymposium am 14. November 2019 By Müller-Peters, Horst (Ed.); Schmidt, Jan-Philipp (Ed.); Völler, Michaele (Ed.)
First Time Around: Local Conditions and Multi-dimensional Integration of Refugees By Aksoy, Cevat Giray; Poutvaara, Panu; Schikora, Felicitas
First Time Around: Local Conditions and Multi-Dimensional Integration of Refugees By Cevat Giray Aksoy; Panu Poutvaara; Felicitas Schikora
First Time around: Local Conditions and Multi-Dimensional Integration of Refugees By Aksoy, Cevat Giray; Poutvaara, Panu; Schikora, Felicitas
A Random Forest a Day Keeps the Doctor Away By Markus Eyting
Constructing trading strategy ensembles by classifying market states By Michal Balcerak; Thomas Schmelzer
COVID-19 and the stock market: evidence from Twitter By Rahul Goel; Lucas Javier Ford; Maksym Obrizan; Rajesh Sharma
Leveraging Big Data to Advance Gender Equality By Ahmed Nauraiz Rana
Open Banking: Credit Market Competition When Borrowers Own the Data By Zhiguo He; Jing Huang; Jidong Zhou

Biased Programmers? Or Biased Data? A Field Experiment in Operationalizing AI Ethics

By:	Bo Cowgill; Fabrizio Dell'Acqua; Samuel Deng; Daniel Hsu; Nakul Verma; Augustin Chaintreau
Abstract:	Why do biased predictions arise? What interventions can prevent them? We evaluate 8.2 million algorithmic predictions of math performance from $\approx$400 AI engineers, each of whom developed an algorithm under a randomly assigned experimental condition. Our treatment arms modified programmers' incentives, training data, awareness, and/or technical knowledge of AI ethics. We then assess out-of-sample predictions from their algorithms using randomized audit manipulations of algorithm inputs and ground-truth math performance for 20K subjects. We find that biased predictions are mostly caused by biased training data. However, one-third of the benefit of better training data comes through a novel economic mechanism: Engineers exert greater effort and are more responsive to incentives when given better training data. We also assess how performance varies with programmers' demographic characteristics, and their performance on a psychological test of implicit bias (IAT) concerning gender and careers. We find no evidence that female, minority and low-IAT engineers exhibit lower bias or discrimination in their code. However, we do find that prediction errors are correlated within demographic groups, which creates performance improvements through cross-demographic averaging. Finally, we quantify the benefits and tradeoffs of practical managerial or policy interventions such as technical advice, simple reminders, and improved incentives for decreasing algorithmic bias.
Date:	2020–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2012.02394&r=all

Every Corporation Owns Its Image: Corporate Credit Ratings via Convolutional Neural Networks

By:	Bojing Feng; Wenfang Xue; Bindang Xue; Zeyu Liu
Abstract:	Credit rating is an analysis of the credit risks associated with a corporation, which reflect the level of the riskiness and reliability in investing. There have emerged many studies that implement machine learning techniques to deal with corporate credit rating. However, the ability of these models is limited by enormous amounts of data from financial statement reports. In this work, we analyze the performance of traditional machine learning models in predicting corporate credit rating. For utilizing the powerful convolutional neural networks and enormous financial data, we propose a novel end-to-end method, Corporate Credit Ratings via Convolutional Neural Networks, CCR-CNN for brevity. In the proposed model, each corporation is transformed into an image. Based on this image, CNN can capture complex feature interactions of data, which are difficult to be revealed by previous machine learning models. Extensive experiments conducted on the Chinese public-listed corporate rating dataset which we build, prove that CCR-CNN outperforms the state-of-the-art methods consistently.
Date:	2020–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2012.03744&r=all

Explainable AI for Interpretable Credit Scoring

By:	Lara Marie Demajo; Vince Vella; Alexiei Dingli
Abstract:	With the ever-growing achievements in Artificial Intelligence (AI) and the recent boosted enthusiasm in Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. Credit scoring helps financial experts make better decisions regarding whether or not to accept a loan application, such that loans with a high probability of default are not accepted. Apart from the noisy and highly imbalanced data challenges faced by such credit scoring models, recent regulations such as the `right to explanation' introduced by the General Data Protection Regulation (GDPR) and the Equal Credit Opportunity Act (ECOA) have added the need for model interpretability to ensure that algorithmic decisions are understandable and coherent. An interesting concept that has been recently introduced is eXplainable AI (XAI), which focuses on making black-box models more interpretable. In this work, we present a credit scoring model that is both accurate and interpretable. For classification, state-of-the-art performance on the Home Equity Line of Credit (HELOC) and Lending Club (LC) Datasets is achieved using the Extreme Gradient Boosting (XGBoost) model. The model is then further enhanced with a 360-degree explanation framework, which provides different explanations (i.e. global, local feature-based and local instance-based) that are required by different people in different situations. Evaluation through the use of functionallygrounded, application-grounded and human-grounded analysis show that the explanations provided are simple, consistent as well as satisfy the six predetermined hypotheses testing for correctness, effectiveness, easy understanding, detail sufficiency and trustworthiness.
Date:	2020–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2012.03749&r=all

New Directions for Data-Driven Transport Safety

By:	ITF
Abstract:	This report explores how seamless data collection, analysis and sharing can unlock innovations in transport safety. Most interventions to improve transport safety are reactions to incidents. Connected vehicles, smartphone apps, ubiquitous sensors, data sharing and machine learning make proactive transport safety interventions possible and prevent crashes before they happen. Drawing on the Safe System approach, this report examines how transport stakeholders can make better decisions by using more relevant and timely data.
Date:	2019–05–23
URL:	http://d.repec.org/n?u=RePEc:oec:itfaac:83-en&r=all

Competition analysis on the over-the-counter credit default swap market

By:	Louis Abraham
Abstract:	We study two questions related to competition on the OTC CDS market using data collected as part of the EMIR regulation. First, we study the competition between central counterparties through collateral requirements. We present models that successfully estimate the initial margin requirements. However, our estimations are not precise enough to use them as input to a predictive model for CCP choice by counterparties in the OTC market. Second, we model counterpart choice on the interdealer market using a novel semi-supervised predictive task. We present our methodology as part of the literature on model interpretability before arguing for the use of conditional entropy as the metric of interest to derive knowledge from data through a model-agnostic approach. In particular, we justify the use of deep neural networks to measure conditional entropy on real-world datasets. We create the $\textit{Razor entropy}$ using the framework of algorithmic information theory and derive an explicit formula that is identical to our semi-supervised training objective. Finally, we borrow concepts from game theory to define $\textit{top-k Shapley values}$. This novel method of payoff distribution satisfies most of the properties of Shapley values, and is of particular interest when the value function is monotone submodular. Unlike classical Shapley values, top-k Shapley values can be computed in quadratic time of the number of features instead of exponential. We implement our methodology and report the results on our particular task of counterpart choice. Finally, we present an improvement to the $\textit{node2vec}$ algorithm that could for example be used to further study intermediation. We show that the neighbor sampling used in the generation of biased walks can be performed in logarithmic time with a quasilinear time pre-computation, unlike the current implementations that do not scale well.
Date:	2020–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2012.01883&r=all

Impact of weather factors on migration intention using machine learning algorithms

By:	John Aoga; Juhee Bae; Stefanija Veljanoska; Siegfried Nijssen; Pierre Schaus
Abstract:	A growing attention in the empirical literature has been paid to the incidence of climate shocks and change in migration decisions. Previous literature leads to different results and uses a multitude of traditional empirical approaches. This paper proposes a tree-based Machine Learning (ML) approach to analyze the role of the weather shocks towards an individual's intention to migrate in the six agriculture-dependent-economy countries such as Burkina Faso, Ivory Coast, Mali, Mauritania, Niger, and Senegal. We perform several tree-based algorithms (e.g., XGB, Random Forest) using the train-validation-test workflow to build robust and noise-resistant approaches. Then we determine the important features showing in which direction they are influencing the migration intention. This ML-based estimation accounts for features such as weather shocks captured by the Standardized Precipitation-Evapotranspiration Index (SPEI) for different timescales and various socioeconomic features/covariates. We find that (i) weather features improve the prediction performance although socioeconomic characteristics have more influence on migration intentions, (ii) country-specific model is necessary, and (iii) international move is influenced more by the longer timescales of SPEIs while general move (which includes internal move) by that of shorter timescales.
Date:	2020–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2012.02794&r=all

Voting: A machine learning approach

By:	Burka, Dávid; Puppe, Clemens; Szepesváry, László; Tasnádi, Attila
Abstract:	Voting rules can be assessed from quite different perspectives: the axiomatic, the pragmatic, in terms of computational or conceptual simplicity, susceptibility to manipulation, and many others aspects. In this paper, we take the machine learning perspective and ask how 'well' a few prominent voting rules can be learned by a neural network. To address this question, we train the neural network to choosing Condorcet, Borda, and plurality winners, respectively. Remarkably, our statistical results show that, when trained on a limited (but still reasonably large) sample, the neural network mimics most closely the Borda rule, no matter on which rule it was previously trained. The main overall conclusion is that the necessary training sample size for a neural network varies significantly with the voting rule, and we rank a number of popular voting rules in terms of the sample size required.
Keywords:	voting,social choice,neural networks,machine learning,Borda count
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:zbw:kitwps:145&r=all

Double machine learning for (weighted) dynamic treatment effects

By:	Hugo Bodory; Martin Huber; Luk\'a\v{s} Laff\'ers
Abstract:	We consider evaluating the causal effects of dynamic treatments, i.e. of multiple treatment sequences in various periods, based on double machine learning to control for observed, time-varying covariates in a data-driven way under a selection-on-observables assumption. To this end, we make use of so-called Neyman-orthogonal score functions, which imply the robustness of treatment effect estimation to moderate (local) misspecifications of the dynamic outcome and treatment models. This robustness property permits approximating outcome and treatment models by double machine learning even under high dimensional covariates and is combined with data splitting to prevent overfitting. In addition to effect estimation for the total population, we consider weighted estimation that permits assessing dynamic treatment effects in specific subgroups, e.g. among those treated in the first treatment period. We demonstrate that the estimators are asymptotically normal and $\sqrt{n}$-consistent under specific regularity conditions and investigate their finite sample properties in a simulation study. Finally, we apply the methods to the Job Corps study in order to assess different sequences of training programs under a large set of covariates.
Date:	2020–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2012.00370&r=all

It’s in the News: Developing a Real Time Index for Economic Uncertainty Based on Finnish News Titles

By:	Avela, Aleksi; Lehmus, Markku
Abstract:	Abstract Uncertainty may affect economic behavior of individuals and firms in a wide variety of ways, with typically negative consequences for economic growth. It is due to this fact, combined with rising political uncertainty observed lately in many countries, that uncertainty has gained increasing attention in economic literature, too. In this paper, we construct a measure of economic uncertainty for Finland based on Finnish news titles, collected from the YLE’s (the Finnish broadcasting company) website. To construct the index, we utilize machine learning and natural language processing (NLP) techniques, and in this paper, specifically, a transformed naive Bayes text classifier. On basis of the model evaluation, the constructed uncertainty index seems helpful in giving a timely assessment of the current state of the Finnish economy. We find a strong negative correlation between our index and the consumer confidence index by Statistics Finland, and most remarkably, our index seems to lead the consumer confidence index by one month.
Keywords:	Economic uncertainty, Nowcasting, Machine learning, Natural language processing, Naive Bayes
JEL:	C45 C53 C61 E71
Date:	2020–12–08
URL:	http://d.repec.org/n?u=RePEc:rif:wpaper:84&r=all

Using neural networks to model long-term dependencies in occupancy behavior

By:	Kleinebrahm, Max; Torriti, Jacopo; McKenna, Russell; Ardone, Armin; Fichtner, Wolf
Abstract:	Models simulating household energy demand based on different occupant and household types and their behavioral patterns have received increasing attention over the last years due the need to better understand fundamental characteristics that shape the demand side. Most of the models described in the literature are based on Time Use Survey data and Markov chains. Due to the nature of the underlying data and the Markov property, it is not sufficiently possible to consider long-term dependencies over several days in occupant behavior. An accurate mapping of long-term dependencies in behavior is of increasing importance, e.g. for the determination of flexibility potentials of individual households urgently needed to compensate supply-side fluctuations of renewable based energy systems. The aim of this study is to bridge the gap between social practice theory, energy related activity modelling and novel machine learning approaches. The weaknesses of existing approaches are addressed by combining time use survey data with mobility data, which provide information about individual mobility behavior over periods of one week. In social practice theory, emphasis is placed on the sequencing and repetition of practices over time. This suggests that practices have a memory. Transformer models based on the attention mechanism and Long short-term memory (LSTM) based neural networks define the state of the art in the field of natural language processing (NLP) and are for the first time introduced in this paper for the generation of weekly activity profiles. In a first step an autoregressive model is presented, which generates synthetic weekly mobility schedules of individual occupants and thereby captures long-term dependencies in mobility behavior. In a second step, an imputation model enriches the weekly mobility schedules with detailed information about energy relevant at home activities. The weekly activity profiles build the basis for multiple use cases one of which is modelling consistent electricity, heat and mobility demand profiles of households. The approach developed provides the basis for making high-quality weekly activity data available to the general public without having to carry out complex application procedures.
Keywords:	activity modelling,mobility behavior,neural networks,synthetic data
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:zbw:kitiip:49&r=all

Serie de Machine Learning Análisis de Componentes Principales (PCA)

By:	Sergio A. Pernice
Abstract:	En este documento presentamos la técnica de Principal Component Analysis (PCA). Es parte de la serie de documentos sobre machine learning. Es parte del contenido del curso “Métodos de Machine Learning para Economistas” de la Maestría en Economía de la UCEMA.
Keywords:	Principal component analysis, Análisis de componentes principales, aprendizaje no supervisado.
Date:	2020–11
URL:	http://d.repec.org/n?u=RePEc:cem:doctra:770&r=all

What can we learn about mortgage supply from online data?

By:	Agnese Carella (Bank of Italy); Federica Ciocchetta (Bank of Italy); Valentina Michelangeli (Bank of Italy); Federico Maria Signoretti (Bank of Italy)
Abstract:	We exploit a novel dataset on mortgages offered by banks through Italy’s main online mortgage broker, which works with banks representing over 80 per cent of mortgages granted, to gain an up-to-date assessment of loan supply conditions. Characteristics of mortgages are reported for about 85,000 borrower-contract profiles, constant over time, available at the beginning of each month starting from March 2018. We document that riskier applications, characterized by high loan-to-value ratios and long maturity, are, on average, offered by a smaller number of banks that charge higher interest rates. Online banks tend to provide better price conditions than traditional intermediaries. We use the online rates offered to nowcast bank-level official (MIR) interest rate statistics, available only several weeks later. By using both regression analysis and machine learning algorithms, we show that the rates offered have significant predictive content for fixed-rate contracts, also after controlling for time-varying demand conditions, market reference rates, and unobserved time-invariant bank characteristics. Machine learning algorithms provide further improvements over regression models in out of sample predictions.
Keywords:	mortgage, experimental data, risk-taking, nowcasting
JEL:	G21 C81
Date:	2020–11
URL:	http://d.repec.org/n?u=RePEc:bdi:opques:qef_583_20&r=all

Reading the city through its neighbourhoods: Deep text embeddings of Yelp reviews as a basis for determining similarity and change

By:	Olson, Alex; Calderon-Figueroa, Fernando; Bidian, Olimpia; Silver, Daniel; Sanner, Scott
Abstract:	This paper develops novel methods for using Yelp reviews as a window into the collective representations of a city and its neighbourhoods. Basing analysis on social media data such as Yelp is a challenging task because review data is highly sparse and direct analysis may fail to uncover hidden trends. To this end, we propose a deep autoencoder approach for embedding the language of neighbourhood-based business reviews into a reduced dimensional space that facilitates similarity comparison of neighbourhoods and their change over time. Our model improves performance in distinguishing real and fake neighbourhood descriptions derived from real reviews, increasing performance in the task from an average accuracy of 0.46 to 0.77. This improvement in performance indicates that this novel application of embedded language analysis permits us to uncover comparative trends in neighbourhood change through the lens of their venues' reviews, providing a computational methodology for reading a city through its neighbourhoods. The resulting toolkit makes it possible to examine a city's current sociological trends in terms of its neighbourhoods' collective identities.
Date:	2020–12–02
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:8jbvg&r=all

Artificial Intelligence in Agribusiness is Growing in Emerging Markets

By:	Peter Cook; Felicity O'Neill
Keywords:	Agriculture - Agribusiness Agriculture - Agricultural Knowledge & Information Systems Agriculture - Food Markets Information and Communication Technologies - ICT Applications Information and Communication Technologies - Information Technology International Economics and Trade - Access to Markets International Economics and Trade - Foreign Direct Investment
Date:	2020–05
URL:	http://d.repec.org/n?u=RePEc:wbk:wboper:34304&r=all

Stata/SQL/Python integration to emulate prospective cohort studies from big register data

By:	Matteo Marrazzo (Karolinska Institutet); Nicola Orsini (Karolinska Institutet)
Abstract:	The possibilities of using Stata to interrogate and analyze big data are not widely known among health researchers. However, the ability to meld different programming tools is becoming gradually more important with the increasing mainstream availability of big data sources. The aim of this presentation is to illustrate, using existing commands such as odbc and python, how to emulate and analyze large prospective cohorts from a collection of big national registers, harvesting the power of the different engines available (for example, SQL to handle relational databases and the preprocess phase, Stata to easily perform advanced statistical analyses and Python to implement well-known modules and packages for data manipulation and plots). I use a case study in pharmaco-epidemiology to illustrate the potential of using Stata to both design and analyze such complex and large datasets.
Date:	2020–08–20
URL:	http://d.repec.org/n?u=RePEc:boc:ncon19:12&r=all

Revolutionieren Big Data und KI die Versicherungswirtschaft? 24. Kölner Versicherungssymposium am 14. November 2019

By:	Müller-Peters, Horst (Ed.); Schmidt, Jan-Philipp (Ed.); Völler, Michaele (Ed.)
Abstract:	Die Frage, ob Big Data und Künstliche Intelligenz (KI) die Versicherungswirtschaft revolutionieren, beschäftigt schon seit einiger Zeit unsere Gesellschaft sowie im Besonderen die Versicherungsbranche. Die Fortschritte in jüngster Vergangenheit in der KI und bei der Auswertung großer Datenmengen sowie die große mediale Aufmerksamkeit sind immens. Somit waren Big Data und Künstliche Intelligenz auch die diesjährigen vielversprechenden Themen des 24. Kölner Versicherungs- symposiums der TH Köln am 14. November 2019: Das ivwKöln hatte zum fachlichen Austausch eingeladen, ein attraktives Vortragsprogramm zusammengestellt und Networking-Gelegenheiten für die Gäste aus Forschung und Praxis vorbereitet. Der vorliegende Proceedings-Band umfasst die Vortragsinhalte der verschiedenen Referenten.
Keywords:	Versicherungswirtschaft,Künstliche Intelligenz,Big Data
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:zbw:thkivw:72020&r=all

First Time Around: Local Conditions and Multi-dimensional Integration of Refugees

By:	Aksoy, Cevat Giray; Poutvaara, Panu; Schikora, Felicitas
Abstract:	We study the causal effect of local labor market conditions and attitudes towards immigrants at the time of arrival on refugees’ multi-dimensional integration outcomes (economic, linguistic, navigational, political, psychological, and social). Using a unique dataset on refugees, we leverage a centralized allocation policy in Germany where refugees were exogenously assigned to live in specific counties. We find that high initial local unemployment negatively affects refugees’ economic and social integration: they are less likely to be in education or employment and they earn less. We also show that favorable attitudes towards immigrants promote refugees’ economic and social integration. The results suggest that attitudes toward immigrants are as important as local unemployment rates in shaping refugees’ integration outcomes. Using a machine learning classifier algorithm, we find that our results are driven by older people and those with secondary or tertiary education. Our findings highlight the importance of both initial economic and social conditions for facilitating refugee integration, and have implications for the design of centralized allocation policies.
Date:	2020–11–30
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:nsr8q&r=all

First Time Around: Local Conditions and Multi-Dimensional Integration of Refugees

By:	Cevat Giray Aksoy; Panu Poutvaara; Felicitas Schikora
Abstract:	We study the causal effect of local labor market conditions and attitudes towards immigrants at the time of arrival on refugees’ multi-dimensional integration outcomes (economic, linguistic, navigational, political, psychological, and social). Using a unique dataset on refugees, we leverage a centralized allocation policy in Germany where refugees were exogenously assigned to live in specific counties. We find that high initial local unemployment negatively affects refugees’ economic and social integration: they are less likely to be in education or employment and they earn less. We also show that favorable attitudes towards immigrants promote refugees’ economic and social integration. The results suggest that attitudes toward immigrants are as important as local unemployment rates in shaping refugees’ integration outcomes. Using a machine learning classifier algorithm, we find that our results are driven by older people and those with secondary or tertiary education. Our findings highlight the importance of both initial economic and social conditions for facilitating refugee integration, and have implications for the design of centralized allocation policies.
Keywords:	international migration, refugees, integration, allocation policy
JEL:	F22 J15 J24
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_8747&r=all

First Time around: Local Conditions and Multi-Dimensional Integration of Refugees

By:	Aksoy, Cevat Giray (European Bank for Reconstruction and Development); Poutvaara, Panu (University of Munich); Schikora, Felicitas (DIW Berlin)
Abstract:	We study the causal effect of local labor market conditions and attitudes towards immigrants at the time of arrival on refugees' multi-dimensional integration outcomes (economic, linguistic, navigational, political, psychological, and social). Using a unique dataset on refugees, we leverage a centralized allocation policy in Germany where refugees were exogenously assigned to live in specific counties. We find that high initial local unemployment negatively affects refugees' economic and social integration: they are less likely to be in education or employment and they earn less. We also show that favorable attitudes towards immigrants promote refugees' economic and social integration. The results suggest that attitudes toward immigrants are as important as local unemployment rates in shaping refugees' integration outcomes. Using a machine learning classifier algorithm, we find that our results are driven by older people and those with secondary or tertiary education. Our findings highlight the importance of both initial economic and social conditions for facilitating refugee integration, and have implications for the design of centralized allocation policies.
Keywords:	international migration, refugees, integration, allocation policy
JEL:	F22 J15 J24
Date:	2020–11
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp13914&r=all

A Random Forest a Day Keeps the Doctor Away

By:	Markus Eyting (Johannes Gutenberg University)
Abstract:	Using a unique dataset from a German health check-up provider including detailed individual questionnaire data as well as medical test data, I apply a random forest to predict several health risk factors. I evaluate the prediction performance using various metrics and find decent prediction qualities across all outcomes. By identifying the most relevant predictor variables, I compile concise and validated questionnaire tools to identify individuals’ blood pressure, blood glucose, and cholesterol levels, their risk of a coronary heart disease, whether or not they suffer from plaque or a metabolic syndrome as well as their relative fitness levels. In a second step, I compare the prediction results to physician predictions of the same patient observations. I find that the random forest outperforms the physicians if predictions are based on the same information set. When additionally providing the physicians with the random forest predictions for a particular patient observation, the physicians align with the random forest predictions. Finally, while the random forest considers various psychological scales, the physicians focus on family health history information instead.
Date:	2020–12–07
URL:	http://d.repec.org/n?u=RePEc:jgu:wpaper:2026&r=all

Constructing trading strategy ensembles by classifying market states

By:	Michal Balcerak; Thomas Schmelzer
Abstract:	Rather than directly predicting future prices or returns, we follow a more recent trend in asset management and classify the state of a market based on labels. We use numerous standard labels and even construct our own ones. The labels rely on future data to be calculated, and can be used a target for training a market state classifier using an appropriate set of market features, e.g. moving averages. The construction of those features relies on their label separation power. Only a set of reasonable distinct features can approximate the labels. For each label we use a specific neural network to classify the state using the market features from our feature space. Each classifier gives a probability to buy or to sell and combining all their recommendations (here only done in a linear way) results in what we call a trading strategy. There are many such strategies and some of them are somewhat dubious and misleading. We construct our own metric based on past returns but penalising for a low number of transactions or small capital involvement. Only top score-performance-wise trading strategies end up in final ensembles. Using the Bitcoin market we show that the strategy ensembles outperform both in returns and risk-adjusted returns in the out-of-sample period. Even more so we demonstrate that there is a clear correlation between the success achieved in the past (if measured in our custom metric) and the future.
Date:	2020–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2012.03078&r=all

COVID-19 and the stock market: evidence from Twitter

By:	Rahul Goel; Lucas Javier Ford; Maksym Obrizan; Rajesh Sharma
Abstract:	COVID-19 has had a much larger impact on the financial markets compared to previous epidemics because the news information is transferred over the social networks at a speed of light. Using Twitter's API, we compiled a unique dataset with more than 26 million COVID-19 related Tweets collected from February 2nd until May 1st, 2020. We find that more frequent use of the word "stock" in daily Tweets is associated with a substantial decline in log returns of three key US indices - Dow Jones Industrial Average, S&P500, and NASDAQ. The results remain virtually unchanged in multiple robustness checks.
Date:	2020–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2011.08717&r=all

Leveraging Big Data to Advance Gender Equality

By:	Ahmed Nauraiz Rana
Keywords:	Gender - Gender and Development Information and Communication Technologies - ICT Applications
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:wbk:wboper:34308&r=all

Open Banking: Credit Market Competition When Borrowers Own the Data

By:	Zhiguo He; Jing Huang; Jidong Zhou
Abstract:	Open banking facilitates data sharing consented by customers who generate the data, with a regulatory goal of promoting competition between traditional banks and challenger fintech entrants. We study lending market competition when sharing banks' customer data enables better borrower screening or targeting by fintech lenders. Open banking could make the entire financial industry better off yet leave all borrowers worse off, even if borrowers could choose whether to share their data. We highlight the importance of equilibrium credit quality inference from borrowers' endogenous sign-up decisions. When data sharing triggers privacy concerns by facilitating exploitative targeted loans, the equilibrium sign-up population can grow with the degree of privacy concerns.
JEL:	D18 G21 L13 L15 L51
Date:	2020–11
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:28118&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.