nep-big 2022-10-17 papers

on Big Data

Issue of 2022–10–17
23 papers chosen by
Tom Coupé, University of Canterbury

Forecasting World Trade Using Big Data and Machine Learning Techniques By Andrei Dubovik; Adam Elbourne; Bram Hendriks; Mark Kattenberg
Artificial Intelligence, Surveillance, and Big Data By David Karpa; Torben Klarl; Michael Rochlitz
Weak Supervision in Analysis of News: Application to Economic Policy Uncertainty By Paul Trust; Ahmed Zahran; Rosane Minghim
Housing Boom and Headline Inflation: Insights from Machine Learning By Mr. Yunhui Zhao; Yang Liu; Di Yang
Predicting Performances of Mutual Funds using Deep Learning and Ensemble Techniques By Nghia Chu; Binh Dao; Nga Pham; Huy Nguyen; Hien Tran
Artificial Intelligence Models and Employee Lifecycle Management: A Systematic Literature Review By Saeed Nosratabadi; Roya Khayer Zahed; Vadim Vitalievich Ponkratov; Evgeniy Vyacheslavovich Kostyrin
What does machine learning say about the drivers of inflation? By Emanuel Kohlscheen
Understanding and Predicting Systemic Corporate Distress: A Machine-Learning Approach By Ms. Burcu Hacibedel; Ritong Qu
The boosted HP filter is more general than you might think By Ziwei Mei; Peter C. B. Phillips; Zhentao Shi
Dynamic Early Warning and Action Model By Hannes Mueller; Christopher Rauh; Alessandro Ruggieri
Interpreting and predicting the economy flows: A time-varying parameter global vector autoregressive integrated the machine learning model By Yukang Jiang; Xueqin Wang; Zhixi Xiong; Haisheng Yang; Ting Tian
Noisy Night Lights Data: Effects on Research Findings for Developing Countries By Omoniyi Alimi; Geua Boe-Gibson; John Gibson
RESHAPE: Explaining Accounting Anomalies in Financial Statement Audits by enhancing SHapley Additive exPlanations By Ricardo M\"uller; Marco Schreyer; Timur Sattarov; Damian Borth
Tree-Based Learning in RNNs for Power Consumption Forecasting By Roberto Baviera; Pietro Manzoni
Computing XVA for American basket derivatives by Machine Learning techniques By Ludovic Goudenege; Andrea Molent; Antonino Zanette
Shaping the transition: Artificial intelligence and social dialogue By Clara Krämer; Sandrine Cazes
An Attention Free Long Short-Term Memory for Time Series Forecasting By Hugo Inzirillo; Ludovic De Villelongue
Two-stage Modeling for Prediction with Confidence By Dangxing Chen
Intergenerational Mobility in the Land of Inequality By Paolo Pinotti; Diogo G. C. Britto; Alexandre Fonseca; Breno Sampaio; Lucas Warwar
Do not judge a business idea by its cover: The relation between topics in business ideas and incorporation probability By Jessica Birkholz
Not all data are created equal - Data sharing and privacy By Michiel Bijlsma; Carin van der Cruijsen; Nicole Jonker
Using Topic Modeling in Innovation Studies: The Case of a Small Innovation System under Conditions of Pandemic Related Change By Jessica Birkholz; Jutta Günther; Mariia Shkolnykova
Smiles in Profiles: Improving Fairness and Efficiency Using Estimates of User Preferences in Online Marketplaces By Susan Athey; Dean Karlan; Emil Palikot; Yuan Yuan

Forecasting World Trade Using Big Data and Machine Learning Techniques

By:	Andrei Dubovik (CPB Netherlands Bureau for Economic Policy Analysis); Adam Elbourne (CPB Netherlands Bureau for Economic Policy Analysis); Bram Hendriks (CPB Netherlands Bureau for Economic Policy Analysis); Mark Kattenberg (CPB Netherlands Bureau for Economic Policy Analysis)
Abstract:	We compare machine learning techniques to a large Bayesian VAR for nowcasting and forecasting world merchandise trade. We focus on how the predictive performance of the machine learning models changes when they have access to a big dataset with 11,017 data series on key economic indicators. The machine learning techniques used include lasso, random forest and linear ensembles. We additionally compare the accuracy of the forecasts during and outside the Great Financial Crisis. We find no statistically significant differences in forecasting accuracy whether with respect to the technique, the dataset used - small or big - or the time period.
JEL:	F17 C53 C55
Date:	2022–10
URL:	https://d.repec.org/n?u=RePEc:cpb:discus:441

Artificial Intelligence, Surveillance, and Big Data

By:	David Karpa; Torben Klarl; Michael Rochlitz
Abstract:	The most important resource to improve technologies in the field of artificial intelligence is data. Two types of policies are crucial in this respect: privacy and data-sharing regulations, and the use of surveillance technologies for policing. Both types of policies vary substantially across countries and political regimes. In this paper, we examine how authoritarian and democratic political institutions can influence the quality of research in artificial intelligence, and the availability of large-scale datasets to improve and train deep learning algorithms. We focus mainly on the Chinese case, and find that - ceteris paribus - authoritarian political institutions continue to have a negative effect on innovation. They can, however, have a positive effect on research in deep learning, via the availability of large-scale datasets that have been obtained through government surveillance. We propose a research agenda to study which of the two effects might dominate in a race for leadership in artificial intelligence between countries with different political institutions, such as the United States and China.
Keywords:	Artificial intelligence, political institutions, big data, surveillance, innovation, China
JEL:	O25 O31 O38 P16 P51
Date:	2021–11
URL:	https://d.repec.org/n?u=RePEc:atv:wpaper:2108

Weak Supervision in Analysis of News: Application to Economic Policy Uncertainty

By:	Paul Trust; Ahmed Zahran; Rosane Minghim
Abstract:	The need for timely data analysis for economic decisions has prompted most economists and policy makers to search for non-traditional supplementary sources of data. In that context, text data is being explored to enrich traditional data sources because it is easy to collect and highly abundant. Our work focuses on studying the potential of textual data, in particular news pieces, for measuring economic policy uncertainty (EPU). Economic policy uncertainty is defined as the public's inability to predict the outcomes of their decisions under new policies and future economic fundamentals. Quantifying EPU is of great importance to policy makers, economists, and investors since it influences their expectations about the future economic fundamentals with an impact on their policy, investment and saving decisions. Most of the previous work using news articles for measuring EPU are either manual or based on a simple keyword search. Our work proposes a machine learning based solution involving weak supervision to classify news articles with regards to economic policy uncertainty. Weak supervision is shown to be an efficient machine learning paradigm for applying machine learning models in low resource settings with no or scarce training sets, leveraging domain knowledge and heuristics. We further generated a weak supervision based EPU index that we used to conduct extensive econometric analysis along with the Irish macroeconomic indicators to validate whether our generated index foreshadows weaker macroeconomic performance
Date:	2022–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2209.05383

Housing Boom and Headline Inflation: Insights from Machine Learning

By:	Mr. Yunhui Zhao; Yang Liu; Di Yang
Abstract:	Inflation has been rising during the pandemic against supply chain disruptions and a multi-year boom in global owner-occupied house prices. We present some stylized facts pointing to house prices as a leading indicator of headline inflation in the U.S. and eight other major economies with fast-rising house prices. We then apply machine learning methods to forecast inflation in two housing components (rent and owner-occupied housing cost) of the headline inflation and draw tentative inferences about inflationary impact. Our results suggest that for most of these countries, the housing components could have a relatively large and sustained contribution to headline inflation, as inflation is just starting to reflect the higher house prices. Methodologically, for the vast majority of countries we analyze, machine-learning models outperform the VAR model, suggesting some potential value for incorporating such models into inflation forecasting.
Keywords:	Housing Price Inflation; Rent; Owner-Occupied Housing; Machine Learning; Forecast; machine-learning model; machine learning method; housing boom; D. forecasting result; Inflation; Housing prices; Housing; Consumer price indexes; Global; Europe; Australia and New Zealand; North America; Caribbean;VAR model
Date:	2022–07–28
URL:	https://d.repec.org/n?u=RePEc:imf:imfwpa:2022/151

Predicting Performances of Mutual Funds using Deep Learning and Ensemble Techniques

By:	Nghia Chu; Binh Dao; Nga Pham; Huy Nguyen; Hien Tran
Abstract:	Predicting fund performance is beneficial to both investors and fund managers, and yet is a challenging task. In this paper, we have tested whether deep learning models can predict fund performance more accurately than traditional statistical techniques. Fund performance is typically evaluated by the Sharpe ratio, which represents the risk-adjusted performance to ensure meaningful comparability across funds. We calculated the annualised Sharpe ratios based on the monthly returns time series data for more than 600 open-end mutual funds investing in listed large-cap equities in the United States. We find that long short-term memory (LSTM) and gated recurrent units (GRUs) deep learning methods, both trained with modern Bayesian optimization, provide higher accuracy in forecasting funds' Sharpe ratios than traditional statistical ones. An ensemble method, which combines forecasts from LSTM and GRUs, achieves the best performance of all models. There is evidence to say that deep learning and ensembling offer promising solutions in addressing the challenge of fund performance forecasting.
Date:	2022–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2209.09649

Artificial Intelligence Models and Employee Lifecycle Management: A Systematic Literature Review

By:	Saeed Nosratabadi; Roya Khayer Zahed; Vadim Vitalievich Ponkratov; Evgeniy Vyacheslavovich Kostyrin
Abstract:	Background/Purpose: The use of artificial intelligence (AI) models for data-driven decision-making in different stages of employee lifecycle (EL) management is increasing. However, there is no comprehensive study that addresses contributions of AI in EL management. Therefore, the main goal of this study was to address this theoretical gap and determine the contribution of AI models to EL. Methods: This study applied the PRISMA method, a systematic literature review model, to ensure that the maximum number of publications related to the subject can be accessed. The output of the PRISMA model led to the identification of 23 related articles, and the findings of this study were presented based on the analysis of these articles. Results: The findings revealed that AL algorithms were used in all stages of EL management (i.e., recruitment, on-boarding, employability and benefits, retention, and off-boarding). It was also disclosed that Random Forest, Support Vector Machines, Adaptive Boosting, Decision Tree, and Artificial Neural Network algorithms outperform other algorithms and were the most used in the literature. Conclusion: Although the use of AI models in solving EL problems is increasing, research on this topic is still in its infancy stage, and more research on this topic is necessary.
Date:	2022–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2209.07335

What does machine learning say about the drivers of inflation?

By:	Emanuel Kohlscheen
Abstract:	This paper examines the drivers of CPI inflation through the lens of a simple, but computationally intensive machine learning technique. More specifically, it predicts inflation across 20 advanced countries between 2000 and 2021, relying on 1,000 regression trees that are constructed based on six key macroeconomic variables. This agnostic, purely data driven method delivers (relatively) good outcome prediction performance. Out of sample root mean square errors (RMSE) systematically beat even the in-sample benchmark econometric models. Partial effects of inflation expectations on CPI outcomes are also elicited in the paper. Overall, the results highlight the role of expectations for inflation outcomes in advanced economies, even though their importance appears to have declined somewhat during the last 10 years.
Date:	2022–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2208.14653

Understanding and Predicting Systemic Corporate Distress: A Machine-Learning Approach

By:	Ms. Burcu Hacibedel; Ritong Qu
Abstract:	In this paper, we study systemic non-financial corporate sector distress using firm-level probabilities of default (PD), covering 55 economies, and spanning the last three decades. Systemic corporate distress is identified by elevated PDs across a large portion of the firms in an economy. A machine-learning based early warning system is constructed to predict the onset of distress in one year’s time. Our results show that credit expansion, monetary policy tightening, overvalued stock prices, and debt-linked balance-sheet weaknesses predict corporate distress. We also find that systemic corporate distress events are associated with contractions in GDP and credit growth in advanced and emerging markets at different degrees and milder than financial crises.
Keywords:	Nonfinancial sector; Probability of default; Early warning systems; Macroprudential policy; balance-sheet weakness; appendix B constructing predictor; distress events; appendix C machine learning model; PD indices; Corporate sector; Banking crises; Credit; Financial statements; Global
Date:	2022–07–29
URL:	https://d.repec.org/n?u=RePEc:imf:imfwpa:2022/153

The boosted HP filter is more general than you might think

By:	Ziwei Mei; Peter C. B. Phillips; Zhentao Shi
Abstract:	The global financial crisis and Covid recession have renewed discussion concerning trend-cycle discovery in macroeconomic data, and boosting has recently upgraded the popular HP filter to a modern machine learning device suited to data-rich and rapid computational environments. This paper sheds light on its versatility in trend-cycle determination, explaining in a simple manner both HP filter smoothing and the consistency delivered by boosting for general trend detection. Applied to a universe of time series in FRED databases, boosting outperforms other methods in timely capturing downturns at crises and recoveries that follow. With its wide applicability the boosted HP filter is a useful automated machine learning addition to the macroeconometric toolkit.
Date:	2022–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2209.09810

Dynamic Early Warning and Action Model

By:	Hannes Mueller; Christopher Rauh; Alessandro Ruggieri
Abstract:	This document presents the outcome of two modules developed for the UK Foreign, Commonwealth Development Office (FCDO): 1) a forecast model which uses machine learning and text downloads to predict outbreaks and intensity of internal armed conflict. 2) A decision making module that embeds these forecasts into a model of preventing armed conflict damages. The outcome is a quantitative benchmark which should provide a testing ground for internal FCDO debates on both strategic levels (i.e. the process of deciding on country priorities) and operational levels (i.e. identifying critical periods by the country experts). Our method allows the FCDO to simulate policy interventions and changes in its strategic focus. We show, for example, that the FCDO should remain engaged in recently stabilized armed conflicts and re-think its development focus in countries with the highest risks. The total expected economic benefit of reinforced preventive efforts, as defined in this report, would bring monthly savings in expected costs of 26 billion USD with a monthly gain to the UK of 630 million USD.
Keywords:	C44, D74, E17
Date:	2022–06
URL:	https://d.repec.org/n?u=RePEc:bge:wpaper:1355

Interpreting and predicting the economy flows: A time-varying parameter global vector autoregressive integrated the machine learning model

By:	Yukang Jiang; Xueqin Wang; Zhixi Xiong; Haisheng Yang; Ting Tian
Abstract:	The paper proposes a time-varying parameter global vector autoregressive (TVP-GVAR) framework for predicting and analysing developed region economic variables. We want to provide an easily accessible approach for the economy application settings, where a variety of machine learning models can be incorporated for out-of-sample prediction. The LASSO-type technique for numerically efficient model selection of mean squared errors (MSEs) is selected. We show the convincing in-sample performance of our proposed model in all economic variables and relatively high precision out-of-sample predictions with different-frequency economic inputs. Furthermore, the time-varying orthogonal impulse responses provide novel insights into the connectedness of economic variables at critical time points across developed regions. We also derive the corresponding asymptotic bands (the confidence intervals) for orthogonal impulse responses function under standard assumptions.
Date:	2022–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2209.05998

Noisy Night Lights Data: Effects on Research Findings for Developing Countries

By:	Omoniyi Alimi (University of Waikato); Geua Boe-Gibson (University of Waikato); John Gibson (University of Waikato)
Abstract:	Night lights data are increasingly used by economists, especially for developing country research. Many of these countries have limited capacity to generate timely and accurate sub-national statistics on economic activity so satellite data seem attractive. Most studies have used Defense Meteorological Satellite Program (DMSP) data that are flawed by blurring, lack of calibration, and top- and bottom-coding. These noisy data are only weakly related to traditional economic activity measures for lower levels spatial units. More accurate data from VIIRS (the Visible Infrared Imaging Radiometer Suite) are available since 2012 but are rarely used by economists. This paper examines how recent published findings for developing countries based on DMSP data for very small spatial units change when the more accurate VIIRS night lights data are used. Our first example finds that economic activity is far more concentrated in low-lying, flood-prone, urban areas than is apparent with the DMSP data. Our second example shows that urbanization, as proxied by night lights, is not ceteris paribus associated with better child nutritional outcomes in Nigeria, contrary to claims in a study using DMSP data. In both examples, spatially mean-reverting errors in the DMSP data cause econometric bias that distorts policy implications.
Keywords:	Anthropometrics; DMSP; flooding; night lights; satellite data; VIIRS
JEL:	C80 O12 Q54
Date:	2022–09–30
URL:	https://d.repec.org/n?u=RePEc:wai:econwp:22/12

RESHAPE: Explaining Accounting Anomalies in Financial Statement Audits by enhancing SHapley Additive exPlanations

By:	Ricardo M\"uller; Marco Schreyer; Timur Sattarov; Damian Borth
Abstract:	Detecting accounting anomalies is a recurrent challenge in financial statement audits. Recently, novel methods derived from Deep-Learning (DL) have been proposed to audit the large volumes of a statement's underlying accounting records. However, due to their vast number of parameters, such models exhibit the drawback of being inherently opaque. At the same time, the concealing of a model's inner workings often hinders its real-world application. This observation holds particularly true in financial audits since auditors must reasonably explain and justify their audit decisions. Nowadays, various Explainable AI (XAI) techniques have been proposed to address this challenge, e.g., SHapley Additive exPlanations (SHAP). However, in unsupervised DL as often applied in financial audits, these methods explain the model output at the level of encoded variables. As a result, the explanations of Autoencoder Neural Networks (AENNs) are often hard to comprehend by human auditors. To mitigate this drawback, we propose (RESHAPE), which explains the model output on an aggregated attribute-level. In addition, we introduce an evaluation framework to compare the versatility of XAI methods in auditing. Our experimental results show empirical evidence that RESHAPE results in versatile explanations compared to state-of-the-art baselines. We envision such attribute-level explanations as a necessary next step in the adoption of unsupervised DL techniques in financial auditing.
Date:	2022–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2209.09157

Tree-Based Learning in RNNs for Power Consumption Forecasting

By:	Roberto Baviera; Pietro Manzoni
Abstract:	A Recurrent Neural Network that operates on several time lags, called an RNN(p), is the natural generalization of an Autoregressive ARX(p) model. It is a powerful forecasting tool when different time scales can influence a given phenomenon, as it happens in the energy sector where hourly, daily, weekly and yearly interactions coexist. The cost-effective BPTT is the industry standard as learning algorithm for RNNs. We prove that, when training RNN(p) models, other learning algorithms turn out to be much more efficient in terms of both time and space complexity. We also introduce a new learning algorithm, the Tree Recombined Recurrent Learning, that leverages on a tree representation of the unrolled network and appears to be even more effective. We present an application of RNN(p) models for power consumption forecasting on the hourly scale: experimental results demonstrate the efficiency of the proposed algorithm and the excellent predictive accuracy achieved by the selected model both in point and in probabilistic forecasting of the energy consumption.
Date:	2022–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2209.01378

Computing XVA for American basket derivatives by Machine Learning techniques

By:	Ludovic Goudenege; Andrea Molent; Antonino Zanette
Abstract:	Total value adjustment (XVA) is the change in value to be added to the price of a derivative to account for the bilateral default risk and the funding costs. In this paper, we compute such a premium for American basket derivatives whose payoff depends on multiple underlyings. In particular, in our model, those underlying are supposed to follow the multidimensional Black-Scholes stochastic model. In order to determine the XVA, we follow the approach introduced by Burgard and Kjaer \cite{burgard2010pde} and afterward applied by Arregui et al. \cite{arregui2017pde,arregui2019monte} for the one-dimensional American derivatives. The evaluation of the XVA for basket derivatives is particularly challenging as the presence of several underlings leads to a high-dimensional control problem. We tackle such an obstacle by resorting to Gaussian Process Regression, a machine learning technique that allows one to address the curse of dimensionality effectively. Moreover, the use of numerical techniques, such as control variates, turns out to be a powerful tool to improve the accuracy of the proposed methods. The paper includes the results of several numerical experiments that confirm the goodness of the proposed methodologies.
Date:	2022–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2209.06485

Shaping the transition: Artificial intelligence and social dialogue

By:	Clara Krämer; Sandrine Cazes
Abstract:	Rapid advances in the development and adoption of artificial intelligence (AI) technologies provide new opportunities but also raise fears about disruptive labour market and workplace transitions. This working paper examines how social dialogue can shape the AI transition in beneficial ways for both workers and firms. It highlights that social dialogue can generally help foster inclusive labour markets and ease technological transitions, and presents new descriptive evidence together with ongoing initiatives from social partners showing that social dialogue has an important role to play in the AI transition as well. The paper also discusses how AI adoption may affect social dialogue itself, e.g. by adding new pressures on weakening labour relations systems and posing practical challenges to social partners, such as insufficient AI-related expertise and resources to respond to the AI transition. Based on these insights, the paper suggests a few measures for policy makers who would like to support social partners’ efforts in shaping the AI transition.
JEL:	J01 J08 J51 O3
Date:	2022–10–03
URL:	https://d.repec.org/n?u=RePEc:oec:elsaab:279-en

An Attention Free Long Short-Term Memory for Time Series Forecasting

By:	Hugo Inzirillo; Ludovic De Villelongue
Abstract:	Deep learning is playing an increasingly important role in time series analysis. We focused on time series forecasting using attention free mechanism, a more efficient framework, and proposed a new architecture for time series prediction for which linear models seem to be unable to capture the time dependence. We proposed an architecture built using attention free LSTM layers that overcome linear models for conditional variance prediction. Our findings confirm the validity of our model, which also allowed to improve the prediction capacity of a LSTM, while improving the efficiency of the learning task.
Date:	2022–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2209.09548

Two-stage Modeling for Prediction with Confidence

By:	Dangxing Chen
Abstract:	The use of neural networks has been very successful in a wide variety of applications. However, it has recently been observed that it is difficult to generalize the performance of neural networks under the condition of distributional shift. Several efforts have been made to identify potential out-of-distribution inputs. Although existing literature has made significant progress with regard to images and textual data, finance has been overlooked. The aim of this paper is to investigate the distribution shift in the credit scoring problem, one of the most important applications of finance. For the potential distribution shift problem, we propose a novel two-stage model. Using the out-of-distribution detection method, data is first separated into confident and unconfident sets. As a second step, we utilize the domain knowledge with a mean-variance optimization in order to provide reliable bounds for unconfident samples. Using empirical results, we demonstrate that our model offers reliable predictions for the vast majority of datasets. It is only a small portion of the dataset that is inherently difficult to judge, and we leave them to the judgment of human beings. Based on the two-stage model, highly confident predictions have been made and potential risks associated with the model have been significantly reduced.
Date:	2022–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2209.08848

Intergenerational Mobility in the Land of Inequality

By:	Paolo Pinotti (Bocconi University); Diogo G. C. Britto (Bocconi University); Alexandre Fonseca (Federal Revenue of Brazil); Breno Sampaio (Universidade Federal de Pernambuco); Lucas Warwar (Universidade Federal de Pernambuco)
Abstract:	We provide theÂ first estimates of intergenerational income mobility for a developing country, namely Brazil. We measure formal income from tax and employment registries, and we train machine learning models on census and survey data to predict informal income. The data reveal a much higher degree of persistence than previous estimates available for developed economies: a 10 percentile increase in parental income rank is associated with a 5.5 percentile increase in child income rank, and persistence is even higher in the top 5%. Children born to parents in the first income quintile face a 46% chance of remaining at the bottom when adults. We validate these estimates using two novel mobility measures that rank children and parents without the need to impute informal income. We document substantial heterogeneity in mobility across individual characteristics - notably gender and race - and across Brazilian regions. Leveraging children who migrate at different ages, we estimate that causal place effects explain 57% of the large spatial variation in mobility. Finally, assortative mating plays a strong role in household income persistence, and parental income is also strongly associated with several key long-term outcomes such as education, teenage pregnancy, occupation, mortality, and victimization.
Keywords:	Intergenerational Mobility, Inequality, Brazil, Migration, Place Effects
JEL:	J62 D31 I31 R23
Date:	2022–10
URL:	https://d.repec.org/n?u=RePEc:crm:wpaper:2322

Do not judge a business idea by its cover: The relation between topics in business ideas and incorporation probability

By:	Jessica Birkholz
Abstract:	It is of key importance to identify innovative business ideas in an early stage, so that funding resources can be adequately allocated according to their economic potential. Traditional indicators do not reliably discriminate business ideas with high degree of innovativeness and high incorporation chances from those with low degree of innovativeness and low incorporation prospects. Therefore, this paper examines the content of business idea descriptions to improve the estimations of the incorporation probability. The paper aims to answer two questions: 1) Are there differences in topic prevalence in innovative and non-innovative business ideas?, and 2) How does the composition of topics related to a business idea influence its incorporation probability? Structural topic modeling and classification tree analysis are applied on business idea descriptions from a competition in Bremen, Germany, from 2003 until 2019. The results show that business idea descriptions are a rich source of information to identify both innovative ideas and those with higher incorporation prospects.
Keywords:	Innovative entrepreneurship, business idea, machine learning, incorporation
JEL:	O30 L26 O38
Date:	2021–11
URL:	https://d.repec.org/n?u=RePEc:atv:wpaper:2109

Not all data are created equal - Data sharing and privacy

By:	Michiel Bijlsma; Carin van der Cruijsen; Nicole Jonker
Abstract:	The COVID-19 pandemic has increased our online presence and unleashed a new discussion on sharing sensitive personal data. Upcoming European legislation will facilitate data sharing in several areas, following the lead of the revised payments directive (PSD2), which enables payments data sharing with third parties. However, little is known about what drives consumersâ€™ preferences with different types of data, as preferences may differ according to the type of data, type of usage or type of firm using the data. Using a discrete-choice survey approach among a representative group of Dutch consumers, we find that next to health data, people are hesitant to share their financial data on payments, wealth and pensions, compared to other types of consumer data. Second, consumers are especially cautious about sharing their data when they are not used anonymously. Third, consumers are more hesitant to share their data with BigTechs, webshops and insurers than they are with banks. Fourth, a financial reward can trigger data sharing by consumers. Last, we show that attitudes towards data usage depend on personal characteristics, consumersâ€™ digital skills, online behaviour and their trust in the firms using the data.
Keywords:	consumer data; data sharing; banks; BigTechs; insurers; webshop; trust, digital skills
JEL:	D12 E42 G21 G22 G23
Date:	2021–11
URL:	https://d.repec.org/n?u=RePEc:dnb:dnbwpp:728

Using Topic Modeling in Innovation Studies: The Case of a Small Innovation System under Conditions of Pandemic Related Change

By:	Jessica Birkholz; Jutta Günther; Mariia Shkolnykova
Abstract:	It is a challenge to empirically investigate rapidly developing situations. An economic crisis is such a situation in which firms exit, enter, and create new business models. The current pandemic has caused a turbulent situation with hardship, but at the same time with creative potential of innovative change. It calls for empirical analyses, but firm level data based on surveys is hard to collect given the high speed of developments. An alternative data source are news articles reporting on innovation issues and assessed by text mining techniques. This is exemplified in this chapter. It shows how topic modeling can be used to scrutinize the shift of innovation topics since the beginning of the COVID-19 crisis. The results apply to a small innovation system in Germany and confirm that innovation priorities change during a crisis and that many different actors are involved.
Keywords:	Topic modeling, innovation, structural change, crisis
JEL:	O30 R11 R58
Date:	2021–01
URL:	https://d.repec.org/n?u=RePEc:atv:wpaper:2101

Smiles in Profiles: Improving Fairness and Efficiency Using Estimates of User Preferences in Online Marketplaces

By:	Susan Athey; Dean Karlan; Emil Palikot; Yuan Yuan
Abstract:	Online platforms often face challenges being both fair (i.e., non-discriminatory) and efficient (i.e., maximizing revenue). Using computer vision algorithms and observational data from a micro-lending marketplace, we find that choices made by borrowers creating online profiles impact both of these objectives. We further support this conclusion with a web-based randomized survey experiment. In the experiment, we create profile images using Generative Adversarial Networks that differ in a specific feature and estimate it's impact on lender demand. We then counterfactually evaluate alternative platform policies and identify particular approaches to influencing the changeable profile photo features that can ameliorate the fairness-efficiency tension.
Date:	2022–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2209.01235

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.