nep-big 2022-10-31 papers

on Big Data

Issue of 2022‒10‒31
35 papers chosen by
Tom Coupé
University of Canterbury

Proxying Economic Activity with Daytime Satellite Imagery: Filling Data Gaps across Time and Space By Lehnert, Patrick; Niederberger, Michael; Backes-Gellner, Uschi; Bettinger, Eric
Asset Pricing and Deep Learning By Chen Zhang
How communication makes the difference between a cartel and tacit collusion: a machine learning approach By Maximilian Andres; Lisa Bruttel; Jana Friedrichsen
Counterfactual Reconciliation: Incorporating Aggregation Constraints For More Accurate Causal Effect Estimates By Cengiz, Doruk; Tekgüç, Hasan
Chaotic Hedging with Iterated Integrals and Neural Networks By Ariel Neufeld; Philipp Schmocker
Achieving fairness with a simple ridge penalty By Scutari, Marco; Panero, Francesca; Proissl, Manuel
Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning By John M. Abowd; Joelle Hillary Abramowitz; Margaret Catherine Levenstein; Kristin McCue; Dhiren Patki; Trivellore Raghunathan; Ann Michelle Rodgers; Matthew D. Shapiro; Nada Wasi; Dawn Zinsser
Using National Payment System Data to Nowcast Economic Activity in Azerbaijan By Ilkin Huseynov; Nazrin Ramazanova; Hikmat Valirzayev
Small Area Estimation of Monetary Poverty in Mexico Using Satellite Imagery and Machine Learning By Newhouse,David Locke; Merfeld,Joshua David; Ramakrishnan,Anusha Pudugramam; Swartz,Tom; Lahiri,Partha
Using Knowledge Distillation to improve interpretable models in a retail banking context By Maxime Biehler; Mohamed Guermazi; C\'elim Starck
Automatic Identification and Classification of Share Buybacks and their Effect on Short-, Mid- and Long-Term Returns By Thilo Reintjes
The Community Explorer: Bringing Populations' Diversity into Policy Discussions, One County at a Time By Lopez, Claude; Roh, Hyeongyul; Switek, Maggie
AI-Assisted Discovery of Quantitative and Formal Models in Social Science By Julia Balla; Sihao Huang; Owen Dugan; Rumen Dangovski; Marin Soljacic
The impact of artificial intelligence on the nature and quality of jobs By Laura Nurski; Mia Hoffmann
Bayesian Modeling of Time-varying Parameters Using Regression Trees By Niko Hauzenberger; Florian Huber; Gary Koop; James Mitchell
With big data come big problems: pitfalls in measuring basis risk for crop index insurance By Matthieu Stigler; Apratim Dey; Andrew Hobbs; David Lobell
What news can really tell us? Evidence from a news-based sentiment index for financial markets analysis By Anna Marszal
How Artificial Intelligence Can Help Advance Post-Secondary Learning in Emerging Markets By Baloko Makala; Maud Schmitt; Alejandro Caballero
Natural Disasters and Economic Dynamics : Evidence from the Kerala Floods By Beyer,Robert Carl Michael; Narayanan,Abhinav; Thakur,Gogol Mitra
Consumer Privacy and the Value of Consumer Data By Mehmet Canayaz; Ilja Kantorovitch; Roxana Mihet
A gender perspective on artificial intelligence and jobs- The vicious cycle of digital inequality By Estrella Gomez-Herrera; Sabine Köszegi
To Be or Not to Be: The Entrepreneur in Neo-Schumpeterian Growth Theory By Henrekson, Magnus; Johansson, Dan; Karlsson, Johan
Detecting asset price bubbles using deep learning By Francesca Biagini; Lukas Gonon; Andrea Mazzon; Thilo Meyer-Brandis
Optimal consumption-investment choices under wealth-driven risk aversion By Ruoxin Xiao
Sentiment Analysis of ESG disclosures on Stock Market By Sudeep R. Bapat; Saumya Kothari; Rushil Bansal
Feature-Rich Long-term Bitcoin Trading Assistant By Jatin Nainani; Nirman Taterh; Md Ausaf Rashid; Ankit Khivasara
Qualitative Analysis at Scale :An Application to Aspirations in Cox's Bazaar, Bangladesh By Ashwin,Julian; Rao,Vijayendra; Biradavolu,Monica Rao; Haque,Arshia; Khan,Afsana Iffat; Krishnan,Nandini; Nagy,Peer Sebastian
A Missed Opportunity to Further Build Trust in AI: A Landscape Analysis of OECD.AI By Susan Ariel Aaronson
The ECB press conference: a textual analysis By Pavelkova, Andrea
The Economic Impact of Covid-19 and Associated Lockdown Measures in China By Charpe, Matthieu
Wage expectation, information and the decision to become a nurse By Philipp Kugler;
Wicked Problems Might Inspire Greater Data Sharing By Susan Ariel Aaronson
How Well Can Real-Time Indicators Track the Economic Impacts of a Crisis Like COVID-19 ? By Ten,Gi Khan; Merfeld,Joshua David; Hirfrfot,Kibrom Tafere; Newhouse,David Locke; Pape,Utz Johann
PayTech and the D(ata) N(etwork) A(ctivities) of BigTech Platforms By Jonathan Chiu; Thorsten V. Koeppl
Silence is not Golden Anymore? Social media activity and stock market valuation in Europe By Christophe J. GODLEWSKI; Katarzyna BYRKA-KITA; Renata GOLA; Jacek CYPRYJANSKI

Proxying Economic Activity with Daytime Satellite Imagery: Filling Data Gaps across Time and Space

By:	Lehnert, Patrick (University of Zurich); Niederberger, Michael (University of Zurich); Backes-Gellner, Uschi (University of Zurich); Bettinger, Eric (Stanford University)
Abstract:	This paper develops a novel procedure for proxying economic activity with day-time satellite imagery across time periods and spatial units, for which reliable data on economic activity are otherwise not available. In developing this unique proxy, we apply machine-learning techniques to a historical time series of daytime satellite imagery dating back to 1984. Compared to satellite data on night light intensity, another common economic proxy, our proxy more precisely predicts economic activity at smaller regional levels and over longer time horizons. We demonstrate our measure's usefulness for the example of Germany, where East German data on economic activity are unavailable for detailed regional levels and historical time series. Our procedure is generalizable to any region in the world, and it has great potential for analyzing historical economic developments, evaluating local policy reforms, and controlling for economic activity at highly disaggregated regional levels in econometric applications.
Keywords:	daytime satellite imagery, Landsat, machine learning, economic activity, land cover
JEL:	E01 E23 O18 R11 R14
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp15555&r=

Asset Pricing and Deep Learning

By:	Chen Zhang (SenseTime Research)
Abstract:	Traditional machine learning methods have been widely studied in financial innovation. My study focuses on the application of deep learning methods on asset pricing. I investigate various deep learning methods for asset pricing, especially for risk premia measurement. All models take the same set of predictive signals (firm characteristics, systematic risks and macroeconomics). I demonstrate high performance of all kinds of state-of-the-art (SOTA) deep learning methods, and figure out that RNNs with memory mechanism and attention have the best performance in terms of predictivity. Furthermore, I demonstrate large economic gains to investors using deep learning forecasts. The results of my comparative experiments highlight the importance of domain knowledge and financial theory when designing deep learning models. I also show return prediction tasks bring new challenges to deep learning. The time varying distribution causes distribution shift problem, which is essential for financial time series prediction. I demonstrate that deep learning methods can improve asset risk premium measurement. Due to the booming deep learning studies, they can constantly promote the study of underlying financial mechanisms behind asset pricing. I also propose a promising research method that learning from data and figuring out the underlying economic mechanisms through explainable artificial intelligence (AI) methods. My findings not only justify the value of deep learning in blooming fintech development, but also highlight their prospects and advantages over traditional machine learning methods.
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2209.12014&r=

How communication makes the difference between a cartel and tacit collusion: a machine learning approach

By:	Maximilian Andres (University of Potsdam); Lisa Bruttel (University of Potsdam); Jana Friedrichsen (Humboldt-Universität zu Berlin, WZB Berlin Social Science Center, DIW Berlin)
Abstract:	This paper sheds new light on the role of communication for cartel formation. Using machine learning to evaluate free-form chat communication among firms in a laboratory experiment, we identify typical communication patterns for both explicit cartel formation and indirect attempts to collude tacitly. We document that firms are less likely to communicate explicitly about price fixing and more likely to use indirect messages when sanctioning institutions are present. This effect of sanctions on communication reinforces the direct cartel-deterring effect of sanctions as collusion is more difficult to reach and sustain without an explicit agreement. Indirect messages have no, or even a negative, effect on prices.
Keywords:	cartel, collusion, communication, machine learning, experiment
JEL:	C92 D43 L41
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:pot:cepadp:53&r=

Counterfactual Reconciliation: Incorporating Aggregation Constraints For More Accurate Causal Effect Estimates

By:	Cengiz, Doruk; Tekgüç, Hasan
Abstract:	We extend the scope of the forecast reconciliation literature and use its tools in the context of causal inference. Researchers are interested in both the average treatment effect on the treated and treatment effect heterogeneity. We show that ex post correction of the counterfactual estimates using the aggregation constraints that stem from the hierarchical or grouped structure of the data is likely to yield more accurate estimates. Building on the geometric interpretation of forecast reconciliation, we provide additional insights into the exact factors determining the size of the accuracy improvement due to the reconciliation. We experiment with U.S. GDP and employment data. We find that the reconciled treatment effect estimates tend to be closer to the truth than the original (base) counterfactual estimates even in cases where the aggregation constraints are non-linear. Consistent with our theoretical expectations, improvement is greater when machine learning methods are used.
Keywords:	Forecast Reconciliation; Non-linear Constraints; Causal Machine Learning Methods; Counterfactual Estimation; Difference-in-Differences
JEL:	C53
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:114478&r=

Chaotic Hedging with Iterated Integrals and Neural Networks

By:	Ariel Neufeld; Philipp Schmocker
Abstract:	In this paper, we extend the Wiener-Ito chaos decomposition to the class of diffusion processes, whose drift and diffusion coefficient are of linear growth. By omitting the orthogonality in the chaos expansion, we are able to show that every $p$-integrable functional, for $p \in [1,\infty)$, can be represented as sum of iterated integrals of the underlying process. Using a truncated sum of this expansion and (possibly random) neural networks for the integrands, whose parameters are learned in a machine learning setting, we show that every financial derivative can be approximated arbitrarily well in the $L^p$-sense. Moreover, the hedging strategy of the approximating financial derivative can be computed in closed form.
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2209.10166&r=

Achieving fairness with a simple ridge penalty

By:	Scutari, Marco; Panero, Francesca; Proissl, Manuel
Abstract:	In this paper, we present a general framework for estimating regression models subject to a user-defined level of fairness. We enforce fairness as a model selection step in which we choose the value of a ridge penalty to control the effect of sensitive attributes. We then estimate the parameters of the model conditional on the chosen penalty value. Our proposal is mathematically simple, with a solution that is partly in closed form and produces estimates of the regression coefficients that are intuitive to interpret as a function of the level of fairness. Furthermore, it is easily extended to generalised linear models, kernelised regression models and other penalties, and it can accommodate multiple definitions of fairness. We compare our approach with the regression model from Komiyama et al. (in: Proceedings of machine learning research. 35th international conference on machine learning (ICML), vol 80, pp 2737–2746, 2018), which implements a provably optimal linear regression model and with the fair models from Zafar et al. (J Mach Learn Res 20:1–42, 2019). We evaluate these approaches empirically on six different data sets, and we find that our proposal provides better goodness of fit and better predictive accuracy for the same level of fairness. In addition, we highlight a source of bias in the original experimental evaluation in Komiyama et al. (in: Proceedings of machine learning research. 35th international conference on machine learning (ICML), vol 80, pp 2737–2746, 2018).
Keywords:	fairness; generalised linear models; linear regression; logistic regression; ridge regression; EPSRC and MRC Centre for Doctoral Training in Statistical Science; University of Oxford (Grant EP/L016710/1); EPSRC and MRC Centre for Doctoral Training in Statistical Science; University of Oxford (Grant EP/L016710/1)
JEL:	C1
Date:	2022–09–18
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:116916&r=

Finding Needles in Haystacks: Multiple-Imputation Record Linkage Using Machine Learning

By:	John M. Abowd; Joelle Hillary Abramowitz; Margaret Catherine Levenstein; Kristin McCue; Dhiren Patki; Trivellore Raghunathan; Ann Michelle Rodgers; Matthew D. Shapiro; Nada Wasi; Dawn Zinsser
Abstract:	This paper considers the problem of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across establishments is highly skewed. To address these difficulties, this paper develops a probabilistic record linkage methodology that combines machine learning (ML) with multiple imputation (MI). This ML-MI methodology is applied to link survey respondents in the Health and Retirement Study to their workplaces in the Census Business Register. The linked data reveal new evidence that non-sampling errors in household survey data are correlated with respondents’ workplace characteristics.
Keywords:	administrative data; machine learning; multiple imputation; probabilistic record linkage; survey data
JEL:	C13 C18 C81
Date:	2021–10–01
URL:	http://d.repec.org/n?u=RePEc:fip:fedbwp:94891&r=

Using National Payment System Data to Nowcast Economic Activity in Azerbaijan

By:	Ilkin Huseynov (Central Bank of the Republic of Azerbaijan); Nazrin Ramazanova (Central Bank of the Republic of Azerbaijan); Hikmat Valirzayev (Central Bank of the Republic of Azerbaijan)
Abstract:	This study examines whether payment system data can be useful for tracking economic activity in Azerbaijan. We utilise the transactional payment system data at the sectoral level and employ a Dynamic Factor Model (DFM) and Machine Learning (ML) techniques to nowcast quarterover- quarter and year-over-year nominal gross domestic product. We compared the nowcasting performance of these models against the benchmark model in terms of the out-of-sample root mean square error at three different horizons during the quarter. The results suggest that ML and DFM models have higher predictability than the benchmark model and can significantly lower nowcast errors. Although our payment time series is still too short to obtain statistically robust results, the findings indicate that variables at a higher frequency in such data can be helpful in assessing the current state of the economy and have the potential to provide a faster estimate of the economic activity.
Keywords:	Payment data, Nowcasting, ML, DFM
JEL:	C32 C38 C52 C53 E42
Date:	2022–10–12
URL:	http://d.repec.org/n?u=RePEc:gii:giihei:heidwp23-2022&r=

Small Area Estimation of Monetary Poverty in Mexico Using Satellite Imagery and Machine Learning

By:	Newhouse,David Locke; Merfeld,Joshua David; Ramakrishnan,Anusha Pudugramam; Swartz,Tom; Lahiri,Partha
Abstract:	Estimates of poverty are an important input into policy formulation in developing countries. Theaccurate measurement of poverty rates is therefore a first-order problem for development policy. This paper showsthat combining satellite imagery with household surveys can improve the precision and accuracy of estimated povertyrates in Mexican municipalities, a level at which the survey is not considered representative. It also shows that ahousehold-level model outperforms other common small area estimation methods. However, poverty estimates in 2015derived from geospatial data remain less accurate than 2010 estimates derived from household census data. These resultsindicate that the incorporation of household survey data and widely available satellite imagery can improve on existingpoverty estimates in developing countries when census data are old or when patterns of poverty are changing rapidly,even for small subgroups.
Date:	2022–09–14
URL:	http://d.repec.org/n?u=RePEc:wbk:wbrwps:10175&r=

Using Knowledge Distillation to improve interpretable models in a retail banking context

By:	Maxime Biehler; Mohamed Guermazi; C\'elim Starck
Abstract:	This article sets forth a review of knowledge distillation techniques with a focus on their applicability to retail banking contexts. Predictive machine learning algorithms used in banking environments, especially in risk and control functions, are generally subject to regulatory and technical constraints limiting their complexity. Knowledge distillation gives the opportunity to improve the performances of simple models without burdening their application, using the results of other - generally more complex and better-performing - models. Parsing recent advances in this field, we highlight three main approaches: Soft Targets, Sample Selection and Data Augmentation. We assess the relevance of a subset of such techniques by applying them to open source datasets, before putting them to the test on the use cases of BPCE, a major French institution in the retail banking sector. As such, we demonstrate the potential of knowledge distillation to improve the performance of these models without altering their form and simplicity.
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2209.15496&r=

Automatic Identification and Classification of Share Buybacks and their Effect on Short-, Mid- and Long-Term Returns

By:	Thilo Reintjes
Abstract:	This thesis investigates share buybacks, specifically share buyback announcements. It addresses how to recognize such announcements, the excess return of share buybacks, and the prediction of returns after a share buyback announcement. We illustrate two NLP approaches for the automated detection of share buyback announcements. Even with very small amounts of training data, we can achieve an accuracy of up to 90%. This thesis utilizes these NLP methods to generate a large dataset consisting of 57,155 share buyback announcements. By analyzing this dataset, this thesis aims to show that most companies, which have a share buyback announced are underperforming the MSCI World. A minority of companies, however, significantly outperform the MSCI World. This significant overperformance leads to a net gain when looking at the averages of all companies. If the benchmark index is adjusted for the respective size of the companies, the average overperformance disappears, and the majority underperforms even greater. However, it was found that companies that announce a share buyback with a volume of at least 1% of their market cap, deliver, on average, a significant overperformance, even when using an adjusted benchmark. It was also found that companies that announce share buybacks in times of crisis emerge better than the overall market. Additionally, the generated dataset was used to train 72 machine learning models. Through this, it was able to find many strategies that could achieve an accuracy of up to 77% and generate great excess returns. A variety of performance indicators could be improved across six different time frames and a significant overperformance was identified. This was achieved by training several models for different tasks and time frames as well as combining these different models, generating significant improvement by fusing weak learners, in order to create one strong learner.
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2209.12863&r=

The Community Explorer: Bringing Populations' Diversity into Policy Discussions, One County at a Time

By:	Lopez, Claude (Milken Institute); Roh, Hyeongyul (Milken Institute); Switek, Maggie (Milken Institute)
Abstract:	The Community Explorer provides new insights and data on the characteristics and diversity of the US population. Using machine learning methods, it synthesizes the information of 751 variables across 3,142 counties from the US Census Bureau's American Community Survey into 17 communities. Each one of these communities has a distinctive profile that combines demographic, socio-economic, and cultural behavioral determinants while not being geographically bounded. We encourage policy makers and researchers to make use of the results of our analysis. The Community Explorer dashboard provides the location of these profiles, allowing for targeted deployment of community interventions and, more broadly, increasing the understanding of socioeconomic gaps withing the US.
Keywords:	diversity, communities, development, economic well-being
JEL:	D31 J08 J10 R10
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:iza:izapps:pp190&r=

AI-Assisted Discovery of Quantitative and Formal Models in Social Science

By:	Julia Balla; Sihao Huang; Owen Dugan; Rumen Dangovski; Marin Soljacic
Abstract:	In social science, formal and quantitative models, such as ones describing economic growth and collective action, are used to formulate mechanistic explanations, provide predictions, and uncover questions about observed phenomena. Here, we demonstrate the use of a machine learning system to aid the discovery of symbolic models that capture nonlinear and dynamical relationships in social science datasets. By extending neuro-symbolic methods to find compact functions and differential equations in noisy and longitudinal data, we show that our system can be used to discover interpretable models from real-world data in economics and sociology. Augmenting existing workflows with symbolic regression can help uncover novel relationships and explore counterfactual models during the scientific process. We propose that this AI-assisted framework can bridge parametric and non-parametric models commonly employed in social science research by systematically exploring the space of nonlinear models and enabling fine-grained control over expressivity and interpretability.
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2210.00563&r=

The impact of artificial intelligence on the nature and quality of jobs

By:	Laura Nurski; Mia Hoffmann
Abstract:	Policymakers should strengthen the role of social partners in the adoption of AI technology to protect workers’ bargaining power.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:bre:wpaper:node_8191&r=

Bayesian Modeling of Time-varying Parameters Using Regression Trees

By:	Niko Hauzenberger; Florian Huber; Gary Koop; James Mitchell
Abstract:	In light of widespread evidence of parameter instability in macroeconomic models, many time-varying parameter (TVP) models have been proposed. This paper proposes a nonparametric TVP-VAR model using Bayesian Additive Regression Trees (BART). The novelty of this model arises from the law of motion driving the parameters being treated nonparametrically. This leads to great flexibility in the nature and extent of parameter change, both in the conditional mean and in the conditional variance. In contrast to other nonparametric and machine learning methods that are black box, inference using our model is straightforward because, in treating the parameters rather than the variables nonparametrically, the model remains conditionally linear in the mean. Parsimony is achieved through adopting nonparametric factor structures and use of shrinkage priors. In an application to US macroeconomic data, we illustrate the use of our model in tracking both the evolving nature of the Phillips curve and how the effects of business cycle shocks on inflationary measures vary nonlinearly with movements in uncertainty.
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2209.11970&r=

With big data come big problems: pitfalls in measuring basis risk for crop index insurance

By:	Matthieu Stigler; Apratim Dey; Andrew Hobbs; David Lobell
Abstract:	New satellite sensors will soon make it possible to estimate field-level crop yields, showing a great potential for agricultural index insurance. This paper identifies an important threat to better insurance from these new technologies: data with many fields and few years can yield downward biased estimates of basis risk, a fundamental metric in index insurance. To demonstrate this bias, we use state-of-the-art satellite-based data on agricultural yields in the US and in Kenya to estimate and simulate basis risk. We find a substantive downward bias leading to a systematic overestimation of insurance quality. In this paper, we argue that big data in crop insurance can lead to a new situation where the number of variables $N$ largely exceeds the number of observations $T$. In such a situation where $T\ll N$, conventional asymptotics break, as evidenced by the large bias we find in simulations. We show how the high-dimension, low-sample-size (HDLSS) asymptotics, together with the spiked covariance model, provide a more relevant framework for the $T\ll N$ case encountered in index insurance. More precisely, we derive the asymptotic distribution of the relative share of the first eigenvalue of the covariance matrix, a measure of systematic risk in index insurance. Our formula accurately approximates the empirical bias simulated from the satellite data, and provides a useful tool for practitioners to quantify bias in insurance quality.
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2209.14611&r=

What news can really tell us? Evidence from a news-based sentiment index for financial markets analysis

By:	Anna Marszal (Narodowy Bank Polski)
Abstract:	This study presents a state-of-the-art approach in measuring financial market sentiment, namely, extracting it from news headlines. The sentiment index is constructed by analysing over 124,000 news items for the 2020-2021 period using natural language processing methods. Its informational power is validated by the strong correlation with the VIX index as well as by the occurrence of common periods of higher volatility of both measures. These findings reinforce the treatment of the news-based index as a true sentiment indicator and contribute to its usage independently of any financial instruments. Additionally, a direction of significant correlation coefficients between the sentiment indicator and selected financial assets is consistent with the natural logic of capital flows in financial markets. At the same time, the developed tool allows to identify not only market sentiment, but also the main factors contributing to its direction and time periods in which they are of most significance. It is necessary to understand that the analysed period is specific as it coincides with the outbreak and development of the COVID-19 pandemic. This was reflected in the results that highlight coronavirus as the dominant topic throughout the dataset.
Keywords:	market sentiment, natural language processing, lexicon-based models, VADER, risk aversion, risk appetite, VIX index, news, volatility
JEL:	C6 C8 G4
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:nbp:nbpmis:349&r=

How Artificial Intelligence Can Help Advance Post-Secondary Learning in Emerging Markets

By:	Baloko Makala; Maud Schmitt; Alejandro Caballero
Keywords:	Education - Educational Technology and Distance Education Social Protections and Labor - Employment and Unemployment Social Protections and Labor - Skills Development and Labor Force Training Social Protections and Labor - Vocational & Technical Education Science and Technology Development - Technology Innovation
Date:	2021–01
URL:	http://d.repec.org/n?u=RePEc:wbk:wboper:35054&r=

Natural Disasters and Economic Dynamics : Evidence from the Kerala Floods

By:	Beyer,Robert Carl Michael; Narayanan,Abhinav; Thakur,Gogol Mitra
Abstract:	Exceptionally high rainfall in the Indian state of Kerala caused major flooding in 2018. Thispaper estimates the short-run causal impact of the disaster on the economy, using a difference-in-difference approach.Monthly nighttime light intensity, a proxy for aggregate economic activity, suggests that activity declined for threemonths during the disaster but boomed subsequently. Automated teller machine transactions, a proxy for consumerdemand, declined and credit disbursal increased, with households borrowing more for housing and less forconsumption. In line with other results, both household income and expenditure declined during the floods. Despite astrong wage recovery after the floods, spending remained lower relative to the unaffected districts. The paper arguesthat increased labor demand due to reconstruction efforts increased wages after the floods and provides corroboratingevidence: (i) rural labor markets tightened, (ii) poorer households benefited more, and (iii) wages increased mostwhere government relief was strongest. The findings confirm the presence of interesting economic dynamics during andright after natural disasters that remain in the shadow when analyzed with annual data.
Date:	2022–06–13
URL:	http://d.repec.org/n?u=RePEc:wbk:wbrwps:10084&r=

Consumer Privacy and the Value of Consumer Data

By:	Mehmet Canayaz (Pennsylvania State University - Smeal College of Business); Ilja Kantorovitch (EPFL); Roxana Mihet (Swiss Finance Institute - HEC Lausanne)
Abstract:	We analyze how the adoption of the California Consumer Privacy Act (CCPA), which limits consumer personal data acquisition, processing, and trade, affects voice-AI firms. To derive theoretical predictions, we use a general equilibrium model where firms produce intermediate goods using labor and data in the form of intangible capital, which can be traded subject to a cost representing regulatory and technical challenges. Firms differ in their ability to collect data internally, driven by the size of their customer base and reliance on data. When the introduction of the CCPA increases the cost of trading data, sophisticated firms with small customer bases are hit the hardest. Such firms have a low ability to collect in-house data and high reliance on data and cannot adequately substitute the previously externally purchased data. We utilize novel and hand-collected data on voice-AI firms to provide empirical support for our theoretical predictions. We empirically show that sophisticated firms with voice-AI products experience lower returns on assets than their industry peers after the introduction of the CCPA, and firms with weak customer bases experience the strongest distortionary effects.
Keywords:	Privacy, Voice Data, In-House Data, Big Data, Intangible Capital, Product Sentiment
JEL:	D80 G30 G31 G38 L20 O30
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:chf:rpseri:rp2268&r=

A gender perspective on artificial intelligence and jobs- The vicious cycle of digital inequality

By:	Estrella Gomez-Herrera; Sabine Köszegi
Abstract:	How do gender stereotypes and gendered work segregation, and digitalisation and automation, result in a vicious cycle of digital gender inequality?
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:bre:wpaper:node_8264&r=

To Be or Not to Be: The Entrepreneur in Neo-Schumpeterian Growth Theory

By:	Henrekson, Magnus (Research Institute of Industrial Economics (IFN)); Johansson, Dan (Örebro University School of Business); Karlsson, Johan (Centre for Family Entrepreneurship and Ownership (CeFEO))
Abstract:	Based on a review of 700+ peer-reviewed articles since 1990, identified using text mining methodology and supervised machine learning, we analyze how neo-Schumpeterian growth theorists relate to the entrepreneur-centered view of Schumpeter (1934) and the entrepreneurless framework of Schumpeter (1942). The literature leans heavily towards Schumpeter (1942); innovation returns are modeled as following an ex ante known probability distribution. By assuming that innovation outcomes are (probabilistically) deterministic, the entrepreneur becomes redundant. Abstracting from genuine uncertainty implies that central issues regarding the economic function of the entrepreneur are overlooked, such as the roles of proprietary resources, skills, and profits.
Keywords:	Creative destruction; Economic growth; Entrepreneur; Innovation; Judgment; Knightian uncertainty
JEL:	B40 O10 O30
Date:	2022–10–01
URL:	http://d.repec.org/n?u=RePEc:hhs:iuiwop:1441&r=

Detecting asset price bubbles using deep learning

By:	Francesca Biagini; Lukas Gonon; Andrea Mazzon; Thilo Meyer-Brandis
Abstract:	In this paper we employ deep learning techniques to detect financial asset bubbles by using observed call option prices. The proposed algorithm is widely applicable and model-independent. We test the accuracy of our methodology in numerical experiments within a wide range of models and apply it to market data of tech stocks in order to assess if asset price bubbles are present. In addition, we provide a theoretical foundation of our approach in the framework of local volatility models. To this purpose, we give a new necessary and sufficient condition for a process with time-dependent local volatility function to be a strict local martingale.
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2210.01726&r=

Optimal consumption-investment choices under wealth-driven risk aversion

By:	Ruoxin Xiao
Abstract:	CRRA utility where the risk aversion coefficient is a constant is commonly seen in various economics models. But wealth-driven risk aversion rarely shows up in investor's investment problems. This paper mainly focus on numerical solutions to the optimal consumption-investment choices under wealth-driven aversion done by neural network. A jump-diffusion model is used to simulate the artificial data that is needed for the neural network training. The WDRA Model is set up for describing the investment problem and there are two parameters that require to be optimized, which are the investment rate of the wealth on the risky assets and the consumption during the investment time horizon. Under this model, neural network LSTM with one objective function is implemented and shows promising results.
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2210.00950&r=

Sentiment Analysis of ESG disclosures on Stock Market

By:	Sudeep R. Bapat; Saumya Kothari; Rushil Bansal
Abstract:	In this paper, we look at the impact of Environment, Social and Governance related news articles and social media data on the stock market performance. We pick four stocks of companies which are widely known in their domain to understand the complete effect of ESG as the newly opted investment style remains restricted to only the stocks with widespread information. We summarise live data of both twitter tweets and newspaper articles and create a sentiment index using a dictionary technique based on online information for the month of July, 2022. We look at the stock price data for all the four companies and calculate the percentage change in each of them. We also compare the overall sentiment of the company to its percentage change over a specific historical period.
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2210.00731&r=

Feature-Rich Long-term Bitcoin Trading Assistant

By:	Jatin Nainani (K. J. Somaiya College of Engineering); Nirman Taterh (K. J. Somaiya College of Engineering); Md Ausaf Rashid (K. J. Somaiya College of Engineering); Ankit Khivasara (K. J. Somaiya College of Engineering)
Abstract:	For a long time predicting, studying and analyzing financial indices has been of major interest for the financial community. Recently, there has been a growing interest in the Deep-Learning community to make use of reinforcement learning which has surpassed many of the previous benchmarks in a lot of fields. Our method provides a feature rich environment for the reinforcement learning agent to work on. The aim is to provide long term profits to the user so, we took into consideration the most reliable technical indicators. We have also developed a custom indicator which would provide better insights of the Bitcoin market to the user. The Bitcoin market follows the emotions and sentiments of the traders, so another element of our trading environment is the overall daily Sentiment Score of the market on Twitter. The agent is tested for a period of 685 days which also included the volatile period of Covid-19. It has been capable of providing reliable recommendations which give an average profit of about 69%. Finally, the agent is also capable of suggesting the optimal actions to the user through a website. Users on the website can also access the visualizations of the indicators to help fortify their decisions.
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2209.12664&r=

Qualitative Analysis at Scale :An Application to Aspirations in Cox's Bazaar, Bangladesh

By:	Ashwin,Julian; Rao,Vijayendra; Biradavolu,Monica Rao; Haque,Arshia; Khan,Afsana Iffat; Krishnan,Nandini; Nagy,Peer Sebastian
Abstract:	Qualitative work has found limited use in economics largely because it is difficult to analyzeat scale due to the careful reading of text and human coding it requires. This paper presents a framework with which toextend a small set of hand-coding to a much larger set of documents using natural language processing and thus toanalyze qualitative data at scale. The paper shows how to assess the robustness and reliability of this approach anddemonstrates that it can allow the identification of meaningful patterns in the data that the original hand-codedsample is too small to identify. The approach is applied to data collected among Rohingya refugees and their Bangladeshihosts in Cox’s Bazaar, Bangladesh, to build on work in anthropology and philosophy that distinguishes betweenambition–specific goals, aspiration–transforming values, and navigational capacity, which is the ability to achieveambitions and aspirations. The findings demonstrate that these distinctions can have important policy implications.
Date:	2022–05–16
URL:	http://d.repec.org/n?u=RePEc:wbk:wbrwps:10046&r=

A Missed Opportunity to Further Build Trust in AI: A Landscape Analysis of OECD.AI

By:	Susan Ariel Aaronson (George Washington University)
Abstract:	OECD.AI is the world's best source for information on public policies dedicated to AI, trustworthy AI and international efforts to advance cooperation in AI. However, the web site is also a missed opportunity to ascertain best practice and to build trust in AI not just for citizens of reporting nations but for the world. The author came to that conclusion after examining the documentation that nations placed online at OECD.AI. website. She utilized a landscape analysis to group these policies reported to the OECD by country and type, whether the initiative was evaluated or reported on, and whether it provided new insights about best practice trust, in AI, and/or trustworthy AI. Some 61 countries and the EU reported to the OECD on their AI initiatives (for a total of 62). Although the members of the OECD are generally high and high-middle income nations, the 62 governments providing information to OECD.AI represent a mix of AI capacity, income level, economic system, and location. Some 814 initiatives placed on the website as of August 2022, but 4 were duplicative and some 30 were blank, leaving 780. Of these, countries claimed that 48 of these initiatives were evaluated. However, we actually found only four evaluations (and one in progress) with a clear evaluative methodology. Two initiatives were labeled evaluations but did not include a methodology. Many of the other 42 were reports rather than evaluations. In addition, only a small percentage (41 initiatives or 5% of all initiatives) were designed to build trust in AI or to create trustworthy AI systems. National policymakers and not the OECD Secretariat decide what each of the 62 governments choose to put on the site. These officials don't list every initiative their country implements to foster AI. But their choices reveal their priorities. Most of the documentation focuses on what they are doing to build domestic AI capacity and a supportive governance context for AI. We also found relatively few efforts to build international cooperation on AI, or to strengthen other countries' AI capacity. Taken in sum, these efforts are important but reveal little effort to build international trust in AI.
Keywords:	AI (artificial intelligence) trust, trustworthy, policies, innovation
JEL:	A1
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:gwi:wpaper:2022-10&r=

The ECB press conference: a textual analysis

By:	Pavelkova, Andrea
Abstract:	The aim of central bank communication is to provide information on monetary policy and the economic outlook in a timely manner to the public. While research on central bank communication and specifically the European Central Bank’s press conference has shown that it has the potential to move markets, in-depth textual analysis of key communication tools creates room for further analysis. Focusing on the press conferences of the ECB, this paper employs structural topic modelling (STM) and finds that topics within the introductory statement and the Q&A are significantly different, with a nearly equal split of topics unique to both parts. The split of topics suggests that the Q&A does not only provide clarification of what has been said in the introductory statement, but also allows journalists to enquire about the discussion within the Governing Council as well as the ECB’s stance on broader economic issues. JEL Classification: E50, E52, E58
Keywords:	central bank communication, ECB press conference, natural language processing, structural topic model, text analysis
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:ecb:ecbwps:20222742&r=

The Economic Impact of Covid-19 and Associated Lockdown Measures in China

By:	Charpe, Matthieu
Abstract:	This paper assesses at the local level the economic impact of Covid-19 and associated lockdown measures in China using high frequency nighttime lights data. Building a model of monthly light intensity, lights dropped by a factor ranging between 13 and 18 percent in early 2020. This corresponds to a decline in economic activity of between 9 and 12 percent and a decline in employment of between 2.6 and 3.6 percent. At the local level, the majority of administrative entities followed a v-shaped recovery, while a smaller number followed a u-shaped recovery or a double dip. At province level, light intensity is explained by the number of cases and a lockdown measure. In particular, the increase in stringency index from 0 in December 2019 to 78 in April 2002 explains a decline in lights by 7.4 percent.
Keywords:	Covid-19, lockdown, China, nighttime lights, big data
JEL:	O11 O18 R11 R12
Date:	2022–10–03
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:114861&r=

Wage expectation, information and the decision to become a nurse

By:	Philipp Kugler;
Abstract:	In light of skilled-labor shortage in nursing, the effect of a change in the wage of nurses on their labor supply is intensely discussed in recent literature. However, most results show a wage elasticity close to zero. Using extensive data of former German 9th graders, I analyze the role of the expected wage as an incentive to become a nurse. To estimate a causal effect, I select controls and their functional form using post-double-selection, which is a data driven selection method based on regression shrinkage via the lasso. Contrary to common perceptions, the expected wage plays a positive and statistically significant role in the decision to become a nurse. Further, understating a nurse's wage decreases the probability of becoming one. Concerning omitted variable bias, I assess the sensitivity of the results using a novel approach. It evaluates the minimum strength that unobserved confounders would need to change the conclusion. The sensitivity analysis shows that potential unobserved confounders would have to be very strong to overrule the conclusions. The empirical results lead to two important policy implications. First, increasing the wage may help to overcome the shortage observed in many countries. Second, providing information on the (relative) wage may be a successful strategy to attract more individuals into this profession.
Keywords:	health professional, expected wage, wage information, machine learning, sensitivity analysis
JEL:	I11 I21 J24 J31
Date:	2022–01–19
URL:	http://d.repec.org/n?u=RePEc:iaw:iawdip:135&r=

Wicked Problems Might Inspire Greater Data Sharing

By:	Susan Ariel Aaronson (George Washington University)
Abstract:	Global public goods are goods and services with benefits and costs that potentially extend to all countries, people, and generations. Global data sharing can also help solve what scholars call wicked problems-problems so complex that they require innovative, cost effective and global mitigating strategies. Wicked problems are problems that no one knows how to solve without creating further problems. Hence, policymakers must find ways to encourage greater data sharing among entities that hold large troves of various types of data, while protecting that data from theft, manipulation etc. Many factors impede global data sharing for public good purposes; this analysis focuses on two. First, policymakers generally don't think about data as a global public good; they view data as a commercial asset that they should nurture and control. While they may understand that data can serve the public interest, they are more concerned with using data to serve their country's economic interest. Secondly, many leaders of civil society and business see the data they have collected as proprietary data. So far many leaders of private entities with troves of data are not convinced that their organization will benefit from such sharing. At the same time, companies voluntarily share some data for social good purposes. However, data cannot meet its public good purpose if data is not shared among societal entities. Moreover, if data as a sovereign asset, policymakers are unlikely to encourage data sharing across borders oriented towards addressing shared problems. Consequently, society will be less able to use data as both a commercial asset and as a resource to enhance human welfare. This paper discusses why the world has made so little progress encouraging a vision of data as a global public good. As UNCTAD noted, data generated in one country can also provide social value in other countries, which would call for sharing of data at the international level through a set of shared and accountable rules (UNCTAD: 2021). Moreover, the world is drowning in data, yet much of that data remains hidden and underutilized. But guilt is a great motivator. The author suggests a new agency, the Wicked Problems Agency, to act as a counterweight to that opacity and to create a demand and a market for data sharing in the public good.
Keywords:	data, AI, public good, wicked problems, data-sharing
JEL:	C45
Date:	2022–09
URL:	http://d.repec.org/n?u=RePEc:gwi:wpaper:2022-09&r=

How Well Can Real-Time Indicators Track the Economic Impacts of a Crisis Like COVID-19 ?

By:	Ten,Gi Khan; Merfeld,Joshua David; Hirfrfot,Kibrom Tafere; Newhouse,David Locke; Pape,Utz Johann
Abstract:	This paper presents evidence on the extent to which a set of real-time indicators trackedchanges in gross domestic product across 142 countries in 2020. The real-time indicators include Google mobility,Google search trends, food price information, nitrogen dioxide, and nighttime lights. Google mobility and staplefood prices both declined sharply in March and April, followed by a rapid recovery that returned to baselinelevels by July and August. Mobility and staple food prices fell less in low-income countries. Nitrogen dioxide levelsshow a similar pattern, with a steep fall and rapid recovery in high-income and upper-middle-income countries but not inlow-income and lower-middle-income countries. In April and May, Google search terms reflecting economic distress andreligiosity spiked in some regions but not others. Data on nighttime lights show no clear drop in March outside EastAsia. Linear models selected using the Least Absolute Shrinkage and Selection Operator explain about a third ofthe variation in annual gross domestic product growth rates across 72 countries. In a smaller subset of higher incomecountries, real-time indicators explain about 40 percent of the variation in quarterly gross domestic product growth.Overall, mobility and food price data, as well as pollution data in more developed countries, appeared to be best atcapturing the widespread economic disruption experienced during the summer of 2020. The results indicate that thesereal-time indicators can track a substantial percentage of both annual and quarterly changes in gross domestic product.
Date:	2022–06–13
URL:	http://d.repec.org/n?u=RePEc:wbk:wbrwps:10080&r=

PayTech and the D(ata) N(etwork) A(ctivities) of BigTech Platforms

By:	Jonathan Chiu (Bank of Canada); Thorsten V. Koeppl
Abstract:	Why do BigTech platforms introduce payment services? Digital platforms often run business models where activities on the platform generate data that can be monetized off the platform. There is a trade-o between the value of such data and the privacy concerns of users, since platforms need to compensate users for their privacy loss by subsidizing activities. The nature of complementarities between data and payments determines the introduction of payments. When data help to provide better payments (data-driven payments), platforms have too little incentives to adopt. When payments generate additional data (payments-driven data), platforms may adopt payments inefficiently.
Keywords:	BigTech, Payments, Privacy, Digital Platform, Data
JEL:	D8 E42 L1
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:qed:wpaper:1490&r=

Silence is not Golden Anymore? Social media activity and stock market valuation in Europe

By:	Christophe J. GODLEWSKI (LaRGE Research Center, Université de Strasbourg); Katarzyna BYRKA-KITA (Institute of Economics and Finance, Uniwersytet Szczecinski); Renata GOLA (Institute of Economics and Finance, Uniwersytet Szczecinski); Jacek CYPRYJANSKI (Institute of Economics and Finance, Uniwersytet Szczecinski)
Abstract:	We investigate the link between social media activity and market valuation of listed European companies over the period January 2018 – June 2020. Using a large novel dataset from 39 European capital markets, we first provide a comprehensive “big picture” of social media activity of European listed companies, using data from all European capital markets. Second, we show that greater Twitter activity is associated with increased shareholders’ returns. Third, we find that portfolios with a larger number of tweets posted by a company exhibit larger market risks. Our findings support the idea that investors should consider social media activity when implementing investment strategies.
Keywords:	stock markets, valuation, CAPM, Twitter, social media, investor attention, information asymmetry, disclosure.
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:lar:wpaper:2022-04&r=

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.