nep-big 2021-08-23 papers

on Big Data

Issue of 2021‒08‒23
23 papers chosen by
Tom Coupé
University of Canterbury

Machine Learning and Mobile Phone Data Can Improve the Targeting of Humanitarian Assistance By Emily Aiken; Suzanne Bellue; Dean Karlan; Christopher R. Udry; Joshua Blumenstock
Enrichment of the Banque de France’s monthly business survey: lessons from textual analysis of business leaders’ comments By Gerardin Mathilde,; Ranvier Martial.
What We Teach About Race and Gender: Representation in Images and Text of Children’s Books By Anjali Adukia; Alex Eble; Emileigh Harrison; Hakizumwami Birali Runesha; Teodora Szasz
InfoGram and Admissible Machine Learning By Subhadeep Mukhopadhyay
Two-Stage Sector Rotation Methodology Using Machine Learning and Deep Learning Techniques By Tugce Karatas; Ali Hirsa
Trade When Opportunity Comes: Price Movement Forecasting via Locality-Aware Attention and Adaptive Refined Labeling By Liang Zeng; Lei Wang; Hui Niu; Jian Li; Ruchen Zhang; Zhonghao Dai; Dewei Zhu; Ling Wang
Adoption of Machine Learning Systems for Medical Diagnostics in Clinics: A Qualitative Interview Study By Pumplun, Luisa; Fecho, Mariska; Wahl, Nihal; Peters, Felix; Buxmann, Peter
Combining Machine Learning Classifiers for Stock Trading with Effective Feature Extraction By A. K. M. Amanat Ullah; Fahim Imtiaz; Miftah Uddin Md Ihsan; Md. Golam Rabiul Alam; Mahbub Majumdar
Contracting, pricing, and data collection under the AI flywheel effect By Huseyin Gurkan; Francis de Véricourt
Cognitive Architectures for Artificial Intelligence Ethics By Steve J. Bickley; Benno Torgler
Persuading Investors: A Video-Based Study By Allen Hu; Song Ma
Building a Foundation for Data-Driven, Interpretable, and Robust Policy Design using the AI Economist By Alexander Trott; Sunil Srinivasa; Douwe van der Wal; Sebastien Haneuse; Stephan Zheng
Analysis of Data Mining Process for Improvement of Production Quality in Industrial Sector By Hamza Saad; Nagendra Nagarur; Abdulrahman Shamsan
Riding the Tide Toward the Digital Era: The Imminent Alliance of AI and Mental Health in the Philippines By Kee, Shaira Limson; Garganera, John Patrick; Maravilla, Nicholle Mae Amor; Garganera, Wilbert; Fermin, Jamie Ledesma; AlDahoul, Nouar; Karim, Hezerul Abdul; Tan, Myles Joshua
Adoption of ICTs in Agri-Food Logistics: Potential and Limitations for Supply Chain Sustainability By Cédric Vernier; Denis Loeillet; Rallou Thomopoulos; Catherine Macombe
Stochastic loss reserving with mixture density neural networks By Muhammed Taher Al-Mudafer; Benjamin Avanzi; Greg Taylor; Bernard Wong
Third-Degree Price Discrimination in the Age of Big Data By Charlson, G.
Learning from Zero: How to Make Consumption-Saving Decisions in a Stochastic Environment with an AI Algorithm By Rui (Aruhan) Shi
Gay Politics Goes Mainstream: Democrats, Republicans, and Same-Sex Relationships By Raquel Fernández; Sahar Parsa
The making of data commodities: data analytics as an embedded process By Aaltonen, Aleksi Ville; Alaimo, Cristina; Kallinikos, Jannis
Tilted Platforms: Rental Housing Technology and the Rise of Urban Big Data Oligopolies By Geoff Boeing; Max Besbris; David Wachsmuth; Jake Wegmann
Improving Inference from Simple Instruments through Compliance Estimation By Stephen Coussens; Jann Spiess
Seeing the Forest for the Trees: using hLDA models to evaluate communication in Banco Central do Brasil By Angelo M. Fasolo; Flávia M. Graminho; Saulo B. Bastos

Machine Learning and Mobile Phone Data Can Improve the Targeting of Humanitarian Assistance

By:	Emily Aiken; Suzanne Bellue; Dean Karlan; Christopher R. Udry; Joshua Blumenstock
Abstract:	The COVID-19 pandemic has devastated many low- and middle-income countries (LMICs), causing widespread food insecurity and a sharp decline in living standards. In response to this crisis, governments and humanitarian organizations worldwide have mobilized targeted social assistance programs. Targeting is a central challenge in the administration of these programs: given available data, how does one rapidly identify the individuals and families with the greatest need? This challenge is particularly acute in the large number of LMICs that lack recent and comprehensive data on household income and wealth. Here we show that non-traditional “big” data from satellites and mobile phone networks can improve the targeting of anti-poverty programs. Our approach uses traditional survey-based measures of consumption and wealth to train machine learning algorithms that recognize patterns of poverty in non-traditional data; the trained algorithms are then used to prioritize aid to the poorest regions and mobile subscribers. We evaluate this approach by studying Novissi, Togo’s flagship emergency cash transfer program, which used these algorithms to determine eligibility for a rural assistance program that disbursed millions of dollars in COVID-19 relief aid. Our analysis compares outcomes – including exclusion errors, total social welfare, and measures of fairness – under different targeting regimes. Relative to the geographic targeting options considered by the Government of Togo at the time, the machine learning approach reduces errors of exclusion by 4-21%. Relative to methods that require a comprehensive social registry (a hypothetical exercise; no such registry exists in Togo), the machine learning approach increases exclusion errors by 9-35%. These results highlight the potential for new data sources to contribute to humanitarian response efforts, particularly in crisis settings when traditional data are missing or out of date.
JEL:	C55 I32 I38 O12 O38
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:29070&r=

Enrichment of the Banque de France’s monthly business survey: lessons from textual analysis of business leaders’ comments

By:	Gerardin Mathilde,; Ranvier Martial.
Abstract:	In the context of the Banque de France’s monthly business survey, this document presents the main findings of the textual analysis of business leaders’ comments. First, the richness of these data is illustrated via an elementary sentiment index and the identification of the main social movements since 2009 by means of keywords. Then, the article presents two statistical applications whose reproducibility is discussed. The first one, applied to the 2018 yellow vests and the 2019 strikes, aims to estimate the impact on GDP of an event whose effect is unequivocal. The second, backed by the study of Brexit, aims to characterize, using a supervised learning model and word vectors, the effects of a complex event with multiple impacts.
Keywords:	Textual Analysis, Business Survey, Sentiment Index, Keywords, Word Vectors,Brexit, Social Movements.
JEL:	C21 C45 C52 E32 D22
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:bfr:banfra:821&r=

What We Teach About Race and Gender: Representation in Images and Text of Children’s Books

By:	Anjali Adukia; Alex Eble; Emileigh Harrison; Hakizumwami Birali Runesha; Teodora Szasz
Abstract:	Books shape how children learn about society and social norms, in part through the representation of different characters. To better understand the messages children encounter in books, we introduce new artificial intelligence methods for systematically converting images into data. We apply these image tools, along with established text analysis methods, to measure the representation of race, gender, and age in children’s books commonly found in US schools and homes over the last century. We find that more characters with darker skin color appear over time, but "mainstream" award-winning books, which are twice as likely to be checked out from libraries, persistently depict more lighter-skinned characters even after conditioning on perceived race. Across all books, children are depicted with lighter skin than adults. Over time, females are increasingly present but are more represented in images than in text, suggesting greater symbolic inclusion in pictures than substantive inclusion in stories. Relative to their growing share of the US population, Black and Latinx people are underrepresented in the mainstream collection; males, particularly White males, are persistently overrepresented. Our data provide a view into the "black box" of education through children’s books in US schools and homes, highlighting what has changed and what has endured.
JEL:	I21 I24 J15 J16 Z1
Date:	2021–08
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:29123&r=

InfoGram and Admissible Machine Learning

By:	Subhadeep Mukhopadhyay
Abstract:	We have entered a new era of machine learning (ML), where the most accurate algorithm with superior predictive power may not even be deployable, unless it is admissible under the regulatory constraints. This has led to great interest in developing fair, transparent and trustworthy ML methods. The purpose of this article is to introduce a new information-theoretic learning framework (admissible machine learning) and algorithmic risk-management tools (InfoGram, L-features, ALFA-testing) that can guide an analyst to redesign off-the-shelf ML methods to be regulatory compliant, while maintaining good prediction accuracy. We have illustrated our approach using several real-data examples from financial sectors, biomedical research, marketing campaigns, and the criminal justice system.
Date:	2021–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2108.07380&r=

Two-Stage Sector Rotation Methodology Using Machine Learning and Deep Learning Techniques

By:	Tugce Karatas; Ali Hirsa
Abstract:	Market indicators such as CPI and GDP have been widely used over decades to identify the stage of business cycles and also investment attractiveness of sectors given market conditions. In this paper, we propose a two-stage methodology that consists of predicting ETF prices for each sector using market indicators and ranking sectors based on their predicted rate of returns. We initially start with choosing sector specific macroeconomic indicators and implement Recursive Feature Elimination algorithm to select the most important features for each sector. Using our prediction tool, we implement different Recurrent Neural Networks models to predict the future ETF prices for each sector. We then rank the sectors based on their predicted rate of returns. We select the best performing model by evaluating the annualized return, annualized Sharpe ratio, and Calmar ratio of the portfolios that includes the top four ranked sectors chosen by the model. We also test the robustness of the model performance with respect to lookback windows and look ahead windows. Our empirical results show that our methodology beats the equally weighted portfolio performance even in the long run. We also find that Echo State Networks exhibits an outstanding performance compared to other models yet it is faster to implement compared to other RNN models.
Date:	2021–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2108.02838&r=

Trade When Opportunity Comes: Price Movement Forecasting via Locality-Aware Attention and Adaptive Refined Labeling

By:	Liang Zeng; Lei Wang; Hui Niu; Jian Li; Ruchen Zhang; Zhonghao Dai; Dewei Zhu; Ling Wang
Abstract:	Price movement forecasting aims at predicting the future trends of financial assets based on the current market conditions and other relevant information. Recently, machine learning(ML) methods have become increasingly popular and achieved promising results for price movement forecasting in both academia and industry. Most existing ML solutions formulate the forecasting problem as a classification(to predict the direction) or a regression(to predict the return) problem in the entire set of training data. However, due to the extremely low signal-to-noise ratio and stochastic nature of financial data, good trading opportunities are extremely scarce. As a result, without careful selection of potentially profitable samples, such ML methods are prone to capture the patterns of noises instead of real signals. To address the above issues, we propose a novel framework-LARA(Locality-Aware Attention and Adaptive Refined Labeling), which contains the following three components: 1)Locality-aware attention automatically extracts the potentially profitable samples by attending to their label information in order to construct a more accurate classifier on these selected samples. 2)Adaptive refined labeling further iteratively refines the labels, alleviating the noise of samples. 3)Equipped with metric learning techniques, Locality-aware attention enjoys task-specific distance metrics and distributes attention on potentially profitable samples in a more effective way. To validate our method, we conduct comprehensive experiments on three real-world financial markets: ETFs, the China's A-share stock market, and the cryptocurrency market. LARA achieves superior performance compared with the time-series analysis methods and a set of machine learning based competitors on the Qlib platform. Extensive ablation studies and experiments demonstrate that LARA indeed captures more reliable trading opportunities.
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2107.11972&r=

Adoption of Machine Learning Systems for Medical Diagnostics in Clinics: A Qualitative Interview Study

By: Pumplun, Luisa; Fecho, Mariska; Wahl, Nihal; Peters, Felix; Buxmann, Peter

Date: 2021

URL: http://d.repec.org/n?u=RePEc:dar:wpaper:127993&r=

Combining Machine Learning Classifiers for Stock Trading with Effective Feature Extraction

By:	A. K. M. Amanat Ullah; Fahim Imtiaz; Miftah Uddin Md Ihsan; Md. Golam Rabiul Alam; Mahbub Majumdar
Abstract:	The unpredictability and volatility of the stock market render it challenging to make a substantial profit using any generalized scheme. This paper intends to discuss our machine learning model, which can make a significant amount of profit in the US stock market by performing live trading in the Quantopian platform while using resources free of cost. Our top approach was to use ensemble learning with four classifiers: Gaussian Naive Bayes, Decision Tree, Logistic Regression with L1 regularization and Stochastic Gradient Descent, to decide whether to go long or short on a particular stock. Our best model performed daily trade between July 2011 and January 2019, generating 54.35% profit. Finally, our work showcased that mixtures of weighted classifiers perform better than any individual predictor about making trading decisions in the stock market.
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2107.13148&r=

Contracting, pricing, and data collection under the AI flywheel effect

By:	Huseyin Gurkan (ESMT European School of Management and Technology); Francis de Véricourt (ESMT European School of Management and Technology)
Abstract:	This paper explores how firms that lack expertise in machine learning (ML) can leverage the so-called AI Flywheel effect. This effect designates a virtuous cycle by which, as an ML product is adopted and new user data are fed back to the algorithm, the product improves, enabling further adoptions. However, managing this feedback loop is difficult, especially when the algorithm is contracted out. Indeed, the additional data that the AI Flywheel effect generates may change the provider's incentives to improve the algorithm over time. We formalize this problem in a simple two-period moral hazard framework that captures the main dynamics among ML, data acquisition, pricing, and contracting. We find that the firm's decisions crucially depend on how the amount of data on which the machine is trained interacts with the provider's effort. If this effort has a more (less) significant impact on accuracy for larger volumes of data, the firm underprices (overprices) the product. Interestingly, these distortions sometimes improve social welfare, which accounts for the customer surplus and profits of both the firm and provider. Further, the interaction between incentive issues and the positive externalities of the AI Flywheel effect has important implications for the firm's data collection strategy. In particular, the firm can boost its profit by increasing the product's capacity to acquire usage data only up to a certain level. If the product collects too much data per user, the firm's profit may actually decrease, i.e., more data is not necessarily better.
Keywords:	Data, machine learning, data product, pricing, incentives, contracting
Date:	2020–03–03
URL:	http://d.repec.org/n?u=RePEc:esm:wpaper:esmt-20-01_r3&r=

Cognitive Architectures for Artificial Intelligence Ethics

By:	Steve J. Bickley; Benno Torgler
Abstract:	As artificial intelligence (AI) thrives and propagates through modern life, a key question to ask is how to include humans in future AI? Despite human- involvement at every stage of the pr oduction process from conception and design through to implementation, modern AI is still often criticized for its black box characteristics. Sometimes, we do not know what really goes on inside or how and why certain conclusions are met. Future AI will face many dilemmas and ethical issu es unforeseen by their creators beyond those commonly discussed (e.g., trolley probl ems and variants of it) and to which solutions cannot be hard-coded and are often still up for debate. Given the sensitivity of such social and ethical dilemmas and the implications of these for human society at large, when and if our AI make the wrong choice we need to understand how they got there in order to make corrections and prevent recurrences. This is particularly true in situations where human livelihoods are at stake (e.g., health, well-being, finance, law) or when major individual or household decisions are taken. Doing so requires opening up the black box of AI; especially as they act, interact, and adapt in a human world and how they interact with other AI in this world. In this article, we argue for the application of cognitive architectures for ethical AI. In particular, for their potential contributions to AI transparency, ex plainability, and accountability. We need to understand how our AI get to the solutions they do, and we should seek to do this on a deeper level in terms of the machine-equivalents of motivations, attitudes, values, and so on. The path to future AI is long and winding but it could arrive faster than we think. In order to harness the positive potential outcomes of AI for humans and society ( and avoid the negatives), we need to understand AI more fully in the first place and we expect this will simultaneously contribute towards greater understanding of their human counterparts also.
Keywords:	Artificial Intelligence; Ethics; Cognitive Architectures; Intelligent Systems; Ethical AI; Society
Date:	2021–08
URL:	http://d.repec.org/n?u=RePEc:cra:wpaper:2021-27&r=

Persuading Investors: A Video-Based Study

By:	Allen Hu; Song Ma
Abstract:	Persuasive communication functions not only through content but also delivery, e.g., facial expression, tone of voice, and diction. This paper examines the persuasiveness of delivery in start-up pitches. Using machine learning (ML) algorithms to process full pitch videos, we quantify persuasion in visual, vocal, and verbal dimensions. Positive (i.e., passionate, warm) pitches increase funding probability. Yet conditional on funding, high-positivity startups underperform. Women are more heavily judged on delivery when evaluating single-gender teams, but they are neglected when co-pitching with men in mixed-gender teams. Using an experiment, we show persuasion delivery works mainly through leading investors to form inaccurate beliefs.
JEL:	C55 D91 G24 G4 G41
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:29048&r=

Building a Foundation for Data-Driven, Interpretable, and Robust Policy Design using the AI Economist

By:	Alexander Trott; Sunil Srinivasa; Douwe van der Wal; Sebastien Haneuse; Stephan Zheng
Abstract:	Optimizing economic and public policy is critical to address socioeconomic issues and trade-offs, e.g., improving equality, productivity, or wellness, and poses a complex mechanism design problem. A policy designer needs to consider multiple objectives, policy levers, and behavioral responses from strategic actors who optimize for their individual objectives. Moreover, real-world policies should be explainable and robust to simulation-to-reality gaps, e.g., due to calibration issues. Existing approaches are often limited to a narrow set of policy levers or objectives that are hard to measure, do not yield explicit optimal policies, or do not consider strategic behavior, for example. Hence, it remains challenging to optimize policy in real-world scenarios. Here we show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning (RL) and data-driven simulations. We validate our framework on optimizing the stringency of US state policies and Federal subsidies during a pandemic, e.g., COVID-19, using a simulation fitted to real data. We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes. Their behavior can be explained, e.g., well-performing policies respond strongly to changes in recovery and vaccination rates. They are also robust to calibration errors, e.g., infection rates that are over or underestimated. As of yet, real-world policymaking has not seen adoption of machine learning methods at large, including RL and AI-driven simulations. Our results show the potential of AI to guide policy design and improve social welfare amidst the complexity of the real world.
Date:	2021–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2108.02904&r=

Analysis of Data Mining Process for Improvement of Production Quality in Industrial Sector

By:	Hamza Saad; Nagendra Nagarur; Abdulrahman Shamsan
Abstract:	Background and Objective: Different industries go through high-precision and complex processes that need to analyze their data and discover defects before growing up. Big data may contain large variables with missed data that play a vital role to understand what affect the quality. So, specialists of the process might be struggling to defined what are the variables that have direct effect in the process. Aim of this study was to build integrated data analysis using data mining and quality tools to improve the quality of production and process. Materials and Methods: Data collected in different steps to reduce missed data. The specialists in the production process recommended to select the most important variables from big data and then predictor screening was used to confirm 16 of 71 variables. Seven important variables built the output variable that called textile quality score. After testing ten algorithms, boosted tree and random forest were evaluated to extract knowledge. In the voting process, three variables were confirmed to use as input factors in the design of experiments. The response of design was estimated by data mining and the results were confirmed by the quality specialists. Central composite (surface response) has been run 17 times to extract the main effects and interactions on the textile quality score. Results: Current study found that a machine productivity has negative effect on the quality, so this validated by the management. After applying changes, the efficiency of production has improved 21%. Conclusion: Results confirmed a big improvement in quality processes in industrial sector. The efficiency of production improved to 21%, weaving process improved to 23% and the overall process improved to 17.06%.
Date:	2021–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2108.07615&r=

Riding the Tide Toward the Digital Era: The Imminent Alliance of AI and Mental Health in the Philippines

By:	Kee, Shaira Limson; Garganera, John Patrick; Maravilla, Nicholle Mae Amor; Garganera, Wilbert; Fermin, Jamie Ledesma; AlDahoul, Nouar; Karim, Hezerul Abdul; Tan, Myles Joshua
Abstract:	From a public health perspective, this opinion article discusses why it is necessary to integrate Artificial Intelligence (AI) into the mental health practices in the Philippines. The use of AI systems is an optimum solution to the rising demand for more accessible, cost-efficient, and inclusive healthcare. With the recent developments, the Philippines is deemed to have sufficient capacity to adopt this agendum. This article serves as a call for the introduction of advanced detection tools and predictive analytics in the medical field, especially in the mental health discipline.
Date:	2021–08–03
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:3wg5p&r=

Adoption of ICTs in Agri-Food Logistics: Potential and Limitations for Supply Chain Sustainability

By:	Cédric Vernier (UMR ITAP - Information – Technologies – Analyse Environnementale – Procédés Agricoles - Montpellier SupAgro - Institut national d’études supérieures agronomiques de Montpellier - Institut Agro - Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement); Denis Loeillet (UR GECO - Fonctionnement écologique et gestion durable des agrosystèmes bananiers et ananas - Cirad - Centre de Coopération Internationale en Recherche Agronomique pour le Développement); Rallou Thomopoulos (UMR IATE - Ingénierie des Agro-polymères et Technologies Émergentes - Cirad - Centre de Coopération Internationale en Recherche Agronomique pour le Développement - UM2 - Université Montpellier 2 - Sciences et Techniques - Montpellier SupAgro - Centre international d'études supérieures en sciences agronomiques - UM - Université de Montpellier - Montpellier SupAgro - Institut national d’études supérieures agronomiques de Montpellier - Institut Agro - Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement); Catherine Macombe (UMR ITAP - Information – Technologies – Analyse Environnementale – Procédés Agricoles - Montpellier SupAgro - Institut national d’études supérieures agronomiques de Montpellier - Institut Agro - Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement)
Abstract:	A major challenge of Sustainable Development Goal 12 "Responsible Consumption and Production" is to reduce food losses along production and supply chains. This is particularly critical for fresh food products, due to their perishable and fragile nature, which makes the coordination of the actors all the more crucial to avoid wastes and losses. The rise of new technologies, referred to as "Industry 4.0" powered by the internet of things, big data analytics and artificial intelligence, could bring new solutions to meet these needs. Information and communication technologies (ICTs) allow for frequent exchanges of huge amounts of information between actors in the agrofood chains to coordinate their activities. The aim of the chapter is to provide a state-of-the-art analysis on ICTs used in agrofood supply chains, with a special focus on the case of fresh fruits and vegetables, to analyze the potential and weaknesses which exist in different forms of supply chains for ICTs becoming a "resource" (precious, rare, non-imitable, and nonsubstitutable) prospect and to suggest promising ICTs in this context.
Keywords:	food sustainability,food supply chain,innovation in food,food waste and SDGs,ICTs,objectifs de développement durable
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-03280502&r=

Stochastic loss reserving with mixture density neural networks

By:	Muhammed Taher Al-Mudafer; Benjamin Avanzi; Greg Taylor; Bernard Wong
Abstract:	Neural networks offer a versatile, flexible and accurate approach to loss reserving. However, such applications have focused primarily on the (important) problem of fitting accurate central estimates of the outstanding claims. In practice, properties regarding the variability of outstanding claims are equally important (e.g., quantiles for regulatory purposes). In this paper we fill this gap by applying a Mixture Density Network ("MDN") to loss reserving. The approach combines a neural network architecture with a mixture Gaussian distribution to achieve simultaneously an accurate central estimate along with flexible distributional choice. Model fitting is done using a rolling-origin approach. Our approach consistently outperforms the classical over-dispersed model both for central estimates and quantiles of interest, when applied to a wide range of simulated environments of various complexity and specifications. We further extend the MDN approach by proposing two extensions. Firstly, we present a hybrid GLM-MDN approach called "ResMDN". This hybrid approach balances the tractability and ease of understanding of a traditional GLM model on one hand, with the additional accuracy and distributional flexibility provided by the MDN on the other. We show that it can successfully improve the errors of the baseline ccODP, although there is generally a loss of performance when compared to the MDN in the examples we considered. Secondly, we allow for explicit projection constraints, so that actuarial judgement can be directly incorporated in the modelling process. Throughout, we focus on aggregate loss triangles, and show that our methodologies are tractable, and that they out-perform traditional approaches even with relatively limited amounts of data. We use both simulated data -- to validate properties, and real data -- to illustrate and ascertain practicality of the approaches.
Date:	2021–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2108.07924&r=

Third-Degree Price Discrimination in the Age of Big Data

By:	Charlson, G.
Abstract:	A platform holds information on the demographics of its users and wants maximise total surplus. The data generates a probability over which of two products a buyer prefers, with different data segmentations being more or less informative. The platform reveals segmentations of the data to two firms, one popular and one niche, preferring to reveal no information than completely revealing the consumer's type for certain. The platform can improve profits by revealing to both firms a segmentation where the niche firm is relatively popular, but still less popular than the other firm, potentially doing even better by revealing information asymmetrically. The platform has an incentive to provide more granular data in markets in which the niche firm is particularly unpopular or in which broad demographic categories are not particularly revelatory of type, suggesting that the profit associated with big data techniques differs depending on market characteristics.
Keywords:	Strategic interaction, network games, interventions, industrial organisation, platforms, hypergraphs
JEL:	D40 L10 L40
Date:	2021–08–19
URL:	http://d.repec.org/n?u=RePEc:cam:camdae:2159&r=

Learning from Zero: How to Make Consumption-Saving Decisions in a Stochastic Environment with an AI Algorithm

By:	Rui (Aruhan) Shi
Abstract:	This exercise offers an innovative learning mechanism to model economic agent’s decision-making process using a deep reinforcement learning algorithm. In particular, this AI agent is born in an economic environment with no information on the underlying economic structure and its own preference. I model how the AI agent learns from square one in terms of how it collects and processes information. It is able to learn in real time through constantly interacting with the environment and adjusting its actions accordingly (i.e., online learning). I illustrate that the economic agent under deep reinforcement learning is adaptive to changes in a given environment in real time. AI agents differ in their ways of collecting and processing information, and this leads to different learning behaviours and welfare distinctions. The chosen economic structure can be generalised to other decision-making processes and economic models.
Keywords:	expectation formation, exploration, deep reinforcement learning, bounded rationality, stochastic optimal growth
JEL:	C45 D83 D84 E21 E70
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_9255&r=

Gay Politics Goes Mainstream: Democrats, Republicans, and Same-Sex Relationships

By:	Raquel Fernández; Sahar Parsa
Abstract:	Attitudes towards same-sex relationships in the US have changed radically over a relatively short period of time. After remaining fairly constant for over two decades, opinions became more favorable starting in 1992—a presidential election year in which the Democratic and Republican parties took opposing stands over the status of gay people in society. What roles did political parties and their leaders play in this process of cultural change? Using a variety of techniques including machine learning, we show that the partisan opinion gap emerged substantially prior to 1992—in the mid 1980s —and did not increase as a result of the political debates in 1992-'93. Furthermore, we identify people with a college-and-above education as the potential "leaders" of the process of partisan divergence.
JEL:	P16 Z1 Z13
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:29062&r=

The making of data commodities: data analytics as an embedded process

By:	Aaltonen, Aleksi Ville; Alaimo, Cristina; Kallinikos, Jannis
Abstract:	This paper studies the process by which data are generated, managed, and assembled into tradable objects we call data commodities. We link the making of such objects to the open and editable nature of digital data and to the emerging big data industry in which they are diffused items of exchange, repurposing, and aggregation. We empirically investigate the making of data commodities in the context of an innovative telecommunications operator, analyzing its efforts to produce advertising audiences by repurposing data from the network infrastructure. The analysis unpacks the processes by which data are repurposed and aggregated into novel data-based objects that acquire organizational and industry relevance through carefully maintained metrics and practices of data management and interpretation. Building from our findings, we develop a process theory that explains the transformations data undergo on their way to becoming commodities and shows how these transformations are related to organizational practices and to the editable, portable, and recontextualizable attributes of data. The theory complements the standard picture of data encountered in data science and analytics and renews and extends the promise of a constructivist IS research into the age of datafication. The results provide practitioners, regulators included, vital insights concerning data management practices that produce commodities from data.
Keywords:	advertising audience; analytics; big data; case study; data commodities; data-based objects; social practices; Taylor & Francis deal
JEL:	J50
Date:	2021–08–06
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:110296&r=

Tilted Platforms: Rental Housing Technology and the Rise of Urban Big Data Oligopolies

By:	Geoff Boeing; Max Besbris; David Wachsmuth; Jake Wegmann
Abstract:	This article interprets emerging scholarship on rental housing platforms -- particularly the most well-known and used short- and long-term rental housing platforms - and considers how the technological processes connecting both short-term and long-term rentals to the platform economy are transforming cities. It discusses potential policy approaches to more equitably distribute benefits and mitigate harms. We argue that information technology is not value-neutral. While rental housing platforms may empower data analysts and certain market participants, the same cannot be said for all users or society at large. First, user-generated online data frequently reproduce the systematic biases found in traditional sources of housing information. Evidence is growing that the information broadcasting potential of rental housing platforms may increase rather than mitigate sociospatial inequality. Second, technology platforms curate and shape information according to their creators' own financial and political interests. The question of which data -- and people -- are hidden or marginalized on these platforms is just as important as the question of which data are available. Finally, important differences in benefits and drawbacks exist between short-term and long-term rental housing platforms, but are underexplored in the literature: this article unpacks these differences and proposes policy recommendations.
Date:	2021–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2108.08229&r=

Improving Inference from Simple Instruments through Compliance Estimation

By:	Stephen Coussens; Jann Spiess
Abstract:	Instrumental variables (IV) regression is widely used to estimate causal treatment effects in settings where receipt of treatment is not fully random, but there exists an instrument that generates exogenous variation in treatment exposure. While IV can recover consistent treatment effect estimates, they are often noisy. Building upon earlier work in biostatistics (Joffe and Brensinger, 2003) and relating to an evolving literature in econometrics (including Abadie et al., 2019; Huntington-Klein, 2020; Borusyak and Hull, 2020), we study how to improve the efficiency of IV estimates by exploiting the predictable variation in the strength of the instrument. In the case where both the treatment and instrument are binary and the instrument is independent of baseline covariates, we study weighting each observation according to its estimated compliance (that is, its conditional probability of being affected by the instrument), which we motivate from a (constrained) solution of the first-stage prediction problem implicit to IV. The resulting estimator can leverage machine learning to estimate compliance as a function of baseline covariates. We derive the large-sample properties of a specific implementation of a weighted IV estimator in the potential outcomes and local average treatment effect (LATE) frameworks, and provide tools for inference that remain valid even when the weights are estimated nonparametrically. With both theoretical results and a simulation study, we demonstrate that compliance weighting meaningfully reduces the variance of IV estimates when first-stage heterogeneity is present, and that this improvement often outweighs any difference between the compliance-weighted and unweighted IV estimands. These results suggest that in a variety of applied settings, the precision of IV estimates can be substantially improved by incorporating compliance estimation.
Date:	2021–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2108.03726&r=

Seeing the Forest for the Trees: using hLDA models to evaluate communication in Banco Central do Brasil

By:	Angelo M. Fasolo; Flávia M. Graminho; Saulo B. Bastos
Abstract:	Central bank communication is a key tool in managing ination expectations. This paper proposes a hierarchical Latent Dirichlet Allocation (hLDA) model combined with feature selection techniques to allow an endogenous selection of topic structures associated with documents published by Banco Central do Brasil's Monetary Policy Committee (Copom). These computational linguistic techniques allow building measures of the content and tone of Copom's minutes and statements. The effects of the tone are measured in different dimensions such as inflation, inflation expectations, economic activity, and economic uncertainty. Beyond the impact on the economy, the hLDA model is used to evaluate the coherence between the statements and the minutes of Copom's meetings.
Date:	2021–08
URL:	http://d.repec.org/n?u=RePEc:bcb:wpaper:555&r=

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.

By:	Pumplun, Luisa; Fecho, Mariska; Wahl, Nihal; Peters, Felix; Buxmann, Peter
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:dar:wpaper:127993&r=