|
on Big Data |
By: | Wee Ling Tan; Stephen Roberts; Stefan Zohren |
Abstract: | We introduce a novel approach to options trading strategies using a highly scalable and data-driven machine learning algorithm. In contrast to traditional approaches that often require specifications of underlying market dynamics or assumptions on an option pricing model, our models depart fundamentally from the need for these prerequisites, directly learning non-trivial mappings from market data to optimal trading signals. Backtesting on more than a decade of option contracts for equities listed on the S&P 100, we demonstrate that deep learning models trained according to our end-to-end approach exhibit significant improvements in risk-adjusted performance over existing rules-based trading strategies. We find that incorporating turnover regularization into the models leads to further performance enhancements at prohibitively high levels of transaction costs. |
Date: | 2024–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2407.21791 |
By: | Haowei Ni; Shuchen Meng; Xupeng Chen; Ziqing Zhao; Andi Chen; Panfeng Li; Shiyao Zhang; Qifu Yin; Yuanqing Wang; Yuxi Chan |
Abstract: | Accurate stock market predictions following earnings reports are crucial for investors. Traditional methods, particularly classical machine learning models, struggle with these predictions because they cannot effectively process and interpret extensive textual data contained in earnings reports and often overlook nuances that influence market movements. This paper introduces an advanced approach by employing Large Language Models (LLMs) instruction fine-tuned with a novel combination of instruction-based techniques and quantized low-rank adaptation (QLoRA) compression. Our methodology integrates 'base factors', such as financial metric growth and earnings transcripts, with 'external factors', including recent market indices performances and analyst grades, to create a rich, supervised dataset. This comprehensive dataset enables our models to achieve superior predictive performance in terms of accuracy, weighted F1, and Matthews correlation coefficient (MCC), especially evident in the comparison with benchmarks such as GPT-4. We specifically highlight the efficacy of the llama-3-8b-Instruct-4bit model, which showcases significant improvements over baseline models. The paper also discusses the potential of expanding the output capabilities to include a 'Hold' option and extending the prediction horizon, aiming to accommodate various investment styles and time frames. This study not only demonstrates the power of integrating cutting-edge AI with fine-tuned financial data but also paves the way for future research in enhancing AI-driven financial analysis tools. |
Date: | 2024–08 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2408.06634 |
By: | Teng Ye; Jingnan Zheng; Junhui Jin; Jingyi Qiu; Wei Ai; Qiaozhu Mei |
Abstract: | While small businesses are increasingly turning to online crowdfunding platforms for essential funding, over 40% of these campaigns may fail to raise any money, especially those from low socio-economic areas. We utilize the latest advancements in AI technology to identify crucial factors that influence the success of crowdfunding campaigns and to improve their fundraising outcomes by strategically optimizing these factors. Our best-performing machine learning model accurately predicts the fundraising outcomes of 81.0% of campaigns, primarily based on their textual descriptions. Interpreting the machine learning model allows us to provide actionable suggestions on improving the textual description before launching a campaign. We demonstrate that by augmenting just three aspects of the narrative using a large language model, a campaign becomes more preferable to 83% human evaluators, and its likelihood of securing financial support increases by 11.9%. Our research uncovers the effective strategies for crafting descriptions for small business fundraising campaigns and opens up a new realm in integrating large language models into crowdfunding methodologies. |
Date: | 2024–04 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2407.09480 |
By: | Gregory Yampolsky; Dhruv Desai; Mingshu Li; Stefano Pasquali; Dhagash Mehta |
Abstract: | The explainability of black-box machine learning algorithms, commonly known as Explainable Artificial Intelligence (XAI), has become crucial for financial and other regulated industrial applications due to regulatory requirements and the need for transparency in business practices. Among the various paradigms of XAI, Explainable Case-Based Reasoning (XCBR) stands out as a pragmatic approach that elucidates the output of a model by referencing actual examples from the data used to train or test the model. Despite its potential, XCBR has been relatively underexplored for many algorithms such as tree-based models until recently. We start by observing that most XCBR methods are defined based on the distance metric learned by the algorithm. By utilizing a recently proposed technique to extract the distance metric learned by Random Forests (RFs), which is both geometry- and accuracy-preserving, we investigate various XCBR methods. These methods amount to identify special points from the training datasets, such as prototypes, critics, counter-factuals, and semi-factuals, to explain the predictions for a given query of the RF. We evaluate these special points using various evaluation metrics to assess their explanatory power and effectiveness. |
Date: | 2024–08 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2408.06679 |
By: | Achintya Gopal |
Abstract: | The use of machine learning for statistical modeling (and thus, generative modeling) has grown in popularity with the proliferation of time series models, text-to-image models, and especially large language models. Fundamentally, the goal of classical factor modeling is statistical modeling of stock returns, and in this work, we explore using deep generative modeling to enhance classical factor models. Prior work has explored the use of deep generative models in order to model hundreds of stocks, leading to accurate risk forecasting and alpha portfolio construction; however, that specific model does not allow for easy factor modeling interpretation in that the factor exposures cannot be deduced. In this work, we introduce NeuralFactors, a novel machine-learning based approach to factor analysis where a neural network outputs factor exposures and factor returns, trained using the same methodology as variational autoencoders. We show that this model outperforms prior approaches both in terms of log-likelihood performance and computational efficiency. Further, we show that this method is competitive to prior work in generating realistic synthetic data, covariance estimation, risk analysis (e.g., value at risk, or VaR, of portfolios), and portfolio optimization. Finally, due to the connection to classical factor analysis, we analyze how the factors our model learns cluster together and show that the factor exposures could be used for embedding stocks. |
Date: | 2024–08 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2408.01499 |
By: | Jian-Qiao Zhu; Joshua C. Peterson; Benjamin Enke; Thomas L. Griffiths |
Abstract: | Understanding how people behave in strategic settings--where they make decisions based on their expectations about the behavior of others--is a long-standing problem in the behavioral sciences. We conduct the largest study to date of strategic decision-making in the context of initial play in two-player matrix games, analyzing over 90, 000 human decisions across more than 2, 400 procedurally generated games that span a much wider space than previous datasets. We show that a deep neural network trained on these data predicts people's choices better than leading theories of strategic behavior, indicating that there is systematic variation that is not explained by those theories. We then modify the network to produce a new, interpretable behavioral model, revealing what the original network learned about people: their ability to optimally respond and their capacity to reason about others are dependent on the complexity of individual games. This context-dependence is critical in explaining deviations from the rational Nash equilibrium, response times, and uncertainty in strategic decisions. More broadly, our results demonstrate how machine learning can be applied beyond prediction to further help generate novel explanations of complex human behavior. |
Date: | 2024–08 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2408.07865 |
By: | Hué, Sullivan (Aix-Marseille University - Aix-Marseille School of Economics); Hurlin, Christophe (University of Orleans); Pérignon, Christophe (HEC Paris); Saurin, Sébastien (University of Orleans, Laboratoire d'économie d'Orléans, Students) |
Abstract: | In credit scoring, machine learning models are known to outperform standard parametric models. As they condition access to credit, banking supervisors and internal model validation teams need to monitor their predictive performance and to identify the features with the highest impact on performance. To facilitate this, we introduce the XPER methodology to decompose a performance metric (e.g., AUC, R^2) into specific contributions associated with the various features of a classification or regression model. XPER is theoretically grounded on Shapley values and is both model-agnostic and performance metric-agnostic. Furthermore, it can be implemented either at the model level or at the individual level. Using a novel dataset of car loans, we decompose the AUC of a machine-learning model trained to forecast the default probability of loan applicants. We show that a small number of features can explain a surprisingly large part of the model performance. Furthermore, we find that the features that contribute the most to the predictive performance of the model may not be the ones that contribute the most to individual forecasts (SHAP). We also show how XPER can be used to deal with heterogeneity issues and significantly boost out-of-sample performance. |
Keywords: | Machine learning; Explainability; Performance metric; Shapley value |
JEL: | C40 C52 |
Date: | 2022–11–22 |
URL: | https://d.repec.org/n?u=RePEc:ebg:heccah:1463 |
By: | Thiago Christiano Silva; Paulo Victor Berri Wilhelm; Diego Raphael Amancio |
Abstract: | This study examines the effects of deglobalization trends on international trade networks and their role in improving forecasts for economic growth. Using section-level trade data from more than 200 countries from 2010 to 2022, we identify significant shifts in the network topology driven by rising trade policy uncertainty. Our analysis highlights key global players through centrality rankings, with the United States, China, and Germany maintaining consistent dominance. Using a horse race of supervised regressors, we find that network topology descriptors evaluated from section-specific trade networks substantially enhance the quality of a country's economic growth forecast. We also find that non-linear models, such as Random Forest, eXtreme Gradient Boosting, and Light Gradient Boosting Machine, outperform traditional linear models used in the economics literature. Using SHapley Additive exPlanations values to interpret these non-linear model's predictions, we find that about half of the most important features originate from the network descriptors, underscoring their vital role in refining forecasts. Moreover, this study emphasizes the significance of recent economic performance, population growth, and the primary sector's influence in shaping economic growth predictions, offering novel insights into the intricacies of economic growth forecasting. |
Date: | 2024–08 |
URL: | https://d.repec.org/n?u=RePEc:bcb:wpaper:597 |
By: | Ugo Bolletta; Laurens Cherchye; Thomas Demuynck; Bram De Rock; Luca Paolo Merlino |
Abstract: | We propose a method to identify individuals’ marriage markets assuming that observed marriages are stable. We aim to learn about (the relative importance of) the individual’s observable characteristics defining these markets. First, we use a nonparametric revealed preference approach to construct inner and outer bound approximations of these markets from observed marriages. We then use machine learning to estimate arobust boundary between them (as a linear function of individual characteristics). We demonstrate the usefulness of our method using Dutch household data and quantify the trade-off between the characteristics such as age, education and wages defining individuals’ marriage markets. |
Keywords: | marriage market, identification, revealed preferences, machine learning, support vector machine (SVM). |
Date: | 2024–07 |
URL: | https://d.repec.org/n?u=RePEc:eca:wpaper:2013/376857 |
By: | Michel Alexandre; Thiago Christiano Silva; Francisco Aparecido Rodrigues |
Abstract: | In this study, we propose a method for the identification of influential edges in financial networks. In our approach, the critical edges are those whose removal would cause a large impact on the systemic risk of the financial network. We apply this framework to a thorough Brazilian data set to identify critical bank-firm edges. In our data set, banks and firms are connected through two financial networks: the interbank network and the bank-firm loan network. We found at least 18% of the edges are critical, in the sense they have a significant impact on the systemic risk of the network. We then employed machine learning (ML) techniques to predict the critical status and – for a large level of the initial shock – the sign of the impact of bank-firm edges on the systemic risk. The level of accuracy obtained in these prediction exercises was very high (above 90%). Posterior analysis through Shapley values shows: i) the PageRank of the edge’s destination node (the firm) is the main driver of the critical status of the edges; and ii) the sign of the edges’ impact depends on the degree of the edge’s origin node (the bank). |
Date: | 2024–08 |
URL: | https://d.repec.org/n?u=RePEc:bcb:wpaper:594 |
By: | Kishore Mullangi (Visa Inc.) |
Abstract: | This study uses accelerated testing and modern technology to improve payment processing system security and efficiency. The primary goals are to identify and evaluate blockchain, AI, machine learning, and biometric authentication advances in protection and performance. The study uses secondary data to demonstrate the revolutionary power of these technologies and the importance of automated, continuous, and AI-driven testing. Main findings: blockchain is secure and decentralized, AI and ML improve real-time fraud detection, and biometric authentication lowers unwanted access. Faster testing methods identify and fix vulnerabilities, ensuring system integrity and meet changing regulatory demands. The study emphasizes the need for constant monitoring and investment in advanced testing technologies despite cybersecurity threats, regulatory compliance, interoperability, scalability, and user experience. Policy implications show that integrating these technologies and tackling associated problems can considerably improve payment processing system resilience and reliability, ensuring a secure and seamless user experience in a digital financial ecosystem. |
Keywords: | Payment Processing, Blockchain, Accelerated Testing, Security Enhancement, Financial Transactions, Risk Management, Fraud Prevention, Machine Learning, Automation |
Date: | 2023–07–27 |
URL: | https://d.repec.org/n?u=RePEc:hal:journl:hal-04647281 |
By: | Klatt, Nikolina |
Abstract: | How do judicial decisions influence political discourse, particularly in areas as contentious as abortion rights? This study investigates how the overturning of Roe v. Wade affected the narrative strategies of U.S. representatives on social media, focusing on variations by party affiliation and geography. While there is literature on the influence of judicial decisions on public opinion and policy, the effect on political narratives remains underexplored. To address this gap, the study analyzes 5, 293 tweets from U.S. representatives in 2022 by supervised text classification and statistical modeling to identify shifts in narrative strategies. The study found the leaked opinion draft acted as a catalyst, which prompted an increase in stories of decline-narratives that emphasize a worsening situation-particularly for Republicans. This study provides empirical evidence of how political narratives evolve in response to landmark judicial changes and insights into the strategic use of narratives by political actors in digital communication. |
Date: | 2024 |
URL: | https://d.repec.org/n?u=RePEc:zbw:wzbtod:301155 |
By: | Cordier, J.;; Geissler, A.;; Vogel, J.; |
Abstract: | This study addresses the challenges of using high-dimensional claims data, typically represented by categorical features, for prediction tasks. Traditional one-hot encoding methods lead to computational inefficiencies and sparse data issues. To overcome these challenges, we propose using entity embedding, a technique that has shown promise in natural language processing, to transform categorical claims data into dense, low-dimensional vectors as input for downstream prediction tasks. Our study focuses on predicting hospitalizations for patients with Chronic Obstructive Pulmonary Disease using the Word2Vec Continuous Bag-of-Words model. Our findings indicate that entity embedding enhances model performance, achieving an AUC of 0.92 compared to 0.91 with one-hot encoding, and improves specificity from 0.55 to 0.60 for a recall of 0.95. Additionally, entity embedding significantly reduces required computation power. These results suggest that entity embedding not only captures the dynamics of medical events more effectively but also enhances the efficiency of training predictive models, making it a valuable tool for healthcare and insurance analytics. |
Date: | 2024–08 |
URL: | https://d.repec.org/n?u=RePEc:yor:hectdg:24/09 |
By: | Andras Komaromi; Xiaomin Wu; Ran Pan; Yang Liu; Pablo Cisneros; Anchal Manocha; Hiba El Oirghi |
Abstract: | The International Monetary Fund (IMF) has expanded its online learning program, offering over 100 Massive Open Online Courses (MOOCs) to support economic and financial policymaking worldwide. This paper explores the application of Artificial Intelligence (AI), specifically Large Language Models (LLMs), to analyze qualitative feedback from participants in these courses. By fine-tuning a pre-trained LLM on expert-annotated text data, we develop models that efficiently classify open-ended survey responses with accuracy comparable to human coders. The models’ robust performance across multiple languages, including English, French, and Spanish, demonstrates its versatility. Key insights from the analysis include a preference for shorter, modular content, with variations across genders, and the significant impact of language barriers on learning outcomes. These and other findings from unstructured learner feedback inform the continuous improvement of the IMF's online courses, aligning with its capacity development goals to enhance economic and financial expertise globally. |
Date: | 2024–08–02 |
URL: | https://d.repec.org/n?u=RePEc:imf:imfwpa:2024/166 |
By: | Jianqing Fan; Weining Wang; Yue Zhao |
Abstract: | High-dimensional covariates often admit linear factor structure. To effectively screen correlated covariates in high-dimension, we propose a conditional variable screening test based on non-parametric regression using neural networks due to their representation power. We ask the question whether individual covariates have additional contributions given the latent factors or more generally a set of variables. Our test statistics are based on the estimated partial derivative of the regression function of the candidate variable for screening and a observable proxy for the latent factors. Hence, our test reveals how much predictors contribute additionally to the non-parametric regression after accounting for the latent factors. Our derivative estimator is the convolution of a deep neural network regression estimator and a smoothing kernel. We demonstrate that when the neural network size diverges with the sample size, unlike estimating the regression function itself, it is necessary to smooth the partial derivative of the neural network estimator to recover the desired convergence rate for the derivative. Moreover, our screening test achieves asymptotic normality under the null after finely centering our test statistics that makes the biases negligible, as well as consistency for local alternatives under mild conditions. We demonstrate the performance of our test in a simulation study and two real world applications. |
Date: | 2024–08–21 |
URL: | https://d.repec.org/n?u=RePEc:azt:cemmap:17/24 |
By: | Julian Ashwin; Paul Beaudry; Martin Ellison |
Abstract: | Neural networks offer a promising tool for the analysis of nonlinear economies. In this paper, we derive conditions for the global stability of nonlinear rational expectations equilibria under neural network learning. We demonstrate the applicability of the conditions in analytical and numerical examples where the nonlinearity is caused by monetary policy targeting a range, rather than a specific value, of inflation. If shock persistence is high or there is inertia in the structure of the economy, then the only rational expectations equilibria that are learnable may involve inflation spending long periods outside its target range. Neural network learning is also useful for solving and selecting between multiple equilibria and steady states in other settings, such as when there is a zero lower bound on the nominal interest rate. |
JEL: | C45 E19 E47 |
Date: | 2024–08 |
URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:32807 |
By: | Margherita Borella; Francisco A. Bullano; Mariacristina De Nardi; Benjamin Krueger; Elena Manresa |
Abstract: | While health affects many economic outcomes, its dynamics are still poorly understood. We use k means clustering, a machine learning technique, and data from the Health and Retirement Study to identify health types during middle and old age. We identify five health types: the vigorous resilient, the fair-health resilient, the fair-health vulnerable, the frail resilient, and the frail vulnerable. They are characterized by different starting health and health and mortality trajectories. Our five health types account for 84% of the variation in health trajectories and are not explained by observable characteristics, such as age, marital status, education, gender, race, health-related behaviors, and health insurance status, but rather, by one’s past health dynamics. We also show that health types are important drivers of health and mortality heterogeneity and dynamics. Our results underscore the importance of better understanding health type formation and of modeling it appropriately to properly evaluate the effects of health on people’s decisions and the implications of policy reforms. |
JEL: | I1 |
Date: | 2024–08 |
URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:32799 |
By: | Brian Jabarian |
Abstract: | In this article, we explore the transformative potential of integrating generative AI, particularly Large Language Models (LLMs), into behavioral and experimental economics to enhance internal validity. By leveraging AI tools, researchers can improve adherence to key exclusion restrictions and in particular ensure the internal validity measures of mental models, which often require human intervention in the incentive mechanism. We present a case study demonstrating how LLMs can enhance experimental design, participant engagement, and the validity of measuring mental models. |
Date: | 2024–06 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2407.12032 |
By: | Breen, Casey; Fatehkia, Masoomali; Yan, Jiani; Zhao, Xinyi; Leasure, Douglas R. (Leverhulme Centre for Demographic Science, University of Oxford); Weber, Ingmar (Qatar Computing Research Institute); Kashyap, Ridhi |
Abstract: | The digital revolution has ushered in many societal and economic benefits. Yet access to digital technologies such as mobile phones and internet remains highly unequal, especially by gender in the context of low- and middle-income countries. While national-level estimates are increasingly available for many countries, reliable, quantitative estimates of digital gender inequalities at the subnational level are lacking. These estimates, however, are essential for monitoring gaps within countries and implementing targeted interventions within the global sustainable development goals, which emphasize the need to close inequalities both between and within countries. We develop estimates of internet and mobile adoption by gender and digital gender gaps at the subnational level for 2, 158 regions in 118 low- and middle-income countries (LMICs), a context where digital penetration is low and national-level gender gaps disfavoring women are large. We construct these estimates by applying machine-learning algorithms to Facebook user counts, geospatial data, development indicators, and population composition data. We calibrate and assess the performance of these algorithms using ground-truth data from subnationally-representative household survey data from 31 LMICs. Our results reveal striking disparities in access to mobile and internet technologies between and within LMICs, with implications for policy formulation and infrastructure investment. These disparities contribute to a global context where women are 21% less likely to use the internet and 17% less likely to own mobile phones than men, corresponding to over 385 million more men than women owning a mobile phone and over 360 million more men than women using the internet. |
Date: | 2024–08–15 |
URL: | https://d.repec.org/n?u=RePEc:osf:socarx:qnzsw |
By: | Juan Imbet (DRM - Dauphine Recherches en Management - Université Paris Dauphine-PSL - PSL - Université Paris Sciences et Lettres - CNRS - Centre National de la Recherche Scientifique); J. Anthony Cookson (Leeds School of Business [Boulder] - University of Colorado [Boulder]); Corbin Fox; Christoph Schiller; Javier Gil-Bazo |
Abstract: | Social media fueled a bank run on Silicon Valley Bank (SVB), and the effects werefelt broadly in the U.S. banking industry. We employ comprehensive Twitter data toshow that preexisting exposure to social media predicts bank stock market losses inthe run period even after controlling for bank characteristics related to run risk (i.e., mark-to-market losses and uninsured deposits). Moreover, we show that social mediaamplifies these bank run risk factors. During the run period, we find the intensity ofTwitter conversation about a bank predicts stock market losses at the hourly frequency.This effect is stronger for banks with bank run risk factors. At even higher frequency, tweets in the run period with negative sentiment translate into immediate stock marketlosses. These high frequency effects are stronger when tweets are authored by membersof the Twitter startup community (who are likely depositors) and contain keywordsrelated to contagion. These results are consistent with depositors using Twitter tocommunicate in real time during the bank run. |
Keywords: | Bank Runs, Social Media, Social Finance, FinTech |
Date: | 2024 |
URL: | https://d.repec.org/n?u=RePEc:hal:journl:hal-04660083 |
By: | Dhammika Dharmapala; Aziz Huq |
Abstract: | U.S. law requires the Attorney General to collect data on hate crime victimization from states and municipalities, but states and localities are under no obligation to cooperate by gathering or sharing information. Data production hence varies considerably across jurisdictions. This paper addresses the ensuing “missing data” problem by imputing unreported hate crimes using Google search rates for a racial epithet. As a benchmark of accurate hate crime data, it uses two alternative definitions of which jurisdictions more effectively collect hate crime data: all states that were not part of the erstwhile Confederacy, and those states with statutory provisions relating to hate crime reporting. We regress rates of racially-motivated hate crimes with African-American victims on Google searches and other relevant variables over 2004-2015 at the state-year level for each group of benchmark states. Adding the Google search rate for the epithet substantially enhances the capacity of such models to predict hate crime rates among benchmark states. We use the results of these regressions to impute hate crime rates, out-of-sample, to non-benchmark jurisdictions that do not robustly report hate crimes. The results imply a substantial number of unreported hate crimes, concentrated in particular jurisdictions. It also illustrates how internet search rates can be a source of data on attitudes that are otherwise hard to measure. |
Keywords: | hate crimes, victimization, internet search, crime reporting |
JEL: | K42 |
Date: | 2024 |
URL: | https://d.repec.org/n?u=RePEc:ces:ceswps:_11245 |
By: | Marín Llanes, Lucas (Universidad de los Andes); Fernández Sierra, Manuel (Universidad de los Andes); Vélez Lesmes, María Alejandra (Universidad de los Andes); Martínez González, Eduard (Superintendencia de Economía Solidaria); Murillo Sandoval, Paulo (Universidad del Tolima) |
Abstract: | This study investigates the socio-economic effects of Colombia’s recent coca cultivation boom, exploiting municipal variations in production incentives following the 2014 announcement of the coca crop substitution program. Using a difference-in-differences strategy with satellite-derived night-time light data as a proxy for economic activity, we find that a one standard deviation increase in coca crops resulted in a 2.5% to 3.1% increase in municipality-level GDP. We also estimate local GDP multipliers, showing that each additional dollar from coca leaf and coca base sales raises GDP by $1.17 to $2.30 and $0.86 to $1.63, respectively. Although the coca boom did not significantly affect local fiscal revenues, violence indicators, or land used for agricultural production, it had substantial environmental impacts, with deforestation rates increasing by 104% and a 302% rise in land conversion from coca cultivation to cattle pastures in the Colombian Amazon. Our findings underscore the significance of illicit economies in providing short-term economic gains and acting as catalysts for economic activity. |
Keywords: | Illicit Economies; Economic growth; Coca Cultivation; Deforestation; Colombia |
JEL: | K42 O13 O17 Q34 Q56 |
Date: | 2024–08–20 |
URL: | https://d.repec.org/n?u=RePEc:col:000089:021186 |
By: | Nils Wernerfelt; Anna Tuchman; Bradley Shapiro; Robert Moakler |
Abstract: | Third-party cookies and related ‘offsite’ tracking technologies are frequently used to share user data across applications in support of ad delivery. These data are viewed as highly valuable for online advertisers, but their usage faces increasing headwinds. In this paper, we quantify the benefit to advertisers from using such offsite tracking data in their ad delivery. With this goal in mind, we conduct a large-scale, randomized experiment that includes more than 70, 000 advertisers on Facebook and Instagram. We first estimate advertising effectiveness at baseline across our broad sample. We then estimate the change in effectiveness of the same campaigns were advertisers to lose the ability to optimize ad delivery with offsite data. In each of these cases, we use recently developed deconvolution techniques to flexibly estimate the underlying distribution of effects. We find a median cost per incremental customer at baseline of $38.16 that under the median loss in effectiveness would rise to $49.93, a 31% increase. Further, we find ads targeted using offsite data generate more long-term customers per dollar than those without, and losing offsite data disproportionately hurts small scale advertisers. Taken together, our results suggest that offsite data bring large benefits to a wide range of advertisers. |
JEL: | L15 L40 L49 L59 M30 M31 M37 M38 |
Date: | 2024–08 |
URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:32765 |