nep-big New Economics Papers
on Big Data
Issue of 2021‒03‒22
twenty-two papers chosen by
Tom Coupé
University of Canterbury

  1. Artificial Intelligence and Big Data in Sustainable Entrepreneurship By Steve J. Bickley; Alison Macintyre; Benno Torgler
  2. Forecasting commodity prices using long-short-term memory neural networks By Ly, Racine; Traore, Fousseini; Dia, Khadim
  3. DeepSets and their derivative networks for solving symmetric PDEs * By Maximilien Germain; Mathieu Laurière; Huyên Pham; Xavier Warin
  4. A Survey of Forex and Stock Price Prediction Using Deep Learning By Zexin Hu; Yiqi Zhao; Matloob Khushi
  5. Predicting the Behavior of Dealers in Over-The-Counter Corporate Bond Markets By Yusen Lin; Jinming Xue; Louiqa Raschid
  6. A Neural Network Ensemble Approach for GDP Forecasting By Luigi Longo; Massimo Riccaboni; Armando Rungi
  7. Artificial Intelligence and Energy Intensity in China’s Industrial Sector: Effect and Transmission Channel By Liu, Liang; Yang, Kun; Fujii, Hidemichi; Liu, Jun
  8. Gender Distribution across Topics in Top 5 Economics Journals: A Machine Learning Approach By J. Ignacio Conde-Ruiz; Juan José Ganuza; Manu García; Luis A. Puch
  9. The Gender Pay Gap Revisited with Big Data: Do Methodological Choices Matter? By Anthony Strittmatter; Conny Wunsch
  10. DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R By Philipp Bach; Victor Chernozhukov; Malte S. Kurz; Martin Spindler
  11. Do Word Embeddings Really Understand Loughran-McDonald's Polarities? By Mengda Li; Charles-Albert Lehalle
  12. Prediction of financial time series using LSTM and data denoising methods By Qi Tang; Tongmei Fan; Ruchen Shi; Jingyan Huang; Yidan Ma
  13. Coordinating Human and Machine Learning for Effective Organizational Learning By Sturm, Timo; Gerlach, Jin; Pumplun, Luisa; Mesbah, Neda; Peters, Felix; Tauchert, Christoph; Nan, Ning; Buxmann, Peter
  14. Causal Reinforcement Learning: An Instrumental Variable Approach By Jin Li; Ye Luo; Xiaowei Zhang
  15. The impact of online machine-learning methods on long-term investment decisions and generator utilization in electricity markets By Alexander J. M. Kell; A. Stephen McGough; Matthew Forshaw
  16. Autocalibration and Tweedie-dominance for Insurance Pricing with Machine Learning By Michel Denuit; Arthur Charpentier; Julien Trufin
  17. The Social Dilemma of Big Data: Donating Personal Data to Promote Social Welfare By Kirsten Hillebrand; Lars Hornuf
  18. Statistical Arbitrage Risk Premium by Machine Learning By Raymond C. W. Leung; Yu-Man Tam
  19. A Novel Data Governance Scheme Based on the Behavioral Economics Theory By Hou, Bohan
  20. Optimal Targeting in Fundraising: A Machine Learning Approach By Tobias Cagala; Ulrich Glogowsky; Johannes Rincke; Anthony Strittmatter
  21. Backcasting, Nowcasting, and Forecasting Residential Repeat-Sales Returns: Big Data meets Mixed Frequency By Matteo Garzoli; Alberto Plazzi; Rossen I. Valkanov
  22. Application of Legal Instruments of Protection in the Field of Personal Data – Human Rights between Challenges and Limits By Madalina Botina; Marilena Marin

  1. By: Steve J. Bickley; Alison Macintyre; Benno Torgler
    Abstract: The recent acceleration and ongoing development in Artificial Intelligence (AI) and its related (and/or enabling) digital technologies presents new challenges and considerable opportunity on which businesses and individuals may capitalise. In the era of BD – and with increasing societal value being placed on sustainable business to minimise or mitigate the impacts of climate change – customers and regulators alike are turning to organizations to tackle large and complex sustainable development goals. AI and BD can help interpret and monitor the environment, identify which problems need attention, design strategies, generate decisions, and action the tactics. A key challenge in sustainable entrepreneurship is a failure to integrate ‘systems thinking’ beyond a limited number of issues, rather than taking the time to understand the relationship between business processes, macro ecological processes, boundary conditions, and tipping points. The recent and substantial increase in data availability simultaneously advances the potential for AI and BD to enhance ecological sustainability through validation and testing of beliefs and hunches, offering empirical guidance to every stage involved in decision making, and comparing inputs against the outcomes – particularly in fast-changing and highly uncertain environments. To prepare, we must strategize by looking to and engaging with the market, our clients, and our customers for guidance. Only then can we then proceed to develop viable and sustainable business models and plans. To reap the rewards of progress in AI, BD, and related technologies, we need to find ways to race with the emerging technologies while also identifying ways to act in symbiosis with them. The demands of adapting to AI and BD are no different from the situation with past disruptive technologies such as the automobile, radio, and the Internet.
    Keywords: Artificial Intelligence; Big Data; Entrepreneurship; Sustainability; Expert Systems
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:cra:wpaper:2021-11&r=all
  2. By: Ly, Racine; Traore, Fousseini; Dia, Khadim
    Abstract: This paper applies a recurrent neural network (RNN) method to forecast cotton and oil prices. We show how these new tools from machine learning, particularly Long-Short Term Memory (LSTM) models, complement traditional methods. Our results show that machine learning methods fit reasonably well with the data but do not outperform systematically classical methods such as Autoregressive Integrated Moving Average (ARIMA) or the naïve models in terms of out of sample forecasts. However, averaging the forecasts from the two type of models provide better results compared to either method. Compared to the ARIMA and the LSTM, the Root Mean Squared Error (RMSE) of the average forecast was 0.21 and 21.49 percent lower, respectively, for cotton. For oil, the forecast averaging does not provide improvements in terms of RMSE. We suggest using a forecast averaging method and extending our analysis to a wide range of commodity prices.
    Keywords: WORLD; forecasting; models; prices; commodities; machine learning; neural networks; cotton; oils; Recurrent Neural networks; LSTM; commodity prices; Long-Short Term Memory; Autoregressive Integrated Moving Average (ARIMA)
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:fpr:ifprid:2000&r=all
  3. By: Maximilien Germain (EDF - EDF, LPSM (UMR_8001) - Laboratoire de Probabilités, Statistiques et Modélisations - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique - UP - Université de Paris); Mathieu Laurière (ORFE - Department of Operations Research and Financial Engineering - Princeton University, School of Engineering and Applied Science); Huyên Pham (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistiques et Modélisations - UPD7 - Université Paris Diderot - Paris 7 - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique, FiME Lab - Laboratoire de Finance des Marchés d'Energie - Université Paris Dauphine-PSL - PSL - Université Paris sciences et lettres - CREST - EDF R&D - EDF R&D - EDF - EDF); Xavier Warin (EDF - EDF, FiME Lab - Laboratoire de Finance des Marchés d'Energie - Université Paris Dauphine-PSL - PSL - Université Paris sciences et lettres - CREST - EDF R&D - EDF R&D - EDF - EDF)
    Abstract: Machine learning methods for solving nonlinear partial differential equations (PDEs) are hot topical issues, and different algorithms proposed in the literature show efficient numerical approximation in high dimension. In this paper, we introduce a class of PDEs that are invariant to permutations, and called symmetric PDEs. Such problems are widespread, ranging from cosmology to quantum mechanics, and option pricing/hedging in multi-asset market with exchangeable payoff. Our main application comes actually from the particles approximation of mean-field control problems. We design deep learning algorithms based on certain types of neural networks, named PointNet and DeepSet (and their associated derivative networks), for computing simultaneously an approximation of the solution and its gradient to symmetric PDEs. We illustrate the performance and accuracy of the PointNet/DeepSet networks compared to classical feedforward ones, and provide several numerical results of our algorithm for the examples of a mean-field systemic risk, mean-variance problem and a min/max linear quadratic McKean-Vlasov control problem.
    Keywords: Permutation-invariant PDEs,symmetric neural networks,exchangeability,deep backward scheme,mean-field control
    Date: 2021–02–27
    URL: http://d.repec.org/n?u=RePEc:hal:wpaper:hal-03154116&r=all
  4. By: Zexin Hu; Yiqi Zhao; Matloob Khushi
    Abstract: The prediction of stock and foreign exchange (Forex) had always been a hot and profitable area of study. Deep learning application had proven to yields better accuracy and return in the field of financial prediction and forecasting. In this survey we selected papers from the DBLP database for comparison and analysis. We classified papers according to different deep learning methods, which included: Convolutional neural network (CNN), Long Short-Term Memory (LSTM), Deep neural network (DNN), Recurrent Neural Network (RNN), Reinforcement Learning, and other deep learning methods such as HAN, NLP, and Wavenet. Furthermore, this paper reviewed the dataset, variable, model, and results of each article. The survey presented the results through the most used performance metrics: RMSE, MAPE, MAE, MSE, accuracy, Sharpe ratio, and return rate. We identified that recent models that combined LSTM with other methods, for example, DNN, are widely researched. Reinforcement learning and other deep learning method yielded great returns and performances. We conclude that in recent years the trend of using deep-learning based method for financial modeling is exponentially rising.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.09750&r=all
  5. By: Yusen Lin; Jinming Xue; Louiqa Raschid
    Abstract: Trading in Over-The-Counter (OTC) markets is facilitated by broker-dealers, in comparison to public exchanges, e.g., the New York Stock Exchange (NYSE). Dealers play an important role in stabilizing prices and providing liquidity in OTC markets. We apply machine learning methods to model and predict the trading behavior of OTC dealers for US corporate bonds. We create sequences of daily historical transaction reports for each dealer over a vocabulary of US corporate bonds. Using this history of dealer activity, we predict the future trading decisions of the dealer. We consider a range of neural network-based prediction models. We propose an extension, the Pointwise-Product ReZero (PPRZ) Transformer model, and demonstrate the improved performance of our model. We show that individual history provides the best predictive model for the most active dealers. For less active dealers, a collective model provides improved performance. Further, clustering dealers based on their similarity can improve performance. Finally, prediction accuracy varies based on the activity level of both the bond and the dealer.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.09098&r=all
  6. By: Luigi Longo (IMT School for advanced studies); Massimo Riccaboni (IMT School for advanced studies); Armando Rungi (IMT School for advanced studies)
    Abstract: We propose an ensemble learning methodology to forecast the future US GDP growth release. Our approach combines a Recurrent Neural Network (RNN) with a Dynamic Factor model accounting for time-variation in mean with a General- ized Autoregressive Score (DFM-GAS). The analysis is based on a set of predictors encompassing a wide range of variables measured at different frequencies. The forecast exercise is aimed at evaluating the predictive ability of each model's com- ponent of the ensemble by considering variations in mean, potentially caused by recessions affecting the economy. Thus, we show how the combination of RNN and DFM-GAS improves forecasts of the US GDP growth rate in the aftermath of the 2008-09 global financial crisis. We find that a neural network ensemble markedly reduces the root mean squared error for the short-term forecast horizon.
    Keywords: macroeconomic forecasting; machine learning; neural networks; dynamic factor model; Covid-19 crisis
    JEL: C53 E37
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:ial:wpaper:2/2021&r=all
  7. By: Liu, Liang; Yang, Kun; Fujii, Hidemichi; Liu, Jun
    Abstract: The continued development of artificial intelligence (AI) has changed production methods but may also pose challenges related to energy consumption; in addition, the effectiveness of AI differs across industries. Thus, to develop efficient policies, it is necessary to discuss the effect of AI adoption on energy intensity and to identify industries that are more significantly affected. Using data on industrial robots installed in 16 Chinese industrial subsectors from 2006 to 2016, this paper investigates both the effect of AI on energy intensity and the channel through which this effect is transmitted. The empirical results show, first, that boosting applications of AI can significantly reduce energy intensity by both increasing output value and reducing energy consumption, especially for energy intensities at high quantiles. Second, compared with the impacts in capital-intensive sectors (e.g., basic metals, pharmaceuticals, and cosmetics), the negative impacts of AI on energy intensity in labor-intensive sectors (e.g., textiles and paper) and technology-intensive sectors (e.g., industrial machinery and transportation equipment) are more pronounced. Finally, the impact of AI on energy intensity is primarily achieved through its facilitation of technological progress; this accounts for 78.3% of the total effect. To reduce energy intensity, the Chinese government should effectively promote the development of AI and its use in industry, especially in labor-intensive and technology-intensive sectors.
    Keywords: artificial intelligence; energy intensity; energy consumption; industrial robot; China
    JEL: L6 O13 O14 O32
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:106333&r=all
  8. By: J. Ignacio Conde-Ruiz; Juan José Ganuza; Manu García; Luis A. Puch
    Abstract: We analyze all the articles published in Top 5 economic journals between 2002 and 2019 in order to find gender di↵erences in their research approach. Using an unsupervised machine learning algorithm (Structural Topic Model) developed by Roberts et al. (2019) we characterize jointly the set of latent topics that best fits our data (the set of abstracts) and how the documents/abstracts are allocated in each latent topic. This latent topics are mixtures over words were each word has a probability of belonging to a topic after controlling by year and journal.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:fda:fdaddt:2021-07&r=all
  9. By: Anthony Strittmatter; Conny Wunsch
    Abstract: The vast majority of existing studies that estimate the average unexplained gender pay gap use unnecessarily restrictive linear versions of the Blinder-Oaxaca decomposition. Using a notably rich and large data set of 1.7 million employees in Switzerland, we investigate how the methodological improvements made possible by such big data affect estimates of the unexplained gender pay gap. We study the sensitivity of the estimates with regard to i) the availability of observationally comparable men and women, ii) model flexibility when controlling for wage determinants, and iii) the choice of different parametric and semi-parametric estimators, including variants that make use of machine learning methods. We find that these three factors matter greatly. Blinder-Oaxaca estimates of the unexplained gender pay gap decline by up to 39% when we enforce comparability between men and women and use a more flexible specification of the wage equation. Semi-parametric matching yields estimates that when compared with the Blinder-Oaxaca estimates, are up to 50% smaller and also less sensitive to the way wage determinants are included.
    Keywords: gender inequality, gender pay gap, common support, model specification, matching estimator, machine learning
    JEL: J31 C21
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:ces:ceswps:_8912&r=all
  10. By: Philipp Bach; Victor Chernozhukov; Malte S. Kurz; Martin Spindler
    Abstract: The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov et al. (2018). It provides functionalities to estimate parameters in causal models based on machine learning methods. The double machine learning framework consist of three key ingredients: Neyman orthogonality, high-quality machine learning estimation and sample splitting. Estimation of nuisance components can be performed by various state-of-the-art machine learning methods that are available in the mlr3 ecosystem. DoubleML makes it possible to perform inference in a variety of causal models, including partially linear and interactive regression models and their extensions to instrumental variable estimation. The object-oriented implementation of DoubleML enables a high flexibility for the model specification and makes it easily extendable. This paper serves as an introduction to the double machine learning framework and the R package DoubleML. In reproducible code examples with simulated and real data sets, we demonstrate how DoubleML users can perform valid inference based on machine learning methods.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.09603&r=all
  11. By: Mengda Li; Charles-Albert Lehalle
    Abstract: In this paper we perform a rigorous mathematical analysis of the word2vec model, especially when it is equipped with the Skip-gram learning scheme. Our goal is to explain how embeddings, that are now widely used in NLP (Natural Language Processing), are influenced by the distribution of terms in the documents of the considered corpus. We use a mathematical formulation to shed light on how the decision to use such a model makes implicit assumptions on the structure of the language. We show how Markovian assumptions, that we discuss, lead to a very clear theoretical understanding of the formation of embeddings, and in particular the way it captures what we call frequentist synonyms. These assumptions allow to produce generative models and to conduct an explicit analysis of the loss function commonly used by these NLP techniques. Moreover, we produce synthetic corpora with different levels of structure and show empirically how the word2vec algorithm succeed, or not, to learn them. It leads us to empirically assess the capability of such models to capture structures on a corpus of around 42 millions of financial News covering 12 years. That for, we rely on the Loughran-McDonald Sentiment Word Lists largely used on financial texts and we show that embeddings are exposed to mixing terms with opposite polarity, because of the way they can treat antonyms as frequentist synonyms. Beside we study the non-stationarity of such a financial corpus, that has surprisingly not be documented in the literature. We do it via time series of cosine similarity between groups of polarized words or company names, and show that embedding are indeed capturing a mix of English semantics and joined distribution of words that is difficult to disentangle.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.09813&r=all
  12. By: Qi Tang; Tongmei Fan; Ruchen Shi; Jingyan Huang; Yidan Ma
    Abstract: In order to further overcome the difficulties of the existing models in dealing with the non-stationary and nonlinear characteristics of high-frequency financial time series data, especially its weak generalization ability, this paper proposes an ensemble method based on data denoising methods, including the wavelet transform (WT) and singular spectrum analysis (SSA), and long-term short-term memory neural network (LSTM) to build a data prediction model, The financial time series is decomposed and reconstructed by WT and SSA to denoise. Under the condition of denoising, the smooth sequence with effective information is reconstructed. The smoothing sequence is introduced into LSTM and the predicted value is obtained. With the Dow Jones industrial average index (DJIA) as the research object, the closing price of the DJIA every five minutes is divided into short-term (1 hour), medium-term (3 hours) and long-term (6 hours) respectively. . Based on root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and absolute percentage error standard deviation (SDAPE), the experimental results show that in the short-term, medium-term and long-term, data denoising can greatly improve the accuracy and stability of the prediction, and can effectively improve the generalization ability of LSTM prediction model. As WT and SSA can extract useful information from the original sequence and avoid overfitting, the hybrid model can better grasp the sequence pattern of the closing price of the DJIA. And the WT-LSTM model is better than the benchmark LSTM model and SSA-LSTM model.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.03505&r=all
  13. By: Sturm, Timo; Gerlach, Jin; Pumplun, Luisa; Mesbah, Neda; Peters, Felix; Tauchert, Christoph; Nan, Ning; Buxmann, Peter
    Date: 2021–03–11
    URL: http://d.repec.org/n?u=RePEc:dar:wpaper:125653&r=all
  14. By: Jin Li; Ye Luo; Xiaowei Zhang
    Abstract: In the standard data analysis framework, data is first collected (once for all), and then data analysis is carried out. With the advancement of digital technology, decisionmakers constantly analyze past data and generate new data through the decisions they make. In this paper, we model this as a Markov decision process and show that the dynamic interaction between data generation and data analysis leads to a new type of bias -- reinforcement bias -- that exacerbates the endogeneity problem in standard data analysis. We propose a class of instrument variable (IV)-based reinforcement learning (RL) algorithms to correct for the bias and establish their asymptotic properties by incorporating them into a two-timescale stochastic approximation framework. A key contribution of the paper is the development of new techniques that allow for the analysis of the algorithms in general settings where noises feature time-dependency. We use the techniques to derive sharper results on finite-time trajectory stability bounds: with a polynomial rate, the entire future trajectory of the iterates from the algorithm fall within a ball that is centered at the true parameter and is shrinking at a (different) polynomial rate. We also use the technique to provide formulas for inferences that are rarely done for RL algorithms. These formulas highlight how the strength of the IV and the degree of the noise's time dependency affect the inference.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.04021&r=all
  15. By: Alexander J. M. Kell; A. Stephen McGough; Matthew Forshaw
    Abstract: Electricity supply must be matched with demand at all times. This helps reduce the chances of issues such as load frequency control and the chances of electricity blackouts. To gain a better understanding of the load that is likely to be required over the next 24h, estimations under uncertainty are needed. This is especially difficult in a decentralized electricity market with many micro-producers which are not under central control. In this paper, we investigate the impact of eleven offline learning and five online learning algorithms to predict the electricity demand profile over the next 24h. We achieve this through integration within the long-term agent-based model, ElecSim. Through the prediction of electricity demand profile over the next 24h, we can simulate the predictions made for a day-ahead market. Once we have made these predictions, we sample from the residual distributions and perturb the electricity market demand using the simulation, ElecSim. This enables us to understand the impact of errors on the long-term dynamics of a decentralized electricity market. We show we can reduce the mean absolute error by 30% using an online algorithm when compared to the best offline algorithm, whilst reducing the required tendered national grid reserve required. This reduction in national grid reserves leads to savings in costs and emissions. We also show that large errors in prediction accuracy have a disproportionate error on investments made over a 17-year time frame, as well as electricity mix.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.04327&r=all
  16. By: Michel Denuit; Arthur Charpentier; Julien Trufin
    Abstract: Boosting techniques and neural networks are particularly effective machine learning methods for insurance pricing. Often in practice, there are nevertheless endless debates about the choice of the right loss function to be used to train the machine learning model, as well as about the appropriate metric to assess the performances of competing models. Also, the sum of fitted values can depart from the observed totals to a large extent and this often confuses actuarial analysts. The lack of balance inherent to training models by minimizing deviance outside the familiar GLM with canonical link setting has been empirically documented in W\"uthrich (2019, 2020) who attributes it to the early stopping rule in gradient descent methods for model fitting. The present paper aims to further study this phenomenon when learning proceeds by minimizing Tweedie deviance. It is shown that minimizing deviance involves a trade-off between the integral of weighted differences of lower partial moments and the bias measured on a specific scale. Autocalibration is then proposed as a remedy. This new method to correct for bias adds an extra local GLM step to the analysis. Theoretically, it is shown that it implements the autocalibration concept in pure premium calculation and ensures that balance also holds on a local scale, not only at portfolio level as with existing bias-correction techniques. The convex order appears to be the natural tool to compare competing models, putting a new light on the diagnostic graphs and associated metrics proposed by Denuit et al. (2019).
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.03635&r=all
  17. By: Kirsten Hillebrand; Lars Hornuf
    Abstract: When using digital devices and services, individuals provide their personal data to organizations in exchange for gains in various domains of life. Organizations use these data to run technologies such as smart assistants, augmented reality, and robotics. Most often, these organizations seek to make a profit. Individuals can, however, also provide personal data to public databases that enable nonprofit organizations to promote social welfare if sufficient data are contributed. Regulators have therefore called for efficient ways to help the public collectively benefit from its own data. By implementing an online experiment among 1,696 US citizens, we find that individuals would donate their data even when at risk of getting leaked. The willingness to provide personal data depends on the risk level of a data leak but not on a realistic impact of the data on social welfare. Individuals are less willing to donate their data to the private industry than to academia or the government. Finally, individuals are not sensitive to whether the data are processed by a human-supervised or a self-learning smart assistant.
    Keywords: data philanthropy, sustainable development, decision-making, privacy, environmental protection, public health
    JEL: C71 H41 I18 O35 Q56
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:ces:ceswps:_8926&r=all
  18. By: Raymond C. W. Leung; Yu-Man Tam
    Abstract: How to hedge factor risks without knowing the identities of the factors? We first prove a general theoretical result: even if the exact set of factors cannot be identified, any risky asset can use some portfolio of similar peer assets to hedge against its own factor exposures. A long position of a risky asset and a short position of a "replicate portfolio" of its peers represent that asset's factor residual risk. We coin the expected return of an asset's factor residual risk as its Statistical Arbitrage Risk Premium (SARP). The challenge in empirically estimating SARP is finding the peers for each asset and constructing the replicate portfolios. We use the elastic-net, a machine learning method, to project each stock's past returns onto that of every other stock. The resulting high-dimensional but sparse projection vector serves as investment weights in constructing the stocks' replicate portfolios. We say a stock has high (low) Statistical Arbitrage Risk (SAR) if it has low (high) R-squared with its peers. The key finding is that "unique" stocks have both a higher SARP and higher excess returns than "ubiquitous" stocks: in the cross-section, high SAR stocks have a monthly SARP (monthly excess returns) that is 1.101% (0.710%) greater than low SAR stocks. The average SAR across all stocks is countercyclical. Our results are robust to controlling for various known priced factors and characteristics.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.09987&r=all
  19. By: Hou, Bohan
    Abstract: The digital economy has become one of the most important sectors in global GDP.Personal data is the new asset class that creates value through the applications of cybertechnologies and Artificial Intelligence. However, there are increasing concerns over the privacy invasions and human rights violations associated with the exploitation of personal data. Various data laws were made in nations to balance the data fluidity and privacy protections. However, most laws have inherent limitations and underenforcement issues that fail to achieve their aims and protection principles. Utilizing a behavioral economics theoretical framework, this study categorizes the issues and causes to Information Asymmetry, Bounded Rationality, Power Imbalance, and Technical Incapacity. The study makes a novel contribution by proposing a global data governance scheme to address the limitations of data laws. The scheme adopts a Libertarian Paternalism approach and develops seven principles in the framework design. Elements and components in the scheme include individuals, data controllers, privacy rating frameworks, meta-data and privacy configuration, reports, Automated Consent Management (ACM), Bureaus, and signatures, etc. The components will operate on an interoperable and global data management platform. Visual diagrams are developed to describe the various forms of interactions between components and procedures. A balance between privacy protection and data fluidity is found through experimental scenarios such as Ordinary Data Request, Sensitive Data Request, Inconsistency Checks, Data Rights Exercise, Monitored Data Transfer, Broadcast and Notice. The scenarios analyzed are not exhaustive but serve as the meaningful starting point to inspire more designs and discussions from scholars.
    Date: 2021–01–25
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:2b9dc&r=all
  20. By: Tobias Cagala; Ulrich Glogowsky; Johannes Rincke; Anthony Strittmatter
    Abstract: This paper studies optimal targeting as a means to increase fundraising efficacy. We randomly provide potential donors with an unconditional gift and use causal-machine learning techniques to "optimally" target this fundraising tool to the predicted net donors: individuals who, in expectation, give more than their solicitation costs. With this strategy, our fundraiser avoids lossy solicitations, significantly boosts available funds, and, consequently, can increase service and goods provision. Further, to realize these gains, the charity can merely rely on readily available data. We conclude that charities that refrain from fundraising targeting waste significant resources.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.10251&r=all
  21. By: Matteo Garzoli (University of Lugano); Alberto Plazzi (Swiss Finance Institute; Universita' della Svizzera italiana); Rossen I. Valkanov (University of California, San Diego (UCSD) - Rady School of Management)
    Abstract: The Case-Shiller is the reference repeat-sales index for the U.S. residential real estate market, yet it is released with a two-month delay. We find that incorporating recent information from 71 financial and macro predictors improves backcasts, now-casts, and short-term forecasts of the index returns. Combining individual forecasts with recently-proposed weighting schemes delivers large improvements in forecast accuracy at all horizons. Additional gains obtain with mixed-data sampling methods that exploit the daily frequency of financial variables, reducing the mean square forecast error by as much as 13% compared to a simple autoregressive benchmark. The forecast improvements are largest during economic turmoils, throughout the 2020 COVID-19 pandemic period, and in more populous metropolitan areas.
    Keywords: Real estate, Case-Shiller, MIDAS, Forecasting, Big Data
    JEL: C22 C53 R30
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:chf:rpseri:rp2121&r=all
  22. By: Madalina Botina (Ovidius University of Constanta, Romania); Marilena Marin (Ovidius University of Constanta, Romania)
    Abstract: This paper proposes the analysis of a situation that may arise in the matter of personal data, when we talk about the protection of such data, as well as about the applicable legislation, referring to those legal instruments for the protection of personal data. Since the implementation of the legal texts also implies the confrontation with the reality or with the factual situation, the working hypothesis we propose is that of the limitations that appear regarding the exercise of what we generically call “human rights†. These limitations and the way in which the legislation has the capacity to deal in particular with respect for human rights, are challenges that we will analyze in our paper. As a working method, we chose qualitative analysis, observation and comparison, using various types of normative acts applicable in European countries. As a subject of analysis, I preferred the legislation within the European Union, as well as the Romanian legislation.
    Keywords: human rights, personal data, sensitive data, regulation, directive, legal instruments
    Date: 2021–01
    URL: http://d.repec.org/n?u=RePEc:smo:conswp:032mb&r=all

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.