nep-big New Economics Papers
on Big Data
Issue of 2019‒03‒25
eighteen papers chosen by
Tom Coupé
University of Canterbury

  1. The Market for Data Privacy By Ramadorai, Tarun; Uettwiller, Antoine; Walther, Ansgar
  2. Leveraging Loyalty Programs Using Competitor Based Targeting By Hollenbeck, Brett; Taylor, Wayne
  3. Controlling Algorithmic Collusion: short review of the literature, undecidability, and alternative approaches By João E. Gata
  4. Machine Learning Risk Models By Zura Kakushadze; Willie Yu
  5. Data-driven Neural Architecture Learning For Financial Time-series Forecasting By Dat Thanh Tran; Juho Kanniainen; Moncef Gabbouj; Alexandros Iosifidis
  6. A novel machine learning approach for identifying the drivers of domestic electricity users' price responsiveness By Peiyang Guo; Jacqueline CK Lam; Victor OK Li
  7. BBVA big data on online credit card transactions: The patterns of domestic and cross-border e-commerce By OECD
  8. Countries' perceptions of China's Belt and Road initiative: A big data analysis By Alicia Garcia-Herrero; Jianwei Xu
  9. Multimodal Deep Learning for Finance: Integrating and Forecasting International Stock Markets By Sang Il Lee; Seong Joon Yoo
  10. A fast method for pricing American options under the variance gamma model By Weilong Fu; Ali Hirsa
  11. Forecasting Equity Index Volatility by Measuring the Linkage among Component Stocks By Qiu, Yue; Xie, Tian; Yu, Jun; Zhou, Qiankun
  12. Estimating the Heterogeneous Impact of the Free Movement of Persons on Relative Wage Mobility By Naguib, Costanza
  13. China's Response to Nuclear Safety Post-Fukushima: Genuine or Rhetoric? By Jacqueline CK Lam; Lawrence YL Cheung; Y. Han; SS Wang
  14. Digitalization for Energy Access in Sub-Saharan Africa : Challenges, Opportunities and Potential Business Models By Mazzoni, Davide
  15. Digitalization for Energy Access in Sub-Saharan Africa : Challenges, Opportunities and Potential Business Models By Davide Mazzoni
  16. Bayesian MIDAS Penalized Regressions: Estimation, Selection, and Prediction By Matteo Mogliani
  17. Administrative Data Linking and Statistical Power Problems in Randomized Experiments By Sarah Tahamont; Zubin Jelveh; Aaron Chalfin; Shi Yan; Benjamin Hansen
  18. Use of Evidence to Drive Decision-Making in Government By Julieta Lugo-Gil; Dana Jean-Baptiste; Livia Frasso Jaramillo

  1. By: Ramadorai, Tarun; Uettwiller, Antoine; Walther, Ansgar
    Abstract: We scrape a comprehensive set of US firms' privacy policies to facilitate research on the supply of data privacy. We analyze these data with the help of expert legal evaluations, and also acquire data on firms' web tracking activities. We find considerable and systematic variation in privacy policies along multiple dimensions including ease of access, length, readability, and quality, both within and between industries. Motivated by a simple theory of big data acquisition and usage, we analyze the relationship between firm size, knowledge capital intensity, and privacy supply. We find that large firms with intermediate data intensity have longer, legally watertight policies, but are more likely to share user data with third parties.
    Keywords: data markets; privacy; third-party sharing; web tracking
    JEL: D8 K2 L1
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:cpr:ceprdp:13588&r=all
  2. By: Hollenbeck, Brett; Taylor, Wayne
    Abstract: Loyalty programs are widely used by firms but their effectiveness is subject to debate. These programs provide discounts and perks to loyal customers and are costly to administer, and with uncertain effectiveness at increasing spending or stealing business from rivals. We use a large new dataset on retail purchases before and after joining a loyalty program (LP) at the customer level to evaluate what determines LP effectiveness. We exploit detailed spatial data on customer and store locations, including locations of competing firms. A simple analysis shows that location relative to competitors is the strongest predictor of LP effectiveness, suggesting that LPs work primarily through business stealing and not through other demand expansion. We next estimate what variables best predict LP effectiveness using high-dimensional data on spatial relationships between customers, the focal firm’s stores, and competing stores as well as customers’ historical spending patterns. We use LASSO regularization to show that spatial relationships are more predictive of LP effects than are past sales data. Finally, we show how firms can use this type of predictive analytics model to leverage customer and competitor location data to substantially increase the performance of their LP through spatially driven targeting rules.
    Keywords: Loyalty programs, predictive analytics, spatial models, retail competition, machine learning
    JEL: C45 C52 L13 L21 M31
    Date: 2019
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:92900&r=all
  3. By: João E. Gata
    Abstract: Algorithms have played an increasingly important role in economic activity, as they becoming faster and smarter.Together with the increasing use of ever larger data sets, they may lead to significant changes in the way markets work. These developments have been raising concerns not only over the rights to privacy and consumers’ autonomy, but also on competition. Infringements of antitrust laws involving the use of algorithms have occurred in the past. However, current concerns are of a different nature as they relate to the role algorithms can play as facilitators of collusive behavior in repeated games, and the role increasingly sophisticated algorithms can play as autonomous implementers of pricing strategies, learning to collude without any explicit instructions provided by humanagents. In particular, it is recognized that the use of ‘learning algorithms’ can facilitate tacit collusion and lead to an increased blurring of borders between tacit and explicit collusion. Several authors who have addressed the possibilities for achieving tacit collusion equilibrium outcomes by algorithms interacting autonomously, have also consideredsome form of ex-ante assessment and regulation over the type of algorithms used by firms. By using well-known resultsin the theory of computation, I showthat such option faces serious challenges to its effectivenessdue to undecidability results. Ex-post assessment may be constrained as well. Notwithstanding several challenges face by current software testing methodologies, competition law enforcement and policy have much to gain from an interdisciplinary collaboration with computer science and mathematics.
    Keywords: Collusion, Antitrust, Algorithms, Finite Automaton, Turing Machine, Church-Turing Thesis, Halting Problem, Recursiveness, Undecidability.
    JEL: D43 D83 K21 L41
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:ise:remwps:wp0772019&r=all
  4. By: Zura Kakushadze; Willie Yu
    Abstract: We give an explicit algorithm and source code for constructing risk models based on machine learning techniques. The resultant covariance matrices are not factor models. Based on empirical backtests, we compare the performance of these machine learning risk models to other constructions, including statistical risk models, risk models based on fundamental industry classifications, and also those utilizing multilevel clustering based industry classifications.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1903.06334&r=all
  5. By: Dat Thanh Tran; Juho Kanniainen; Moncef Gabbouj; Alexandros Iosifidis
    Abstract: Forecasting based on financial time-series is a challenging task since most real-world data exhibits nonstationary property and nonlinear dependencies. In addition, different data modalities often embed different nonlinear relationships which are difficult to capture by human-designed models. To tackle the supervised learning task in financial time-series prediction, we propose the application of a recently formulated algorithm that adaptively learns a mapping function, realized by a heterogeneous neural architecture composing of Generalized Operational Perceptron, given a set of labeled data. With a modified objective function, the proposed algorithm can accommodate the frequently observed imbalanced data distribution problem. Experiments on a large-scale Limit Order Book dataset demonstrate that the proposed algorithm outperforms related algorithms, including tensor-based methods which have access to a broader set of input information.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1903.06751&r=all
  6. By: Peiyang Guo (Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong); Jacqueline CK Lam (Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong); Victor OK Li (Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong)
    Keywords: Time-based electricity pricing, price responsiveness, high-potential users, variable selection, Time of Use, machine learning
    JEL: Q41
    Date: 2018–08
    URL: http://d.repec.org/n?u=RePEc:enp:wpaper:eprg1824&r=all
  7. By: OECD
    Abstract: This report uses a standard gravity setup to analyse the determinants of e-commerce, using data on online credit card payments by private Spanish customers of the multinational bank BBVA. The results show that the gravity model applies well to credit card payments, explaining up to 95% of the variation in the data. The analysis finds potentially large border effects for trade between any two regions or countries, implying that individuals tend to purchase more from their home region or domestically than from other places. The estimates also suggest that the effect of distance might be slightly less important for e-commerce transactions than for offline trade, although the death of distance hypothesis is clearly rejected by the data.
    Date: 2019–03–08
    URL: http://d.repec.org/n?u=RePEc:oec:stiaab:278-en&r=all
  8. By: Alicia Garcia-Herrero (Adjunct Professor, Department of Economics, Hong Kong University of Science and Technology; Chief Economist for Asia Pacific, NATIXIS; Institute for Emerging Market Studies , Hong Kong University of Science and Technology); Jianwei Xu (Associate professor, Beijing Normal University)
    Abstract: Drawing on a global database of media articles, we quantitatively assess perceptions of ChinaÃs Belt and Road Initiative (BRI) in different countries and regions. We find that the BRI is generally positively received. All regions as a whole, except South Asia, have a positive perception of the BRI, but there are marked differences at the country level, with some countries in all regions having very negative views. Interestingly, there is no significant difference in perceptions of the BRI between countries that officially participate in the BRI and those that do not. We also use our dataset of media articles to identify the topics that are most frequently associated with the BRI. The most common topics are trade and investment. Finally, we use regression analysis to identify how the frequency with which these topics are discussed in the news affects the perceptions of the BRI in different countries. We find that the more frequently trade is mentioned in the media, the more negative a countryÃs perception of the BRI tends to be. On the other hand, while investment under the BRI seems also to attract attention in the media, it is not statistically relevant for countriesà perceptions of the BRI.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:hku:wpaper:201959&r=all
  9. By: Sang Il Lee; Seong Joon Yoo
    Abstract: Stock prices are influenced by numerous factors. We present a method to combine these factors and we validate the method by taking the international stock market as a case study. In today's increasingly international economy, return and volatility spillover effects across international equity markets are major macroeconomic drivers of stock dynamics. Thus, foreign market information is one of the most important factors in forecasting domestic stock prices. However, the cross-correlation between domestic and foreign markets is so complex that it would be extremely difficult to express it explicitly with a dynamical equation. In this study, we develop stock return prediction models that can jointly consider international markets, using multimodal deep learning. Our contributions are three-fold: (1) we visualize the transfer information between South Korea and US stock markets using scatter plots; (2) we incorporate the information into stock prediction using multimodal deep learning; (3) we conclusively show that both early and late fusion models achieve a significant performance boost in comparison with single modality models. Our study indicates that considering international stock markets jointly can improve prediction accuracy, and deep neural networks are very effective for such tasks.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1903.06478&r=all
  10. By: Weilong Fu; Ali Hirsa
    Abstract: We investigate methods for pricing American options under the variance gamma model. The variance gamma process is a pure jump process which is constructed by replacing the calendar time by the gamma time in a Brownian motion with drift, which makes it a time-changed Brownian motion. In general, the finite difference method and the simulation method can be used for pricing under this model, but their speed is not satisfactory. So there is a need for fast but accurate approximation methods. In the case of Black-Merton-Scholes model, there are fast approximation methods, but they cannot be utilized for the variance gamma model. We develop a new fast method inspired by the quadratic approximation method, while reducing the error by making use of a machine learning technique on pre-calculated quantities. We compare the performance of our proposed method with those of the existing methods and show that this method is efficient and accurate for practical use.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1903.07519&r=all
  11. By: Qiu, Yue (WISE and School of Economics, Xiamen University); Xie, Tian (School of Economics, Singapore Management University); Yu, Jun (School of Economics and Lee Kong Chian School of Business, Singapore Management University); Zhou, Qiankun (Department of Economics, Louisiana State University)
    Abstract: The linkage among the realized volatilities across component stocks are important when modeling and forecasting the relevant index volatility. In this paper, the linkage is measured via an extended Common Correlated Effects (CCE) approach under a panel heterogeneous autoregression model where unobserved common factors in errors are assumed. Consistency of the CCE estimator is obtained. The common factors are extracted using the principal component analysis. Empirical studies show that realized volatility models exploiting the linkage effects lead to significantly better out-of-sample forecast performance, for example, an up to 32% increase in the pseudo R2. We also conduct various forecasting exercises on the the linkage variables that compare conventional regression methods with popular machine learning techniques.
    Keywords: Volatility Forecasting; Heterogeneous autoregression; Common correlated effect; Factor analysis; Random forest
    JEL: C31 C32 G12 G17
    Date: 2019–03–02
    URL: http://d.repec.org/n?u=RePEc:ris:smuesw:2019_007&r=all
  12. By: Naguib, Costanza
    Abstract: We analyse the impact of an inflow of foreign workers on positional wage mobility in a small open economy like Switzerland. We exploit the quasi-natural experiment constituted by the entry into force of the Agreement on the Free Movement of Persons between Switzerland and the EU on 1st June 2002. We compute conditional average treatment effects with machine learning methods, and we find evidence of relevant heterogeneity in the impact of this policy on wage mobility.
    Keywords: Wage mobility, Bilateral Agreements, causal forest, Conditional Average Treatment Effect
    JEL: C14 J31
    Date: 2019–02
    URL: http://d.repec.org/n?u=RePEc:usg:econwp:2019:03&r=all
  13. By: Jacqueline CK Lam (Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong); Lawrence YL Cheung (Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, NT, Hong Kong .); Y. Han (Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong); SS Wang (Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong)
    Keywords: nuclear safety, media focus, computational text analysis, regulatory governance, safety management
    JEL: C89 Q42 Q48
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:enp:wpaper:eprg1834&r=all
  14. By: Mazzoni, Davide
    Abstract: Innovative business models supported by digital technologies, together with the widening connectivity and data collection, are already giving a big contribution in fostering the access to electricity and clean cooking in Sub-Saharan Africa. This paper gives an overview on the actual state of energy access in Sub-Saharan Africa and the current technologies used to provide it, followed by a description of the key trends and drivers of the ongoing African digital transformation. A deep analysis of the Pay-as-you-go business model in the off-grid solar sector will shed light on how this transformation started some years ago and the way it is affecting society in many ways. Strengths and opportunities — as well as weaknesses and risks of the model — are provided through a screening of the most representative business experiences in East and West Africa, financial aspects and market analysis. The perspective of both companies and end-users have been considered here. The last section gives recommendations to policy-makers on how to ride the wave of digitalization to foster the access to clean and reliable energy, by acting on the electrification planning, regulations, business environment, distribution channels and mobile money environment.
    Keywords: Resource /Energy Economics and Policy
    Date: 2019–03–19
    URL: http://d.repec.org/n?u=RePEc:ags:feemfe:285024&r=all
  15. By: Davide Mazzoni (Fondazione Eni Enrico Mattei)
    Abstract: Innovative business models supported by digital technologies, together with the widening connectivity and data collection, are already giving a big contribution in fostering the access to electricity and clean cooking in Sub-Saharan Africa. This paper gives an overview on the actual state of energy access in Sub-Saharan Africa and the current technologies used to provide it, followed by a description of the key trends and drivers of the ongoing African digital transformation. A deep analysis of the Pay-as-you-go business model in the off-grid solar sector will shed light on how this transformation started some years ago and the way it is affecting society in many ways. Strengths and opportunities — as well as weaknesses and risks of the model — are provided through a screening of the most representative business experiences in East and West Africa, financial aspects and market analysis. The perspective of both companies and end-users have been considered here. The last section gives recommendations to policy-makers on how to ride the wave of digitalization to foster the access to clean and reliable energy, by acting on the electrification planning, regulations, business environment, distribution channels and mobile money environment.
    Keywords: Energy Access, Digitalization, PAYGO, Business Models, Africa, Digital Transformation
    JEL: O13 O33 O55 M13 Q40 Q48
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:fem:femwpa:2019.02&r=all
  16. By: Matteo Mogliani
    Abstract: We propose a new approach to mixed-frequency regressions in a high-dimensional environment that resorts to Group Lasso penalization and Bayesian techniques for estimation and inference. To improve the sparse recovery ability of the model, we also consider a Group Lasso with a spike-and-slab prior. Penalty hyper-parameters governing the model shrinkage are automatically tuned via an adaptive MCMC algorithm. Simulations show that the proposed models have good selection and forecasting performance, even when the design matrix presents high cross-correlation. When applied to U.S. GDP data, the results suggest that financial variables may have some, although limited, short-term predictive content.
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1903.08025&r=all
  17. By: Sarah Tahamont; Zubin Jelveh; Aaron Chalfin; Shi Yan; Benjamin Hansen
    Abstract: The increasing availability of administrative data has led to a particularly exciting innovation in public policy research, that of the “low-cost” randomized trial in which administrative data are used to measure outcomes in lieu of costly primary data collection. Linking data from an experimental intervention to administrative records that track outcomes of interest typically requires matching datasets without a common unique identifier. In order to minimize mistaken linkages, researchers will often use “exact matching” (retaining an individual only if all their demographic variables match exactly in two or more datasets) in order to ensure that speculative matches do not lead to errors in an analytic dataset. We argue that when this approach is used to detect the presence of a binary outcome, this seemingly conservative approach leads to attenuated estimates of treatment effects, and critically, to underpowered experiments. For marginally powered studies, which are common in empirical social science, exact matching is particularly problematic. In this paper, we derive an analytic result for the consequences of linking errors on statistical power and show how the problem varies across different combinations of relevant inputs, including the matching error rate, the outcome density and the sample size. We conclude on an optimistic note by showing that machine learning-based probabilistic matching algorithms allow researchers to recover a considerable share of the statistical power that is lost to errors in data linking.
    JEL: C1 C12 K42
    Date: 2019–03
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:25657&r=all
  18. By: Julieta Lugo-Gil; Dana Jean-Baptiste; Livia Frasso Jaramillo
    Abstract: This report presents the findings from the Policy Analysis and Decision-Making Capacity project, funded by the Office of Science and Data Policy within the Office of the Assistant Secretary for Planning and Evaluation at the U.S. Department of Health and Human Services.
    Keywords: evidence-based decision-making, evaluation, data use, knowledge brokers, learning agendas, evidence producers
    JEL: I
    URL: http://d.repec.org/n?u=RePEc:mpr:mprres:b84e939e19cf4e058572a738a03d1a4d&r=all

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.