nep-big New Economics Papers
on Big Data
Issue of 2019‒05‒13
thirteen papers chosen by
Tom Coupé
University of Canterbury

  1. Estimation and Updating Methods for Hedonic Valuation By Michael Mayer; Steven C. Bourassa; Martin Hoesli; Donato Flavio Scognamiglio
  2. Battling Antibiotic Resistance: Can Machine Learning Improve Prescribing? By Michael A. Ribers; Hannes Ullrich
  3. Computing a Data Dividend By Eric Bax
  4. Compliance effects of risk-based tax audits By Knut Løyland; Oddbjørn Raaum; Gaute Torsvik; Arnstein Øvrum
  5. Decision Making with Machine Learning and ROC Curves By Kai Feng; Han Hong; Ke Tang; Jingyuan Wang
  6. Impact of Artificial Intelligence on Businesses: from Research, Innovation, Market Deployment to Future Shifts in Business Models By Neha Soni; Enakshi Khular Sharma; Narotam Singh; Amita Kapoor
  7. Working women and caste in India: A study of social disadvantage using feature attribution By Kuhu Joshi; Chaitanya K. Joshi
  8. Land rights, agricultural productivity, and deforestation in Viet Nam By Carney Conor; Abman Ryan
  9. Economics of Voluntary Information Sharing By Liberti, Jose; Sturgess, Jason; Sutherland, Andrew
  10. Statistical Learning and Exchange Rate Forecasting By Emilio Colombo; Matteo Pelagatti
  11. Themes and topics in parliamentary oversight hearings: a new direction in textual data analysis By Sanders, James; Lisi, Giulio; Schonhardt-Bailey, Cheryl
  12. The rhetoric of recessions: how British newspapers talk about the poor when unemployment rises, 1896–2000 By McArthur, Daniel; Reeves, Aaron
  13. A Policy Compass for Ecological Economics By Mich\`ele Friend

  1. By: Michael Mayer (Consult AG Bern); Steven C. Bourassa (Florida Atlantic University); Martin Hoesli (University of Geneva - Geneva School of Economics and Management (GSEM); Swiss Finance Institute; University of Geneva - Research Center for Statistics; University of Aberdeen - Business School); Donato Flavio Scognamiglio (University of Berne, Institut für Finanzmanagement)
    Abstract: Purpose – We use a large and rich data set consisting of over 123,000 single-family houses sold in Switzerland between 2005 and 2017 to investigate the accuracy and volatility of different methods for estimating and updating hedonic valuation models. Design/methodology/approach – We apply six estimation methods (linear least squares, robust regression, mixed effects regression, random forests, gradient boosting, and neural networks) and two updating methods (moving and extending windows). Findings – The gradient boosting method yields the greatest accuracy while the robust method provides the least volatile predictions. There is a clear trade-off across methods depending on whether the goal is to improve accuracy or avoid volatility. The choice between moving and extending windows has only a modest effect on the results. Originality/value – This paper compares a range of linear and machine learning techniques in the context of moving or extending window scenarios that are used in practice but which have not been considered in prior research. The techniques include robust regression, which has not previously been used in this context. The data updating allows for analysis of the volatility in addition to the accuracy of predictions. The results should prove useful in improving hedonic models used by property tax assessors, mortgage underwriters, valuation firms, and regulatory authorities.
    Keywords: Hedonic models, Appraisal accuracy, Appraisal volatility, Machine learning, Robust regression, Mixed effects models, Random forests, Gradient boosting, Neural networks
    JEL: R31 C45 C53
    Date: 2018–12
  2. By: Michael A. Ribers; Hannes Ullrich
    Abstract: Antibiotic resistance constitutes a major health threat. Predicting bacterial causes of infections is key to reducing antibiotic misuse, a leading cause of antibiotic resistance. We combine administrative and microbiological laboratory data from Denmark to train a machine learning algorithm predicting bacterial causes of urinary tract infections. Based on predictions, we develop policies to improve prescribing in primary care, highlighting the relevance of physician expertise and time-variant patient distributions for policy implementation. The proposed policies delay prescriptions for some patients until test results are known and give them instantly to others. We find that machine learning can reduce antibiotic use by 7.42 percent without reducing the number of treated bacterial infections. As Denmark is one of the most conservative countries in terms of antibiotic use, targeting a 30 percent reduction in prescribing by 2020, this result is likely to be a lower bound of what can be achieved elsewhere.
    Keywords: Antibiotic prescribing; prediction policy; machine learning; expert decision-making
    JEL: C10 I11 I18 L38 O38 Q28
    Date: 2019
  3. By: Eric Bax
    Abstract: Quality data is a fundamental contributor to success in statistics and machine learning. If a statistical assessment or machine learning leads to decisions that create value, data contributors may want a share of that value. This paper presents methods to assess the value of individual data samples, and of sets of samples, to apportion value among different data contributors. We use Shapley values for individual samples and Owen values for combined samples, and show that these values can be computed in polynomial time in spite of their definitions having numbers of terms that are exponential in the number of samples.
    Date: 2019–05
  4. By: Knut Løyland; Oddbjørn Raaum; Gaute Torsvik; Arnstein Øvrum
    Abstract: Tax administrations use machine learning to predict risk scores as a basis for selecting individual taxpayers for audit. Audits detect noncompliance immediately, but may also alter future filing behavior. This analysis is the first to estimate compliance effects of audits among high-risk wage earners. We exploit a sharp audit assignment discontinuity in Norway based on individual tax payers risk score. Additional data from a random audit allow us to estimate how the audit effect vary across the risk score distribution. We show that the current risk score audit threshold is set far above the one that maximizes net public revenue.
    Keywords: tax audits, tax revenue, tax reporting decisions, income tax, machine learning, risk profiling
    JEL: D04 H26 H83
    Date: 2019
  5. By: Kai Feng; Han Hong; Ke Tang; Jingyuan Wang
    Abstract: The Receiver Operating Characteristic (ROC) curve is a representation of the statistical information discovered in binary classification problems and is a key concept in machine learning and data science. This paper studies the statistical properties of ROC curves and its implication on model selection. We analyze the implications of different models of incentive heterogeneity and information asymmetry on the relation between human decisions and the ROC curves. Our theoretical discussion is illustrated in the context of a large data set of pregnancy outcomes and doctor diagnosis from the Pre-Pregnancy Checkups of reproductive age couples in Henan Province provided by the Chinese Ministry of Health.
    Date: 2019–05
  6. By: Neha Soni; Enakshi Khular Sharma; Narotam Singh; Amita Kapoor
    Abstract: The fast pace of artificial intelligence (AI) and automation is propelling strategists to reshape their business models. This is fostering the integration of AI in the business processes but the consequences of this adoption are underexplored and need attention. This paper focuses on the overall impact of AI on businesses - from research, innovation, market deployment to future shifts in business models. To access this overall impact, we design a three-dimensional research model, based upon the Neo-Schumpeterian economics and its three forces viz. innovation, knowledge, and entrepreneurship. The first dimension deals with research and innovation in AI. In the second dimension, we explore the influence of AI on the global market and the strategic objectives of the businesses and finally, the third dimension examines how AI is shaping business contexts. Additionally, the paper explores AI implications on actors and its dark sides.
    Date: 2019–05
  7. By: Kuhu Joshi; Chaitanya K. Joshi
    Abstract: Women belonging to the socially disadvantaged caste-groups in India have historically been engaged in labour-intensive, blue-collar work. We study whether there has been any change in the ability to predict a woman's work-status and work-type based on her caste by interpreting machine learning models using feature attribution. We find that caste is now a less important determinant of work for the younger generation of women compared to the older generation. Moreover, younger women from disadvantaged castes are now more likely to be working in white-collar jobs.
    Date: 2019–04
  8. By: Carney Conor; Abman Ryan
    Abstract: This paper studies the relationship between land tenure for smallholder agriculture and deforestation in Viet Nam. We combine high resolution satellite data on deforestation with rich household and commune-level, biannual panel data.We study two margins of tenure security, whether a household has any land title (extensive) and the share of a household’s land held in title (intensive). Using a household-fixed effects model, we find the increases in crop production and land investment associated with holding land title are driven by the intensive margin.We then aggregate the survey data to the commune-level and find evidence that marginal increases in extensive tenure (share of households with any land title) increase deforestation holding constant the average intensive tenure (average share of land held in tenure among those with land title).We find some evidence that increasing the intensive margin of tenure (holding constant the extensive tenure) decreases deforestation. These results present a more nuanced view of the tenuredeforestation relationship than is prevalent in the existing literature.
    Keywords: Agricultural productivity,Deforestation,Land tenure
    Date: 2018
  9. By: Liberti, Jose; Sturgess, Jason; Sutherland, Andrew
    Abstract: We show that lenders join a U.S. commercial credit bureau when information asymmetries between incumbents and entrants create an adverse selection problem that hinders market entry. Lenders also delay joining when information asymmetries protect them from competition in existing markets, consistent with lenders trading off new market entry against heightened competition. We exploit shocks to information coverage to show that lenders enter new markets after joining the bureau in a pattern consistent with this trade-off. Our results illuminate why intermediaries voluntarily share information and show how financial technology that mitigates information asymmetries can shape the boundaries of lending.
    Keywords: information sharing, adverse selection, specialization, financial intermediation, collateral, credit bureaus, fintech
    JEL: D43 D82 G21 G23 G32
    Date: 2018
  10. By: Emilio Colombo; Matteo Pelagatti
    Abstract: his study uses the most innovative tools recently proposed in the statistical learning literature to assess the ability of standard exchange rate models to predict the exchange rate in the short and long run. Our results show that statistical learning methods display impressive performances, consistently outperforming the random walk in forecasting the exchange rate at different forecasting horizons, with the exception of the very short term (a period of 1-2 months). We use these tools to compare the predictive ability of different exchange rate models and model specifications. We find that sticky price versions of the monetary model with the error correction specification exhibit the best performance. We also explore the functioning of statistical learning models by developing measures of variable importance and by analyzing the kind of relationship that links each variable with the outcome. This allows us to improve our understanding of the relationship between the exchange rate and economic fundamentals, which appears complex and characterized by strong non-linearities.
    JEL: F37 C53
    Date: 2019
  11. By: Sanders, James; Lisi, Giulio; Schonhardt-Bailey, Cheryl
    Abstract: This paper contributes to the growing empirical work on deliberation in legislatures by proposing a novel approach to analysing parliamentary hearings using both thematic and topic modelling textual analysis software. We explore variations in deliberative quality across economic policy type (fiscal policy, monetary policy and financial stability) and across parliamentary chambers (Commons and Lords) in UK select committee oversight hearings during the 2010–2015 Parliament. Our overall focus is not only to suggest a multi-method approach to the textual analysis of parliamentary data, but also to explore more substantive aspects of parliamentary oversight, such as: (1) the extent to which oversight varies between unelected and elected policy makers; and (2) whether parliamentarians conduct oversight more forcefully or more along partisan lines when they are challenging fellow politicians as opposed to central bank officials. Our findings suggest consistent differences in deliberative styles between types of hearings (fiscal, monetary, financial stability) and between chambers (Commons, Lords).
    JEL: C1
    Date: 2018–04–19
  12. By: McArthur, Daniel; Reeves, Aaron
    Abstract: Recessions appear to coincide with an increasingly stigmatising presentation of poverty in parts of the media. Previous research on the connection between high unemployment and media discourse has often relied on case studies of periods when stigmatising rhetoric about the poor was increasing. We build on earlier work on how economic context affects media representations of poverty by creating a unique dataset that measures how often stigmatising descriptions of the poor are used in five centrist and right-wing British newspapers between 1896 and 2000. Our results suggest stigmatising rhetoric about the poor increases when unemployment rises, except at the peak of very deep recessions (e.g. the 1930s and 1980s). This pattern is consistent with the idea that newspapers deploy deeply embedded Malthusian explanations for poverty when those ideas resonate with the economic context, and so this stigmatising rhetoric of recessions is likely to recur during future economic crises.
    Keywords: poverty; print media; recession; stigma; unemployment
    JEL: N0
    Date: 2019–04–09
  13. By: Mich\`ele Friend
    Abstract: A policy compass indicates the direction in which an institution is going in terms of three general qualities. The three qualities are: suppression, harmony and passion. Any formal institution can develop a policy compass to examine the discrepancy between what the institution would like to do (suggested in its mandate) and the actual performance and situation it finds itself in. The latter is determined through an aggregation of statistical data and facts. These are made robust and stable using meta-requirements of convergence. Here, I present a version of the compass adapted to embed the central ideas of ecological economics: that society is dependent on the environment, and that economic activity is dependent on society; that we live in a world subject to at least the first two laws of thermodynamics; that the planet we live on is limited in space and resources; that some of our practices have harmful and irreversible consequences on the natural environment; that there are values other than value in exchange, such as intrinsic value and use value. In this paper, I explain how to construct a policy compass in general. This is followed by the adaptation for ecological economics. The policy compass is original, and so is the adaptation. The compass is inspired by the work of Anthony Friend, Rob Hoffman, Satish Kumar, Georgescu-Roegen, Stanislav Schmelev, Peter S\"oderbaum and Arild Vatn. In the conclusion, I discuss the accompanying conception of sustainability.
    Date: 2019–03

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.