nep-big New Economics Papers
on Big Data
Issue of 2019‒12‒09
nineteen papers chosen by
Tom Coupé
University of Canterbury

  1. Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making By Veale, Michael; Van Kleek, Max; Binns, Reuben
  2. Spread the Word: International Spillovers from Central Bank Communication By Hanna Armelius; Christoph Bertsch; Isaiah Hull; Xin Zhang
  3. AI and Robotics Implications for the Poor By von Braun, Joachim
  4. Multi-Scale RCNN Model for Financial Time-series Classification By Liu Guang; Wang Xiaojie; Li Ruifan
  5. Insights from self-organizing maps for predicting accessibility demand for healthcare infrastructure By Mayaud, Jerome; Anderson, Sam; Tran, Martino; Radic, Valentina
  6. Towards Quantification of Explainability in Explainable Artificial Intelligence Methods By Sheikh Rabiul Islam; William Eberle; Sheikh K. Ghafoor
  7. Investigating the Impacts of Customer Experience and Attribute Performances on Overall Ratings using Online Review Data: Nonlinear Estimation and Visualization with a Neural Network By Toshikuni Sato
  8. Exports and imports in Zimbabwe: recent insights from artificial neural networks By NYONI, THABANI
  9. Logics and practices of transparency and opacity in real-world applications of public sector machine learning By Veale, Michael
  10. Crowdsourced Quantification and Visualization of Urban Mobility Space Inequality By Szell, Michael
  11. Passive Diagnosis of Mental Health Disorders Incorporating an Empathic Dialogue System By Delahunty, Fionn; Arcan, Mihael; Johansson, Robert
  12. Entrepreneurship Intention Prediction using Decision Tree and Support Vector Machine By Siahaan, Andysah Putera Utama; Nasution, Muhammad Dharma Tuah Putra
  13. Modeling UK Mortgage Demand Using Online Searches By Jaroslav Pavlicek; Ladislav Kristoufek
  14. The complexity of the intangible digital economy: an agent-based model By Bertani, Filippo; Ponta, Linda; Raberto, Marco; Teglio, Andrea; Cincotti, Silvano
  15. Using Facebook Ad Data to Track the Global Digital Gender Gap By Fatehkia, Masoomali; Kashyap, Ridhi; Weber, Ingmar
  16. Deep Reinforcement Learning for Trading By Zihao Zhang; Stefan Zohren; Stephen Roberts
  17. Investigating bankruptcy prediction models in the presence of extreme class imbalance and multiple stages of economy By Sheikh Rabiul Islam; William Eberle; Sheikh K. Ghafoor; Sid C. Bundy; Douglas A. Talbert; Ambareen Siraj
  18. Mapping without a map: Exploring the UK business landscape using unsupervised learning By Stathoulopoulos, Kostas; Mateos-Garcia, Juan
  19. A method for dealing with regional differences in population size when interpreting slopes in Google Trends query data By McCallum, Malcolm L

  1. By: Veale, Michael; Van Kleek, Max; Binns, Reuben
    Abstract: Cite as: Michael Veale, Max Van Kleek and Reuben Binns (2018) Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making. ACM Conference on Human Factors in Computing Systems (CHI'18). doi: 10.1145/3173574.3174014 Calls for heightened consideration of fairness and accountability in algorithmically-informed public decisions—like taxation, justice, and child protection—are now commonplace. How might designers support such human values? We interviewed 27 public sector machine learning practitioners across 5 OECD countries regarding challenges understanding and imbuing public values into their work. The results suggest a disconnect between organisational and institutional realities, constraints and needs, and those addressed by current research into usable, transparent and 'discrimination-aware' machine learning—absences likely to undermine practical initiatives unless addressed. We see design opportunities in this disconnect, such as in supporting the tracking of concept drift in secondary data sources, and in building usable transparency tools to identify risks and incorporate domain knowledge, aimed both at managers and at the `street-level bureaucrats' on the frontlines of public service. We conclude by outlining ethical challenges and future directions for collaboration in these high-stakes applications.
    Date: 2018–02–04
  2. By: Hanna Armelius; Christoph Bertsch; Isaiah Hull; Xin Zhang
    Abstract: We construct a novel text dataset to measure the sentiment component of communications for 23 central banks over the 2002-2017 period. Our analysis yields three results. First, comovement in sentiment across central banks is not reducible to trade or financial flow exposures. Second, sentiment shocks generate cross-country spillovers in sentiment, policy rates, and macroeconomic variables; and the Fed appears to be a uniquely influential generator of such spillovers, even among prominent central banks. And third, geographic distance is a robust and economically significant determinant of comovement in central bank sentiment, while shared language and colonial ties have weaker predictive power.
    Keywords: communication, monetary policy, international policy transmission
    JEL: E52 E58 F42
    Date: 2019–12
  3. By: von Braun, Joachim
    Abstract: Artificial intelligence and robotics (AI/R) have the potential to result in great change of livelihoods. While individual impacts of AI/R on, for instance, employment, have been subject to a lot of research, how AI/R may affect the poor is scarce. This paper aims to draw attention to how AI/R may impact the poor and marginalized and highlights research needs. A thought experiment compares the future situation of the poor in an AI/R scenario to a scenario without AI/R. A framework is established that depicts poverty and marginality conditions of health, education, public services, work, small businesses including farming as well as voice and empowerment of the poor. This conceptual framework identifies points of entry of AI/R, and is complemented by a more detailed discussion of the way in which changes through AI/R in these areas may relate positively or adversely to the livelihood of the poor. This paper concludes that empirical scenarios and modelling analyses are needed to better understand the different components in the emerging technological and institutional AI/R innovations and identify how they will shape the livelihoods of poor households and communities.
    Keywords: Agribusiness, Financial Economics, Health Economics and Policy, Labor and Human Capital, Research and Development/Tech Change/Emerging Technologies, Teaching/Communication/Extension/Profession
    Date: 2019–12–03
  4. By: Liu Guang; Wang Xiaojie; Li Ruifan
    Abstract: Financial time-series classification (FTC) is extremely valuable for investment management. In past decades, it draws a lot of attention from a wide extent of research areas, especially Artificial Intelligence (AI). Existing researches majorly focused on exploring the effects of the Multi-Scale (MS) property or the Temporal Dependency (TD) within financial time-series. Unfortunately, most previous researches fail to combine these two properties effectively and often fall short of accuracy and profitability. To effectively combine and utilize both properties of financial time-series, we propose a Multi-Scale Temporal Dependent Recurrent Convolutional Neural Network (MSTD-RCNN) for FTC. In the proposed method, the MS features are simultaneously extracted by convolutional units to precisely describe the state of the financial market. Moreover, the TD and complementary across different scales are captured through a Recurrent Neural Network. The proposed method is evaluated on three financial time-series datasets which source from the Chinese stock market. Extensive experimental results indicate that our model achieves the state-of-the-art performance in trend classification and simulated trading, compared with classical and advanced baseline models.
    Date: 2019–11
  5. By: Mayaud, Jerome; Anderson, Sam; Tran, Martino; Radic, Valentina
    Abstract: As urban populations grow worldwide, it becomes increasingly important to critically analyse accessibility – the ease with which residents can reach key places or opportunities. The combination of ‘big data’ and advances in computational techniques such as machine learning (ML) could be a boon for urban accessibility studies, yet their application remains limited in this field. In this study, we aim to more robustly relate socio-economic factors to healthcare accessibility across a city experiencing rapid population growth, using a novel combination of clustering methods. We applied a powerful ML clustering tool, the self-organising map (SOM), in conjunction with principal component analysis (PCA), to examine how income shifts over time (2016–2022) could affect accessibility equity to healthcare for senior populations (65+ years) in the City of Surrey, Canada. We characterised accessibility levels to hospitals and walk-in clinics using door-to-door travel times, and combined this with high-resolution census data. Higher income clusters are projected to become more prevalent across the city over the study period, in some cases incurring into previously low income areas. However, low income clusters have on average much better accessibility to healthcare facilities than high income clusters, and their accessibility levels are projected to increase between 2016 and 2022. By attributing temporal differences through cross-term analysis, we show that population growth will be the biggest accessibility challenge in neighbourhoods with existing access to healthcare, whereas income change (both positive and negative) will be most challenging in poorly connected neighbourhoods. A dual accessibility problem may therefore arise in Surrey. First, large senior populations will reside in areas with access to numerous, and close-by, clinics, putting pressure on existing facilities for specialised services. Second, lower-income seniors will increasingly reside in areas poorly connected to healthcare services; since these populations are likely to be highly reliant on public transportation, accessibility equity may suffer. To our knowledge, this study is the first to apply a combination of PCA and SOM techniques in the context of urban accessibility, and it demonstrates the value of this clustering approach for drawing planning policy recommendations from large multivariate datasets.
    Date: 2018–10–28
  6. By: Sheikh Rabiul Islam; William Eberle; Sheikh K. Ghafoor
    Abstract: Artificial Intelligence (AI) has become an integral part of domains such as security, finance, healthcare, medicine, and criminal justice. Explaining the decisions of AI systems in human terms is a key challenge--due to the high complexity of the model, as well as the potential implications on human interests, rights, and lives . While Explainable AI is an emerging field of research, there is no consensus on the definition, quantification, and formalization of explainability. In fact, the quantification of explainability is an open challenge. In our previous work, we incorporated domain knowledge for better explainability, however, we were unable to quantify the extent of explainability. In this work, we (1) briefly analyze the definitions of explainability from the perspective of different disciplines (e.g., psychology, social science), properties of explanation, explanation methods, and human-friendly explanations; and (2) propose and formulate an approach to quantify the extent of explainability. Our experimental result suggests a reasonable and model-agnostic way to quantify explainability
    Date: 2019–11
  7. By: Toshikuni Sato
    Abstract: This study investigates interpretable neural networks for marketing and consumer behavior research using customer reviews instead of measurement scales to better understand customer experiences. Service attribute ratings are used to measure attribute performances to compare the influence of customer experience and service performance on overall satisfaction. Although many researchers have investigated word-of-mouth reviews and their practical applications, the detailed contents of those reviews were generally disregarded, possibly because of their high dimensionality. To solve this problem, this study proposes some useful neural-network methods for specifying the expected assumptions based on previous knowledge or theories in consumer behavior research. Because neural networks help estimate nonlinear relationships between objective and predictive variables, a partial dependence plot is used to visualize the estimated functions and marginal effects. Empirical results not only provide a highly accurate neural-network model, they also create better marketing implications.
    Date: 2019–11
    Abstract: This study, which is the first of its kind in the case of Zimbabwe; attempts to model and forecast Zimbabwe’s exports and imports using annual time series data ranging over the period 1975 – 2017. In order to analyze Zimbabwe’s export and import dynamics, the study employed the Neural Network approach, a deep-learning technique which has not been applied in this area in the case of Zimbabwe. The Hyperbolic Tangent function was selected and applied as the activation function of the neural networks applied in this study. The neural networks applied in this research were evaluated using the most common forecast evaluation statistics, i.e. the Error, MSE and MAE; and it was clearly shown that the neural networks yielded reliable forecasts of Zimbabwe’s exports and imports over the period 2018 – 2027. The main results of the study indicate that imports will continue to outperform exports over the out-of-sample period. Amongst other policy recommendations, the study encourages Zimbabwean policy makers to intensify export growth promotion policies and strategies such as clearly identifying export drivers as well as export diversification if persistant current account deficits in Zimbabwe are to be dealt with effectively.
    Keywords: ANNs; exports; forecast; hyperbolic tangent function; imports; trade deficits; Zimbabwe
    JEL: F13 P33 Q17
    Date: 2019–11–04
  9. By: Veale, Michael
    Abstract: Presented as a talk at the 4th Workshop on Fairness, Accountability and Transparency in Machine Learning (FAT/ML 2017), Halifax, Nova Scotia, Canada. Machine learning systems are increasingly used to support public sector decision-making across a variety of sectors. Given concerns around accountability in these domains, and amidst accusations of intentional or unintentional bias, there have been increased calls for transparency of these technologies. Few, however, have considered how logics and practices concerning transparency have been understood by those involved in the machine learning systems already being piloted and deployed in public bodies today. This short paper distils insights about transparency on the ground from interviews with 27 such actors, largely public servants and relevant contractors, across 5 OECD countries. Considering transparency and opacity in relation to trust and buy-in, better decision-making, and the avoidance of gaming, it seeks to provide useful insights for those hoping to develop socio-technical approaches to transparency that might be useful to practitioners on-the-ground.
    Date: 2017–11–18
  10. By: Szell, Michael
    Abstract: Most cities are car-centric, allocating a privileged amount of urban space to cars at the expense of sustainable mobility like cycling. Simultaneously, privately owned vehicles are vastly underused, wasting valuable opportunities for accommodating more people in a livable urban environment by occupying spacious parking areas. Since a data-driven quantification and visualization of such urban mobility space inequality is lacking, here we explore how crowdsourced data can help to advance its understanding. In particular, we describe how the open-source online platform What the Street!? uses massive user-generated data from OpenStreetMap for the interactive exploration of city-wide mobility spaces. Using polygon packing and graph algorithms, the platform rearranges all parking and mobility spaces of cars, rails, and bicycles of a city to be directly comparable, making mobility space inequality accessible to a broad public. This crowdsourced method confirms a prevalent imbalance between modal share and space allocation in 23 cities worldwide, typically discriminating bicycles. Analyzing the guesses of the platform’s visitors about mobility space distributions, we find that this discrimination is consistently underestimated in the public opinion. Finally, we discuss a visualized scenario in which extensive parking areas are regained through fleets of shared, autonomous vehicles. We outline how such accessible visualization platforms can facilitate urban planners and policy makers to reclaim road and parking space for pushing forward sustainable transport solutions.
    Date: 2018–03–28
  11. By: Delahunty, Fionn; Arcan, Mihael; Johansson, Robert
    Abstract: Depression and anxiety are the two most prevalent mental health disorders worldwide, impacting the lives of millions of people each year. Current screening methods require individuals to manually complete psychometric questionnaires. In this work we develop a deep learning approach to predict psychometric scores given textual data through the use of psycholinguistics features. Data is collected via a dialogue system, were we develop and incorporate an approach to model empathy. Which aims to allow for appropriate use of these systems in a clinical setting. Following a public evaluation, we demonstrate that our approach to model empathy can out perform a similarly trained non empathic approach. Additionally, we show that our deep learning prediction approach performed well on evaluation data, but has difficulty generalizing to experimentally collected data. Limitations and implications as a result of this work are discussed.
    Date: 2019–10–07
  12. By: Siahaan, Andysah Putera Utama (Universitas Pembangunan Panca Budi); Nasution, Muhammad Dharma Tuah Putra
    Abstract: This study discusses the prediction model of entrepreneurship intent for alumni. The data is obtained from the database of an online job market, alumni tracer and survey results to the alumni. This research applies the C4.5 decision tree algorithm to get a prediction model that shows the intention of entrepreneurship. Some essential indicators include Self-efficacy, Need for Achievement, Advisory Quotient, Locus of Control and Passion. The predictive model found that the best predictor was Self-efficacy which contributed to influence the entrepreneurship intention with a value of 79.7 percent. The authors recommend to educational institutions to foster candidate interest through curriculum improvement, field practice or learning models in and out of the classroom.
    Date: 2018–06–30
  13. By: Jaroslav Pavlicek (Institute of Economic Studies, Faculty of Social Sciences, Charles University, Opletalova 26, 110 00, Prague, Czech Republic); Ladislav Kristoufek (Institute of Economic Studies, Faculty of Social Sciences, Charles University, Opletalova 26, 110 00, Prague, Czech Republic)
    Abstract: The internet has become the primary source of information for most of the population in modern economies, and as such, it provides an enormous amount of readily available data. Among these are the data on the internet search queries, which have been shown to improve forecasting models for various economic and financial series. In the aftermath of the global financial crisis, modeling and forecasting mortgage demand and subsequent approvals have become a central issue in the banking sector as well as for governments and regulators. Here, we provide new insights into the dynamics of the UK mortgage market, specifically the demand for mortgages measured by new mortgage approvals, and whether or how models of this market can be improved by incorporating the online searches of potential mortgage applicants. Because online searches are expected to be one of the last steps before a customer’s actual application for a large share of the population, intuitive utility is an appealing approach. We compare two baseline models – an autoregressive model and a structural model with relevant macroeconomic variables – with their extensions utilizing online searches on Google. We find that the extended models better explain the number of new mortgage approvals and markedly improve their nowcasting and forecasting performance.
    Keywords: Mortgage, online data, Google Trends, forecasting
    JEL: C22 C52 C53 C82 E27 E51
    Date: 2019–07
  14. By: Bertani, Filippo; Ponta, Linda; Raberto, Marco; Teglio, Andrea; Cincotti, Silvano
    Abstract: During the last decades, we have witnessed a strong development of intangible digital technologies. Software, artificial intelligence and algorithms are increasingly affecting both production systems and our lives; economists have started to figure out the long-run complex economic implications of this new technological wave. In this paper, we address this question through the agent-based modelling approach. In particular, we enrich the macroeconomic model Eurace with the concept of intangible digital technology and investigate its effects both at the micro and macro level. Results show the emergence of the relevant stylized facts observed in the business domain, such as increasing returns, winner-take-most phenomena and market lock-in. At the macro level, our main finding is an increasing unemployment level, since the sizeable decrease of the employment rate in the mass-production system, provided by the higher productivity of digital assets, is usually not counterbalanced by the new jobs created in the digital sector.
    Keywords: Intangible assets, Digital transformation, Technological unemployment, Agent-based economics
    JEL: C63 D24 O33
    Date: 2019–11–21
  15. By: Fatehkia, Masoomali; Kashyap, Ridhi; Weber, Ingmar
    Abstract: Gender equality in access to the internet and mobile phones has become increasingly recognised as a development goal. Monitoring progress towards this goal however is challenging due to the limited availability of gender-disaggregated data, particularly in low-income countries. In this data sparse context, we examine the potential of a source of digital trace `big data' -- Facebook's advertisement audience estimates -- that provides aggregate data on Facebook users by demographic characteristics covering the platform's over 2 billion users to measure and `nowcast' digital gender gaps. We generate a unique country-level dataset combining `online' indicators of Facebook users by gender, age and device type, `offline' indicators related to a country's overall development and gender gaps, and official data on gender gaps in internet and mobile access where available. Using this dataset, we predict internet and mobile phone gender gaps from official data using online indicators, as well as online and offline indicators. We find that the online Facebook gender gap indicators are highly correlated with official statistics on internet and mobile phone gender gaps. For internet gender gaps, models using Facebook data do better than those using offline indicators alone. Models combining online and offline variables however have the highest predictive power. Our approach demonstrates the feasibility of using Facebook data for real-time tracking of digital gender gaps. It enables us to improve geographical coverage for an important development indicator, with the biggest gains made for low-income countries for which existing data are most limited.
    Date: 2018–03–06
  16. By: Zihao Zhang; Stefan Zohren; Stephen Roberts
    Abstract: We adopt Deep Reinforcement Learning algorithms to design trading strategies for continuous futures contracts. Both discrete and continuous action spaces are considered and volatility scaling is incorporated to create reward functions which scale trade positions based on market volatility. We test our algorithms on the 50 most liquid futures contracts from 2011 to 2019, and investigate how performance varies across different asset classes including commodities, equity indices, fixed income and FX markets. We compare our algorithms against classical time series momentum strategies, and show that our method outperforms such baseline models, delivering positive profits despite heavy transaction costs. The experiments show that the proposed algorithms can follow large market trends without changing positions and can also scale down, or hold, through consolidation periods.
    Date: 2019–11
  17. By: Sheikh Rabiul Islam; William Eberle; Sheikh K. Ghafoor; Sid C. Bundy; Douglas A. Talbert; Ambareen Siraj
    Abstract: In the area of credit risk analytics, current Bankruptcy Prediction Models (BPMs) struggle with (a) the availability of comprehensive and real-world data sets and (b) the presence of extreme class imbalance in the data (i.e., very few samples for the minority class) that degrades the performance of the prediction model. Moreover, little research has compared the relative performance of well-known BPM's on public datasets addressing the class imbalance problem. In this work, we apply eight classes of well-known BPMs, as suggested by a review of decades of literature, on a new public dataset named Freddie Mac Single-Family Loan-Level Dataset with resampling (i.e., adding synthetic minority samples) of the minority class to tackle class imbalance. Additionally, we apply some recent AI techniques (e.g., tree-based ensemble techniques) that demonstrate potentially better results on models trained with resampled data. In addition, from the analysis of 19 years (1999-2017) of data, we discover that models behave differently when presented with sudden changes in the economy (e.g., a global financial crisis) resulting in abrupt fluctuations in the national default rate. In summary, this study should aid practitioners/researchers in determining the appropriate model with respect to data that contains a class imbalance and various economic stages.
    Date: 2019–11
  18. By: Stathoulopoulos, Kostas; Mateos-Garcia, Juan
    Abstract: Policy interventions have to be timely and tailored to specific sectors of the economic ecosystem to maximise their potential impact. We propose a system based on open data that offers policy makers two capabilities. First, it enables them to explore the digital and tech company space with high granularity through keywords, specific technologies or company names, and identify relevant organisations and those most similar to them. Second, it provides an overview of the ecosystem by creating thematic topics that characterise the activities of these companies. We demonstrate the effectiveness of this system in three activity areas not currently captured in the SIC codes.
    Date: 2017–11–24
  19. By: McCallum, Malcolm L
    Abstract: A quandary exists when comparing trend lines of Google Trends query data among different countries. This approach provides directionality and speed of change, but it does not account for the quantity of movement occurring when comparing large regions to small ones. This study applies the physical concept of momentum to the analysis of Google Trends results to provide a method for comparing trends among countries. By accounting for the volume of interest along with the direction and rate of interest gain/loss, one is able to make accurate quantitative statements about how the public in differently sized regions may shift interests and opinion on different issues. Momentum allows us to identify how countries have responded and how they may respond in the future without the erroneous assumption that the behaviors of large and small populations are equally flexible and responsive to new ideas.
    Date: 2018–06–02

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.