nep-big New Economics Papers
on Big Data
Issue of 2021‒01‒04
twelve papers chosen by
Tom Coupé
University of Canterbury

  1. Excited and aroused: The predictive importance of simple choice process metrics By Steffen Q. Mueller; Patrick Ring; Maria Fischer
  2. Double machine learning for sample selection models By Michela Bia; Martin Huber; Luk\'a\v{s} Laff\'ers
  3. Impact of Weather Factors on Migration Intention using Machine Learning Algorithms By Juhee Bae; John Aoga; Stefanija Veljanoska; Siegfried Nijssen; Pierre Schaus
  4. Privacy and antitrust in digital platforms By Nicholas Economides; Ioannis Lianos
  5. Mobile applications aiming to facilitate immigrants’ societal integration and overall level of integration, health and mental health. Does artificial intelligence enhance outcomes? By Drydakis, Nick
  6. Non-Cognitive Skills in Training Curricula and Heterogeneous Wage Returns By Fabienne Kiener; Ann-Sophie Gnehm; Uschi Backes-Gellner
  7. Automated Creation of a High-Performing Algorithmic Trader via Deep Learning on Level-2 Limit Order Book Data By Aaron Wray; Matthew Meades; Dave Cliff
  8. Start Spreading the News: News Sentiment and Economic Activity in Australia By Kim Nguyen; Gianni La Cava
  9. Expectations formation of household inflation expectations in India By Singh, Gaurav Kumar
  10. Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds By Elena Ivona DUMITRESCU; Sullivan HUE; Christophe HURLIN; Sessi TOKPAVI
  11. Competition Problems and Governance of Non-personal Agricultural Machine Data: Comparing Voluntary Initiatives in the US and EU By ATIK Can; MARTENS Bertin
  12. Developing Artificial Intelligence Sustainably By Gordon Myers; Kiril Nejkov

  1. By: Steffen Q. Mueller (Chair for Economic Policy, University of Hamburg); Patrick Ring (Social and Behavioral Approaches to Global Problems, Kiel Institute for the World Economy); Maria Fischer (Department of Psychology, Kiel University)
    Abstract: We conduct a lottery experiment to assess the predictive importance of simple choice process metrics (SCPMs) in forecasting risky 50/50 gambling decisions using different types of machine learning algorithms as well as traditional choice modeling approaches. The SCPMs are recorded during a fixed pre-decision phase and are derived from tracking subjects’ eye movements, pupil sizes, skin conductance, and cardiovascular and respiratory signals. Our study demonstrates that SCPMs provide relevant information for predicting gambling decisions, but we do not find forecasting accuracy to be substantially affected by adding SCPMs to standard choice data. Instead, our results show that forecasting accuracy highly depends on differences in subject-specific risk preferences and is largely driven by including information on lottery design variables. As a key result, we find evidence for dynamic changes in the predictive importance of psychophysiological responses that appear to be linked to habituation and resource-depletion effects. Subjects’ willingness to gamble and choice-revealing arousal signals both decrease as the experiment progresses. Moreover, our findings highlight the importance of accounting for previous lottery payoff characteristics when investigating the role of emotions and cognitive bias in repeated decision-making scenarios.
    Keywords: Repeated decision making, eye-tracking, psychophysiological responses, machine learning, forecasting
    JEL: C44 C45 C53 D81 D87 D91
    Date: 2020–12–16
  2. By: Michela Bia; Martin Huber; Luk\'a\v{s} Laff\'ers
    Abstract: This paper considers treatment evaluation when outcomes are only observed for a subpopulation due to sample selection or outcome attrition/non-response. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. To control in a data-driven way for potentially high dimensional pre-treatment covariates that motivate the selection-on-observables assumptions, we adapt the double machine learning framework to sample selection problems. That is, we make use of (a) Neyman-orthogonal and doubly robust score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning-based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent under specific regularity conditions concerning the machine learners. The estimator is available in the causalweight package for the statistical software R.
    Date: 2020–11
  3. By: Juhee Bae (University of Skovde, Sweden); John Aoga (University of Abomey-Calavi, Bénin); Stefanija Veljanoska (Université de Rennes 1, France); Siegfried Nijssen (ICTEAM, Université catholique de Louvain); Pierre Schaus (ICTEAM, Université catholique de Louvain)
    Abstract: A growing attention in the empirical literature has been paid on the incidence of climate shocks and change on migration decisions. Previous literature leads to different results and uses a multitude of traditional empirical approach. This paper proposes a tree-based Machine Learning (ML) approach to analyze the role of the weather shocks towards an individual’s intention to migrate in the six agriculture-dependent economy countries such as Burkina Faso, Ivory Coast, Mali, Mauritania, Niger, and Senegal. We perform several tree-based algorithms (e.g., XGB, Random Forest) using the train-validation test workflow to build robust and noise-resistant approaches. Then we determine the important features showing in which direction they are influencing the migration intention. This ML based estimation accounts for features such as weather shocks captured by the Standardized Precipitation-Evapotranspiration Index (SPEI) for different timescales and various socioeconomic features/covariates. We find that (i) weather features improve the prediction performance although socioeconomic characteristics have more influence on migration intentions, (ii) country-specific model is necessary, and (iii) international move is influenced more by the longer timescales of SPEIs while general move (which includes internal move) by that of shorter timescales.
    Keywords: Migration, Weather shocks, Machine learning, Tree-based algorithms
    Date: 2020–11–02
  4. By: Nicholas Economides (Professor of Economics, NYU Stern School of Business, New York, New York 10012); Ioannis Lianos (President, Hellenic Competition Commission and Professor of Global Competition Law and Public Policy, Faculty of Laws, University College London)
    Abstract: Dominant digital platforms such as Google and Facebook collect personal information of users by default precipitating a market failure in the market for personal information. We establish the economic harms from the market failure. We discuss conditions for eliminating the market failure and various remedies to restore competition.
    Keywords: personal information; Internet search; Google; Facebook; digital; privacy; restrictions of competition; exploitation; market failure; data dominance; abuse of a dominant position; unfair commercial practices; excessive data extraction; self-determination; behavioral manipulation; remedies; portability; opt-out.
    JEL: K21 L1 L12 L4 L41 L5 L86 L88
    Date: 2021–01
  5. By: Drydakis, Nick
    Abstract: Using panel data on immigrant populations from European, Asian and African countries the study estimates positive associations between the number of mobile applications in use aiming to facilitate immigrants’ societal integration (m-Integration) and increased level of integration (Ethnosizer), good overall health (EQ-VAS) and mental health (CESD-20). It is estimated that the patterns are gender sensitive. In addition, it is found that m-Integration applications in relation to translation and voice assistants, public services, and medical services provide the highest returns on immigrants’ level of integration, health/mental health status. For instance, translation and voice assistant applications are associated with a 4% increase in integration and a 0.8% increase in good overall health. Moreover, m-Integration applications aided by artificial intelligence (AI) are associated with increased health/mental health and integration levels among immigrants. We indicate that AI by providing customized search results, peer reviewed e-learning, professional coaching on pronunciation, real-time translations, and virtual communication for finding possible explanations for health conditions might bring better quality services facilitating immigrants’ needs. This is the first known study to introduce the term ‘m-Integration’, quantify associations between applications, health/mental health and integration for immigrants, and assess AI’s role in enhancing the aforementioned outcomes.
    Keywords: Mobile Applications,m-Integration,m-Health,Artificial Intelligence,Integration,Immigrants,Refugees,Health,Mental Health
    JEL: O3 O31 I1 J15
    Date: 2020
  6. By: Fabienne Kiener; Ann-Sophie Gnehm; Uschi Backes-Gellner
    Abstract: For non-cognitive skills, economics research has focused primarily on social skills as one element. One important, largely unexplored element is self-competence, the ability to act responsibly for oneself. We therefore study returns to self-competence adding heterogeneous and complementary returns to the literature on non-cognitive skills. Using texts of training curricula as data source, we apply machine-learning methods to identify self-competence in occupations. Combining these measures with labor market data, we find heterogeneous returns to self-competence: A medium level of self-competence has the strongest wage returns compared to low or high levels, but with high cognitive requirements also high self-competence pays.
    Keywords: non-cognitive skills, human capital, text as data, curricula content analyses, vocational education and training
    JEL: I26 J24 M53
    Date: 2020–12
  7. By: Aaron Wray; Matthew Meades; Dave Cliff
    Abstract: We present results demonstrating that an appropriately configured deep learning neural network (DLNN) can automatically learn to be a high-performing algorithmic trading system, operating purely from training-data inputs generated by passive observation of an existing successful trader T. That is, we can point our black-box DLNN system at trader T and successfully have it learn from T's trading activity, such that it trades at least as well as T. Our system, called DeepTrader, takes inputs derived from Level-2 market data, i.e. the market's Limit Order Book (LOB) or Ladder for a tradeable asset. Unusually, DeepTrader makes no explicit prediction of future prices. Instead, we train it purely on input-output pairs where in each pair the input is a snapshot S of Level-2 LOB data taken at the time when T issued a quote Q (i.e. a bid or an ask order) to the market; and DeepTrader's desired output is to produce Q when it is shown S. That is, we train our DLNN by showing it the LOB data S that T saw at the time when T issued quote Q, and in doing so our system comes to behave like T, acting as an algorithmic trader issuing specific quotes in response to specific LOB conditions. We train DeepTrader on large numbers of these S/Q snapshot/quote pairs, and then test it in a variety of market scenarios, evaluating it against other algorithmic trading systems in the public-domain literature, including two that have repeatedly been shown to outperform human traders. Our results demonstrate that DeepTrader learns to match or outperform such existing algorithmic trading systems. We analyse the successful DeepTrader network to identify what features it is relying on, and which features can be ignored. We propose that our methods can in principle create an explainable copy of an arbitrary trader T via "black-box" deep learning methods.
    Date: 2020–11
  8. By: Kim Nguyen (Reserve Bank of Australia); Gianni La Cava (Reserve Bank of Australia)
    Abstract: In times of crisis, real-time indicators of economic activity are a critical input to timely and well-targeted policy responses. The COVID-19 pandemic is the most recent example of a crisis where events with little historical precedent played out rapidly and unpredictably. To address this need for real-time indicators we develop a new indicator of 'news sentiment' based on a combination of text analysis, machine learning and newspaper articles. The news sentiment index complements other timely economic indicators and has the advantage of potentially being updated on a daily basis. It captures key macroeconomic events, such as economic downturns, and typically moves ahead of survey-based measures of sentiment. Changes in sentiment expressed in monetary policy-related news can also partly explain unexpected changes in monetary policy. This suggests that news captures important, but unobserved, information about the risks to the RBA's forecasts that the RBA responds to when setting interest rates. An event study in the days around monetary policy decisions suggests that an unexpected tightening in monetary policy is associated with weaker news sentiment, though the effects on sentiment are temporary and not particularly strong.
    Keywords: news media; sentiment; economic activity; text analysis; machine learning
    JEL: E32 E52
    Date: 2020–12
  9. By: Singh, Gaurav Kumar
    Abstract: Inflation expectations data are commonly used to address a number of important questions primarily related to the inflation expectations formation. This work presents such an empirical analysis of Reserve Bank of India’s (RBI) inflation expectations data for Indian urban population. First, we apply a battery of tests for verifying the assumptions of rationality of household expectations. The tests lead to the outright rejection of the assumptions. On the other hand, the inflation forecasts by professional forecasters seem to support the rational expectations assumptions. Second, considering a regression model we find that the inflation forecasts by the professionals forecast the actual inflation better than what could be predicted by the recently available actual inflation data. Finally, using a sticky information model (Mankiw Reis (2001, 2002), Carroll (2003)) we also find the support for Carroll’s contention that relevant macroeconomic information about future inflation flows from experts to the households, not vice versa. Additionally, if the sticky inflation model describes the household inflation expectations formation, it is natural to expect that more news about inflation in the news channels would lead to the reduction of disagreement. Our empirical analysis using Google trend data supports this hypothesis.
    Date: 2020–12–14
  10. By: Elena Ivona DUMITRESCU; Sullivan HUE; Christophe HURLIN; Sessi TOKPAVI
    Keywords: , Risk management, Credit scoring, Credit scoring, Machine learning, Interpretability
    Date: 2020
  11. By: ATIK Can; MARTENS Bertin (European Commission – JRC)
    Abstract: The arrival of digital data in agriculture opens the possibility to realise productivity gains through precision farming. It also raises questions about the distribution of these gains between farmers and agricultural service providers. Farmers’ control of the data is often perceived as a means to appropriate a larger share of these gains. We show how data-driven agricultural business models lock farm data into machines and devices that reduce competition in downstream agricultural services markets. Personal data protection regulation is not applicable to non-personal agricultural machine data. Voluntary data charters in the EU and US emulate GDPR-like principles to give farmers more control over their data but do not really change market-based outcomes due to their legal design. Third-party platforms are a necessary intermediary because farmers cannot achieve the benefits from applications that depend on economies of scale and scope in data aggregation. The low marginal value of individual farm data in such applications puts farmers in a weak bargaining position. Neutral intermediaries that are not vertically integrated into agricultural machines, inputs or services may circumvent monopolistic data lock-ins provided they can access the data. Unless they find a way to generate and monetise economies of scale and scope with their data, their business model may not be sustainable. Regulatory intervention that facilitates portability and interoperability might be useful for farmers to overcome data lock-ins, but designing data access rights is a complicated issue as many parties contribute data to the production process and may claim access rights. Minor changes in who gets access to which data under which conditions may have significant effects on stakeholders. We conclude that digital agriculture still has some way to go to reach equitable and efficient solutions to data access rights. Similar situations are likely to occur in other industries that rely on non-personal machine data.
    Keywords: Digital data, data access and ownership, digital agriculture
    Date: 2020–12
  12. By: Gordon Myers; Kiril Nejkov
    Keywords: Information and Communication Technologies - ICT Policy and Strategies Private Sector Development - Business Ethics, Leadership and Values Private Sector Development - Emerging Markets Science and Technology Development - Technology Innovation
    Date: 2020–03

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.