nep-big New Economics Papers
on Big Data
Issue of 2022‒04‒18
23 papers chosen by
Tom Coupé
University of Canterbury

  1. The Value of Artificial Intelligence for Healthcare Decision Making: Lessons Learned By Danielle Whicher; Thomas Rapp
  2. Machine Learning for Stock Prediction Based on Fundamental Analysis By Yuxuan Huang; Luiz Fernando Capretz; Danny Ho
  3. Nonsmooth Implicit Differentiation for Machine Learning and Optimization By Bolte, Jérôme; Le, Tam; Pauwels, Edouard; Silveti-Falls, Antonio
  4. Nonsmooth Implicit Differentiation for Machine Learning and Optimization By Bolte, Jérôme; Le, Tam; Pauwels, Edouard; Silveti-Falls, Antonio
  5. Closure operators: Complexity and applications to classification and decision-making By Hamed Hamze Bajgiran; Federico Echenique
  6. Towards accountability in machine learning applications: A system-testing approach By Wan, Wayne Xinwei; Lindenthal, Thies
  7. Using Past Violence and Current News to Predict Changes in Violence By Mueller, H.; Rauh, C.
  8. Data-intensive innovation and the State: evidence from AI firms in China By Beraja, Martin; Yang, David Y.; Yuchtman, Noam
  9. Examining spatial disparities in electric vehicle charging station placements using machine learning By Roy, Avipsa; Law, Mankin
  10. Time Dependency, Data Flow, and Competitive Advantage By Ehsan Valavi; Joel Hestness; Marco Iansiti; Newsha Ardalani; Feng Zhu; Karim R. Lakhani
  11. Artificial intelligence and firm-level productivity By Czarnitzki, Dirk; Fernández, Gastón P.; Rammer, Christian
  12. Inspection-L: Practical GNN-Based Money Laundering Detection System for Bitcoin By Wai Weng Lo; Siamak Layeghy; Marius Portmann
  13. GAM(L)A: An econometric model for interpretable Machine Learning By Emmanuel Flachaire; Gilles Hacheme; Sullivan Hu\'e; S\'ebastien Laurent
  14. Expanding the Measurement of Culture with a Sample of Two Billion Humans By Obradovich, Nick; Özak, Ömer; Martín, Ignacio; Ortuño-Ortín, Ignacio; Awad, Edmond; Cebrián, Manuel; Cuevas, Rubén; Desmet, Klaus; Rahwan, Iyad; Cuevas, Ángel
  15. Academic Offer of Advanced Digital Skills in 2020-21. International Comparison. Focus on Artificial Intelligence, High Performance Computing, Cybersecurity and Data Science By Riccardo Righi; Montserrat Lopez-Cobo; Michail Papazoglou; Sofia Samoili; Melisande Cardona; Miguel Vazquez-Prada Baillet; Giuditta De-Prato
  16. Narrative Fragmentation and the Business Cycle By Bertsch, Christoph; Hull, Isaiah; Zhang, Xin
  17. Making use of supercomputers in financial machine learning. By Philippe Cotte; Pierre Lagier; Vincent Margot; Christophe Geissler
  18. Performance of long short-term memory artificial neural networks in nowcasting during the COVID-19 crisis By Daniel Hopp
  19. Collusion and Artificial Intelligence: A Computational Experiment with Sequential Pricing Algorithms under Stochastic Costs By Gonzalo Ballestero
  20. Calibration alternatives to logistic regression and their potential for transferring the dispersion of discriminatory power into uncertainties of probabilities of default By Wosnitza, Jan Henrik
  21. Greenwashing in the US metal industry? A novel approach combining SO2 concentrations from satellite data, a plant-level firm database and web text mining By Schmidt, Sebastian; Kinne, Jan; Lautenbach, Sven; Blaschke, Thomas; Lenz, David; Resch, Bernd
  22. Paying Students to Stay in School By Andrew McKendrick
  23. A Dataset of Geolocated Villages and Gram Panchayat Election Candidates in Uttar Pradesh By Srivastava, Aryan; Kalra, Aarushi; Tiwari, Saket

  1. By: Danielle Whicher; Thomas Rapp
    Abstract: Interest and investment in the development of tools or methods that rely on artificial intelligence (AI) algorithms to improve health or healthcare are increasing.
    Keywords: Artificial Intelligence , Healthcare, Health improvement
  2. By: Yuxuan Huang; Luiz Fernando Capretz; Danny Ho
    Abstract: Application of machine learning for stock prediction is attracting a lot of attention in recent years. A large amount of research has been conducted in this area and multiple existing results have shown that machine learning methods could be successfully used toward stock predicting using stocks historical data. Most of these existing approaches have focused on short term prediction using stocks historical price and technical indicators. In this paper, we prepared 22 years worth of stock quarterly financial data and investigated three machine learning algorithms: Feed-forward Neural Network (FNN), Random Forest (RF) and Adaptive Neural Fuzzy Inference System (ANFIS) for stock prediction based on fundamental analysis. In addition, we applied RF based feature selection and bootstrap aggregation in order to improve model performance and aggregate predictions from different models. Our results show that RF model achieves the best prediction results, and feature selection is able to improve test performance of FNN and ANFIS. Moreover, the aggregated model outperforms all baseline models as well as the benchmark DJIA index by an acceptable margin for the test period. Our findings demonstrate that machine learning models could be used to aid fundamental analysts with decision-making regarding stock investment.
    Date: 2022–01
  3. By: Bolte, Jérôme; Le, Tam; Pauwels, Edouard; Silveti-Falls, Antonio
    Abstract: In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus. Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled. This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified for a wide class of nonsmooth problems. Moreover this calculus is entirely compatible with algorithmic differentiation (e.g., backpropagation). We provide several applications such as training deep equilibrium networks, training neural nets with conic optimization layers, or hyperparameter-tuning for nonsmooth Lasso-type models. To show the sharpness of our assumptions, we present numerical experiments showcasing the extremely pathological gradient dynamics one can encounter when applying implicit algorithmic differentiation without any hypothesis.
    Date: 2022–03
  4. By: Bolte, Jérôme; Le, Tam; Pauwels, Edouard; Silveti-Falls, Antonio
    Abstract: In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus. Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled. This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified for a wide class of nonsmooth problems. Moreover this calculus is entirely compatible with algorithmic differentiation (e.g., backpropagation). We provide several applications such as training deep equilibrium networks, training neural nets with conic optimization layers, or hyperparameter-tuning for nonsmooth Lasso-type models. To show the sharpness of our assumptions, we present numerical experiments showcasing the extremely pathological gradient dynamics one can encounter when applying implicit algorithmic differentiation without any hypothesis.
    Date: 2022–03
  5. By: Hamed Hamze Bajgiran; Federico Echenique
    Abstract: We study the complexity of closure operators, with applications to machine learning and decision theory. In machine learning, closure operators emerge naturally in data classification and clustering. In decision theory, they can model equivalence of choice menus, and therefore situations with a preference for flexibility. Our contribution is to formulate a notion of complexity of closure operators, which translate into the complexity of a classifier in ML, or of a utility function in decision theory.
    Date: 2022–02
  6. By: Wan, Wayne Xinwei; Lindenthal, Thies
    Abstract: A rapidly expanding universe of technology-focused startups is trying to change and improve the way real estate markets operate. The undisputed predictive power of machine learning (ML) models often plays a crucial role in the 'disruption' of traditional processes. However, an accountability gap prevails: How do the models arrive at their predictions? Do they do what we hope they do - or are corners cut? Training ML models is a software development process at heart. We suggest to follow a dedicated software testing framework and to verify that the ML model performs as intended. Illustratively, we augment two ML image classifiers with a system testing procedure based on local interpretable model-agnostic explanation (LIME) techniques. Analyzing the classifications sheds light on some of the factors that determine the behavior of the systems.
    Keywords: machine learning,accountability gap,computer vision,real estate,urban studies
    JEL: C52 R30
    Date: 2022
  7. By: Mueller, H.; Rauh, C.
    Abstract: This article proposes a new method for predicting escalations and de†escalations of violence using a model which relies on conflict history and text features. The text features are generated from over 3.5 million newspaper articles using a so†called topic†model. We show that the combined model relies to a large extent on conflict dynamics, but that text is able to contribute meaningfully to the prediction of rare outbreaks of violence in previously peaceful countries. Given the very powerful dynamics of the conflict trap these cases are particularly important for prevention efforts.
    Keywords: Conflict, prediction, machine learning, LDA, topic model, battle deaths, ViEWS prediction competition, random forest
    JEL: F21 C53 C55
    Date: 2022–03–22
  8. By: Beraja, Martin; Yang, David Y.; Yuchtman, Noam
    Abstract: Artificial intelligence (AI) innovation is data-intensive. States have historically collected large amounts of data, which is now being used by AI firms. Gathering comprehensive information on firms and government procurement contracts in China’s facial recognition AI industry, we first study how government data shapes AI innovation. We find evidence of a precise mechanism: because data is sharable across uses, economies of scope arise. Firms awarded public security AI contracts providing access to more government data produce more software for both government and commercial purposes. In a directed technical change model incorporating this mechanism, we then study the trade-offs presented by states’ AI procurement and data pro-vision policies. Surveillance states’ demand for AI may incidentally promote growth, but distort innovation, crowd-out resources, and infringe on civil liberties. Government data provision may be justified when economies of scope are strong and citizens’ privacy concerns are limited.
    Keywords: data; innovation; artificial intelligence; China; economies of scope; directed technical change; industrial policy; privacy; surveillance
    JEL: O30 P00 E00 L50 L63 O23 O40
    Date: 2021–03–31
  9. By: Roy, Avipsa; Law, Mankin
    Abstract: Electric vehicles (EV) are an emerging mode of transportation that has the potential to reshape the transportation sector by significantly reducing carbon emissions thereby promoting a cleaner environment and pushing the boundaries of climate progress. Nevertheless, there remain significant hurdles to the widespread adoption of electric vehicles in the United States ranging from the high cost of EVs to the inequitable placement of EV charging stations (EVCS). A deeper understanding of the underlying complex interactions of social, economic, and demographic factors which may lead to such emerging disparities in EVCS placements is, therefore, necessary to mitigate accessibility issues and improve EV usage among people of all ages and abilities. In this study, we develop a machine learning framework to examine spatial disparities in EVCS placements by using a predictive approach. We first identify the essential socioeconomic factors that may contribute to spatial disparities in EVCS access. Second, using these factors along with ground truth data from existing EVCS placements we predict future ECVS density at multiple spatial scales using machine learning algorithms and compare their predictive accuracy to identify the most optimal spatial resolution for our predictions. Finally, we compare the most accurately predicted EVCS placement density with a spatial inequity indicator to quantify how equitably these placements would be for Orange County, California. Our method achieved the highest predictive accuracy (94.9%) of EVCS placement density at a spatial resolution of 3 km using Random Forests. Our results indicate that a total of 74.18% of predicted EVCS placements in Orange County will lie within a low spatial equity zone – indicating populations with the lowest accessibility may require the highest investments in EVCS placements. Within the low spatial equity areas, 14.86% of the area will have a low density of predicted EVCS placements, 50.32% will have a medium density of predicted EVCS placement, and only 9% tend to have high EVCS placements. The findings from this study highlight a generalizable framework to quantify inequities in EVCS placements that will enable policymakers to identify underserved communities and facilitate targeted infrastructure investments for widespread EV usage and adoption for all.
    Date: 2022–02–22
  10. By: Ehsan Valavi; Joel Hestness; Marco Iansiti; Newsha Ardalani; Feng Zhu; Karim R. Lakhani
    Abstract: Data is fundamental to machine learning-based products and services and is considered strategic due to its externalities for businesses, governments, non-profits, and more generally for society. It is renowned that the value of organizations (businesses, government agencies and programs, and even industries) scales with the volume of available data. What is often less appreciated is that the data value in making useful organizational predictions will range widely and is prominently a function of data characteristics and underlying algorithms. In this research, our goal is to study how the value of data changes over time and how this change varies across contexts and business areas (e.g. next word prediction in the context of history, sports, politics). We focus on data from and compare the value's time-dependency across various Reddit topics (Subreddits). We make this comparison by measuring the rate at which user-generated text data loses its relevance to the algorithmic prediction of conversations. We show that different subreddits have different rates of relevance decline over time. Relating the text topics to various business areas of interest, we argue that competing in a business area in which data value decays rapidly alters strategies to acquire competitive advantage. When data value decays rapidly, access to a continuous flow of data will be more valuable than access to a fixed stock of data. In this kind of setting, improving user engagement and increasing user-base help creating and maintaining a competitive advantage.
    Date: 2022–03
  11. By: Czarnitzki, Dirk; Fernández, Gastón P.; Rammer, Christian
    Abstract: Artificial Intelligence (AI) is often regarded as the next general-purpose technology with a rapid, penetrating, and far-reaching use over a broad number of industrial sectors. A main feature of new general-purpose technology is to enable new ways of production that may increase productivity. So far, however, only very few studies investigated likely productivity effects of AI at the firm-level; presumably because of lacking data. We exploit unique survey data on firms' adoption of AI technology and estimate its productivity effects with a sample of German firms. We employ both a cross-sectional dataset and a panel database. To address the potential endogeneity of AI adoption, we also implement an IV approach. We find positive and significant effects of the use of AI on firm productivity. This finding holds for different measures of AI usage, i.e., an indicator variable of AI adoption, and the intensity with which firms use AI methods in their business processes.
    Keywords: Artificial Intelligence,Productivity,CIS data
    JEL: O14 O31 O33 L25 M15
    Date: 2022
  12. By: Wai Weng Lo; Siamak Layeghy; Marius Portmann
    Abstract: Criminals have become increasingly experienced in using cryptocurrencies, such as Bitcoin, for money laundering. The use of cryptocurrencies can hide criminal identities and transfer hundreds of millions of dollars of dirty funds through their criminal digital wallets. However, this is considered a paradox because cryptocurrencies are gold mines for open-source intelligence, allowing law enforcement agencies to have more power in conducting forensic analyses. This paper proposed Inspection-L, a graph neural network (GNN) framework based on self-supervised Deep Graph Infomax (DGI), with Random Forest (RF), to detect illicit transactions for Anti-Money laundering (AML). To the best of our knowledge, our proposal is the first of applying self-supervised GNNs to the problem of AML in Bitcoin. The proposed method has been evaluated on the Elliptic dataset and shows that our approach outperforms the state-of-the-art in terms of key classification metrics, which demonstrates the potential of self-supervised GNN in cryptocurrency illicit transaction detection.
    Date: 2022–03
  13. By: Emmanuel Flachaire; Gilles Hacheme; Sullivan Hu\'e; S\'ebastien Laurent
    Abstract: Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes or uninterpretable models which has raised concerns from practitioners and regulators. As an alternative, we propose in this paper to use partial linear models that are inherently interpretable. Specifically, this article introduces GAM-lasso (GAMLA) and GAM-autometrics (GAMA), denoted as GAM(L)A in short. GAM(L)A combines parametric and non-parametric functions to accurately capture linearities and non-linearities prevailing between dependent and explanatory variables, and a variable selection procedure to control for overfitting issues. Estimation relies on a two-step procedure building upon the double residual method. We illustrate the predictive performance and interpretability of GAM(L)A on a regression and a classification problem. The results show that GAM(L)A outperforms parametric models augmented by quadratic, cubic and interaction effects. Moreover, the results also suggest that the performance of GAM(L)A is not significantly different from that of random forest and gradient boosting.
    Date: 2022–03
  14. By: Obradovich, Nick; Özak, Ömer; Martín, Ignacio; Ortuño-Ortín, Ignacio; Awad, Edmond; Cebrián, Manuel; Cuevas, Rubén; Desmet, Klaus; Rahwan, Iyad; Cuevas, Ángel
    Abstract: Culture has played a pivotal role in human evolution. Yet, the ability of social scientists to study culture is limited by the currently available measurement instruments. Scholars of culture must regularly choose between scalable but sparse survey-based methods or restricted but rich ethnographic methods. Here, we demonstrate that massive online social networks can advance the study of human culture by providing quantitative, scalable, and high-resolution measurement of behaviorally revealed cultural values and preferences. We employ publicly available data across nearly 60,000 topic dimensions drawn from two billion Facebook users across 225 countries and territories. We first validate that cultural distances calculated from this measurement instrument correspond to traditional survey-based and objective measures of cross-national cultural differences. We then demonstrate that this expanded measure enables rich insight into the cultural landscape globally at previously impossible resolution. We analyze the importance of national borders in shaping culture and compare subnational divisiveness to gender divisiveness across countries. The global collection of massive data on human behavior provides a high-dimensional complement to traditional cultural metrics. Further, the granularity of the measure presents enormous promise to advance scholars' understanding of additional fundamental questions in the social sciences. The measure enables detailed investigation into the geopolitical stability of countries, social cleavages within both small and large-scale human groups, the integration of migrant populations, and the disaffection of certain population groups from the political process, among myriad other potential future applications.
    Keywords: Culture,Cultural Distance,Identity,Regional Culture,Gender Differences
    JEL: C80 F1 J1 O10 R10 Z10
    Date: 2022
  15. By: Riccardo Righi (European Commission - JRC); Montserrat Lopez-Cobo (European Commission - JRC); Michail Papazoglou (European Commission - JRC); Sofia Samoili (European Commission - JRC); Melisande Cardona (European Commission - JRC); Miguel Vazquez-Prada Baillet (European Commission - JRC); Giuditta De-Prato (European Commission - JRC)
    Abstract: The European Commission, as part of its policy to foster digital transformation and succeed in the digital decade, promotes digital skills as a key factor to improve economic competitiveness and social justice. This report provides evidence about the availability of higher education offer in Artificial intelligence, High performance computing, Cybersecurity, and Data science in the academic year 2020-2021, so as to anticipate possible gaps (or abundance) in their offer. Following a keyword-based query methodology that captures the inclusion of advanced digital skills in the programmes’ syllabus, we monitor the availability of masters’ programmes and study their characteristics, such as the scope (broad and specialised), education fields in which digital skills are taught (e.g., Information and communication technologies; Business, administration and law), and the content areas covered by the programmes. The EU’s offer of AI-related specialised master’s programmes is higher than that of the US. Even if the field of education dominating the offer of AI master’s programmes is Information and communication technologies, noticeable shares are also observed for Engineering, manufacturing and construction. In Cybersecurity, the EU is the only area presenting a positive trend during the last year, involving both broad and specialised masters. Despite this, still the EU’s related offer is lower than that of the US and that of the UK. Regarding Data science masters, the US keeps its leading position.
    Keywords: digital skills, education, artificial intelligence, cybersecurity, high performance computing, digital transformation
    Date: 2022–04
  16. By: Bertsch, Christoph (Research Department, Central Bank of Sweden); Hull, Isaiah (Research Department, Central Bank of Sweden); Zhang, Xin (Research Department, Central Bank of Sweden)
    Abstract: According to Shiller (2017), economic and financial narratives often emerge as a con sequence of their virality, rather than their veracity, and constitute an important, but understudied driver of aggregate fluctuations. Using a unique dataset of newspaper articles over the 1950-2019 period and state-of-the-art methods from natural language processing, we characterize the properties of business cycle narratives. Our main finding is that narratives tend to consolidate around a dominant explanation during expansions and fragment into competing explanations during contractions. We also show that the existence of past reference events is strongly associated with increased narrative consolidation.
    Keywords: Natural Language Processing; Machine Learning; Narrative Economics
    JEL: C63 D84 E32 E70
    Date: 2021–01–01
  17. By: Philippe Cotte (Advestis); Pierre Lagier (Fujitsu Laboratories of Europe Ltd. - Fujitsu Laboratories Ltd.); Vincent Margot (Advestis); Christophe Geissler (Advestis)
    Abstract: This article is the result of a collaboration between Fujitsu and Advestis. This collaboration aims at refactoring and running an algorithm based on systematic exploration producing investment recommendations on a high-performance computer of the Fugaku, to see whether a very high number of cores could allow for a deeper exploration of the data compared to a cloud machine, hopefully resulting in better predictions. We found that an increase in the number of explored rules results in a net increase in the predictive performance of the final ruleset. Also, in the particular case of this study, we found that using more than around 40 cores does not bring a significant computation time gain. However, the origin of this limitation is explained by a threshold-based search heuristic used to prune the search space. We have evidence that for similar data sets with less restrictive thresholds, the number of cores actually used could very well be much higher, allowing parallelization to have a much greater effect.
    Keywords: RIPE,Portfolio Management,rule-based algorithm,Expert Systems,High Performance Computing,Parallel Programming,Multiprocessing,XAI
    Date: 2022–02–28
  18. By: Daniel Hopp
    Abstract: The COVID-19 pandemic has demonstrated the increasing need of policymakers for timely estimates of macroeconomic variables. A prior UNCTAD research paper examined the suitability of long short-term memory artificial neural networks (LSTM) for performing economic nowcasting of this nature. Here, the LSTM's performance during the COVID-19 pandemic is compared and contrasted with that of the dynamic factor model (DFM), a commonly used methodology in the field. Three separate variables, global merchandise export values and volumes and global services exports, were nowcast with actual data vintages and performance evaluated for the second, third, and fourth quarters of 2020 and the first and second quarters of 2021. In terms of both mean absolute error and root mean square error, the LSTM obtained better performance in two-thirds of variable/quarter combinations, as well as displayed more gradual forecast evolutions with more consistent narratives and smaller revisions. Additionally, a methodology to introduce interpretability to LSTMs is introduced and made available in the accompanying nowcast_lstm Python library, which is now also available in R, MATLAB, and Julia.
    Date: 2022–03
  19. By: Gonzalo Ballestero (Universidad de San Andrés)
    Abstract: Firms increasingly delegate their strategic decisions to algorithms. A potential concern is that algorithms may undermine competition by leading to pricing outcomes that are collusive, even without having been designed to do so. This paper investigates whether Q-learning algorithms can learn to collude in a setting with sequential price competition and stochastic marginal costs adapted from Maskin and Tirole (1988). By extending a previous model developed in Klein (2021), I find that sequential Q-learning algorithms leads to supracompetitive profits despite they compete under uncertainty and this finding is robust to various extensions. The algorithms can coordinate on focal price equilibria or an Edgeworth cycle provided that uncertainty is not too large. However, as the market environment becomes more uncertain, price wars emerge as the only possible pricing pattern. Even though sequential Q-learning algorithms gain supracompetitive profits, uncertainty tends to make collusive outcomes more dicult to achieve.
    Keywords: Competition Policy, Artificial Intelligence, Algorithmic Collusion
    JEL: D43 K21 L13
    Date: 2022–02
  20. By: Wosnitza, Jan Henrik
    Abstract: The transformation of credit scores into probabilities of default plays an important role in credit risk estimation. The linear logistic regression has developed into a standard calibration approach in the banking sector. With the advent of machine learning techniques in the discriminatory phase of credit risk models, however, the standard calibration approach is currently under scrutiny again. In particular, the assumptions behind the linear logistic regression provide critics with a target. Previous literature has converted the calibration problem into a regression task without any loss of generality. In this paper, we draw on recent academic results in order to suggest two new one-parametric families of differentiable functions as candidates for this regression. The derivation of these two families of differentiable functions is based on the maximum entropy principle and, thus, they rely on a minimum number of assumptions. We compare the performance of four calibration approaches on a real-world data set and find that one of the new one-parametric families outperforms the linear logistic regression. Furthermore, we develop an approach in order to quantify the part of the general estimation error of probabilities of default that stems from the statistical dispersion of the discriminatory power.
    Keywords: Calibration,credit score,cumulative accuracy profile,logistic regression,margin of conservatism,probability of default
    JEL: G17 G21 G33
    Date: 2022
  21. By: Schmidt, Sebastian; Kinne, Jan; Lautenbach, Sven; Blaschke, Thomas; Lenz, David; Resch, Bernd
    Abstract: This Discussion Paper deals with the issue of greenwashing, i.e. the false portrayal of companies as environmentally friendly. The analysis focuses on the US metal industry, which is a major emission source of sulfur dioxide (SO2), one of the most harmful air pollutants. One way to monitor the distribution of atmospheric SO2 concentrations is through satellite data from the Sentinel-5P programme, which represents a major advance due to its unprecedented spatial resolution. In this paper, Sentinel-5P remote sensing data was combined with a plant-level firm database to investigate the relationship between the US metal industry and SO2 concentrations using a spatial regression analysis. Additionally, this study considered web text data, classifying companies based on their websites in order to depict their self-portrayal on the topic of sustainability. In doing so, we investigated the topic of greenwashing, i.e. whether or not a positive self-portrayal regarding sustainability is related to lower local SO2 concentrations. Our results indicated a general, positive correlation between the number of employees in the metal industry and local SO2 concentrations. The web-based analysis showed that only 8% of companies in the metal industry could be classified as engaged in sustainability based on their websites. The regression analyses indicated that these self-reported 'sustainable' companies had a weaker effect on local SO2 concentrations compared to their 'non-sustainable' counterparts, which we interpreted as an indication of the absence of general greenwashing in the US metal industry. However, the large share of firms without a website and lack of specificity of the text classification model were limitations to our methodology.
    Keywords: Sentinel-5P,air pollution,natural language processing,spatial regression
    JEL: Q53 Q56 R11
    Date: 2022
  22. By: Andrew McKendrick
    Abstract: I examine the impact of the Education Maintenance Allowance, a conditional cash transfer in England that was available nationally from 2004 to 2011, on a range of short- and long-term outcomes. Average treatment effects are identified, assuming unconfoundedness, using Inverse Probability Weighting Regression Adjustment. Treatment effect heterogeneity is examined using Causal Forests, a new machine learning approach. I find beneficial impacts of EMA on retention, university attendance and, for the first time, insecure work, as measured by the probability of being on a “zero hours†contract. Other outcomes (educational attainment, risky behaviours, and labour market outcomes) are found not to be impacted.
    Keywords: Education Maintenance Allowance, Causal Forest, Heterogeneity, Labour Market Outcomes, Job Security, Risky Behaviours
    JEL: H52 I12 I28 J22
    Date: 2022
  23. By: Srivastava, Aryan; Kalra, Aarushi; Tiwari, Saket
    Abstract: Village Council or Gram Panchayat (henceforth, Panchayat) elections provide highly localized political shocks, making them suitable to answer various research questions in a causal framework. We collect data for Panchayat elections held in the Indian state of Uttar Pradesh for the years 2021 and 2015. While election candidates’ data are openly available on Uttar Pradesh State Election Commision website, they are not easily accessible. Moreover, the villages are not geolocated. This motivated us to scrape the state website to obtain the data, and geolocate it to allow us to match them with our social media data consisting of the user's activity and their geo-location. These data are a valuable resource for researchers interested in questions related to political economy in the Developing World. Link to the Github repository with the code and data can be found here.
    Date: 2022–03–01

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.