nep-big New Economics Papers
on Big Data
Issue of 2022‒05‒02
thirty-six papers chosen by
Tom Coupé
University of Canterbury

  1. How effective are sanctions on North Korea? Popular DMSP night-lights data may bias evaluations due to blurring and poor low-light detection By John Gibson; Bonggeun Kim; Geua Boe-Gibson
  2. Impacts of Droughts and Floods on Agricultural Productivity in New Zealand as Measured from Space By Elodie Blanc; Ilan Noy
  3. Who pays for gifts to physicians? Heterogeneous effects of industry payments on drug costs By Melissa Newham; Marica Valente
  4. Satisfaction with the Environmental Condition in the Italian Regions between 2004 and 2020 By Laureti, Lucio; Costantiello, Alberto; Leogrande, Angelo
  5. Machine learning model to project the impact of Ukraine crisis By Javad T. Firouzjaee; Pouriya Khaliliyan
  6. Making use of supercomputers in financial machine learning By Philippe Cotte; Pierre Lagier; Vincent Margot; Christophe Geissler
  7. Improving Macroeconomic Model Validity and Forecasting Performance with Pooled Country Data using Structural, Reduced Form, and Neural Network Model By Cameron Fen; Samir Undavia
  8. Offline Deep Reinforcement Learning for Dynamic Pricing of Consumer Credit By Raad Khraishi; Ramin Okhrati
  9. Sparsification and Filtering for Spatial-temporal GNN in Multivariate Time-series By Yuanrong Wang; Tomaso Aste
  10. HiSA-SMFM: Historical and Sentiment Analysis based Stock Market Forecasting Model By Ishu Gupta; Tarun Kumar Madan; Sukhman Singh; Ashutosh Kumar Singh
  11. Artificial Intelligence in Autonomous Vehicles: towards trustworthy systems By FERNANDEZ LLORCA David; GOMEZ Emilia
  12. Trust Predicts Compliance with COVID-19 Containment Policies: Evidence from Ten Countries Using Big Data By Sarracino, Francesco; Greyling, Talita; O'Connor, Kelsey J.; Peroni, Chiara; Rossouw, Stephanié
  13. Predicting Agri-food Quality across Space: A Machine Learning Model for the Acknowledgment of Geographical Indications By Resce, Giuliano; Vaquero-Pineiro, Cristina
  14. Artificial Intelligence and international trade: Some preliminary implications By Janos Ferencz; Javier López González; Irene Oliván García
  15. Learning Transferrable Representations of Career Trajectories for Economic Prediction By Keyon Vafa; Emil Palikot; Tianyu Du; Ayush Kanodia; Susan Athey; David M. Blei
  16. Detecting data-driven robust statistical arbitrage strategies with deep neural networks By Ariel Neufeld; Julian Sester; Daiying Yin
  17. Honest calibration assessment for binary outcome predictions By Timo Dimitriadis; Lutz Duembgen; Alexander Henzi; Marius Puke; Johanna Ziegel
  18. Satellite Image and Machine Learning based Knowledge Extraction in the Poverty and Welfare Domain By Ola Hall; Mattias Ohlsson; Thortseinn R\"ognvaldsson
  19. AI-tocracy By Beraja, Martin; Kao, Andrew; Yang, David Y.; Yuchtman, Noam
  20. Automatic Debiased Machine Learning for Dynamic Treatment Effects By Rahul Singh; Vasilis Syrgkanis
  21. Artificial intelligence and productivity: global evidence from AI patent and bibliometric data. By Aleksandra Parteka; Aleksandra Kordalska
  22. Fast Simulation-Based Bayesian Estimation of Heterogeneous and Representative Agent Models using Normalizing Flow Neural Networks By Cameron Fen
  23. Measuring the Impact of Taxes and Public Services on Property Values: A Double Machine Learning Approach By Isaiah Hull; Anna Grodecka-Messi
  24. Stock Price Prediction using Sentiment Analysis and Deep Learning for Indian Markets By Narayana Darapaneni; Anwesh Reddy Paduri; Himank Sharma; Milind Manjrekar; Nutan Hindlekar; Pranali Bhagat; Usha Aiyer; Yogesh Agarwal
  25. Fusion of Sentiment and Asset Price Predictions for Portfolio Optimization By Mufhumudzi Muthivhi; Terence L. van Zyl
  26. In-Firm Planning and Business Processes Management Using Deep Neural Networks By Fedor Zagumennov
  27. Explainable Machine Learning for Predicting Homicide Clearance in the United States By Gian Maria Campedelli
  28. DAMNETS: A Deep Autoregressive Model for Generating Markovian Network Time Series By Jase Clarkson; Mihai Cucuringu; Andrew Elliott; Gesine Reinert
  29. A Comparative Study on Forecasting of Retail Sales By Md Rashidul Hasan; Muntasir A Kabir; Rezoan A Shuvro; Pankaz Das
  30. Graph similarity learning for change-point detection in dynamic networks By Deborah Sulem; Henry Kenlay; Mihai Cucuringu; Xiaowen Dong
  31. High-Resolution Peak Demand Estimation Using Generalized Additive Models and Deep Neural Networks By Jonathan Berrisch; Micha{\l} Narajewski; Florian Ziel
  32. Bayesian Bilinear Neural Network for Predicting the Mid-price Dynamics in Limit-Order Book Markets By Martin Magris; Mostafa Shabani; Alexandros Iosifidis
  33. DeepTrust: A Reliable Financial Knowledge Retrieval Framework For Explaining Extreme Pricing Anomalies By Pok Wah Chan
  34. Multi-Objective reward generalization: Improving performance of Deep Reinforcement Learning for selected applications in stock and cryptocurrency trading By Federico Cornalba; Constantin Disselkamp; Davide Scassola; Christopher Helf
  35. Measuring anomalies in cigarette sales by using official data from Spanish provinces: Are there only the anomalies detected by the Empty Pack Surveys (EPS) used by Transnational Tobacco Companies (TTCs)? By Pedro Cadahia; Antonio A. Golpe; Juan M. Mart\'in \'Alvarez; E. Asensio
  36. Machine Learning Simulates Agent-Based Model Towards Policy By Bernardo Alves Furtado; Gustavo Onofre Andre\~ao

  1. By: John Gibson (University of Waikato); Bonggeun Kim (Seoul National University); Geua Boe-Gibson (University of Waikato)
    Abstract: The effect of sanctions on economic activity in targeted countries is increasingly studied with satellite-detected night-lights data because conventional economic activity data for such countries are either unavailable or untrustworthy. Many studies use data from the Defense Meteorological Satellite Program (DMSP), designed for observing clouds for short-term weather forecasts rather than for long-run observation of economic activity on earth. The DMSP data are flawed by blurring, and bottom-coding due to poor low-light detection. These errors may bias evaluation of sanction effectiveness. To show this we use a difference-in-differences analysis of impacts on night-lights of the shutdown of the Kaesong Industrial Zone in North Korea, which South Korea closed in 2016 in response to North Korea’s nuclear tests. We estimate impacts of 40–80% declines in luminosity, depending on the choice of comparison region, and these effects are always precisely estimated if data from the accurate Visible Infrared Imaging Radiometer Suite (VIIRS) on the Suomi-NPP satellite are used. Yet with the more widely used DMSP data, apparent impacts are imprecisely estimated and are far smaller. A decomposition suggests much of the attenuation in estimated treatment effects if DMSP data are used comes from false zeroes, which are also likely to matter to evaluations in other poorly lit places.
    Keywords: DMSP; mean-reverting error;night lights;sanctions;VIIRS;North Korea
    JEL: C80 F51 O11
    Date: 2022–03–30
  2. By: Elodie Blanc; Ilan Noy
    Abstract: This study estimates the impact of excess precipitation (or the absence of rainfall) on productivity of agricultural land parcels in New Zealand. This type of post-disaster damage assessments aims to allow for quantification of disaster damage when on-the-ground assessment of damage is too costly or too difficult to conduct. It can also serve as a retroactive data collection tool for disaster loss databases where data collection did not happen at the time of the event. To this end, we use satellite-derived observations of terrestrial vegetation (the Enhanced Vegetation Index – EVI) over the growing season. We pair this data at the land parcel level identifying five land use types (three types of pasture, and annual and perennial crops) with precipitation records, which we use to identify both excessively dry and excessively wet episodes. Using regression analyses, we then examine whether these episodes of excess precipitation had any observable impact on agricultural productivity. Overall, we find statistically significant declines in agricultural productivity that is associated with both floods and droughts. The average impact of these events, averaged over the affected parcels, however, is not very large; usually less than 1%, but quite different across years and across regions. This average hides a heterogeneity of impacts, with some parcels experiencing a much more significant decline in the EVI.
    Keywords: satellite-derived data, crop productivity, drought, flood
    JEL: Q15 Q54 C23
    Date: 2022
  3. By: Melissa Newham; Marica Valente
    Abstract: This paper estimates the impact of gifts - monetary or in-kind payments - from pharmaceutical firms on physicians' prescription decisions and drug costs in the US. Using exhaustive micro data on prescriptions for anti-diabetic drugs from Medicare Part D, we find that payments cause physicians to prescribe more brand drugs. On average, for every dollar spent, payments generate a $6 increase in drug costs. We then estimate heterogeneous causal effects via machine-learning methods. We find large heterogeneity in responses to payments across physicians. Differences are predominantly explained by the insurance coverage of patients: physicians prescribe more brand drugs in response to payments when patients benefit from subsidies that reduce out-of-pocket drug costs. Finally, we estimate that a gift ban would reduce drug costs to treat diabetes by 3%.
    Date: 2022–03
  4. By: Laureti, Lucio; Costantiello, Alberto; Leogrande, Angelo
    Abstract: In the following article, the “Satisfaction with the Environmental Condition” in the 20 Italian regions between 2004 and 2020 was estimated using ISTAT-BES data. The data were analyzed using the following econometric techniques, namely: Panel Data with Random Effects, Panel Data with Fixed Effects, Dynamic Panel, Pooled OLS, WLS. The results show that satisfaction with the environmental situation is positively associated with the following variables "People with at least high school diploma", "Satisfaction with leisure time", "Concern for the deterioration of the landscape" and negatively associated with "Gross disposable income per capita", "Dissatisfaction with the landscape of the place of life", "Perception of the risk of crime". A cluster analysis was then carried out using the unsupervised k-Means algorithm optimized through the Silhouette coefficient and 3 clusters were found. A comparative analysis was then carried out between eight different machine learning algorithms to predict the trend of satisfaction by environmental situation. The analysis showed that the Tree Ensemble Regression algorithm is the best predictor and estimates a reduction of the variable of 0.05%. Subsequently, using augmented data, a further prediction was made with an estimated result equal to -1.93%.
    Keywords: Environmental Economics, Valuation of Environmental Effects, Sustainability, Government Policy, Ecological Economics.
    JEL: Q5 Q51 Q56 Q57 Q58
    Date: 2022–03–19
  5. By: Javad T. Firouzjaee; Pouriya Khaliliyan
    Abstract: Russia's attack on Ukraine on Thursday 24 February 2022 hitched financial markets and the increased geopolitical crisis. In this paper, we select some main economic indexes, such as Gold, Oil (WTI), NDAQ, and known currency which are involved in this crisis and try to find the quantitative effect of this war on them. To quantify the war effect, we use the correlation feature and the relationships between these economic indices, create datasets, and compare the results of forecasts with real data. To study war effects, we use Machine Learning Linear Regression. We carry on empirical experiments and perform on these economic indices datasets to evaluate and predict this war tolls and its effects on main economics indexes.
    Date: 2022–03
  6. By: Philippe Cotte; Pierre Lagier; Vincent Margot; Christophe Geissler
    Abstract: This article is the result of a collaboration between Fujitsu and Advestis. This collaboration aims at refactoring and running an algorithm based on systematic exploration producing investment recommendations on a high-performance computer of the Fugaku, to see whether a very high number of cores could allow for a deeper exploration of the data compared to a cloud machine, hopefully resulting in better predictions. We found that an increase in the number of explored rules results in a net increase in the predictive performance of the final ruleset. Also, in the particular case of this study, we found that using more than around 40 cores does not bring a significant computation time gain. However, the origin of this limitation is explained by a threshold-based search heuristic used to prune the search space. We have evidence that for similar data sets with less restrictive thresholds, the number of cores actually used could very well be much higher, allowing parallelization to have a much greater effect.
    Date: 2022–03
  7. By: Cameron Fen; Samir Undavia
    Abstract: We show that pooling countries across a panel dimension to macroeconomic data can improve by a statistically significant margin the generalization ability of structural, reduced form, and machine learning (ML) methods to produce state-of-the-art results. Using GDP forecasts evaluated on an out-of-sample test set, this procedure reduces root mean squared error by 12\% across horizons and models for certain reduced-form models and by 24\% across horizons for dynamic structural general equilibrium models. Removing US data from the training set and forecasting out-of-sample country-wise, we show that reduced-form and structural models are more policy-invariant when trained on pooled data, and outperform a baseline that uses US data only. Given the comparative advantage of ML models in a data-rich regime, we demonstrate that our recurrent neural network model and automated ML approach outperform all tested baseline economic models. Robustness checks indicate that our outperformance is reproducible, numerically stable, and generalizable across models.
    Date: 2022–03
  8. By: Raad Khraishi; Ramin Okhrati
    Abstract: We introduce a method for pricing consumer credit using recent advances in offline deep reinforcement learning. This approach relies on a static dataset and requires no assumptions on the functional form of demand. Using both real and synthetic data on consumer credit applications, we demonstrate that our approach using the conservative Q-Learning algorithm is capable of learning an effective personalized pricing policy without any online interaction or price experimentation.
    Date: 2022–03
  9. By: Yuanrong Wang; Tomaso Aste
    Abstract: We propose an end-to-end architecture for multivariate time-series prediction that integrates a spatial-temporal graph neural network with a matrix filtering module. This module generates filtered (inverse) correlation graphs from multivariate time series before inputting them into a GNN. In contrast with existing sparsification methods adopted in graph neural network, our model explicitly leverage time-series filtering to overcome the low signal-to-noise ratio typical of complex systems data. We present a set of experiments, where we predict future sales from a synthetic time-series sales dataset. The proposed spatial-temporal graph neural network displays superior performances with respect to baseline approaches, with no graphical information, and with fully connected, disconnected graphs and unfiltered graphs.
    Date: 2022–03
  10. By: Ishu Gupta; Tarun Kumar Madan; Sukhman Singh; Ashutosh Kumar Singh
    Abstract: One of the pillars to build a country's economy is the stock market. Over the years, people are investing in stock markets to earn as much profit as possible from the amount of money that they possess. Hence, it is vital to have a prediction model which can accurately predict future stock prices. With the help of machine learning, it is not an impossible task as the various machine learning techniques if modeled properly may be able to provide the best prediction values. This would enable the investors to decide whether to buy, sell or hold the share. The aim of this paper is to predict the future of the financial stocks of a company with improved accuracy. In this paper, we have proposed the use of historical as well as sentiment data to efficiently predict stock prices by applying LSTM. It has been found by analyzing the existing research in the area of sentiment analysis that there is a strong correlation between the movement of stock prices and the publication of news articles. Therefore, in this paper, we have integrated these factors to predict the stock prices more accurately.
    Date: 2022–03
  11. By: FERNANDEZ LLORCA David (European Commission - JRC); GOMEZ Emilia (European Commission - JRC)
    Abstract: As Artificial Intelligence (AI) is the main enabler of Autonomous Vehicles (AVs), and autonomous mobility is a scenario of high-risk nature, sectorial regulations of AVs are expected to be aligned with the AI Act. Beyond requirements of safety and robustness, other important criteria to be considered include human agency and oversight, security, privacy, data governance, transparency, explainability, diversity, fairness, social and environmental wellbeing and accountability. These trustworthy requirements for AVs have a heterogeneous level of maturity and bring new research and development challenges in different areas. A specific analysis of the evaluation criteria for trustworthy AI in the context of autonomous driving is needed. There is a window of opportunity to define a European approach to AVs in future implementing acts, by including requirements of trustworthy AI systems in future procedures for the type-approval of AVs at EU level.
    Keywords: Artificial Intelligence, Autonomous Vehicles
    Date: 2022–03
  12. By: Sarracino, Francesco (STATEC Research – National Institute of Statistics and Economic Studies); Greyling, Talita (University of Johannesburg); O'Connor, Kelsey J. (STATEC Research – National Institute of Statistics and Economic Studies); Peroni, Chiara (STATEC Research – National Institute of Statistics and Economic Studies); Rossouw, Stephanié (Auckland University of Technology)
    Abstract: Previous evidence indicates trust is an important correlate of compliance with COVID-19 containment policies. However, this conclusion hinges on two crucial assumptions: first, that compliance does not change over time, and second, that mobility or self-reported measures are good proxies for compliance. This study is the first to use a time-varying measure of compliance to study the relationship between compliance and trust in others and institutions over the period from March 2020 to January 2021 in ten mostly European countries. We calculate a time-varying measure of compliance as the association between containment policies and people's mobility behavior using data from the Oxford Policy Tracker and Google. Additionally, we develop measures of trust in others and national institutions by applying emotion analysis to Twitter data. We test the predictive role of our trust measures using various panel estimation techniques. Our findings demonstrate that compliance does change over time and that increasing (decreasing) trust in others predicts increasing (decreasing) compliance. This evidence indicates compliance should not be taken for granted, and confirms the importance of cultivating trust in others. Nurturing trust in others, through ad-hoc policies such as community activity programs and urban design to facilitate social interactions, can foster compliance with public policies.
    Keywords: compliance, COVID-19, trust, big data, Twitter
    JEL: D91 I18 H12
    Date: 2022–03
  13. By: Resce, Giuliano; Vaquero-Pineiro, Cristina
    Abstract: Geographical Indications (GIs), as Protected Designation of Origin (PDO) and Protected Geographical Indication (PGI), offer a unique protection scheme to preserve high-quality agri-food productions and support rural development, and they have been recognised as a powerful tool to enhance sustainable development and ecological economic transactions at the territorial level. However, not all the areas with traditional agri-food products are acknowledge with a GI. Examining the Italian wine sector by a geo-referenced and a machine learning framework, we show that municipalities which obtain a GI within the following 10 years (2002-2011) can be predicted using a large set of (lagged) municipality-level data (1981-2001). We find that the Random Forest algorithm is the best model to make out-of-sample predictions of municipalities which obtain GIs. Among the features used, the local wine growing tradition, proximity to capital cities, local employment and education rates emerge as crucial in the prediction of GI certifications. This evidence can support policy makers and stakeholders to target rural development policies and investment allocation, and it offers strong policy implications for the future reforms of this quality scheme.
    Keywords: Geographical Indications, Rural Development, Agri-Food Production, Machine Learning, Geo-Referenced Data
    JEL: C53 Q18
    Date: 2022–04–11
  14. By: Janos Ferencz; Javier López González; Irene Oliván García
    Abstract: Artificial intelligence (AI) has strong potential to spur innovation, help firms create new value from data, and reduce trade costs. Growing interest in the economic and societal impacts of AI has also prompted interest in the trade implications of this new technology. While AI technologies have the potential to fundamentally change trade and international business models, trade itself can also be an important mechanism through which countries and firms access the inputs needed to build AI systems, whether goods, services, people or data, and through which they can deploy AI solutions globally. This paper explores the interlinkages between AI technologies and international trade and outlines key trade policy considerations for policy makers seeking to harness the full potential of AI technologies.
    Keywords: Data flows, Digital trade, Innovations, Regional Trade Agreements, Trade policy
    JEL: F13 F14 O33
    Date: 2022–04–22
  15. By: Keyon Vafa; Emil Palikot; Tianyu Du; Ayush Kanodia; Susan Athey; David M. Blei
    Abstract: Understanding career trajectories -- the sequences of jobs that individuals hold over their working lives -- is important to economists for studying labor markets. In the past, economists have estimated relevant quantities by fitting predictive models to small surveys, but in recent years large datasets of online resumes have also become available. These new datasets provide job sequences of many more individuals, but they are too large and complex for standard econometric modeling. To this end, we adapt ideas from modern language modeling to the analysis of large-scale job sequence data. We develop CAREER, a transformer-based model that learns a low-dimensional representation of an individual's job history. This representation can be used to predict jobs directly on a large dataset, or can be "transferred" to represent jobs in smaller and better-curated datasets. We fit the model to a large dataset of resumes, 24 million people who are involved in more than a thousand unique occupations. It forms accurate predictions on held-out data, and it learns useful career representations that can be fine-tuned to make accurate predictions on common economics datasets.
    Date: 2022–02
  16. By: Ariel Neufeld; Julian Sester; Daiying Yin
    Abstract: We present an approach, based on deep neural networks, that allows identifying robust statistical arbitrage strategies in financial markets. Robust statistical arbitrage strategies refer to self-financing trading strategies that enable profitable trading under model ambiguity. The presented novel methodology does not suffer from the curse of dimensionality nor does it depend on the identification of cointegrated pairs of assets and is therefore applicable even on high-dimensional financial markets or in markets where classical pairs trading approaches fail. Moreover, we provide a method to build an ambiguity set of admissible probability measures that can be derived from observed market data. Thus, the approach can be considered as being model-free and entirely data-driven. We showcase the applicability of our method by providing empirical investigations with highly profitable trading performances even in 50 dimensions, during financial crises, and when the cointegration relationship between asset pairs stops to persist.
    Date: 2022–03
  17. By: Timo Dimitriadis; Lutz Duembgen; Alexander Henzi; Marius Puke; Johanna Ziegel
    Abstract: Probability predictions from binary regressions or machine learning methods ought to be calibrated: If an event is predicted to occur with probability $x$, it should materialize with approximately that frequency, which means that the so-called calibration curve $p(x)$ should equal the bisector for all $x$ in the unit interval. We propose honest calibration assessment based on novel confidence bands for the calibration curve, which are valid only subject to the natural assumption of isotonicity. Besides testing the classical goodness-of-fit null hypothesis of perfect calibration, our bands facilitate inverted goodness-of-fit tests whose rejection allows for the sought-after conclusion of a sufficiently well specified model. We show that our bands have a finite sample coverage guarantee, are narrower than existing approaches, and adapt to the local smoothness and variance of the calibration curve $p$. In an application to model predictions of an infant having a low birth weight, the bounds give informative insights on model calibration.
    Date: 2022–03
  18. By: Ola Hall; Mattias Ohlsson; Thortseinn R\"ognvaldsson
    Abstract: Recent advances in artificial intelligence and machine learning have created a step change in how to measure human development indicators, in particular asset based poverty. The combination of satellite imagery and machine learning has the capability to estimate poverty at a level similar to what is achieved with workhorse methods such as face-to-face interviews and household surveys. An increasingly important issue beyond static estimations is whether this technology can contribute to scientific discovery and consequently new knowledge in the poverty and welfare domain. A foundation for achieving scientific insights is domain knowledge, which in turn translates into explainability and scientific consistency. We review the literature focusing on three core elements relevant in this context: transparency, interpretability, and explainability and investigate how they relates to the poverty, machine learning and satellite imagery nexus. Our review of the field shows that the status of the three core elements of explainable machine learning (transparency, interpretability and domain knowledge) is varied and does not completely fulfill the requirements set up for scientific insights and discoveries. We argue that explainability is essential to support wider dissemination and acceptance of this research, and explainability means more than just interpretability.
    Date: 2022–03
  19. By: Beraja, Martin; Kao, Andrew; Yang, David Y.; Yuchtman, Noam
    Abstract: Can frontier innovation be sustained under autocracy? We argue that innovation and autocracy can be mutually reinforcing when: (i) the new technology bolsters the autocrat's power; and (ii) the autocrat's demand for the technology stimulates further innovation in applications beyond those benefiting it directly. We test for such a mutually reinforcing relationship in the context of facial recognition AI in China. To do so, we gather comprehensive data on AI firms and government procurement contracts, as well as on social unrest across China during the last decade. We first show that autocrats benefit from AI: local unrest leads to greater government procurement of facial recognition AI, and increased AI procurement suppresses subsequent unrest. We then show that AI innovation benefits from autocrats' suppression of unrest: the contracted AI firms innovate more both for the government and commercial markets. Taken together, these results suggest the possibility of sustained AI innovation under the Chinese regime: AI innovation entrenches the regime, and the regime's investment in AI for political control stimulates further frontier innovation.
    Keywords: artificial intelligence; autocracy; innovation; data; China; surveillance; political unrest; Global Professorships program
    JEL: O30 P00 E00 L50 L63 O40
    Date: 2021–11–02
  20. By: Rahul Singh; Vasilis Syrgkanis
    Abstract: We extend the idea of automated debiased machine learning to the dynamic treatment regime. We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a recursive Riesz representer characterization of nested mean regressions. We then apply a recursive Riesz representer estimation learning algorithm that estimates de-biasing corrections without the need to characterize how the correction terms look like, such as for instance, products of inverse probability weighting terms, as is done in prior work on doubly robust estimation in the dynamic regime. Our approach defines a sequence of loss minimization problems, whose minimizers are the mulitpliers of the de-biasing correction, hence circumventing the need for solving auxiliary propensity models and directly optimizing for the mean squared error of the target de-biasing correction.
    Date: 2022–03
  21. By: Aleksandra Parteka (Gdansk University of Technology, Gdansk, Poland); Aleksandra Kordalska (Gdansk University of Technology, Gdansk, Poland)
    Abstract: In this paper we analyse the effects of technological innovation in the artificial intelligence (AI) domain on productivity. We embed the recently released data on patents and publications related to AI into an augmented panel model of productivity growth, estimated for OECD countries, and compared to a non-OECD sample. Our instrumental variables' estimates, accounting for AI endogeneity, provide evidence in favour of the modern (AI) productivity paradox. We show that the development of AI technologies remains a niche innovation phenomenon with a negligible role in the officially recorded productivity growth process. This general result, i.e. the lack of a strong relationship between AI and productivity growth, is robust to changes in the country sample, in the way we quantify labour productivity or the creation of AI technology, in the specification of the empirical model (control variables) or in estimation methods.
    Keywords: technological innovation, productivity paradox, productivity growth, artificial intelligence, patents
    JEL: O33 O47
    Date: 2022–01
  22. By: Cameron Fen
    Abstract: This paper proposes a simulation-based deep learning Bayesian procedure for the estimation of macroeconomic models. This approach is able to derive posteriors even when the likelihood function is not tractable. Because the likelihood is not needed for Bayesian estimation, filtering is also not needed. This allows Bayesian estimation of HANK models with upwards of 800 latent states as well as estimation of representative agent models that are solved with methods that don't yield a likelihood--for example, projection and value function iteration approaches. I demonstrate the validity of the approach by estimating a 10 parameter HANK model solved via the Reiter method that generates 812 covariates per time step, where 810 are latent variables, showing this can handle a large latent space without model reduction. I also estimate the algorithm with an 11-parameter model solved via value function iteration, which cannot be estimated with Metropolis-Hastings or even conventional maximum likelihood estimators. In addition, I show the posteriors estimated on Smets-Wouters 2007 are higher quality and faster using simulation-based inference compared to Metropolis-Hastings. This approach helps address the computational expense of Metropolis-Hastings and allows solution methods which don't yield a tractable likelihood to be estimated.
    Date: 2022–03
  23. By: Isaiah Hull; Anna Grodecka-Messi
    Abstract: How do property prices respond to changes in local taxes and local public services? Attempts to measure this, starting with Oates (1969), have suffered from a lack of local public service controls. Recent work attempts to overcome such data limitations through the use of quasi-experimental methods. We revisit this fundamental problem, but adopt a different empirical strategy that pairs the double machine learning estimator of Chernozhukov et al. (2018) with a novel dataset of 947 time-varying local characteristic and public service controls for all municipalities in Sweden over the 2010-2016 period. We find that properly controlling for local public service and characteristic controls more than doubles the estimated impact of local income taxes on house prices. We also exploit the unique features of our dataset to demonstrate that tax capitalization is stronger in areas with greater municipal competition, providing support for a core implication of the Tiebout hypothesis. Finally, we measure the impact of public services, education, and crime on house prices and the effect of local taxes on migration.
    Date: 2022–03
  24. By: Narayana Darapaneni; Anwesh Reddy Paduri; Himank Sharma; Milind Manjrekar; Nutan Hindlekar; Pranali Bhagat; Usha Aiyer; Yogesh Agarwal
    Abstract: Stock market prediction has been an active area of research for a considerable period. Arrival of computing, followed by Machine Learning has upgraded the speed of research as well as opened new avenues. As part of this research study, we aimed to predict the future stock movement of shares using the historical prices aided with availability of sentiment data. Two models were used as part of the exercise, LSTM was the first model with historical prices as the independent variable. Sentiment Analysis captured using Intensity Analyzer was used as the major parameter for Random Forest Model used for the second part, some macro parameters like Gold, Oil prices, USD exchange rate and Indian Govt. Securities yields were also added to the model for improved accuracy of the model. As the end product, prices of 4 stocks viz. Reliance, HDFC Bank, TCS and SBI were predicted using the aforementioned two models. The results were evaluated using RMSE metric.
    Date: 2022–04
  25. By: Mufhumudzi Muthivhi; Terence L. van Zyl
    Abstract: The fusion of public sentiment data in the form of text with stock price prediction is a topic of increasing interest within the financial community. However, the research literature seldom explores the application of investor sentiment in the Portfolio Selection problem. This paper aims to unpack and develop an enhanced understanding of the sentiment aware portfolio selection problem. To this end, the study uses a Semantic Attention Model to predict sentiment towards an asset. We select the optimal portfolio through a sentiment-aware Long Short Term Memory (LSTM) recurrent neural network for price prediction and a mean-variance strategy. Our sentiment portfolio strategies achieved on average a significant increase in revenue above the non-sentiment aware models. However, the results show that our strategy does not outperform traditional portfolio allocation strategies from a stability perspective. We argue that an improved fusion of sentiment prediction with a combination of price prediction and portfolio optimization leads to an enhanced portfolio selection strategy.
    Date: 2022–03
  26. By: Fedor Zagumennov (Plekhanov Russian University of Economics, Department of Industrial Economics, Moscow, Russia Author-2-Name: Andrei Bystrov Author-2-Workplace-Name: Plekhanov Russian University of Economics, Department of Industrial Economics, Moscow, Russia Author-3-Name: Alexey Radaykin Author-3-Workplace-Name: Plekhanov Russian University of Economics, Department of Industrial Economics, Moscow, Russia Author-4-Name: Author-4-Workplace-Name: Author-5-Name: Author-5-Workplace-Name: Author-6-Name: Author-6-Workplace-Name: Author-7-Name: Author-7-Workplace-Name: Author-8-Name: Author-8-Workplace-Name:)
    Abstract: " Objective - The objective of this paper is to consider using machine learning approaches for in-firm processes prediction and to give an estimation of such values as effective production quantities. Methodology - The research methodology used is a synthesis of a deep-learning model, which is used to predict half of real business data for comparison with the remaining half. The structure of the convolutional neural network (CNN) model is provided, as well as the results of experiments with real orders, procurements, and income data. The key findings in this paper are that convolutional with a long-short-memory approach is better than a single convolutional method of prediction. Findings - This research also considers useof such technologies on business digital platforms. According to the results, there are guidelines formulated for the implementation in the particular ERP systems or web business platforms. Novelty - This paper describes the practical usage of 1-dimensional(1D) convolutional neural networks and a mixed approach with convolutional and long-short memory networks for in-firm planning tasks such as income prediction, procurements, and order demand analysis. Type of Paper - Empirical."
    Keywords: Business; Neural, Networks; CNN; Platform
    JEL: C45 C49
    Date: 2021–12–31
  27. By: Gian Maria Campedelli
    Abstract: Purpose: To explore the potential of Explainable Machine Learning in the prediction and detection of drivers of cleared homicides at the national- and state-levels in the United States. Methods: First, nine algorithmic approaches are compared to assess the best performance in predicting cleared homicides country-wise, using data from the Murder Accountability Project. The most accurate algorithm among all (XGBoost) is then used for predicting clearance outcomes state-wise. Second, SHAP, a framework for Explainable Artificial Intelligence, is employed to capture the most important features in explaining clearance patterns both at the national and state levels. Results: At the national level, XGBoost demonstrates to achieve the best performance overall. Substantial predictive variability is detected state-wise. In terms of explainability, SHAP highlights the relevance of several features in consistently predicting investigation outcomes. These include homicide circumstances, weapons, victims' sex and race, as well as number of involved offenders and victims. Conclusions: Explainable Machine Learning demonstrates to be a helpful framework for predicting homicide clearance. SHAP outcomes suggest a more organic integration of the two theoretical perspectives emerged in the literature. Furthermore, jurisdictional heterogeneity highlights the importance of developing ad hoc state-level strategies to improve police performance in clearing homicides.
    Date: 2022–03
  28. By: Jase Clarkson; Mihai Cucuringu; Andrew Elliott; Gesine Reinert
    Abstract: In this work, we introduce DAMNETS, a deep generative model for Markovian network time series. Time series of networks are found in many fields such as trade or payment networks in economics, contact networks in epidemiology or social media posts over time. Generative models of such data are useful for Monte-Carlo estimation and data set expansion, which is of interest for both data privacy and model fitting. Using recent ideas from the Graph Neural Network (GNN) literature, we introduce a novel GNN encoder-decoder structure in which an encoder GNN learns a latent representation of the input graph, and a decoder GNN uses this representation to simulate the network dynamics. We show using synthetic data sets that DAMNETS can replicate features of network topology across time observed in the real world, such as changing community structure and preferential attachment. DAMNETS outperforms competing methods on all of our measures of sample quality over several real and synthetic data sets.
    Date: 2022–03
  29. By: Md Rashidul Hasan; Muntasir A Kabir; Rezoan A Shuvro; Pankaz Das
    Abstract: Predicting product sales of large retail companies is a challenging task considering volatile nature of trends, seasonalities, events as well as unknown factors such as market competitions, change in customer's preferences, or unforeseen events, e.g., COVID-19 outbreak. In this paper, we benchmark forecasting models on historical sales data from Walmart to predict their future sales. We provide a comprehensive theoretical overview and analysis of the state-of-the-art timeseries forecasting models. Then, we apply these models on the forecasting challenge dataset (M5 forecasting by Kaggle). Specifically, we use a traditional model, namely, ARIMA (Autoregressive Integrated Moving Average), and recently developed advanced models e.g., Prophet model developed by Facebook, light gradient boosting machine (LightGBM) model developed by Microsoft and benchmark their performances. Results suggest that ARIMA model outperforms the Facebook Prophet and LightGBM model while the LightGBM model achieves huge computational gain for the large dataset with negligible compromise in the prediction accuracy.
    Date: 2022–03
  30. By: Deborah Sulem; Henry Kenlay; Mihai Cucuringu; Xiaowen Dong
    Abstract: Dynamic networks are ubiquitous for modelling sequential graph-structured data, e.g., brain connectome, population flows and messages exchanges. In this work, we consider dynamic networks that are temporal sequences of graph snapshots, and aim at detecting abrupt changes in their structure. This task is often termed network change-point detection and has numerous applications, such as fraud detection or physical motion monitoring. Leveraging a graph neural network model, we design a method to perform online network change-point detection that can adapt to the specific network domain and localise changes with no delay. The main novelty of our method is to use a siamese graph neural network architecture for learning a data-driven graph similarity function, which allows to effectively compare the current graph and its recent history. Importantly, our method does not require prior knowledge on the network generative distribution and is agnostic to the type of change-points; moreover, it can be applied to a large variety of networks, that include for instance edge weights and node attributes. We show on synthetic and real data that our method enjoys a number of benefits: it is able to learn an adequate graph similarity function for performing online network change-point detection in diverse types of change-point settings, and requires a shorter data history to detect changes than most existing state-of-the-art baselines.
    Date: 2022–03
  31. By: Jonathan Berrisch; Micha{\l} Narajewski; Florian Ziel
    Abstract: This paper presents a method for estimating high-resolution electricity peak demand given lower resolution data. The technique won a data competition organized by the British distribution network operator Western Power Distribution. The exercise was to estimate the minimum and maximum load values in a single substation in a one-minute resolution as precisely as possible. In contrast, the data was given in half-hourly and hourly resolutions. The winning method combines generalized additive models (GAM) and deep artificial neural networks (DNN) which are popular in load forecasting. We provide an extensive analysis of the prediction models, including the importance of input parameters with a focus on load, weather, and seasonal effects. In addition, we provide a rigorous evaluation study that goes beyond the competition frame to analyze the robustness. The results show that the proposed methods are superior, not only in the single competition month but also in the meaningful evaluation study.
    Date: 2022–03
  32. By: Martin Magris; Mostafa Shabani; Alexandros Iosifidis
    Abstract: The prediction of financial markets is a challenging yet important task. In modern electronically-driven markets traditional time-series econometric methods often appear incapable of capturing the true complexity of the multi-level interactions driving the price dynamics. While recent research has established the effectiveness of traditional machine learning (ML) models in financial applications, their intrinsic inability in dealing with uncertainties, which is a great concern in econometrics research and real business applications, constitutes a major drawback. Bayesian methods naturally appear as a suitable remedy conveying the predictive ability of ML methods with the probabilistically-oriented practice of econometric research. By adopting a state-of-the-art second-order optimization algorithm, we train a Bayesian bilinear neural network with temporal attention, suitable for the challenging time-series task of predicting mid-price movements in ultra-high-frequency limit-order book markets. By addressing the use of predictive distributions to analyze errors and uncertainties associated with the estimated parameters and model forecasts, we thoroughly compare our Bayesian model with traditional ML alternatives. Our results underline the feasibility of the Bayesian deep learning approach and its predictive and decisional advantages in complex econometric tasks, prompting future research in this direction.
    Date: 2022–03
  33. By: Pok Wah Chan
    Abstract: Extreme pricing anomalies may occur unexpectedly without a trivial cause, and equity traders typically experience a meticulous process to source disparate information and analyze its reliability before integrating it into the trusted knowledge base. We introduce DeepTrust, a reliable financial knowledge retrieval framework on Twitter to explain extreme price moves at speed, while ensuring data veracity using state-of-the-art NLP techniques. Our proposed framework consists of three modules, specialized for anomaly detection, information retrieval and reliability assessment. The workflow starts with identifying anomalous asset price changes using machine learning models trained with historical pricing data, and retrieving correlated unstructured data from Twitter using enhanced queries with dynamic search conditions. DeepTrust extrapolates information reliability from tweet features, traces of generative language model, argumentation structure, subjectivity and sentiment signals, and refine a concise collection of credible tweets for market insights. The framework is evaluated on two self-annotated financial anomalies, i.e., Twitter and Facebook stock price on 29 and 30 April 2021. The optimal setup outperforms the baseline classifier by 7.75% and 15.77% on F0.5-scores, and 10.55% and 18.88% on precision, respectively, proving its capability in screening unreliable information precisely. At the same time, information retrieval and reliability assessment modules are analyzed individually on their effectiveness and causes of limitations, with identified subjective and objective factors that influence the performance. As a collaborative project with Refinitiv, this framework paves a promising path towards building a scalable commercial solution that assists traders to reach investment decisions on pricing anomalies with authenticated knowledge from social media platforms in real-time.
    Date: 2022–03
  34. By: Federico Cornalba; Constantin Disselkamp; Davide Scassola; Christopher Helf
    Abstract: We investigate the potential of Multi-Objective, Deep Reinforcement Learning for stock and cryptocurrency trading. More specifically, we build on the generalized setting \`a la Fontaine and Friedman arXiv:1809.06364 (where the reward weighting mechanism is not specified a priori, but embedded in the learning process) by complementing it with computational speed-ups, and adding the cumulative reward's discount factor to the learning process. Firstly, we verify that the resulting Multi-Objective algorithm generalizes well, and we provide preliminary statistical evidence showing that its prediction is more stable than the corresponding Single-Objective strategy's. Secondly, we show that the Multi-Objective algorithm has a clear edge over the corresponding Single-Objective strategy when the reward mechanism is sparse (i.e., when non-null feedback is infrequent over time). Finally, we discuss the generalization properties of the discount factor. The entirety of our code is provided in open source format.
    Date: 2022–03
  35. By: Pedro Cadahia; Antonio A. Golpe; Juan M. Mart\'in \'Alvarez; E. Asensio
    Abstract: There is literature that questions the veracity of the studies commissioned by the transnational tobacco companies (TTC) to measure the illicit tobacco trade. Furthermore, there are studies that indicate that the Empty Pack Surveys (EPS) ordered by the TTCs are oversized. The novelty of this study is that, in addition to detecting the anomalies analyzed in the EPSs, there are provinces in which cigarette sales are higher than reasonable values, something that the TTCs ignore. This study analyzed simultaneously, firstly, if the EPSs established in each of the 47 Spanish provinces were fulfilled. Second, anomalies observed in provinces where sales exceed expected values are measured. To achieve the objective of the paper, provincial data on cigarette sales, price and GDP per capita are used. These data are modeled with machine learning techniques widely used to detect anomalies in other areas. The results reveal that the provinces in which sales below reasonable values are observed (as detected by the EPSs) present a clear geographical pattern. Furthermore, the values provided by the EPSs in Spain, as indicated in the previous literature, are slightly oversized. Finally, there are regions bordering other countries or with a high tourist influence in which the observed sales are higher than the expected values.
    Date: 2022–03
  36. By: Bernardo Alves Furtado; Gustavo Onofre Andre\~ao
    Abstract: Public Policies are not intrinsically positive or negative. Rather, policies provide varying levels of effects across different recipients. Methodologically, computational modeling enables the application of a combination of multiple influences on empirical data, thus allowing for heterogeneous response to policies. We use a random forest machine learning algorithm to emulate an agent-based model (ABM) and evaluate competing policies across 46 Metropolitan Regions (MRs) in Brazil. In doing so, we use input parameters and output indicators of 11,076 actual simulation runs and one million emulated runs. As a result, we obtain the optimal (and non-optimal) performance of each region over the policies. Optimum is defined as a combination of production and inequality indicators for the full ensemble of MRs. Results suggest that MRs already have embedded structures that favor optimal or non-optimal results, but they also illustrate which policy is more beneficial to each place. In addition to providing MR-specific policies' results, the use of machine learning to simulate an ABM reduces the computational burden, whereas allowing for a much larger variation among model parameters. The coherence of results within the context of larger uncertainty -- vis-\`a-vis those of the original ABM -- suggests an additional test of robustness of the model. At the same time the exercise indicates which parameters should policymakers intervene, in order to work towards optimum of MRs.
    Date: 2022–03

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.