nep-big New Economics Papers
on Big Data
Issue of 2021‒12‒13
nineteen papers chosen by
Tom Coupé
University of Canterbury

  1. Forecasting Regional Milk Production Quantity: A Comparison of Regression Models and Machine Learning By Baaken, Dominik; Hess, Sebastian
  2. Algorithmic Collusion: Insights from Deep Learning By Matthias Hettich
  3. Machine Learning, Behavioral Targeting and Regression Discontinuity Designs By Narayanan, Sridhar; Kalyanam, Kirthi
  4. What is holding back artificial intelligence adoption in Europe? By Mia Hoffmann; Laura Nurski
  5. Sub-interval images. Big Data By Harin, Alexander
  6. Artificial Intelligence, Growth and Employment: The Role of Policy By Philippe Aghion; Céline Antonin; Simon Bunel
  7. Home sweet home: Assessment of Readiness of Croatian Companies to Introduce I4.0 Technologies By Rajka Hrbić; Tomislav Grebenar
  8. Forecasting Crude Oil Price Using Event Extraction By Jiangwei Liu; Xiaohong Huang
  9. Free Trade Agreements and the Movement of Business People By Thierry Mayer; Hillel Rapoport; Camilo Umana Dajud
  10. The Determinants of Landscape and Cultural Heritage Among Italian Regions in the Period 2004-2019 By Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio; Leogrande, Domenico
  11. The “Human Factor” in Prisoner’s Dilemma Cooperation By Iván Barreda-Tarrazona; Ainhoa Jaramillo-Gutiérrez; Marina Pavan; Gerardo Sabater-Grande
  12. Fiscal rules’ compliance and Social Welfare. By Kea BARET
  13. Evolution of the EU market share of robotics: Data and Methodology By Nestor Duch-Brown; Fiammetta Rossetti; Richard Haarburger
  14. FinRL: Deep Reinforcement Learning Framework to Automate Trading in Quantitative Finance By Xiao-Yang Liu; Hongyang Yang; Jiechao Gao; Christina Dan Wang
  15. A Universal End-to-End Approach to Portfolio Optimization via Deep Learning By Chao Zhang; Zihao Zhang; Mihai Cucuringu; Stefan Zohren
  16. Environmental News Emotion and Air Pollution in China By Sébastien Marchand; Damien Cubizol; Elda Nasho Ah-Pine; Huanxiu Guo
  17. Deep Structural Estimation:With an Application to Option Pricing By Hui Chen; Antoine Didisheim; Simon Scheidegger
  18. Smells Like Animal Spirits: The Effect of Corporate Sentiment on Investment By Gianni La Cava
  19. Consumer and employer discrimination in professional sports markets – New evidence from Major League Baseball By Wolfgang Maennig; Steffen Q. Mueller

  1. By: Baaken, Dominik; Hess, Sebastian
    Keywords: Livestock Production/Industries
    Date: 2021–08
  2. By: Matthias Hettich
    Abstract: Increasingly, firms use algorithms powered by artificial intelligence to set prices. Previous research simulated interactions among Q-learning algorithms in an oligopoly model of price competition. The algorithms learn collusive strategies but require a long time that corresponds to several years to do so. We show that pricing algorithms using deep learning (DQN) can collude significantly faster. The availability of these more powerful pricing algorithms enables simulations in larger markets. Collusion disappears in wide oligopolies with up to 10 firms. However, incorporating knowledge of the learning behavior by reformulating the state representation increases the ability to collude effectively.
    Keywords: Algorithmic Pricing, Collusion, Artificial Intelligence, Reinforcement Learning, DQN
    JEL: D21 D43 D83 L12 L13
    Date: 2021–11
  3. By: Narayanan, Sridhar (Stanford University); Kalyanam, Kirthi (Santa Clara University)
    Abstract: The availability of behavioral data on customers and advances in machine learning methods have enabled scoring and targeting of customers in a variety of domains, including pricing, advertising, recommendation and personal selling. Typically, such targeting involves first training a machine learning algorithm on a training dataset, using that algorithm to score current or potential customers, and when the score crosses a threshold, a treatment such as an offer, an advertisement or a recommendation is assigned. In this paper, we highlight regression discontinuity designs (RDD) as a low-cost alternative to obtaining causal estimates in settings where machine learning is used for behavioral targeting. Our investigation leads to several new insights. Under appropriate conditions, RDD recovers the local average treatment effect (LATE). Further, we show that RDD recovers the average treatment effect (ATE) when: (1) The score is orthogonal to the slope of the treatment and (2) When the selection threshold is equal to the mean value of the score. We also show that RDD can estimate the bounds on the ATE even if we are unable to get point estimates of the ATE. That RDD can estimate ATE or bounds on ATE is a novel perspective that has been understudied in the literature. We also distinguish between two types of scoring: Intercept versus slope based and highlight the practical value of RDD in each context. Finally, we apply RDD in an empirical context where a machine learning based score was used to select consumers for retargeted display advertising. We obtain LATE estimates of the impact of the retargeted advertising program on both online and offline purchases, and also estimate bounds on the ATE. Our LATE estimates and ATE bounds add to the understanding of the effectiveness of retargeting programs in particular on offline purchases which has received less attention.
    Date: 2021–10
  4. By: Mia Hoffmann; Laura Nurski
    Abstract: Artificial intelligence (AI) is considered a key driver of future economic development, expected to increase labour productivity and economic growth worldwide. To realise these gains, AI technologies need to be adopted by companies and integrated into their operations. However, it is unclear what the current level of AI adoption by European firms actually is. Estimates vary widely because of uneven data collection and lack of a standard definition and taxonomy...
    Date: 2021–11
  5. By: Harin, Alexander
    Abstract: A systematic introduction to sub-interval images (or SI-images or S-IIs) is presented here. General outlook of possible use of the SI-analysis for Big Data is given. Basic notions of S-IIs are formulated including cuboids of gravity and sub-interval copies of databases. Two concepts of SII-indexing are proposed for Big Data databases. The S-IIs can be used in, e.g., search, and recognition in databases in, e.g., accounting and audit, micro- and macroeconomics, especially in Big Data databases.
    Keywords: mathematic; databases; Big Data; utility theory; prospect theory; economics;
    JEL: C02 C1 D8
    Date: 2021–11–22
  6. By: Philippe Aghion (Harvard University [Cambridge]); Céline Antonin (OFCE - Observatoire français des conjonctures économiques - Sciences Po - Sciences Po); Simon Bunel
    Abstract: In this survey paper, we argue that the effects of artificial intelligence (AI) and automation on growth and employment depend to a large extent on institutions and policies. We develop a two‑fold analysis. In a first section, we survey the most recent literature to show that AI can spur growth by replacing labor by capital, both in the production of goods and services and in the production of ideas. Yet, we argue that AI may inhibit growth if combined with inappropriate competition policy. In a second section, we discuss the effect of robotization on employment in France over the 1994‑2014 period. Based on our empirical analysis on French data, we first show that robotization reduces aggregate employment at the employment zone level, and second that non‑educated workers are more negatively affected by robotization than educated workers. This finding suggests that inappropriate labor market and education policies reduce the positive impact that AI and automation could have on employment.
    Keywords: Artificial intelligence,Growth,Automation,Robots,Employment
    Date: 2019–12–18
  7. By: Rajka Hrbić (The Croatian National Bank, Croatia); Tomislav Grebenar (The Croatian National Bank, Croatia)
    Abstract: The main topic of this paper is to estimate the possibility and inclination of Croatian companies towards technology and inovation as well as to analize advantages, limitations and risks involved with this significant technological leap. In this paper, we analized 7.147 of Croatian business entities operating in different industries. Starting point in this research is to identify other subjects which could be users of I4.0 or its elements, based on the simmilarity of indicators with indicators of a sample of 58 identified I4.0 companies. We developed machine learning model by using eXtreme Gradient Boosting algoritm (XGBoost) for this purpose, an approach which has not been used in any similar reserches. This research shows that the main difference between I4.0 and traditional industry is mostly observable in significantly better businesess performance of investment indicators, cost efficiency, technical equipment and market competitivness. Riskiness of I4.0 companies is significantly lower than the riskiness of traditional ones. We identified 141 companies (1,97% of total analized sample) as potential users of I4.0, which make around 27% of total assets of the analised sample and around 26% of revenues.
    Keywords: Industry 4.0, eXtreme Gradient Boosting (XGBoost), artificial intelligence, robotics, high-tech companies, machine learning, impacts of I4.0 on bussines results
    JEL: C45 D22 D24 O14 O32 O33
    Date: 2021–03
  8. By: Jiangwei Liu; Xiaohong Huang
    Abstract: Research on crude oil price forecasting has attracted tremendous attention from scholars and policymakers due to its significant effect on the global economy. Besides supply and demand, crude oil prices are largely influenced by various factors, such as economic development, financial markets, conflicts, wars, and political events. Most previous research treats crude oil price forecasting as a time series or econometric variable prediction problem. Although recently there have been researches considering the effects of real-time news events, most of these works mainly use raw news headlines or topic models to extract text features without profoundly exploring the event information. In this study, a novel crude oil price forecasting framework, AGESL, is proposed to deal with this problem. In our approach, an open domain event extraction algorithm is utilized to extract underlying related events, and a text sentiment analysis algorithm is used to extract sentiment from massive news. Then a deep neural network integrating the news event features, sentimental features, and historical price features is built to predict future crude oil prices. Empirical experiments are performed on West Texas Intermediate (WTI) crude oil price data, and the results show that our approach obtains superior performance compared with several benchmark methods.
    Date: 2021–11
  9. By: Thierry Mayer; Hillel Rapoport; Camilo Umana Dajud
    Abstract: Many of the measures to contain Covid-19 severely reduced business travel. Using provisions to ease the movement of business visitors in trade agreements, we show that removing barriers to the movement of business people promotes trade. To do this, we first document the increasing complexity of Free Trade Agreements. We then develop an algorithm that combines machine learning and text analysis techniques to examine the content of FTAs. We use the algorithm to determine which FTAs include provisions to facilitate the movement of business people and whether those provisions are included in dispute settlement mechanisms. Using these data and accounting for the overall depth of FTAs, we show that provisions facilitating business travel indeed facilitate business travel (but not permanent migration) and, eventually, increase bilateral trade flows.
    Keywords: Covid-19;Business travel;Free Trade Agreements;Machine Learning;Text Analysis
    JEL: F10 F13 F14 F15 F20
    Date: 2021–12
  10. By: Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio; Leogrande, Domenico
    Abstract: We estimate the Landscape and Cultural Heritage among Italian regions in the period 2004-2019 using data from ISTAT-BES. We use Panel Data with Fixed Effects, Panel Data with Random Effects, Pooled OLS, WLS, Dynamic Panel. We found that the Landscape and Cultural Heritage is negatively associated with “Dissatisfaction with the landscape of the place of life”, “Illegal building”, “Density and relevance of the museum heritage”, “Internal material consumption”, “Erosion of the rural space due to abandonment”, “Availability of urban green”, and positively associated with “Pressure from mining activities”, “Erosion of the rural space by urban dispersion”, “Concern about the deterioration of the landscape”, “Diffusion of agritourism farms”, “Current expenditure of the Municipalities for culture”. Secondly, we have realized a cluster analysis with the k-Means algorithm optimized with the Silhouette Coefficient and we found two clusters in the sense of “Concern about the deterioration of the landscape”. Finally, we use eight different machine learning algorithms to predict the level of “Concern about the deterioration of the landscape” and we found that the Tree Ensemble Regression is the best predictor.
    Keywords: Environmental Economics; Valuation of Environmental Effects; Pollution Control Adoption and Costs; Sustainability; Government Policy.
    JEL: Q50 Q51 Q52 Q56 Q58
    Date: 2021–11–24
  11. By: Iván Barreda-Tarrazona (LEE and Department of Economics, Universitat Jaume I, Castellón, Spain); Ainhoa Jaramillo-Gutiérrez (LEE and Department of Economics, Universitat Jaume I, Castellón, Spain); Marina Pavan (LEE & Economics Department, Universitat Jaume I, Castellón-Spain); Gerardo Sabater-Grande (LEE and Department of Economics, Universitat Jaume I, Castellón, Spain)
    Abstract: We design a rich setting to study cooperation in the finitely Repeated Prisoners’ Dilemma (RPD), controlling for beliefs, emotions, and personal characteristics. In the baseline, the subjects play one-shot and repeated games with other human subjects. In the treatment, participants play against an artificial intelligence (AI) trained upon data from the previous “all human” sessions to mimic human decisions. We design the experiment so that our sessions are homogeneous in terms of gender composition, altruism, and reasoning ability. In all games, we elicit players’ beliefs regarding cooperation using an incentive compatible method. Besides, after each individual decision, we collect self-reported information on the main reason for it (rational or emotional). We find that expectations of partner cooperation at the beginning of each task are not significantly different between treatments. Despite this, we observe that initial human cooperation is actually much higher with other humans than with an AI. Cooperation continues to be higher in all periods of the RPD tasks: cooperation rates range between 60% and 80% in the baseline, while they range between 20% and 40% in the AI treatment. Last, decisions appear to be less emotion-driven in the AI treatment. Lack of empathy with, rather than fear of, the machine seems to be driving the results.
    Keywords: cooperation, prisoner’s dilemma, artificial intelligence, experiment
    JEL: C91 C73
    Date: 2021
  12. By: Kea BARET
    Abstract: This paper studies the side-effects of fiscal rules’ compliance on social welfare. It considers national Budget Balance Rules’ (BBR) compliance effects on macroeconomic indicators and social welfare proxy indicators in OECD countries between 2004 and 2015. Instead of fiscal rules strength or fiscal rules presence effectiveness, we focus on fiscal rules’ compliance to assess the impact of fiscal rules’ performance on social welfare. The paper shows that governments seem to operate a reallocation of their spending to ensure both BBR’s compliance and economic objectives. Nevertheless, governments choices regarding their public spending composition seem leading to an increase in social inequalities suggesting that governments finally face a trade-off between fiscal rules’ compliance and social objectives. The analysis constitutes the first use of a causal Machine Mearning approach, namely the Double/Debiased Machine Learning recently developed by Chernozhukov et al. [2018], applied to fiscal rules’ performance assessment issues. This method allows us to highlight the key determinants of national BBR’s compliance as well as assessing the compliance’s effect on different macroeconomic and social indicators. We take care of voter preferences by computing a new proxy variable through Latent Factor Analysis approach and show that voter preferences appear as a key determinant for BBR’s compliance, giving an empirical proof that Wyplosz [2012]’s bias may matter when assessing fiscal rules’ performance.
    Keywords: Fiscal rules’ compliance; Social Welfare; Fiscal Surveillance; Machine learning.
    JEL: E61 H11 H50 H61 H62
    Date: 2021
  13. By: Nestor Duch-Brown (European Commission - JRC); Fiammetta Rossetti (European Commission - JRC); Richard Haarburger (European Commission - JRC)
    Abstract: One of the objectives of the AI WATCH is to calculate the EU robotics market shares over the past ten years. To this end, this report, first, provides a review of the robotics industry, and looks at the official definitions of both industrial and services robots. Second, the report offers a review of the scientific and institutional literature looking at the economic impact of robotics. Third, it describes the different statistical data sources, identified through a comprehensive search, offering relevant information about the robotics industry. Fourth, it provides an initial overview of the European robotics market shares from the different data sources identified. The identification of existing robotics data sources will contribute to the construction of a methodology to assess the EU shares concerning adoption and production of robots. The main objective is to establish the basis for a suitable database that will allow tracing the evolution of EU shares in the global robotics market over the past ten years, ideally disentangling between industrial and service robots. This report sketches such methodology, while it also identifies the main data gaps and challenges to integrating the heterogeneous information from different data sources into a coherent database, in order to derive consistent estimates of the EU market share in robotics. Such methodology will have to account for data challenges (e.g., missing data, development of sound merging techniques) so that the EU trends of robotics can be assessed along the most important dimensions (i.e. demand vs supply, industrial vs service robots), and aiming to provide relevant information to the policies of the European Commission for Robotics and Artificial Intelligence.
    Keywords: Robots, robotics industry, industrial robots, service robots
    Date: 2021–11
  14. By: Xiao-Yang Liu; Hongyang Yang; Jiechao Gao; Christina Dan Wang
    Abstract: Deep reinforcement learning (DRL) has been envisioned to have a competitive edge in quantitative finance. However, there is a steep development curve for quantitative traders to obtain an agent that automatically positions to win in the market, namely \textit{to decide where to trade, at what price} and \textit{what quantity}, due to the error-prone programming and arduous debugging. In this paper, we present the first open-source framework \textit{FinRL} as a full pipeline to help quantitative traders overcome the steep learning curve. FinRL is featured with simplicity, applicability and extensibility under the key principles, \textit{full-stack framework, customization, reproducibility} and \textit{hands-on tutoring}. Embodied as a three-layer architecture with modular structures, FinRL implements fine-tuned state-of-the-art DRL algorithms and common reward functions, while alleviating the debugging workloads. Thus, we help users pipeline the strategy design at a high turnover rate. At multiple levels of time granularity, FinRL simulates various markets as training environments using historical data and live trading APIs. Being highly extensible, FinRL reserves a set of user-import interfaces and incorporates trading constraints such as market friction, market liquidity and investor's risk-aversion. Moreover, serving as practitioners' stepping stones, typical trading tasks are provided as step-by-step tutorials, e.g., stock trading, portfolio allocation, cryptocurrency trading, etc.
    Date: 2021–11
  15. By: Chao Zhang; Zihao Zhang; Mihai Cucuringu; Stefan Zohren
    Abstract: We propose a universal end-to-end framework for portfolio optimization where asset distributions are directly obtained. The designed framework circumvents the traditional forecasting step and avoids the estimation of the covariance matrix, lifting the bottleneck for generalizing to a large amount of instruments. Our framework has the flexibility of optimizing various objective functions including Sharpe ratio, mean-variance trade-off etc. Further, we allow for short selling and study several constraints attached to objective functions. In particular, we consider cardinality, maximum position for individual instrument and leverage. These constraints are formulated into objective functions by utilizing several neural layers and gradient ascent can be adopted for optimization. To ensure the robustness of our framework, we test our methods on two datasets. Firstly, we look at a synthetic dataset where we demonstrate that weights obtained from our end-to-end approach are better than classical predictive methods. Secondly, we apply our framework on a real-life dataset with historical observations of hundreds of instruments with a testing period of more than 20 years.
    Date: 2021–11
  16. By: Sébastien Marchand (CERDI - Centre d'Études et de Recherches sur le Développement International - CNRS - Centre National de la Recherche Scientifique - UCA - Université Clermont Auvergne); Damien Cubizol (CERDI - Centre d'Études et de Recherches sur le Développement International - CNRS - Centre National de la Recherche Scientifique - UCA - Université Clermont Auvergne); Elda Nasho Ah-Pine (CleRMa - Clermont Recherche Management - ESC Clermont-Ferrand - École Supérieure de Commerce (ESC) - Clermont-Ferrand - UCA - Université Clermont Auvergne); Huanxiu Guo (The Institute of Economics and Finance - Nanjing Audit University)
    Abstract: In 2013, the Chinese central government launched a war on air pollution. As a new and major source of information, the Internet plays an important role in diffusing environmental news emotion and shaping people's perceptions and emotions regarding the pollution. How could the government make use of the environmental news emotion as an informal regulation of pollution? The paper investigates the causal relationship between web news emotion (defined by the emotional tone of web news) and air pollution (SO2, NO2, PM2.5 and PM10) by exploiting the central government's war on air pollution. We combine daily monitoring data of air pollution at different levels (cities and counties, respectively the second and third administrative levels in China) with the GDELT database that allows us to have information on Chinese web news media (e.g. emotional tone of web news on air pollution). We find that a decrease of the emotional tone in web news (i.e. more negative emotions in the articles) can help to reduce air pollution at both city and county level. We attribute this effect to the context of China's war on air pollution in which the government makes use of the environmental news emotion as an informal regulation of pollution.
    Keywords: News emotion,Air pollution,Mass media,The internet,Government,China
    Date: 2021–11
  17. By: Hui Chen; Antoine Didisheim; Simon Scheidegger
    Abstract: We propose a novel structural estimation framework in which we train a surrogateof an economic model with deep neural networks. Our methodology alleviates the curse of dimensionality and speeds up the evaluation and parameter estimation by orders of magnitudes, which significantly enhances one's ability to conduct analyses that require frequent parameter re-estimation. As an empirical application, we compare two popular option pricing models (the Heston and the Bates model with double-exponential jumps)against a non-parametric random forest model. We document that: a) the Bates model produces better out-of-sample pricing on average, but both structural models fail to outperform random forest for large areas of the volatility surface; b) random forest is more competitive at short horizons (e.g., 1-day), for short-dated options (with less than 7 days to maturity), and on days with poor liquidity; c) both structural models outperform random forest in out-of-sample delta hedging; d) the Heston model's relative performance has deteriorated significantly after the 2008 financial crisis.
    Keywords: Deep Learning, Structural Estimation, Option Pricing, Parameter Stability
    JEL: C45 C52 C58 C61 G17
    Date: 2021–02
  18. By: Gianni La Cava (Reserve Bank of Australia)
    Abstract: Economists have long been interested in the effect of business sentiment on economic activity. Using text analysis, I construct a new company-level indicator of sentiment based on the net balance of positive and negative words in Australian company disclosures. I find that company-level investment is very sensitive to changes in this corporate sentiment indicator, even controlling for fundamentals, such as Tobin's Q and expected profits, as well as controlling for measures of company-level uncertainty. I explore the mechanisms that link investment to sentiment. The conditional relationship could be because sentiment proxies for private information held by managers about the future prospects of the company or because of animal spirits among managers relative to investors. I find that the effect of sentiment on investment is relatively persistent, which is consistent with the private information story, albeit less persistent than other news shocks, such as Tobin's Q. But the effect of sentiment on investment is not any stronger at 'opaque' companies in which managers are likely to be better informed than investors, which argues against the private information story. Corporate investment has been weak in Australia since the global financial crisis (GFC) and demand-side factors, such as lower sales growth, explain more than half of this persistent weakness. Low sentiment and heightened uncertainty weighed on investment during the GFC but have been less important factors since then.
    Keywords: investment; sentiment; text analysis; animal spirits; business cycle
    JEL: D22 D25 D84 D91 E22 E71
    Date: 2021–11
  19. By: Wolfgang Maennig (Chair for Economic Policy, University of Hamburg); Steffen Q. Mueller (Chair for Economic Policy, University of Hamburg)
    Abstract: We investigate the relationship between consumer discrimination, racial matching strategies, and employer discrimination in Major League Baseball (MLB) from 1985 to 2016. To this end, we assess the extent to which both fan attendance and team performance respond to changes in teams’ and their local market areas’ racial compositions. We innovate by using a significantly enhanced data basis with individual player data that we derive from combining web scraping and using facial recognition techniques to identify player race and using County-level Census data instead of Metropolitan Statistical Area data. We find that fans in both MLB Leagues developed a taste for racial diversity in the late 1980s; since the 2000s, discrimination starts to increase again. However, this discrimination is not fully rationalizing the performance gap across athletes of different race and ethnicity; employer discrimination is not primarily driven by fans’ racial preferences.
    Keywords: Consumer preferences, Discrimination, Race, Ethnicity, Facial recognition, Ticket sales
    JEL: C5 J1 Z2
    Date: 2021–12–07

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.