nep-big New Economics Papers
on Big Data
Issue of 2020‒11‒16
thirty papers chosen by
Tom Coupé
University of Canterbury

  1. Textual Information and IPO Underpricing: A Machine Learning Approach By Katsafados, Apostolos G.; Androutsopoulos, Ion; Chalkidis, Ilias; Fergadiotis, Manos; Leledakis, George N.; Pyrgiotakis, Emmanouil G.
  2. Machine Predictions and Human Decisions with Variation in Payoffs and Skills By Michael Allan Ribers; Hannes Ullrich
  3. Deep Learning for Individual Heterogeneity By Max H. Farrell; Tengyuan Liang; Sanjog Misra
  4. Financial Data Analysis Using Expert Bayesian Framework For Bankruptcy Prediction By Amir Mukeri; Habibullah Shaikh; Dr. D. P. Gaikwad
  5. Stock Price Prediction Using CNN and LSTM-Based Deep Learning Models By Sidra Mehtab; Jaydip Sen
  6. Transition to Industry 4.0 in the Visegrád Countries By Septimiu Szabo
  7. Machine learning in credit risk: measuring the dilemma between prediction and supervisory cost By Andrés Alonso; José Manuel Carbó
  8. The impact of data visualisation on the use of shopper insight in the marketing decisionmaking of small food producers. By Stefan Penczynski; Konrad Maliszewski; Andrew Fearne
  9. Distributional Aspects of Microcredit Expansions By Christiansen, T.; Weeks, M.
  10. Evaluating data augmentation for financial time series classification By Elizabeth Fons; Paula Dawson; Xiao-jun Zeng; John Keane; Alexandros Iosifidis
  11. News Media vs. FRED-MD for Macroeconomic Forecasting By Jon Ellingsen; Vegard H. Larsen; Leif Anders Thorsrud
  12. Top Lights: Bright cities and their contribution to economic development By Richard Bluhm; Melanie Krause
  13. Boosting Renewable Energy Technology Uptake in Ireland: A Machine Learning Approach By Sanghamitra Mukherjee
  14. The future of theory: should social protection board the big data train? By Waterschoot, Cedric
  15. Extracting Information from Different Expectations By Andrew B. Martinez
  16. Social Media and Newsroom Production Decisions By Julia Cagé; Nicolas Hervé; Béatrice Mazoyer
  17. Estimation of supply and demand of tertiary education places in advanced digital profiles in the EU: Focus on Artificial Intelligence, High Performance Computing, Cybersecurity and Data Science By Alvaro Gomez Losada; Montserrat Lopez-Cobo; Sofia Samoili; Georgios Alaveras; Miguel Vazquez-Prada Baillet; Melisande Cardona; Riccardo Righi; Lukasz Ziemba; Giuditta De-Prato
  18. Comparison of ARIMA, ETS, NNAR and hybrid models to forecast the second wave of COVID-19 hospitalizations in Italy By Perone, G.
  19. A Predictive and Prescriptive Analytics Framework for Efficient E-Commerce Order Delivery By Kandula, Shanthan; Krishnamoorthy, Srikumar; Roy, Debjit
  20. Learning from Forecast Errors: A New Approach to Forecast Combination By Tae-Hwy Lee; Ekaterina Seregina
  21. Macroeconomic expectations: news sentiment analysis By Nataliia Ostapenko
  22. A deep neural network algorithm for semilinear elliptic PDEs with applications in insurance mathematics By Stefan Kremsner; Alexander Steinicke; Michaela Sz\"olgyenyi
  23. A Narrative Approach to Creating Instruments with Unstructured and Voluminous Text: An Application to Policy Uncertainty By Michael Ryan
  24. Fear and Volatility in Digital Assets By Faizaan Pervaiz; Christopher Goh; Ashley Pennington; Samuel Holt; James West; Shaun Ng
  25. Event-Driven Learning of Systematic Behaviours in Stock Markets By Xianchao Wu
  26. Deep reinforced learning enables solving discrete-choice life cycle models to analyze social security reforms By Antti J. Tanskanen
  27. Expanding the Measurement of Culture with a Sample of Two Billion Humans By Obradovich, Nick; Özak, Ömer; Martín, Ignacio; Ortuño-Ortín, Ignacio; Awad, Edmond; Cebrián, Manuel; Cuevas, Rubén; Desmet, Klaus; Rahwan, Iyad; Cuevas, Ángel
  28. Discrete-time portfolio optimization under maximum drawdown constraint with partial information and deep learning resolution By Carmine De Franco; Johann Nicolle; Huy\^en Pham
  29. Tracking Inflation on a Daily Basis By Alvarez, Santiago E.; Lein, Sarah M.
  30. Communicating corporate LGBTQ advocacy: A computational comparison of the global CSR discourse By Zhou, Alvin

  1. By: Katsafados, Apostolos G.; Androutsopoulos, Ion; Chalkidis, Ilias; Fergadiotis, Manos; Leledakis, George N.; Pyrgiotakis, Emmanouil G.
    Abstract: This study examines the predictive power of textual information from S-1 filings in explaining IPO underpricing. Our empirical approach differs from previous research, as we utilize several machine learning algorithms to predict whether an IPO will be underpriced, or not. We analyze a large sample of 2,481 U.S. IPOs from 1997 to 2016, and we find that textual information can effectively complement traditional financial variables in terms of prediction accuracy. In fact, models that use both textual data and financial variables as inputs have superior performance compared to models using a single type of input. We attribute our findings to the fact that textual information can reduce the ex-ante valuation uncertainty of IPO firms, thus leading to more accurate estimates.
    Keywords: Initial public offerings; First-day returns; Machine learning; Natural language processing
    JEL: G02 G14 G30 G32
    Date: 2020–10–27
  2. By: Michael Allan Ribers; Hannes Ullrich
    Abstract: Human decision-making differs due to variation in both incentives and available information. This generates substantial challenges for the evaluation of whether and how machine learning predictions can improve decision outcomes. We propose a framework that incorporates machine learning on large-scale administrative data into a choice model featuring heterogeneity in decision maker payoff functions and predictive skill. We apply our framework to the major health policy problem of improving the efficiency in antibiotic prescribing in primary care, one of the leading causes of antibiotic resistance. Our analysis reveals large variation in physicians’ skill to diagnose bacterial infections and in how physicians trade off the externality inherent in antibiotic use against its curative benefit. Counterfactual policy simulations show the combination of machine learning predictions with physician diagnostic skill achieves a 25.4 percent reduction in prescribing.
    Keywords: Prediction policy, expert decision-making, machine learning, antibiotic prescribing
    JEL: C10 C55 I11 I18 Q28
    Date: 2020
  3. By: Max H. Farrell; Tengyuan Liang; Sanjog Misra
    Abstract: We propose a methodology for effectively modeling individual heterogeneity using deep learning while still retaining the interpretability and economic discipline of classical models. We pair a transparent, interpretable modeling structure with rich data environments and machine learning methods to estimate heterogeneous parameters based on potentially high dimensional or complex observable characteristics. Our framework is widely-applicable, covering numerous settings of economic interest. We recover, as special cases, well-known examples such as average treatment effects and parametric components of partially linear models. However, we also seamlessly deliver new results for diverse examples such as price elasticities, willingness-to-pay, and surplus measures in choice models, average marginal and partial effects of continuous treatment variables, fractional outcome models, count data, heterogeneous production function components, and more. Deep neural networks are well-suited to structured modeling of heterogeneity: we show how the network architecture can be designed to match the global structure of the economic model, giving novel methodology for deep learning as well as, more formally, improved rates of convergence. Our results on deep learning have consequences for other structured modeling environments and applications, such as for additive models. Our inference results are based on an influence function we derive, which we show to be flexible enough to to encompass all settings with a single, unified calculation, removing any requirement for case-by-case derivations. The usefulness of the methodology in economics is shown in two empirical applications: the response of 410(k) participation rates to firm matching and the impact of prices on subscription choices for an online service. Extensions to instrumental variables and multinomial choices are shown.
    Date: 2020–10
  4. By: Amir Mukeri; Habibullah Shaikh; Dr. D. P. Gaikwad
    Abstract: In recent years, bankruptcy forecasting has gained lot of attention from researchers as well as practitioners in the field of financial risk management. For bankruptcy prediction, various approaches proposed in the past and currently in practice relies on accounting ratios and using statistical modeling or machine learning methods.These models have had varying degrees of successes. Models such as Linear Discriminant Analysis or Artificial Neural Network employ discriminative classification techniques. They lack explicit provision to include prior expert knowledge. In this paper, we propose another route of generative modeling using Expert Bayesian framework. The biggest advantage of the proposed framework is an explicit inclusion of expert judgment in the modeling process. Also the proposed methodology provides a way to quantify uncertainty in prediction. As a result the model built using Bayesian framework is highly flexible, interpretable and intuitive in nature. The proposed approach is well suited for highly regulated or safety critical applications such as in finance or in medical diagnosis. In such cases accuracy in the prediction is not the only concern for decision makers. Decision makers and other stakeholders are also interested in uncertainty in the prediction as well as interpretability of the model. We empirically demonstrate these benefits of proposed framework using Stan, a probabilistic programming language. We found that the proposed model is either comparable or superior to other existing methods. Also resulting model has much less False Positive Rate compared to many existing state of the art methods. The corresponding R code for the experiments is available at Github repository.
    Date: 2020–10
  5. By: Sidra Mehtab; Jaydip Sen
    Abstract: Designing robust and accurate predictive models for stock price prediction has been an active area of research for a long time. While on one side, the supporters of the efficient market hypothesis claim that it is impossible to forecast stock prices accurately, many researchers believe otherwise. There exist propositions in the literature that have demonstrated that if properly designed and optimized, predictive models can very accurately and reliably predict future values of stock prices. This paper presents a suite of deep learning based models for stock price prediction. We use the historical records of the NIFTY 50 index listed in the National Stock Exchange of India, during the period from December 29, 2008 to July 31, 2020, for training and testing the models. Our proposition includes two regression models built on convolutional neural networks and three long and short term memory network based predictive models. To forecast the open values of the NIFTY 50 index records, we adopted a multi step prediction technique with walk forward validation. In this approach, the open values of the NIFTY 50 index are predicted on a time horizon of one week, and once a week is over, the actual index values are included in the training set before the model is trained again, and the forecasts for the next week are made. We present detailed results on the forecasting accuracies for all our proposed models. The results show that while all the models are very accurate in forecasting the NIFTY 50 open values, the univariate encoder decoder convolutional LSTM with the previous two weeks data as the input is the most accurate model. On the other hand, a univariate CNN model with previous one week data as the input is found to be the fastest model in terms of its execution speed.
    Date: 2020–10
  6. By: Septimiu Szabo
    Abstract: The brief provides an analysis on the progress made by the Visegrád Four countries in their transition to Industry 4.0, a concept encompassing new digital technologies like automation and robotisation, 3D printing, machine learning or artificial intelligence. As their share of manufacturing to GDP has been historically high, the economies and the workforce of these four countries are likely to be impacted by the technological disruption expected to take place in the coming decades. The automotive industry, one of the trademarks of the Visegrád economy, is the most advanced in the transition, having started to replace some of the more predictable manual and routine labour tasks with industrial robots. Other dimensions of the Industry 4.0 are less advanced and need particular attention. Most domestic firms lag behind in the integration of new technologies and do not have a clear vision in terms of their digital transformation. The level of advanced digital skills among the workforce is also rather low and the prevalence of digital public services remains limited. Also in the light of the economic recovery following the COVID-19 pandemic, various policies can support the digital transition, including by investing in cross-cutting technologies, facilitating access to risk financing to innovative firms and promoting entrepreneurship.
    Keywords: Industry 4.0, automation, robotisation, Visegrad, technological transition, innovation, Szabo.
    JEL: J2 L6 O3 O4
    Date: 2020–06
  7. By: Andrés Alonso (Banco de España); José Manuel Carbó (Banco de España)
    Abstract: New reports show that the financial sector is increasingly adopting machine learning (ML) tools to manage credit risk. In this environment, supervisors face the challenge of allowing credit institutions to benefit from technological progress and financial innovation, while at the same ensuring compatibility with regulatory requirements and that technological neutrality is observed. We propose a new framework for supervisors to measure the costs and benefits of evaluating ML models, aiming to shed more light on this technology’s alignment with the regulation. We follow three steps. First, we identify the benefits by reviewing the literature. We observe that ML delivers predictive gains of up to 20?% in default classification compared with traditional statistical models. Second, we use the process for validating internal ratings-based (IRB) systems for regulatory capital to detect ML’s limitations in credit risk mangement. We identify up to 13 factors that might constitute a supervisory cost. Finally, we propose a methodology for evaluating these costs. For illustrative purposes, we compute the benefits by estimating the predictive gains of six ML models using a public database on credit default. We then calculate a supervisory cost function through a scorecard in which we assign weights to each factor for each ML model, based on how the model is used by the financial institution and the supervisor’s risk tolerance. From a supervisory standpoint, having a structured methodology for assessing ML models could increase transparency and remove an obstacle to innovation in the financial industry.
    Keywords: artificial intelligence, machine learning, credit risk, interpretability, bias, IRB models
    JEL: C53 D81 G17
    Date: 2020–10
  8. By: Stefan Penczynski (School of Economics and CBESS, University of East Anglia, Norwich.); Konrad Maliszewski (Norwich Business School and CBESS, University of East Anglia, Norwich.); Andrew Fearne (Norwich Business School, University of East Anglia, Norwich.)
    Abstract: Recent advances in machine learning and the availability of big data have made the study of Business Intelligence Systems (BIS) increasingly relevant. BI, which includes processes and methods for improving decision making with the use of fact-based support systems, is reported to be widely used across sectors and different businesses functions. However, most of the research effort centres around the question of how BIS can deliver value to an organisation (Trieu, 2017). Since one of the key determinants of organisational impacts is the actual use of the system, many studies investigate the factors that facilitate more effective use of the information provided. The data presentation format is a promising areas of research (Kelton, Pennington and Tuttle, 2010; Luo, 2019). However, studies conducted thus far tend to rely on laboratory experiments with students and ignore objective usage data, thus reducing the validity and the reliability of the findings (see e.g. Luo, 2019). What is more, the empirical research studies predominantly examine BIS use amongst large businesses neglecting the specific context of SMEs (Arnott, Lizama and Song, 2017; PopoviÄ , Puklavec and Oliveira, 2019). In this research we conduct an online experiment to study the impact of experience on the previously identified relationships between information presentation format, task characteristics and individual differences.
    Keywords: Data visualisation, Business Intelligence Systems (BIS), Improving decision making, Small and Medium Enterprises (SME)
    Date: 2020–11
  9. By: Christiansen, T.; Weeks, M.
    Abstract: Various poverty reduction strategies are being implemented in the pursuit of eliminating extreme poverty. One such strategy is increased access to microcredit in poor areas around the world. Microcredit, typically defined as the supply of small loans to underserved entrepreneurs that originally aimed at displacing expensive local money-lenders, has been both praised and criticized as a development tool Banerjee et al. (2015c). This paper presents an analysis of heterogeneous impacts from increased access to microcredit using data from three randomised trials. In the spirit of recognising that in general the impact of a policy intervention varies conditional on an unknown set of factors, particular, we investigate whether heterogeneity presents itself as groups of winners and losers, and whether such subgroups share characteristics across RCTs. We find no evidence of impacts, neither average nor distributional, from increased access to microcredit on consumption levels. In contrast, the lack of average effects on profits seems to mask heterogeneous impacts. The findings are, however, not robust to the specific machine learning algorithm applied. Switching from the better performing Elastic Net to the worse performing Random Forest leads to a sharp increase in the variance of the estimates. In this context, methods to evaluate the relative performing machine learning algorithm developed by Chernozhukov et al. (2019) provide a disciplined way for the analyst to counter the uncertainty as to which algorithm to deploy.
    Keywords: Machine learning methods, microcredit, development policy, treatment effects, random forest, elastic net
    JEL: D14 G21 I38 O12 O16 P36
    Date: 2020–11–03
  10. By: Elizabeth Fons; Paula Dawson; Xiao-jun Zeng; John Keane; Alexandros Iosifidis
    Abstract: Data augmentation methods in combination with deep neural networks have been used extensively in computer vision on classification tasks, achieving great success; however, their use in time series classification is still at an early stage. This is even more so in the field of financial prediction, where data tends to be small, noisy and non-stationary. In this paper we evaluate several augmentation methods applied to stocks datasets using two state-of-the-art deep learning models. The results show that several augmentation methods significantly improve financial performance when used in combination with a trading strategy. For a relatively small dataset ($\approx30K$ samples), augmentation methods achieve up to $400\%$ improvement in risk adjusted return performance; for a larger stock dataset ($\approx300K$ samples), results show up to $40\%$ improvement.
    Date: 2020–10
  11. By: Jon Ellingsen; Vegard H. Larsen; Leif Anders Thorsrud
    Abstract: Using a unique dataset of 22.5 million news articles from the Dow Jones Newswires Archive, we perform an in depth real-time out-of-sample forecasting comparison study with one of the most widely used data sets in the newer forecasting literature, namely the FRED-MD dataset. Focusing on U.S. GDP, consumption and investment growth, our results suggest that the news data contains information not captured by the hard economic indicators, and that the news-based data are particularly informative for forecasting consumption developments.
    Keywords: forecasting, real-time, machine learning, news, text data
    JEL: C53 C55 E27 E37
    Date: 2020
  12. By: Richard Bluhm (SoDa Laboratories, Monash University); Melanie Krause (SoDa Laboratories, Monash University)
    Abstract: Tracking the development of cities in emerging economies is difficult with conventional data. Even the commonly-used satellite images of nighttime light intensity fail to capture the true brightness of larger cities. This paper shows that nighttime lights can be used as a reliable proxy for economic activity at the city level, provided they are first corrected for top-coding. We present a stylized model of urban luminosity and empirical evidence which both suggest that these ‘top lights’ can be characterized by a Pareto distribution. We then propose a correction procedure which recovers the full distribution of city lights. Our results show that the brightest cities account for nearly a third of global economic activity. Applying this approach to cities in Sub-Saharan Africa, we find that primate cities are outgrowing secondary cities but are changing from within. Poorer neighborhoods are developing and sub-centers are emerging, with the side effect that Africa’s cities are also becoming increasingly fragmented.
    Keywords: Development, urban growth, night lights, top-coding, inequality
    JEL: O10 O18 R11 R12
    Date: 2020–11
  13. By: Sanghamitra Mukherjee
    Abstract: This study explores the impact of socio-demographic, behavioural, and built-environment characteristics on residential renewable energy technology adoption. It provides new insights on factors influencing uptake using nearest neighbour and random forest machine learning models at a granular spatial scale. Being computationally inexpensive and having good classification performance, these models serve as useful baseline prediction tools. Data is sourced from an Irish survey of consumer perceptions of three key technologies – electric vehicles, solar photovoltaic panels, and heat pumps – and general attitudes towards sustainability, innovation, risk, and time. We demonstrate that utility bills, residence period, attitudes to sustainability, satisfaction with household heating, and perceptions of hassle have the biggest influence on current uptake. Urban areas, typically having better access to information and resources, are likely to see the biggest uptake first. Additionally, compatibility of household infrastructure, technical interest, and social approval are the most important predictors of potential uptake. These results may inform policy in other early adopter markets as well. Overall, policy makers must be cognisant of the stage of adoption their country is currently at. Accordingly, a holistic approach to tackling low adoption must include measures that not only enhance adoption capabilities via rebates and financial measures, but also support the opportunity and intent to purchase such technologies.
    Keywords: Renewable energy technology adoption; Consumer behaviour; Machine learning; Heat pumps; Solar PVs; Electric vehicles
    JEL: D1 D9 O3 Q4
    Date: 2020–09
  14. By: Waterschoot, Cedric
    Abstract: Applications of big data have been surging as of late, and the field of public policy does not stand on the sideline while this dramatic wave of new technologies makes its way across the disciplines. However, theory-driven fields may experience radical change, as data fundamentalists claim the end of theory will come due to the nature and practicality of big data. In this paper, the position of social protection is examined with regard to the effects of the already observed shift towards such computational methods. I argue that such dramatic end of theory will not come for social protection policy, as the specialists and theorists take up the role of interpreter of data, performing the needed task of translating the vast collection of information into a useable collection or result. Vital in this position is the contact with the political economy, a task impossible to result in fruitful outcome without the interpreter. To strengthen this position in regard to social protection and big data, two examples are outlined: ‘Citizen Based Analytics’ in New-Zealand and the Big Data Quality Task Team of the UNECE.
    Date: 2020–10–19
  15. By: Andrew B. Martinez (Office of Macroeconomic Analysis, US Department of the Treasury)
    Abstract: Long-term expectations are believed to play a crucial role in driving future inflation and guiding monetary policy responses. However, expectations are not directly observed and the available measures can present a wide range of results. To understand what drives these differences, we examine the evolution of alternative consumer price inflation expectations in the United States between 2003-2019. We show that inflation forecasts can be improved by incorporating the differential between survey and market-based measures of expectations. Next, we decompose and extract the differentials in rigidity and information between measures of expectations. While both information and rigidities play a role, the information differential is more important. Using machine learning methods, we find that up to half of the information differential is explained by real-time changes in measures of liquidity. This also explains some past forecast improvements and helps predict the divergence in long-term inflation expectations in 2020.
    Date: 2020–10
  16. By: Julia Cagé (Sciences Po Paris, Department of Economics, 28 rue des Saints Pères, 75007 Paris, France, and CEPR (London)); Nicolas Hervé (Institut National de l'Audiovisuel, 28 avenue des Frères Lumière, 94366 Bry-sur-Marne, France); Béatrice Mazoyer (CentraleSupélec, Université Paris-Saclay, 91190 Gif-sur-Yvette, France, and Institut National de l'Audiovisuel, 28 avenue des Frères Lumière, 94366 Bry-sur-Marne, France)
    Abstract: Social media affects not only the way we consume news, but also the way news is produced, including by traditional media outlets. In this paper, we study the propagation of information from social media to mainstream media, and investigate whether news editors are influenced in their editorial decisions by stories popularity on social media. To do so, we build a novel dataset including a representative sample of all tweets produced in French between July 2018 and July 2019 (1.8 billion tweets, around 70% of all tweets in French during the period) and the content published online by about 200 mainstream media during the same time period, and develop novel algorithms to identify and link events on social and mainstream media. To isolate the causal impact of popularity, we rely on the structure of the Twitter network and propose a new instrument based on the interaction between measures of user centrality and news pressure at the time of the event. We show that story popularity has a positive effect on media coverage, and that this effect varies depending on media outlets’ characteristics. These findings shed a new light on our understanding of how editors decide on the coverage for stories, and question the welfare effects of social media.
    Keywords: Internet, Information spreading, Network analysis, Social media, Twitter, Text analysis
    JEL: C31 D85 L14 L15 L82 L86
    Date: 2020–10
  17. By: Alvaro Gomez Losada (European Commission - JRC); Montserrat Lopez-Cobo (European Commission - JRC); Sofia Samoili (European Commission - JRC); Georgios Alaveras (European Commission - JRC); Miguel Vazquez-Prada Baillet (European Commission - JRC); Melisande Cardona (European Commission - JRC); Riccardo Righi (European Commission - JRC); Lukasz Ziemba (European Commission - JRC); Giuditta De-Prato (European Commission - JRC)
    Abstract: In order to investigate the extent to which the education offer of advanced digital skills in Europe matches labour market needs, this study estimates the supply and demand of university places for studies covering the technological domains of Artificial Intelligence (AI), High Performance Computing (HPC), Cybersecurity (CS) and Data Science (DS), in the EU27, United Kingdom and Norway. The difference between demand and supply of tertiary education places (Bachelor and Master or equivalent level) in the mentioned technological domains is referred in this report as unmet students' demand of places, or unmet demand. Demanded places, available places and unmet demand are estimated for the following dimensions: (a) the tertiary education level in which this demand is observed: Bachelor and Master or equivalent programmes; (b) the programme’s scope, or depth with which education programmes address the technological domain: broad and specialised; and (c) the main fields of education where this tuition is offered: Business Administration and Law; Natural sciences and Mathematics; Information and Communication Technology (ICT); and Engineering, Manufacturing and Construction, with the remaining fields grouped together in a fifth category. From these estimations, it is concluded that the number of available places in the EU27, at Bachelor level, reaches 587,000 for studies with AI content, 106,000 places offered in HPC, 307,000 places in CS and 444,000 places offered in the domain of DS. At Master level this demand is comparatively lower, except for the DS domain, were it equals the offer at bachelor level. DS outnumbers AI in demand of places at Master level, with 602,000 and 535,000 demanded places, respectively. The unmet demand for AI, HPC, CS and DS in EU27 at MSc level is approximately 150,000, 33,000, 59,000 and 167,000 places, respectively. At BSc level, the unmet demand reaches 273,000, 53,000, 159,000 and 213,000 places, respectively. Another finding is that the unmet demand for broad academic programmes is higher than for specialised programmes of all technological domains and education levels (Bachelor and Master). Higher availability of places for AI, HPC, CS and DS domains is found for academic programmes taught in the ICT field of education, both at Bachelor and Master levels. For Bachelor studies, Germany and Finland are estimated as the countries with the highest unmet demand in AI, HPC, CS and DS, either with a broad or specialised scope. United Kingdom is the only studied country offering places for all fields of education and technological domains at Bachelor level and Master level. For Master studies, this is also found in Germany, Ireland, France and Portugal.
    Keywords: digital skills, higher education, education supply, education demand, artificial Intelligence, high-performance computing, cybersecurity, data science, digital transformation
    Date: 2020–09
  18. By: Perone, G.
    Abstract: Coronavirus disease (COVID-19) is a severe ongoing novel pandemic that has emerged in Wuhan, China, in December 2019. As of October 13, the outbreak has spread rapidly across the world, affecting over 38 million people, and causing over 1 million deaths. In this article, I analysed several time series forecasting methods to predict the spread of COVID-19 second wave in Italy, over the period after October 13, 2020. I used an autoregressive model (ARIMA), an exponential smoothing state space model (ETS), a neural network autoregression model (NNAR), and the following hybrid combinations of them: ARIMA-ETS, ARIMA-NNAR, ETS-NNAR, and ARIMA-ETS-NNAR. About the data, I forecasted the number of patients hospitalized with mild symptoms, and in intensive care units (ICU). The data refer to the period February 21, 2020– October 13, 2020 and are extracted from the website of the Italian Ministry of Health ( The results show that i) the hybrid models, except for ARIMA-ETS, are better at capturing the linear and non-linear epidemic patterns, by outperforming the respective single models; and ii) the number of COVID-19-related hospitalized with mild symptoms and in ICU will rapidly increase in the next weeks, by reaching the peak in about 50-60 days, i.e. in mid-December 2020, at least. To tackle the upcoming COVID-19 second wave it is necessary to enhance social distancing, hire healthcare workers and implement sufficient hospital facilities, protective equipment, and ordinary and intensive care beds.
    Keywords: COVID-19; outbreak; second wave; Italy; hybrid forecasting models; ARIMA; ETS; NNAR.
    JEL: C22 C53 I18
    Date: 2020–11
  19. By: Kandula, Shanthan; Krishnamoorthy, Srikumar; Roy, Debjit
    Abstract: Achieving timely last-mile order delivery is often the most challenging part of an e-commerce order fulfillment. Effective management of last-mile operations can result in significant cost savings and lead to increased customer satisfaction. Currently, due to the lack of customer availability information, the schedules followed by delivery agents are optimized for the shortest tour distance. Therefore, orders are not delivered in customer-preferred time periods resulting in missed deliveries. Missed deliveries are undesirable since they incur additional costs. In this paper, we propose a decision support framework that is intended to improve delivery success rates while reducing delivery costs. Our framework generates delivery schedules by predicting the appropriate delivery time periods for order delivery. More specifically, the proposed framework works in two stages. In the first stage, order delivery success for every order throughout the delivery shift is predicted using machine learning models. The predictions are used as an input for the optimization scheme, which generates delivery schedules in the second stage. The proposed framework is evaluated on two real-world datasets collected from a large e-commerce platform. The results indicate the effectiveness of the decision support framework in enabling savings of up to 10.6% in delivery costs when compared to the current industry practice.
    Date: 2020–11–05
  20. By: Tae-Hwy Lee (Department of Economics, University of California Riverside); Ekaterina Seregina (University of California Riverside)
    Abstract: This paper studies forecast combination (as an expert system) using the precision matrix estimation of forecast errors when the latter admit the approximate factor model. This approach incorporates the facts that experts often use common sets of information and hence they tend to make common mistakes. This premise is evidenced in many empirical results. For example, the European Central Bank's Survey of Professional Forecasters on Euro-area real GDP growth demonstrates that the professional forecasters tend to jointly understate or overstate GDP growth. Motivated by this stylized fact, we develop a novel framework which exploits the factor structure of forecast errors and the sparsity in the precision matrix of the idiosyncratic components of the forecast errors. The proposed algorithm is called Factor Graphical Model (FGM). Our approach overcomes the challenge of obtaining the forecasts that contain unique information, which was shown to be necessary to achieve a "winning" forecast combination. In simulation, we demonstrate the merits of the FGM in comparison with the equal-weighted forecasts and the standard graphical methods in the literature. An empirical application to forecasting macroeconomic time series in big data environment highlights the advantage of the FGM approach in comparison with the existing methods of forecast combination.
    Keywords: High-dimensionality, Graphical Lasso, Approximate Factor Model, Nodewise Regression, Precision Matrix
    JEL: C13 C38 C55
    Date: 2020–09
  21. By: Nataliia Ostapenko
    Abstract: I investigate the role that news sentiment plays in the macroeconomy. Using an approach that combines Doc2Vec embedding and Latent Dirichlet Allocation with lexical-based models I show that the news the media choose to report and the tone of these reports contain impor- tant information for household unemployment, interest rates, and in ation expectations. Topic time series derived from the news and the sentiments they express are employed to estimate how the news a ects the macroeconomy.
    Keywords: expectations, sentiment, news, Latent Dirichlet Allocation (LDA), Doc2Vec
    JEL: E52 E31 E00
    Date: 2020–08–13
  22. By: Stefan Kremsner; Alexander Steinicke; Michaela Sz\"olgyenyi
    Abstract: In insurance mathematics optimal control problems over an infinite time horizon arise when computing risk measures. Their solutions correspond to solutions of deterministic semilinear (degenerate) elliptic partial differential equations. In this paper we propose a deep neural network algorithm for solving such partial differential equations in high dimensions. The algorithm is based on the correspondence of elliptic partial differential equations to backward stochastic differential equations with random terminal time.
    Date: 2020–10
  23. By: Michael Ryan (University of Waikato)
    Abstract: We quantify the effects of policy uncertainty on the economy using a proxy structural vector autoregression (SVAR). Our instrument in the proxy SVAR is a set of exogenous uncertainty events constructed using a text-based narrative approach. Usually the narrative approach involves manually reading texts, which is difficult in our application as our text—the parliamentary record—is unstructured and lengthy. To deal with such circumstances, we develop a procedure using a natural language technique, latent Dirichlet analysis. Our procedure extends the possible application of the narrative identification approach. We find the effects of policy uncertainty are significant, and are underestimated using alternative identification methods.
    Keywords: Latent Dirichlet allocation; narrative identification; policy uncertainty; Proxy SVAR
    JEL: C32 C36 C63 D80 E32 L50
    Date: 2020–11–03
  24. By: Faizaan Pervaiz; Christopher Goh; Ashley Pennington; Samuel Holt; James West; Shaun Ng
    Abstract: We show Bitcoin implied volatility on a 5 minute time horizon is modestly predictable from price, volatility momentum and alternative data including sentiment and engagement. Lagged Bitcoin index price and volatility movements contribute to the model alongside Google Trends with markets responding often several hours later. The code and datasets used in this paper can be found at r.
    Date: 2020–10
  25. By: Xianchao Wu
    Abstract: It is reported that financial news, especially financial events expressed in news, provide information to investors' long/short decisions and influence the movements of stock markets. Motivated by this, we leverage financial event streams to train a classification neural network that detects latent event-stock linkages and stock markets' systematic behaviours in the U.S. stock market. Our proposed pipeline includes (1) a combined event extraction method that utilizes Open Information Extraction and neural co-reference resolution, (2) a BERT/ALBERT enhanced representation of events, and (3) an extended hierarchical attention network that includes attentions on event, news and temporal levels. Our pipeline achieves significantly better accuracies and higher simulated annualized returns than state-of-the-art models when being applied to predicting Standard\&Poor 500, Dow Jones, Nasdaq indices and 10 individual stocks.
    Date: 2020–10
  26. By: Antti J. Tanskanen
    Abstract: Discrete-choice life cycle models can be used to, e.g., estimate how social security reforms change employment rate. Optimal employment choices during the life course of an individual can be solved in the framework of life cycle models. This enables estimating how a social security reform influences employment rate. Mostly, life cycle models have been solved with dynamic programming, which is not feasible when the state space is large, as often is the case in a realistic life cycle model. Solving such life cycle models requires the use of approximate methods, such as reinforced learning algorithms. We compare how well a deep reinforced learning algorithm ACKTR and dynamic programming solve a relatively simple life cycle model. We find that the average utility is almost the same in both algorithms, however, the details of the best policies found with different algorithms differ to a degree. In the baseline model representing the current Finnish social security scheme, we find that reinforced learning yields essentially as good results as dynamics programming. We then analyze a straight-forward social security reform and find that the employment changes due to the reform are almost the same. Our results suggest that reinforced learning algorithms are of significant value in analyzing complex life cycle models.
    Date: 2020–10
  27. By: Obradovich, Nick; Özak, Ömer; Martín, Ignacio; Ortuño-Ortín, Ignacio; Awad, Edmond; Cebrián, Manuel; Cuevas, Rubén; Desmet, Klaus; Rahwan, Iyad; Cuevas, Ángel
    Abstract: Culture has played a pivotal role in human evolution. Yet, the ability of social scientists to study culture is limited by the currently available measurement instruments. Scholars of culture must regularly choose between scalable but sparse survey-based methods or restricted but rich ethnographic methods. Here, we demonstrate that massive online social networks can advance the study of human culture by providing quantitative, scalable, and high-resolution measurement of behaviorally revealed cultural values and preferences. We employ publicly available data across nearly 60,000 topic dimensions drawn from two billion Facebook users across 225 countries and territories. We first validate that cultural distances calculated from this measurement instrument correspond to traditional survey-based and objective measures of cross-national cultural differences. We then demonstrate that this expanded measure enables rich insight into the cultural landscape globally at previously impossible resolution. We analyze the importance of national borders in shaping culture, explore unique cultural markers that identify subnational population groups, and compare subnational divisiveness to gender divisiveness across countries. The global collection of massive data on human behavior provides a high-dimensional complement to traditional cultural metrics. Further, the granularity of the measure presents enormous promise to advance scholars' understanding of additional fundamental questions in the social sciences. The measure enables detailed investigation into the geopolitical stability of countries, social cleavages within both small and large-scale human groups, the integration of migrant populations, and the disaffection of certain population groups from the political process, among myriad other potential future applications.
    Keywords: Culture,Cultural Distance,Identity,Regional Culture,Gender Differences
    JEL: C80 J10 J16 O10 R10 Z10
    Date: 2020
  28. By: Carmine De Franco; Johann Nicolle; Huy\^en Pham
    Abstract: We study a discrete-time portfolio selection problem with partial information and maxi\-mum drawdown constraint. Drift uncertainty in the multidimensional framework is modeled by a prior probability distribution. In this Bayesian framework, we derive the dynamic programming equation using an appropriate change of measure, and obtain semi-explicit results in the Gaussian case. The latter case, with a CRRA utility function is completely solved numerically using recent deep learning techniques for stochastic optimal control problems. We emphasize the informative value of the learning strategy versus the non-learning one by providing empirical performance and sensitivity analysis with respect to the uncertainty of the drift. Furthermore, we show numerical evidence of the close relationship between the non-learning strategy and a no short-sale constrained Merton problem, by illustrating the convergence of the former towards the latter as the maximum drawdown constraint vanishes.
    Date: 2020–10
  29. By: Alvarez, Santiago E.; Lein, Sarah M. (University of Basel)
    Abstract: Using online data for prices and real-time debit card transaction data on changes in expenditures for Switzerland allows us to track inflation on a daily basis. While the daily price index fluctuates around the official price index in normal times, it drops immediately after the lockdown related to the COVID19 pandemic. Official statistics reflect this drop only with a lag, specifically because data collection takes time and is impeded by lockdown conditions. Such daily real-time information can be useful to gauge the relative importance of demand and supply shocks and thus inform policymakers who need to determine appropriate policy measures.
    Keywords: Daily price index, scraped online price data, debit card expenditures, real-time information.
    Date: 2020–08
  30. By: Zhou, Alvin (University of Pennsylvania)
    Abstract: Corporations are increasingly engaging in political and social issues through corporate social responsibility (CSR) initiatives, in contentious areas such as lesbian, gay, bisexual, transgender, queer (LGBTQ) advocacy. This article systematically, comparatively, and computationally examines this intersection, and contributes to the literature by 1) examining the global LGBTQ CSR discourse constructed by Fortune Global 500 companies (136,820 words) with semantic network analysis and structural topic modeling, 2) surveying non-profit organizations’ guidelines and comparing corporate values with them, and 3) exploring how stakeholder expectations and institutional factors influence CSR communication. Results indicate 6 corporate topics and 9 non-profit topics, which were explicated by referencing organizations’ original writing. It is further shown that stakeholder expectations and institutional factors not only affect whether organizations report LGBTQ efforts, but also affect what topics these companies highlight in their CSR communication. Companies in democratic countries with substantial stakeholder expectations emphasize areas that need high investment and exceed legal obligations.
    Date: 2020–10–28

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.