nep-big New Economics Papers
on Big Data
Issue of 2020‒10‒26
27 papers chosen by
Tom Coupé
University of Canterbury

  1. State of the Art Survey of Deep Learning and Machine Learning Models for Smart Cities and Urban Sustainability By Saeed Nosratabadi; Amir Mosavi; Ramin Keivani; Sina Ardabili; Farshid Aram
  2. Stock Price Prediction Using Machine Learning and LSTM-Based Deep Learning Models By Sidra Mehtab; Jaydip Sen; Abhishek Dutta
  3. LASSO DEA for small and big data By Ya Chen; Mike Tsionas; Valentin Zelenyuk
  4. Monitoring War Destruction from Space: A Machine Learning Approach By Hannes Mueller; Andre Groger; Jonathan Hersh; Andrea Matranga; Joan Serrat
  5. The relationship between nuclear energy consumption and economic growth: evidence from Switzerland By Cosimo Magazzino; Marco Mele; Nicolas Schneider; Guillaume Vallet
  6. Using Machine Learning and Alternative Data to Predict Movements in Market Risk By Thomas Dierckx; Jesse Davis; Wim Schoutens
  7. A Deep Learning Approach for Dynamic Balance Sheet Stress Testing By Anastasios Petropoulos; Vassilis Siakoulis; Konstantinos P. Panousis; Theodoros Christophides; Sotirios Chatzis
  8. Predicting Corruption Crimes with Machine Learning. A Study for the Italian Municipalities By Guido de Blasio; Alessio D'Ignazio; Marco Letta
  9. Firm-Level Risk Exposures and Stock Returns in the Wake of COVID-19 By Steven J. Davis; Stephen Hansen; Cristhian Seminario-Amez
  10. Nowcasting GDP growth using data reduction methods: Evidence for the French economy By Olivier Darne; Amelie Charles
  11. Perception of Artificial Intelligence in Spain By Albarrán, Irene; Molina, José Manuel; Gijón, Covadonga
  12. Evaluation of company investment value based on machine learning By Junfeng Hu; Xiaosa Li; Yuru Xu; Shaowu Wu; Bin Zheng
  13. The Consequences of the COVID-19 Job Losses: Who Will Suffer Most and by How Much? By Andreas Gulyas; Krzysztof Pytka
  14. No data? No problem! A Search-based Recommendation System with Cold Starts By Pedro M. Gardete; Carlos D. Santos
  15. Sentiment of tweets and socio-economic characteristics as the determinants of voting behavior at the regional level. Case study of 2019 Polish parliamentary election By Grzegorz Krochmal
  16. Stock2Vec: A Hybrid Deep Learning Framework for Stock Market Prediction with Representation Learning and Temporal Convolutional Network By Xing Wang; Yijun Wang; Bin Weng; Aleksandr Vinel
  17. Asset Price Forecasting using Recurrent Neural Networks By Hamed Vaheb
  18. Tail-risk protection: Machine Learning meets modern Econometrics By Bruno Spilak; Wolfgang Karl H\"ardle
  19. The Rise of Fintech: A Cross-Country Perspective By Oskar KOWALEWSKI; Paweł PISANY
  20. Influence of big data and analysis of orientation effect on firm performance By Susanto, Stefanny Magdalena
  21. Machine Learning Classification of Price Extrema Based on Market Microstructure Features: A Case Study of S&P500 E-mini Futures By Artur Sokolovsky; Luca Arnaboldi
  22. A Pound Centric look at the Pound vs. Krona Exchange Rate Movement from 1844 to 1965 By Andrew Clark
  23. Common factors of withdrawn and prohibited mergers in the European Union By Bernhardt, Lea
  24. A Deep Learning Framework for Predicting Digital Asset Price Movement from Trade-by-trade Data By Qi Zhao
  25. COVID-19 and the future of US fertility: what can we learn from Google? By Joshua Wilde; Wei Chen; Sophie Lohmann
  26. Economic Value of Data: Quantification Using Online Experiments By Lukasz Grzybowski; Frank Verboven; Frank Verboven
  27. The Determinants of Economic Competitiveness By Kluge, Jan; Lappoehn, Sarah; Plank, Kerstin

  1. By: Saeed Nosratabadi; Amir Mosavi; Ramin Keivani; Sina Ardabili; Farshid Aram
    Abstract: Deep learning (DL) and machine learning (ML) methods have recently contributed to the advancement of models in the various aspects of prediction, planning, and uncertainty analysis of smart cities and urban development. This paper presents the state of the art of DL and ML methods used in this realm. Through a novel taxonomy, the advances in model development and new application domains in urban sustainability and smart cities are presented. Findings reveal that five DL and ML methods have been most applied to address the different aspects of smart cities. These are artificial neural networks; support vector machines; decision trees; ensembles, Bayesians, hybrids, and neuro-fuzzy; and deep learning. It is also disclosed that energy, health, and urban transport are the main domains of smart cities that DL and ML methods contributed in to address their problems.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.02670&r=all
  2. By: Sidra Mehtab; Jaydip Sen; Abhishek Dutta
    Abstract: Prediction of stock prices has been an important area of research for a long time. While supporters of the efficient market hypothesis believe that it is impossible to predict stock prices accurately, there are formal propositions demonstrating that accurate modeling and designing of appropriate variables may lead to models using which stock prices and stock price movement patterns can be very accurately predicted. In this work, we propose an approach of hybrid modeling for stock price prediction building different machine learning and deep learning-based models. For the purpose of our study, we have used NIFTY 50 index values of the National Stock Exchange (NSE) of India, during the period December 29, 2014 till July 31, 2020. We have built eight regression models using the training data that consisted of NIFTY 50 index records during December 29, 2014 till December 28, 2018. Using these regression models, we predicted the open values of NIFTY 50 for the period December 31, 2018 till July 31, 2020. We, then, augment the predictive power of our forecasting framework by building four deep learning-based regression models using long-and short-term memory (LSTM) networks with a novel approach of walk-forward validation. We exploit the power of LSTM regression models in forecasting the future NIFTY 50 open values using four different models that differ in their architecture and in the structure of their input data. Extensive results are presented on various metrics for the all the regression models. The results clearly indicate that the LSTM-based univariate model that uses one-week prior data as input for predicting the next week open value of the NIFTY 50 time series is the most accurate model.
    Date: 2020–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2009.10819&r=all
  3. By: Ya Chen (Hefei University of Technology, China); Mike Tsionas (Lancaster University, United Kingdom); Valentin Zelenyuk (School of Economics and Centre for Efficiency and Productivity Analysis (CEPA) at The University of Queensland, Australia)
    Abstract: In data envelopment analysis (DEA), the curse of dimensionality problem may jeopardize the accuracy or even the relevance of results when there is a relatively large dimension of inputs and outputs, even for relatively large samples. Recently, a machine learning approach based on the least absolute shrinkage and selection operator (LASSO) for variable selection was combined with SCNLS (a special case of DEA), and dubbed as LASSO-SCNLS, as a way to circumvent the curse of dimensionality problem. In this paper, we revisit this interesting approach, by considering various data generating processes. We also explore a more advanced version of LASSO, the so-called elastic net (EN) approach, adapt it to DEA and propose the EN-DEA. Our Monte Carlo simulations provide additional and to some extent, new evidence and conclusions. In particular, we find that none of the considered approaches clearly dominate the others. To circumvent the curse of dimensionality of DEA in the context of big wide data, we also propose a simplified two-step approach which we call LASSO+DEA. We find that the proposed simplified approach could be more useful than the existing more sophisticated approaches for reducing very large dimensions into sparser, more parsimonious DEA models that attain greater discriminatory power and suffer less from the curse of dimensionality.
    Keywords: Data envelopment analysis; Data enabled analytics; Sign-constrainedconvex nonparametric least squares (SCNLS); Machine learning; LASSO; Elastic net; Big wide data
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:qld:uqcepa:152&r=all
  4. By: Hannes Mueller; Andre Groger; Jonathan Hersh; Andrea Matranga; Joan Serrat
    Abstract: Existing data on building destruction in conflict zones rely on eyewitness reports or manual detection, which makes it generally scarce, incomplete and potentially biased. This lack of reliable data imposes severe limitations for media reporting, humanitarian relief efforts, human rights monitoring, reconstruction initiatives, and academic studies of violent conflict. This article introduces an automated method of measuring destruction in high-resolution satellite images using deep learning techniques combined with data augmentation to expand training samples. We apply this method to the Syrian civil war and reconstruct the evolution of damage in major cities across the country. The approach allows generating destruction data with unprecedented scope, resolution, and frequency - only limited by the available satellite imagery - which can alleviate data limitations decisively.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.05970&r=all
  5. By: Cosimo Magazzino (Roma Tre University); Marco Mele (UNITE - Universita degli studi di Teramo - University of Teramo [Italie]); Nicolas Schneider; Guillaume Vallet (CREG - Centre de recherche en économie de Grenoble - UGA [2020-....] - Université Grenoble Alpes [2020-....])
    Abstract: This study aims to investigate the relationship between nuclear energy consumption and economicgrowth in Switzerland over the period 1970–2018. We use data on capital, labour, and exportswithin a multivariate framework. Starting from the consideration that Switzerland has decided tophase out nuclear energy by 2034, we examine the effect of this structural economic-energy changein the country. To do so, two distinct estimation tools are performed. The first model, using atime-series approach, analyze the relationship between bivariate and multivariate causality. Thesecond, using a Machine Learning methodology, test the results of the econometric modellingthrough an Artificial Neural Networks process. This last empirical procedure represents ouroriginal contribution with respect to the previous energy-GDP papers. The results, in thelogarithmic propagation of neural networks, suggest a careful analysis of the process that will leadto the abandonment of nuclear energy in Switzerland to avoid adverse effects on economic growth.
    Keywords: nuclear energy consumption,GDP,employment,capital stock,time-series,artificial neural networks
    Date: 2020–09–01
    URL: http://d.repec.org/n?u=RePEc:hal:journl:halshs-02951860&r=all
  6. By: Thomas Dierckx; Jesse Davis; Wim Schoutens
    Abstract: Using machine learning and alternative data for the prediction of financial markets has been a popular topic in recent years. Many financial variables such as stock price, historical volatility and trade volume have already been through extensive investigation. Remarkably, we found no existing research on the prediction of an asset's market implied volatility within this context. This forward-looking measure gauges the sentiment on the future volatility of an asset, and is deemed one of the most important parameters in the world of derivatives. The ability to predict this statistic may therefore provide a competitive edge to practitioners of market making and asset management alike. Consequently, in this paper we investigate Google News statistics and Wikipedia site traffic as alternative data sources to quantitative market data and consider Logistic Regression, Support Vector Machines and AdaBoost as machine learning models. We show that movements in market implied volatility can indeed be predicted through the help of machine learning techniques. Although the employed alternative data appears to not enhance predictive accuracy, we reveal preliminary evidence of non-linear relationships between features obtained from Wikipedia page traffic and movements in market implied volatility.
    Date: 2020–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2009.07947&r=all
  7. By: Anastasios Petropoulos; Vassilis Siakoulis; Konstantinos P. Panousis; Theodoros Christophides; Sotirios Chatzis
    Abstract: In the aftermath of the financial crisis, supervisory authorities have considerably improved their approaches in performing financial stress testing. However, they have received significant criticism by the market participants due to the methodological assumptions and simplifications employed, which are considered as not accurately reflecting real conditions. First and foremost, current stress testing methodologies attempt to simulate the risks underlying a financial institution's balance sheet by using several satellite models, making their integration a really challenging task with significant estimation errors. Secondly, they still suffer from not employing advanced statistical techniques, like machine learning, which capture better the nonlinear nature of adverse shocks. Finally, the static balance sheet assumption, that is often employed, implies that the management of a bank passively monitors the realization of the adverse scenario, but does nothing to mitigate its impact. To address the above mentioned criticism, we introduce in this study a novel approach utilizing deep learning approach for dynamic balance sheet stress testing. Experimental results give strong evidence that deep learning applied in big financial/supervisory datasets create a state of the art paradigm, which is capable of simulating real world scenarios in a more efficient way.
    Date: 2020–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2009.11075&r=all
  8. By: Guido de Blasio (Structural Economic Analysis Directorate, Bank of Italy); Alessio D'Ignazio (Structural Economic Analysis Directorate, Bank of Italy); Marco Letta
    Abstract: Using police archives, we apply machine learning algorithms to predict corruption crimes in Italian municipalities during the period 2012-2014. We correctly identify over 70% (slightly less than 80%) of the municipalities that will experience corruption episodes (an increase in corruption crimes). We show that algorithmic predictions could strengthen the ability of the 2012 Italy’s anti-corruption law to fight white-collar delinquencies.
    Keywords: crime prediction, white-collar crimes, machine learning, classification trees, policy targeting
    JEL: C52 D73 H70 K10
    Date: 2020–09
    URL: http://d.repec.org/n?u=RePEc:saq:wpaper:16/20&r=all
  9. By: Steven J. Davis; Stephen Hansen; Cristhian Seminario-Amez
    Abstract: Firm-level stock returns differ enormously in reaction to COVID-19 news. We characterize these reactions using the Risk Factors discussions in pre-pandemic 10-K filings and two text-analytic approaches: expert-curated dictionaries and supervised machine learning (ML). Bad COVID-19 news lowers returns for firms with high exposures to travel, traditional retail, aircraft production and energy supply — directly and via downstream demand linkages — and raises them for firms with high exposures to healthcare policy, e-commerce, web services, drug trials and materials that feed into supply chains for semiconductors, cloud computing and telecommunications. Monetary and fiscal policy responses to the pandemic strongly impact firm-level returns as well, but differently than pandemic news. Despite methodological differences, dictionary and ML approaches yield remarkably congruent return predictions. Importantly though, ML operates on a vastly larger feature space, yielding richer characterizations of risk exposures and outperforming the dictionary approach in goodness-of-fit. By integrating elements of both approaches, we uncover new risk factors and sharpen our explanations for firm-level returns. To illustrate the broader utility of our methods, we also apply them to explain firm-level returns in reaction to the March 2020 Super Tuesday election results.
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:ces:ceswps:_8594&r=all
  10. By: Olivier Darne (LEMNA - Laboratoire d'économie et de management de Nantes Atlantique - IUML - FR 3473 Institut universitaire Mer et Littoral - UBS - Université de Bretagne Sud - UM - Le Mans Université - UA - Université d'Angers - CNRS - Centre National de la Recherche Scientifique - IFREMER - Institut Français de Recherche pour l'Exploitation de la Mer - UN - Université de Nantes - ECN - École Centrale de Nantes - IEMN-IAE Nantes - Institut d'Économie et de Management de Nantes - Institut d'Administration des Entreprises - Nantes - UN - Université de Nantes); Amelie Charles (Audencia Business School)
    Abstract: In this paper, we propose bridge models to nowcast French gross domestic product (GDP) quarterly growth rate. The bridge models, allowing economic interpretations, are specified by using a machine learning approach via Lasso-based regressions and by an econometric approach based on an automatic general-to-specific procedure. These approaches allow to select explanatory variables among a large data set of soft data. A recursive forecast study is carried out to assess the forecasting performance. It turns out that the bridge models constructed using the both variable-selection approaches outperform benchmark models and give similar performance in the out-of-sample forecasting exercise. Finally, the combined forecasts of these both approaches display interesting forecasting performance.
    Date: 2020–09
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-02948802&r=all
  11. By: Albarrán, Irene; Molina, José Manuel; Gijón, Covadonga
    Abstract: The present paper analyses perception of AI of individuals in Spain and the factors associated with it. Data on 6,308 individuals from the Spanish survey (CIS, 2018) are used. The data include several measures of perception, innovation, place of residence (autonomous regions and province), gender, age, educational level, and other socioeconomic and technical variables. A binary logit regression model is formulated and estimated for the attitude towards robots and artificial intelligence and its possible determinants. The results indicate that people have a negative attitude if they are not interested in scientific discoveries and technological developments and if AI and robots are not helpful at work.
    Keywords: perception,innovation,artificial intelligence,survey data,binary logit
    JEL: C21 C25 D12 D83 L63 L86 L96 P36
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:zbw:itso20:224843&r=all
  12. By: Junfeng Hu; Xiaosa Li; Yuru Xu; Shaowu Wu; Bin Zheng
    Abstract: In this paper, company investment value evaluation models are established based on comprehensive company information. After data mining and extracting a set of 436 feature parameters, an optimal subset of features is obtained by dimension reduction through tree-based feature selection, followed by the 5-fold cross-validation using XGBoost and LightGBM models. The results show that the Root-Mean-Square Error (RMSE) reached 3.098 and 3.059, respectively. In order to further improve the stability and generalization capability, Bayesian Ridge Regression has been used to train a stacking model based on the XGBoost and LightGBM models. The corresponding RMSE is up to 3.047. Finally, the importance of different features to the LightGBM model is analysed.
    Date: 2020–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.01996&r=all
  13. By: Andreas Gulyas; Krzysztof Pytka
    Abstract: Using the universe of Austrian unemployment insurance records until May 2020, we document that the composition of UI claimants during the Covid-19 outbreak is sub- stantially di erent compared to past times. Using a machine-learning algorithm from Gulyas and Pytka (2020), we identify individual earnings losses conditional on worker and job characteristics. Covid-19-related job terminations are associated with lower losses in earnings and wages compared to the Great Recession, but similar employ- ment losses. We further derive an accurate but simple policy rule targeting individuals vulnerable to long-term wage losses.
    Keywords: Covid-19 , Job displacement, Earnings losses, Causal machine learning
    Date: 2020–09
    URL: http://d.repec.org/n?u=RePEc:bon:boncrc:crctr224_2020_212&r=all
  14. By: Pedro M. Gardete; Carlos D. Santos
    Abstract: Recommendation systems are essential ingredients in producing matches between products and buyers. Despite their ubiquity, they face two important challenges. First, they are data-intensive, a feature that precludes sophisticated recommendations by some types of sellers, including those selling durable goods. Second, they often focus on estimating fixed evaluations of products by consumers while ignoring state-dependent behaviors identified in the Marketing literature. We propose a recommendation system based on consumer browsing behaviors, which bypasses the "cold start" problem described above, and takes into account the fact that consumers act as "moving targets," behaving differently depending on the recommendations suggested to them along their search journey. First, we recover the consumers' search policy function via machine learning methods. Second, we include that policy into the recommendation system's dynamic problem via a Bellman equation framework. When compared with the seller's own recommendations, our system produces a profit increase of 33%. Our counterfactual analyses indicate that browsing history along with past recommendations feature strong complementary effects in value creation. Moreover, managing customer churn effectively is a big part of value creation, whereas recommending alternatives in a forward-looking way produces moderate effects.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.03455&r=all
  15. By: Grzegorz Krochmal
    Abstract: This work is dedicated to finding the determinants of voting behavior in Poland at the poviat level. 2019 parliamentary election has been analyzed and an attempt to explain vote share for the winning party (Law and Justice) has been made. Sentiment analysis of tweets in Polish (original) and English (machine-translations), collected in the period around the election, has been applied. Amid multiple machine learning approaches tested, the best classification accuracy has been achieved by Huggingface BERT on machine-translated tweets. OLS regression, with sentiment of tweets and selected socio-economic features as independent variables, has been utilized to explain Law and Justice vote share in poviats. Sentiment of tweets has been found to be a significant predictor, as stipulated by the literature of the field.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.03493&r=all
  16. By: Xing Wang; Yijun Wang; Bin Weng; Aleksandr Vinel
    Abstract: We have proposed to develop a global hybrid deep learning framework to predict the daily prices in the stock market. With representation learning, we derived an embedding called Stock2Vec, which gives us insight for the relationship among different stocks, while the temporal convolutional layers are used for automatically capturing effective temporal patterns both within and across series. Evaluated on S&P 500, our hybrid framework integrates both advantages and achieves better performance on the stock price prediction task than several popular benchmarked models.
    Date: 2020–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.01197&r=all
  17. By: Hamed Vaheb
    Abstract: This thesis serves three primary purposes, first of which is to forecast two stocks, i.e. Goldman Sachs (GS) and General Electric (GE). In order to forecast stock prices, we used a long short-term memory (LSTM) model in which we inputted the prices of two other stocks that lie in rather close correlation with GS. Other models such as ARIMA were used as benchmark. Empirical results manifest the practical challenges when using LSTM for forecasting stocks. One of the main upheavals was a recurring lag which we called "forecasting lag". The second purpose is to develop a more general and objective perspective on the task of time series forecasting so that it could be applied to assist in an arbitrary that of forecasting by ANNs. Thus, attempts are made for distinguishing previous works by certain criteria so as to summarise those including effective information. The summarised information is then unified and expressed through a common terminology that can be applied to different steps of a time series forecasting task. The last but not least purpose of this thesis is to elaborate on a mathematical framework on which ANNs are based. We are going to use the framework introduced in the book "Neural Networks in Mathematical Framework" by Anthony L. Caterini in which the structure of a generic neural network is introduced and the gradient descent algorithm (which incorporates backpropagation) is introduced in terms of their described framework. In the end, we use this framework for a specific architecture, which is recurrent neural networks on which we concentrated and our implementations are based. The book proves its theorems mostly for classification case. Instead, we proved theorems for regression case, which is the case of our problem.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.06417&r=all
  18. By: Bruno Spilak; Wolfgang Karl H\"ardle
    Abstract: Tail risk protection is in the focus of the financial industry and requires solid mathematical and statistical tools, especially when a trading strategy is derived. Recent hype driven by machine learning (ML) mechanisms has raised the necessity to display and understand the functionality of ML tools. In this paper, we present a dynamic tail risk protection strategy that targets a maximum predefined level of risk measured by Value-At-Risk while controlling for participation in bull market regimes. We propose different weak classifiers, parametric and non-parametric, that estimate the exceedance probability of the risk level from which we derive trading signals in order to hedge tail events. We then compare the different approaches both with statistical and trading strategy performance, finally we propose an ensemble classifier that produces a meta tail risk protection strategy improving both generalization and trading performance.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.03315&r=all
  19. By: Oskar KOWALEWSKI (IESEG School of Management & LEM-CNRS 9221); Paweł PISANY (Institute of Economics, Polish Academy of Sciences)
    Abstract: This study investigates the determinants of fintech company creation and activity using a cross-country sample that includes developed and developing countries. Using a random effect negative binomial model and explainable machine learning algorithms, we show the positive role of technology advancements in each economy, quality of research, and more importantly, the level of university-industry collaboration. Additionally, we find that demographic factors may play a role in fintech creation and activity. Some fintech companies may find the quality and stringency of regulation to be an obstacle. Our results also show the sophisticated interactions between the banking sector and fintech companies that we may describe as a mix of cooperation and competition.
    Keywords: fintech, innovation, start up, developed countries, developing countries
    JEL: G21 G23 L26 O30
    Date: 2020–07
    URL: http://d.repec.org/n?u=RePEc:ies:wpaper:f202007&r=all
  20. By: Susanto, Stefanny Magdalena
    Abstract: Kinerja perusahaan yang baik bisa dinilai dari berbagai pendekatan dan teori yang mereka terapkan ke perusahaan mereka, salah satunya adalah perusahaan menggunakan kolaborasi strategis untuk mengurangi biaya dan meningkatkan produktivitas melalui kemampuan teknologi, pengetahuan, dan sumber daya (Moaniba, Su, & Lee, 2020).
    Date: 2020–09–29
    URL: http://d.repec.org/n?u=RePEc:osf:osfxxx:sjdpf&r=all
  21. By: Artur Sokolovsky; Luca Arnaboldi
    Abstract: The study introduces an automated trading system for S\&P500 E-mini futures (ES) based on state-of-the-art machine learning. Concretely: we extract a set of scenarios from the tick market data to train the model and further use the predictions to model trading. We define the scenarios from the local extrema of the price action. Price extrema is a commonly traded pattern, however, to the best of our knowledge, there is no study presenting a pipeline for automated classification and profitability evaluation. Our study is filling this gap by presenting a broad evaluation of the approach showing the resulting average Sharpe ratio of 6.32. However, we do not take into account order execution queues, which of course affect the result in the live-trading setting. The obtained performance results give us confidence that this approach is worthwhile.
    Date: 2020–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2009.09993&r=all
  22. By: Andrew Clark (Department of Economics, University of Reading)
    Abstract: A longitudinal (1844-1965) study of the Pound Krona exchange rate is conducted utilizing London Times article news sentiment, gold price, GDP, and other relevant metrics to create a dynamic systems state-based model to predict the Pound Krona yearly exchange rate. The created model slightly outperforms a naive random walk forecasting model.
    Keywords: Econometrics, Machine Learning, Dynamic Systems, Complex Systems
    JEL: C32 C53 C63 E17 F31
    Date: 2020–10–09
    URL: http://d.repec.org/n?u=RePEc:rdg:emxxdp:em-dp2020-22&r=all
  23. By: Bernhardt, Lea (Helmut Schmidt University, Hamburg)
    Abstract: In this paper, we analyse the final decisions for merger cases prepared by the European Commission (EC) since 1990 and build a unique subsample for all non-cleared cases. These incorporate all merger notifications which were either withdrawn by the notifying parties or have been prohibited by the European Commission.We find a sudden decline in prohibitions and withdrawals of cases since 2002 and explore three judicial defeats of the European Commission as determining factors behind these developments. We also find a higher likelihood of withdrawal or prohibition if cases are registered in sectors which incorporate firms in the business of information and communication or transportation and storage. When classifying the documents with a supervised machine learning algorithm, we are able to automatically identify the cleared versus the non-cleared cases with over 90% accuracy. Finally, we find that network effects, high market shares and the risk of collusion are the main competitive concerns which contribute to prohibition decisions in the information and communications sector.
    Keywords: mergers; competition policy; EU Commission; classification; network effects
    JEL: G34 K21 L40
    Date: 2020–10–08
    URL: http://d.repec.org/n?u=RePEc:ris:vhsuwp:2020_184&r=all
  24. By: Qi Zhao
    Abstract: This paper presents a deep learning framework based on Long Short-term Memory Network(LSTM) that predicts price movement of cryptocurrencies from trade-by-trade data. The main focus of this study is on predicting short-term price changes in a fixed time horizon from a looking back period. By carefully designing features and detailed searching for best hyper-parameters, the model is trained to achieve high performance on nearly a year of trade-by-trade data. The optimal model delivers stable high performance(over 60% accuracy) on out-of-sample test periods. In a realistic trading simulation setting, the prediction made by the model could be easily monetized. Moreover, this study shows that the LSTM model could extract universal features from trade-by-trade data, as the learned parameters well maintain their high performance on other cryptocurrency instruments that were not included in training data. This study exceeds existing researches in term of the scale and precision of data used, as well as the high prediction accuracy achieved.
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2010.07404&r=all
  25. By: Joshua Wilde (Max Planck Institute for Demographic Research, Rostock, Germany); Wei Chen (Max Planck Institute for Demographic Research, Rostock, Germany); Sophie Lohmann (Max Planck Institute for Demographic Research, Rostock, Germany)
    Abstract: We use data from Google Trends to predict the effect of the COVID-19 pandemic on future births in the United States. First, we show that periods of above-normal search volume for Google keywords relating to conception and pregnancy in US states are associated with higher numbers of births in the following months. Excess searches for unemployment keywords have the opposite effect. Second, by employing simple statistical learning techniques, we demonstrate that including information on keyword search volumes in prediction models significantly improves forecast accuracy over a number of cross-validation criteria. Third, we use data on Google searches during the COVID-19 pandemic to predict changes in aggregate fertility rates in the United States at the state level through February 2021. Our analysis suggests that between November 2020 and February 2021, monthly US births will drop sharply by approximately 15%. For context, this would be a 50% larger decline than that following the Great Recession of 2008-2009, and similar in magnitude to the declines following the Spanish Flu pandemic of 1918-1919 and the Great Depression. Finally, we find heterogeneous effects of the COVID-19 pandemic across different types of mothers. Women with less than a college education, as well as Black or African American women, are predicted to have larger declines in fertility due to COVID-19. This finding is consistent with elevated caseloads of COVID-19 in low-income and minority neighborhoods, as well as with evidence suggesting larger economic impacts of the crisis among such households.
    Keywords: fertility
    JEL: J1 Z0
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:dem:wpaper:wp-2020-034&r=all
  26. By: Lukasz Grzybowski (Telecom ParisTech, Department of Economics and Social Sciences, 46 rue Barrault, 75013 Paris, France); Frank Verboven (University of Leuven and CEPR (London), Naamsestraat 69, 3000 Leuven, Belgium); Frank Verboven (University of Leuven and CEPR (London), Naamsestraat 69, 3000 Leuven, Belgium)
    Abstract: We conduct incentive compatible choice experiments to measure the economic value of social media data. We focus on the value that users place on their personal data related to the three biggest social media platforms: Facebook, Instagram and Twitter. We find that the median Willingness to Accept (WTA) of users for the entire ``stock'' of data for Facebook and Instagram is $300 each which is 20% higher than Twitter valued at $250. We use data valuations from a recent data breach settlement by Facebook to provide information interventions. We find that reducing these information frictions about the value of data can make users revise their valuations upwards. They do not reduce their valuation if the amount is less than their initial WTA. Finally, a framing which makes the users think about giving their data up immediately makes them value it much higher. We do not find a significant impact of a potential `present bias' in users' decisions, especially relative to the immediate data sharing intervention.
    Keywords: value of data; choice experiments, privacy
    JEL: O30
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:net:wpaper:2013&r=all
  27. By: Kluge, Jan (Institute for Advanced Studies, Vienna, Austria); Lappoehn, Sarah (Institute for Advanced Studies, Vienna, Austria); Plank, Kerstin (Institute for Advanced Studies, Vienna, Austria)
    Abstract: This paper aims at identifying relevant indicators for TFP growth in EU countries during the recovery phase following the 2008/09 economic crisis. We proceed in three steps: First, we estimate TFP growth by means of Stochastic Frontier Analysis (SFA). Second, we perform a TFP growth decomposition in order to get measures for changes in technical progress (CTP), technical efficiency (CTE), scale efficiency (CSC) and allocative efficiency (CAE). And third, we use BART – a non-parametric Bayesian technique from the realm of statistical learning – in order to identify relevant predictors of TFP and its components from the Global Competitiveness Reports. We find that only a few indicators prove to be stable predictors. In particular, indicators that characterize technological readiness, such as broadband internet access, are outstandingly important in order to push technical progress while issues that describe innovation seem only to speed up CTP in higher-income economies. The results presented in this paper can be guidelines to policymakers as they identify areas in which further action could be taken in order to increase economic growth. Concerning the bigger picture, it becomes obvious that advanced machine learning techniques might not be able to replace sound economic theory but they help separating the wheat from the chaff when it comes to selecting the most relevant indicators of economic competitiveness.
    Keywords: Competitiveness, TFP growth, Stochastic Frontier Analysis, BART
    JEL: C23 E24 O47
    Date: 2020–10
    URL: http://d.repec.org/n?u=RePEc:ihs:ihswps:24&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.