nep-big New Economics Papers
on Big Data
Issue of 2019‒03‒04
thirteen papers chosen by
Tom Coupé
University of Canterbury

  1. Data Science for Entrepreneurship Research : Studying Demand Dynamics for Entrepreneurial Skills in the Netherlands By Prüfer, Jens; Prüfer, Patricia
  2. Financial series prediction using Attention LSTM By Sangyeon Kim; Myungjoo Kang
  3. Working Paper: Improved Stock Price Forecasting Algorithm based on Feature-weighed Support Vector Regression by using Grey Correlation Degree By Quanxi Wang
  4. Global Stock Market Prediction Based on Stock Chart Images Using Deep Q-Network By Jinho Lee; Raehyun Kim; Yookyung Koh; Jaewoo Kang
  6. Artificial intelligence, algorithmic pricing and collusion By Calvano, Emilio; Calzolari, Giacomo; Denicolò, Vincenzo; Pastorello, Sergio
  7. Competition and competition policy: at the junction of the future and past By Shastitko, Andrey (Шаститко, Андрей); Kurdin, Alexander (Курдин, Александр); Markova, Olga (Маркова, Ольга); Мeleshkina, Аnna (Мелешкина, Анна); Morosanova, Anastasia (Моросанова, Анастасия); Pavlova, Natalia (Павлова, Наталья); Shpakova, Anastasia (Шпакова, Анастасия)
  8. Risk and Rationality:The Relative Importance of Probability Weighting and Choice Set Dependence By Marius Brulhart; Olivier cadot; Alexander Himbert
  9. Big Data and Firm Dynamics By Farboodi, Maryam; Mihet, Roxana; Philippon, Thomas; Veldkamp, Laura
  10. Measuring Rice Yield from Space: The Case of Thai Binh Province, Viet Nam By Guan, Kaiyu; Hien, Ngo The; Rao, Lakshman Nagraj
  11. Productivity Panics – Polemics and Realities By Auerbach, Paul
  12. Discovering Language of the Stocks By Marko Po\v{z}enel; Dejan Lavbi\v{c}
  13. Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence By Knaus, Michael C.; Lechner, Michael; Strittmatter, Anthony

  1. By: Prüfer, Jens (Tilburg University, Center For Economic Research); Prüfer, Patricia (Tilburg University, Center For Economic Research)
    Abstract: The recent rise of big data and artificial intelligence (AI) is changing markets, politics, organizations, and societies. It also affects the domain of research. Supported by new statistical methods that rely on computational power and computer science --- data science methods --- we are now able to analyze data sets that can be huge, multidimensional, unstructured, and are diversely sourced. In this paper, we describe the most prominent data science methods suitable for entrepreneurship research and provide links to literature and Internet resources for self-starters. We survey how data science methods have been applied in the entrepreneurship research literature. As a showcase of data science techniques, based on a dataset of 95% of all job vacancies in the Netherlands over a 6-year period with 7.7 million data points, we provide an original analysis of the demand dynamics for entrepreneurial skills in the Netherlands. We show which entrepreneurial skills are particularly important for which type of profession. Moreover, we find that demand for both entrepreneurial and digital skills has increased for managerial positions, but not for others. We also find that entrepreneurial skills were significantly more demanded than digital skills over the entire period 2012-2017 and that the absolute importance of entrepreneurial skills has even increased more than digital skills for managers, despite the impact of datafication on the labor market. We conclude that further studies of entrepreneurial skills in the general population --- outside the domain of entrepreneurs --- is a rewarding subject for future research.
    Keywords: data science; machine learning; entrepreneurship; entrepreneurial skills; big data; artificial intelligence
    JEL: L26 C50 C87 O32
    Date: 2019
  2. By: Sangyeon Kim; Myungjoo Kang
    Abstract: Financial time series prediction, especially with machine learning techniques, is an extensive field of study. In recent times, deep learning methods (especially time series analysis) have performed outstandingly for various industrial problems, with better prediction than machine learning methods. Moreover, many researchers have used deep learning methods to predict financial time series with various models in recent years. In this paper, we will compare various deep learning models, such as multilayer perceptron (MLP), one-dimensional convolutional neural networks (1D CNN), stacked long short-term memory (stacked LSTM), attention networks, and weighted attention networks for financial time series prediction. In particular, attention LSTM is not only used for prediction, but also for visualizing intermediate outputs to analyze the reason of prediction; therefore, we will show an example for understanding the model prediction intuitively with attention vectors. In addition, we focus on time and factors, which lead to an easy understanding of why certain trends are predicted when accessing a given time series table. We also modify the loss functions of the attention models with weighted categorical cross entropy; our proposed model produces a 0.76 hit ratio, which is superior to those of other methods for predicting the trends of the KOSPI 200.
    Date: 2019–02
  3. By: Quanxi Wang
    Abstract: With the widespread engineering applications ranging from artificial intelligence and big data decision-making, originally a lot of tedious financial data processing, processing and analysis have become more and more convenient and effective. This paper aims to improve the accuracy of stock price forecasting. It improves the support vector machine regression algorithm by using grey correlation analysis (GCA) and improves the accuracy of stock prediction. This article first divides the factors affecting the stock price movement into behavioral factors and technical factors. The behavioral factors mainly include weather indicators and emotional indicators. The technical factors mainly include the daily closing data and the HS 300 Index, and then measure relation through the method of grey correlation analysis. The relationship between the stock price and its impact factors during the trading day, and this relationship is transformed into the characteristic weight of each impact factor. The weight of the impact factors of all trading days is weighted by the feature weight, and finally the support vector regression (SVR) is used. The forecast of the revised stock trading data was compared based on the forecast results of technical indicators (MSE, MAE, SCC, and DS) and unmodified transaction data, and it was found that the forecast results were significantly improved.
    Date: 2019–02
  4. By: Jinho Lee; Raehyun Kim; Yookyung Koh; Jaewoo Kang
    Abstract: We applied Deep Q-Network with a Convolutional Neural Network function approximator, which takes stock chart images as input, for making global stock market predictions. Our model not only yields profit in the stock market of the country where it was trained but generally yields profit in global stock markets. We trained our model only in the US market and tested it in 31 different countries over 12 years. The portfolios constructed based on our model's output generally yield about 0.1 to 1.0 percent return per transaction prior to transaction costs in 31 countries. The results show that there are some patterns on stock chart image, that tend to predict the same future stock price movements across global stock markets. Moreover, the results show that future stock prices can be predicted even if the training and testing procedures are done in different countries. Training procedure could be done in relatively large and liquid markets (e.g., USA) and tested in small markets. This result demonstrates that artificial intelligence based stock price forecasting models can be used in relatively small markets (emerging countries) even though they do not have a sufficient amount of data for training.
    Date: 2019–02
  5. By: Jacques Bughin; Tobias Kretschmer; Nicolas van Zeebroeck
    Abstract: With the increasing availability of digital technologies, many firms are planning to develop digitally-enabled business models. Digital technologies can give an impulse to realign strategies through two channels: Initial use of digital technologies may help firms spot their potential and encourage firms to develop digitally-supported business models, or emerging digital technologies may present a threat to firms, who then initiate a process of strategic renewal to relieve the pressure. We study how the adoption of new digital technologies is associated with changes to the strategy of the firm, and how both are shaped by a firm’s perception of the competitive stress created by new technological developments. Using two detailed survey-based datasets on firms’ expectations, adoption and strategy renewal for a wide range of AI and digital technologies, we find a strong positive association between the degree of strategy change and the adoption of advanced digital technologies. This relationship does not seem mediated by the level of competitive stress from digital technology, which is itself strongly associated with strategy change. Our results suggest a tight coupling between (technological) structure and strategy.
    Keywords: Digital transformation, Strategic Organization Design, Technology adoption, Strategic renewal, Digital strategy, Big data, Artificial Intelligence
    Date: 2019–01
  6. By: Calvano, Emilio; Calzolari, Giacomo; Denicolò, Vincenzo; Pastorello, Sergio
    Abstract: Increasingly, pricing algorithms are supplanting human decision making in real marketplaces. To inform the competition policy debate on the possible consequences of this development, we experiment with pricing algorithms powered by Artificial Intelligence (AI) in controlled environments (computer simulations), studying the interaction among a number of Q-learning algorithms in a workhorse oligopoly model of price competition with Logit demand and constant marginal costs. In this setting the algorithms consistently learn to charge supra-competitive prices, without communicating with one another. The high prices are sustained by classical collusive strategies with a finite phase of punishment followed by a gradual return to cooperation. This finding is robust to asymmetries in cost or demand and to changes in the number of players.
    Keywords: artificial intelligence; Collusion; Pricing-Algorithms; Q-Learning; Reinforcement Learning
    JEL: D43 D83 L13 L41
    Date: 2018–12
  7. By: Shastitko, Andrey (Шаститко, Андрей) (The Russian Presidential Academy of National Economy and Public Administration); Kurdin, Alexander (Курдин, Александр) (The Russian Presidential Academy of National Economy and Public Administration); Markova, Olga (Маркова, Ольга) (The Russian Presidential Academy of National Economy and Public Administration); Мeleshkina, Аnna (Мелешкина, Анна) (The Russian Presidential Academy of National Economy and Public Administration); Morosanova, Anastasia (Моросанова, Анастасия) (The Russian Presidential Academy of National Economy and Public Administration); Pavlova, Natalia (Павлова, Наталья) (The Russian Presidential Academy of National Economy and Public Administration); Shpakova, Anastasia (Шпакова, Анастасия) (The Russian Presidential Academy of National Economy and Public Administration)
    Abstract: The report reviewed four current areas of competition policy, related both to new phenomena and traditional antitrust plots. 1) Digital transformation leads to the emergence of new business strategies, however, new sources of risks of restricting competition arise: algorithmic pricing, big data, multilateral markets, platforms. 2) Imports of technology and political sanctions can reduce the effectiveness of the use of antitrust immunity for holders of exclusive rights to the results of intellectual activity (RID). A gradual transition to a new regime of anti-monopoly policy in the field of RID circulation is needed. 3) The development of supranational antitrust promotes the use of complementary capabilities of different countries antimonopoly authorities. Barriers to supranational antitrust are the heterogeneity of the participating States and the lack of sustainability of supranational antitrust authorities. 4) The problem of bilateral monopoly does not lose relevance. High transaction costs and negative externalities from the parties' failure to reach an agreement are the basis for government intervention using the comparative advantages of the antimonopoly authority.
    Date: 2019–01
  8. By: Marius Brulhart; Olivier cadot; Alexander Himbert
    Abstract: Does international trade help or hinder the economic development of border regions relative to interior regions? Theory tends to suggest that trade helps, but it can also predict the reverse. The question is policy relevant as regions near land borders are generally poorer, and sometimes more prone to civil conflict, than interior regions. We therefore estimate how changes in bilateral trade volumes affect economic activity along roads running inland from international borders, using satellite night-light measurements for 2,186 border-crossing roads in 138 countries. We observe a significant ‘border shadow’: on average, lights are 37 percent dimmer at the border than 200 kilometers inland. We find this difference to be reduced by trade expansion as measured by exports and instrumented with tariffs on the opposite side of the border. At the mean, a doubling of exports to a particular neighbor country reduces the gradient of light from the border by some 23 percent. This qualitative finding applies to developed and developing countries, and to rural and urban border regions. Proximity to cities on either side of the border amplifies the effects of trade. We provide evidence that local export-oriented production is a significant mechanism behind the observed effects.
    Keywords: Trade liberalization, border regions, economic geography, night lights data
    JEL: F14 F15 R11 R12
    Date: 2019–02
  9. By: Farboodi, Maryam; Mihet, Roxana; Philippon, Thomas; Veldkamp, Laura
    Abstract: We study a model where firms accumulate data as a valuable intangible asset. Data accumulation affects firms' dynamics. It increases the skewness of the firm size distribution as large firms generate more data and invest more in active experimentation. On the other hand, small data-savvy firms can overtake more traditional incumbents, provided they can finance their initial money-losing growth. Our model can be used to estimate the market and social value of data.
    Keywords: Big Data; firm size
    Date: 2019–01
  10. By: Guan, Kaiyu (University of Illinois); Hien, Ngo The (Ministry of Agriculture and Rural Development); Rao, Lakshman Nagraj (Asian Development Bank)
    Abstract: Despite a growing interest in using satellite data to estimate paddy rice yield in Southeast Asia, significant cloud coverage has led to a scarcity of usable optical data for such analysis. In this paper, we study the feasibility of using two alternative sources of satellite data—(i) surface reflectance fusion data which integrates Landsat and Moderate Resolution Imaging Spectroradiometer (MODIS) images, and (ii) L-band radar backscatter data from the Advanced Land Observing Satellite 2 (ALOS-2) PALSAR-2 sensors—to circumvent the cloud cover problem and estimate yield in Thai Binh Province, Viet Nam during the second growing season of 2015. Our findings indicate that although Landsat– MODIS fusion data are not necessarily beneficial for paddy rice mapping when compared with only using Landsat data, fusion data allows us to estimate the peak value of various vegetation indexes and derive the best empirical relationship between these indexes and yield data from the field. We also find that the L-band radar data not only has a lower performance in paddy rice mapping when compared with optical data, but also contributes little to rice yield estimation.
    Keywords: agriculture; ALOS-2; crop cutting; crop yield; Fusion; Landsat; MODIS; paddy rice; remote sensing; Viet Nam
    JEL: C40 O13 Q18
    Date: 2018–03–23
  11. By: Auerbach, Paul (Kingston University London)
    Abstract: Widespread uneasiness has emerged concerning a perceived slowdown in productivity growth. The question posed here is whether our destiny is indeed tied to inexorable movements in productivity and innovation, whatever these things may be, or can we build a future contingent upon collective choices and guided by human needs and desires?
    Keywords: Artificial Intelligence; innovation; productivity; Schumpeter; technological change; total factor productivity.
    JEL: O10 O30 O33 O40 O47
    Date: 2019–02–25
  12. By: Marko Po\v{z}enel; Dejan Lavbi\v{c}
    Abstract: Stock prediction has always been attractive area for researchers and investors since the financial gains can be substantial. However, stock prediction can be a challenging task since stocks are influenced by a multitude of factors whose influence vary rapidly through time. This paper proposes a novel approach (Word2Vec) for stock trend prediction combining NLP and Japanese candlesticks. First, we create a simple language of Japanese candlesticks from the source OHLC data. Then, sentences of words are used to train the NLP Word2Vec model where training data classification also takes into account trading commissions. Finally, the model is used to predict trading actions. The proposed approach was compared to three trading models Buy & Hold, MA and MACD according to the yield achieved. We first evaluated Word2Vec on three shares of Apple, Microsoft and Coca-Cola where it outperformed the comparative models. Next we evaluated Word2Vec on stocks from Russell Top 50 Index where our Word2Vec method was also very successful in test phase and only fall behind the Buy & Hold method in validation phase. Word2Vec achieved positive results in all scenarios while the average yields of MA and MACD were still lower compared to Word2Vec.
    Date: 2019–02
  13. By: Knaus, Michael C.; Lechner, Michael; Strittmatter, Anthony
    Abstract: We investigate the finite sample performance of causal machine learning estimators for heterogeneous causal effects at different aggregation levels. We employ an Empirical Monte Carlo Study that relies on arguably realistic data Generation processes (DGPs) based on actual data. We consider 24 different DGPs, Eleven different causal machine learning estimators, and three aggregation levels of the estimated effects. In the main DGPs, we allow for selection into treatment based on a rich set of observable covariates. We provide evidence that the estimators can be categorized into three groups. The first group performs consistently well across all DGPs and aggregation levels. These estimators have multiple steps to account for the selection into the treatment and the outcome process. The second group shows competitive performance only for particular DGPs. The third group is clearly outperformed by the other estimators.
    Keywords: Causal Forest; Causal machine learning; conditional average treatment effects; Lasso; Random Forest; selection-on-observables
    JEL: C21
    Date: 2018–12

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.