nep-big New Economics Papers
on Big Data
Issue of 2019‒01‒07
fifteen papers chosen by
Tom Coupé
University of Canterbury

  1. Predicting Distresses using Deep Learning of Text Segments in Annual Reports By Rastin Matin; Casper Hansen; Christian Hansen; Pia M{\o}lgaard
  2. Nowcasting private consumption: traditional indicators, uncertainty measures, credit cards and some internet data By María Gil; Javier J. Pérez; A. Jesús Sánchez; Alberto Urtasun
  3. Leveraging Financial News for Stock Trend Prediction with Attention-Based Recurrent Neural Network By Huicheng Liu
  4. Artificial Intelligence Methods for Knowledge Management Systems By Begler, A.; Gavrilova, T.
  5. Bridging the Digital Divide: Making the Digital Economy Benefit to the Entire Society By Zhang, Bin; Jin, Zhiye; Peng, Zhidao
  6. A Big data analytical framework for portfolio optimization By Dhanya Jothimani; Ravi Shankar; Surendra S. Yadav
  7. Trade Selection with Supervised Learning and OCA By David Saltiel; Eric Benhamou
  8. Benchmarking Deep Sequential Models on Volatility Predictions for Financial Time Series By Qiang Zhang; Rui Luo; Yaodong Yang; Yuanyuan Liu
  9. The ETS challenges: a machine learning approach to the evaluation of simulated financial time series for improving generation processes By Javier Franco-Pedroso; Joaquin Gonzalez-Rodriguez; Maria Planas; Jorge Cubero; Rafael Cobo; Fernando Pablos
  10. Multitask Learning Deep Neural Network to Combine Revealed and Stated Preference Data By Shenhao Wang; Jinhua Zhao
  11. Retail forecasting: research and practice By Fildes, Robert; Ma, Shaohui; Kolassa, Stephan
  12. Portfolio Optimization for Cointelated Pairs: SDEs vs. Machine Learning By Babak Mahdavi-Damghani; Konul Mustafayeva; Stephen Roberts; Cristin Buescu
  13. Economics of Human-AI Ecosystem: Value Bias and Lost Utility in Multi-Dimensional Gaps By Daniel Muller
  14. Practical Deep Reinforcement Learning Approach for Stock Trading By Zhuoran Xiong; Xiao-Yang Liu; Shan Zhong; Hongyang; Yang; Anwar Walid
  15. A method for measuring detailed demand for workers' competences By Pater, Robert; Szkola, Jaroslaw; Kozak, Marcin

  1. By: Rastin Matin; Casper Hansen; Christian Hansen; Pia M{\o}lgaard
    Abstract: Corporate distress models typically only employ the numerical financial variables in the firms' annual reports. We develop a model that employs the unstructured textual data in the reports as well, namely the auditors' reports and managements' statements. Our model consists of a convolutional recurrent neural network which, when concatenated with the numerical financial variables, learns a descriptive representation of the text that is suited for corporate distress prediction. We find that the unstructured data provides a statistically significant enhancement of the distress prediction performance, in particular for large firms where accurate predictions are of the utmost importance. Furthermore, we find that auditors' reports are more informative than managements' statements and that a joint model including both managements' statements and auditors' reports displays no enhancement relative to a model including only auditors' reports. Our model demonstrates a direct improvement over existing state-of-the-art models.
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1811.05270&r=all
  2. By: María Gil (Banco de España); Javier J. Pérez (Banco de España); A. Jesús Sánchez (Instituto Complutense de Estudios Internacionales (UCM) and GEN); Alberto Urtasun (Banco de España)
    Abstract: The focus of this paper is on nowcasting and forecasting quarterly private consumption. The selection of real-time, monthly indicators focuses on standard (“hard” / “soft” indicators) and less-standard variables. Among the latter group we analyze: i) proxy indicators of economic and policy uncertainty; ii) payment cards’ transactions, as measured at “Point-of-sale” (POS) and ATM withdrawals; iii) indicators based on consumption-related search queries retrieved by means of the Google Trends application. We estimate a suite of mixed-frequency, time series models at the monthly frequency, on a real-time database with Spanish data, and conduct out-of-sample forecasting exercises to assess the relevant merits of the different groups of indicators. Some results stand out: i) “hard” and payments cards indicators are the best performers when taken individually, and more so when combined; ii) nonetheless, “soft” indicators are helpful to detect qualitative signals in the nowcasting horizon; iii) Google-based and uncertainty indicators add value when combined with traditional indicators, most notably at estimation horizons beyond the nowcasting one, what would be consistent with capturing information about future consumption decisions; iv) the combinations of models that include the best performing indicators tend to beat broader-based combinations.
    Keywords: private consumption, nowcasting, forecasting, uncertainty, Google Trends.
    JEL: E27 C32 C53
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:bde:wpaper:1842&r=all
  3. By: Huicheng Liu
    Abstract: Stock market prediction is one of the most attractive research topic since the successful prediction on the market's future movement leads to significant profit. Traditional short term stock market predictions are usually based on the analysis of historical market data, such as stock prices, moving averages or daily returns. However, financial news also contains useful information on public companies and the market. Existing methods in finance literature exploit sentiment signal features, which are limited by not considering factors such as events and the news context. We address this issue by leveraging deep neural models to extract rich semantic features from news text. In particular, a Bidirectional-LSTM are used to encode the news text and capture the context information, self attention mechanism are applied to distribute attention on most relative words, news and days. In terms of predicting directional changes in both Standard & Poor's 500 index and individual companies stock price, we show that this technique is competitive with other state of the art approaches, demonstrating the effectiveness of recent NLP technology advances for computational finance.
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1811.06173&r=all
  4. By: Begler, A.; Gavrilova, T.
    Abstract: The paper presents work in progress on the artificial intelligence (AI) methods use in knowledge management (KM) systems. Such methods supposed to improve KM, for example, auto-mated knowledge discovery with data mining and natural language processing techniques (Liebowitz, 2001) or to continuously reinterpret meaning of information with the sense-making injection (Malhotra, 2001). Recent papers focus on the studying of different AI technologies implementation for knowledge management, like big data (Sumbal et al., 2017; Pauleen et al., 2017), ontology-based methods (Zhang et al., 2015; Remolona et al., 2017), intelligent agents (Kravchenko & Kureichik, 2014; Chang et al., 2017; Kadhim et al., 2017). However, a lack of systemic understanding of their application is still exists. In the paper we aimed to answer the question: What is the role of different types of artificial intelligence methods in the knowledge management systems dedicated for solving particular tasks? To do this several steps is taken. First, analytical framework for existing cases of AI applications was constructed. The framework consists of for embedded dimension: organizational con-text and environment, KM processes and tools, KM system architecture, AI technology implementa-tion. For every dimension a set of characteristics was considered by which use cases can be analyzed. Second, based on the characteristics an analysis of the published KM systems with the incorporated AI technologies was performed. The analysis followed Tranfield et al. (2013) methodology with a three stages: planning, conducting, and reporting the review. For the planning stage analytical framework were used as well as analysis of the relevant literature. At the conducting stage search be the keywords in Scopus database was performed yielded 174 papers as an initial result, than exclusion criteria was applied to the results resulted in the 83 papers for the further analysis. This is a current stage of the research. After it will be finished, a synthesis of the patterns will be performed to create the model of AI technologies use in the KM systems.
    Keywords: artificial intelligence, knowledge management system, system architecture, knowledge management,
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:sps:wpaper:15106&r=all
  5. By: Zhang, Bin; Jin, Zhiye; Peng, Zhidao
    Abstract: With the development of information technology, the connotation of the Digital Divide has evolved constantly. At present, we have entered the new era of "Digital Economy" and newer information technologies such as Big Data, Artificial Intelligence, Internet of Things, and Cloud Computing have been widely developed and applied. New technologies should also be included in the measurement of the Digital Divide. At the same time, the physical gap in traditional information technology has been greatly reduced. Under the condition that physical access conditions are similar, the gap in digital technology skills and use is highlighted. Under such circumstances, the measurement of the Digital Divide should be more concerned with Digital Literacy and Digital Experience. Under the background of the Digital Economy, the existence of the Digital Divide means that there is a huge first-mover advantage for the party at a more advanced position. Countries, regions and communities with faster information development will be able to use information dividends promptly to promote their own economic development. However, the party that is far lagging behind will have fewer opportunities to participate in the information-based Digital Economy. At the same time that economic development is at a disadvantage, because under the new economy condition, more work and social activities are closely related to information technology, therefore, opportunities for the information poor to participate in online education, training, entertainment, shopping and communication have also become fewer, and these have exacerbated social inequalities. In this study, qualitative analysis and quantitative analysis were used to study the Digital Divide evaluation system in the era of Digital Economy. In the qualitative part of this article, we summarize the definition of Digital Economy, the definition of Digital Divide and the measurement theory of regional Digital Divide by studying the literature, laying a solid theoretical foundation for the research of this article. Starting from the six aspects of Digital Technology Infrastructure, ICT Readiness, Economy Development, Government Innovation Support, Education and Digital Literature, Digital Contents and Applications, we put forward research hypotheses and build the corresponding evaluation system model. In the quantitative research part of this paper, empirical research methods were used to verify the hypothesis and model. Among them, through a large amount of domestic research data collected from China Statistical Yearbook, with SPSS statistical analysis software to process the data, this paper proposes a complete index system of informatization and Digital Divide evaluation in the Digital Economy era and weights distribution for the system, using Factor Analysis, Analytic Hierarchy Process and Expert Interview Methods. On this basis, this article understands the current situation of informatization development and regional Digital Divide by the calculation of the index. Through the Clustering Analysis and Average Deviation Analysis, we analyze the causes of the Digital Divide formation, understand the gap of regional informatization and digital development, and find the weakness in digital development. Then, we put forward some suggestions that can effectively improve bridging the Digital Divide to solve the "information gap", "knowledge division" and "rich and poor division" between regions due to the development and application level gap. It provides reference for bridging the Digital Divide and promoting regional information, economic and cultural balanced development. This will enable digital technology to be more utilized in the process of promoting the development of the Digital Economy. Giving full play to the connectivity of the Internet will allow the Digital Economy to benefit more regions and enhance the well-being of the entire society.
    Keywords: Digital Divide,Digital Economy,Evaluation Index System,Policy Suggestion
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:zbw:itsb18:190412&r=all
  6. By: Dhanya Jothimani; Ravi Shankar; Surendra S. Yadav
    Abstract: With the advent of Web 2.0, various types of data are being produced every day. This has led to the revolution of big data. Huge amount of structured and unstructured data are produced in financial markets. Processing these data could help an investor to make an informed investment decision. In this paper, a framework has been developed to incorporate both structured and unstructured data for portfolio optimization. Portfolio optimization consists of three processes: Asset selection, Asset weighting and Asset management. This framework proposes to achieve the first two processes using a 5-stage methodology. The stages include shortlisting stocks using Data Envelopment Analysis (DEA), incorporation of the qualitative factors using text mining, stock clustering, stock ranking and optimizing the portfolio using heuristics. This framework would help the investors to select appropriate assets to make portfolio, invest in them to minimize the risk and maximize the return and monitor their performance.
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1811.07188&r=all
  7. By: David Saltiel; Eric Benhamou
    Abstract: In recent years, state-of-the-art methods for supervised learning have exploited increasingly gradient boosting techniques, with mainstream efficient implementations such as xgboost or lightgbm. One of the key points in generating proficient methods is Feature Selection (FS). It consists in selecting the right valuable effective features. When facing hundreds of these features, it becomes critical to select best features. While filter and wrappers methods have come to some maturity, embedded methods are truly necessary to find the best features set as they are hybrid methods combining features filtering and wrapping. In this work, we tackle the problem of finding through machine learning best a priori trades from an algorithmic strategy. We derive this new method using coordinate ascent optimization and using block variables. We compare our method to Recursive Feature Elimination (RFE) and Binary Coordinate Ascent (BCA). We show on a real life example the capacity of this method to select good trades a priori. Not only this method outperforms the initial trading strategy as it avoids taking loosing trades, it also surpasses other method, having the smallest feature set and the highest score at the same time. The interest of this method goes beyond this simple trade classification problem as it is a very general method to determine the optimal feature set using some information about features relationship as well as using coordinate ascent optimization.
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1812.04486&r=all
  8. By: Qiang Zhang; Rui Luo; Yaodong Yang; Yuanyuan Liu
    Abstract: Volatility is a quantity of measurement for the price movements of stocks or options which indicates the uncertainty within financial markets. As an indicator of the level of risk or the degree of variation, volatility is important to analyse the financial market, and it is taken into consideration in various decision-making processes in financial activities. On the other hand, recent advancement in deep learning techniques has shown strong capabilities in modelling sequential data, such as speech and natural language. In this paper, we empirically study the applicability of the latest deep structures with respect to the volatility modelling problem, through which we aim to provide an empirical guidance for the theoretical analysis of the marriage between deep learning techniques and financial applications in the future. We examine both the traditional approaches and the deep sequential models on the task of volatility prediction, including the most recent variants of convolutional and recurrent networks, such as the dilated architecture. Accordingly, experiments with real-world stock price datasets are performed on a set of 1314 daily stock series for 2018 days of transaction. The evaluation and comparison are based on the negative log likelihood (NLL) of real-world stock price time series. The result shows that the dilated neural models, including dilated CNN and Dilated RNN, produce most accurate estimation and prediction, outperforming various widely-used deterministic models in the GARCH family and several recently proposed stochastic models. In addition, the high flexibility and rich expressive power are validated in this study.
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1811.03711&r=all
  9. By: Javier Franco-Pedroso; Joaquin Gonzalez-Rodriguez; Maria Planas; Jorge Cubero; Rafael Cobo; Fernando Pablos
    Abstract: This paper presents an evaluation framework that attempts to quantify the "degree of realism" of simulated financial time series, whatever the simulation method could be, with the aim of discover unknown characteristics that are not being properly reproduced by such methods in order to improve them. For that purpose, the evaluation framework is posed as a machine learning problem in which some given time series examples have to be classified as simulated or real financial time series. The "challenge" is proposed as an open competition, similar to those published at the Kaggle platform, in which participants must send their classification results along with a description of the features and the classifiers used. The results of these "challenges" have revealed some interesting properties of financial data, and have lead to substantial improvements in our simulation methods under research, some of which will be described in this work.
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1811.07792&r=all
  10. By: Shenhao Wang; Jinhua Zhao
    Abstract: It is an enduring question how to combine revealed preference (RP) and stated preference (SP) data to analyze travel behavior. This study presents a new approach of using multitask learning deep neural network (MTLDNN) to combine RP and SP data and incorporate the traditional nest logit approach as a special case. Based on a combined RP and SP survey in Singapore to examine the demand for autonomous vehicles (AV), we designed, estimated and compared one hundred MTLDNN architectures with three major findings. First, the traditional nested logit approach of combining RP and SP can be regarded as a special case of MTLDNN and is only one of a large number of possible MTLDNN architectures, and the nested logit approach imposes the proportional parameter constraint under the MTLDNN framework. Second, out of the 100 MTLDNN models tested, the best one has one shared layer and five domain-specific layers with weak regularization, but the nested logit approach with proportional parameter constraint rivals the best model. Third, the proportional parameter constraint works well in the nested logit model, but is too restrictive for deeper architectures. Overall, this study introduces the MTLDNN model to combine RP and SP data, relates the nested logit approach to the hyperparameter space of MTLDNN, and explores hyperparameter training and architecture design for the joint demand analysis.
    Date: 2019–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1901.00227&r=all
  11. By: Fildes, Robert; Ma, Shaohui; Kolassa, Stephan
    Abstract: This paper first introduces the forecasting problems faced by large retailers, from the strategic to the operational, from the store to the competing channels of distribution as sales are aggregated over products to brands to categories and to the company overall. Aggregated forecasting that supports strategic decisions is discussed on three levels: the aggregate retail sales in a market, in a chain, and in a store. Product level forecasts usually relate to operational decisions where the hierarchy of sales data across time, product and the supply chain is examined. Various characteristics and the influential factors which affect product level retail sales are discussed. The data rich environment at lower product hierarchies makes data pooling an often appropriate strategy to improve forecasts, but success depends on the data characteristics and common factors influencing sales and potential demand. Marketing mix and promotions pose an important challenge, both to the researcher and the practicing forecaster. Online review information too adds further complexity so that forecasters potentially face a dimensionality problem of too many variables and too little data. The paper goes on to examine evidence on the alternative methods used to forecast product sales and their comparative forecasting accuracy. Many of the complex methods proposed have provided very little evidence to convince as to their value, which poses further research questions. In contrast, some ambitious econometric methods have been shown to outperform all the simpler alternatives including those used in practice. New product forecasting methods are examined separately where limited evidence is available as to how effective the various approaches are. The paper concludes with some evidence describing company forecasting practice, offering conclusions as to the research gaps but also the barriers to improved practice.
    Keywords: retail forecasting; product hierarchies; big data; marketing analytics; user-generated web content; new products; comparative accuracy; forecasting practice
    JEL: L81 M20 M30
    Date: 2019–10
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:89356&r=all
  12. By: Babak Mahdavi-Damghani; Konul Mustafayeva; Stephen Roberts; Cristin Buescu
    Abstract: We investigate the problem of dynamic portfolio optimization in continuous-time, finite-horizon setting for a portfolio of two stocks and one risk-free asset. The stocks follow the Cointelation model. The proposed optimization methods are twofold. In what we call an Stochastic Differential Equation approach, we compute the optimal weights using mean-variance criterion and power utility maximization. We show that dynamically switching between these two optimal strategies by introducing a triggering function can further improve the portfolio returns. We contrast this with the machine learning clustering methodology inspired by the band-wise Gaussian mixture model. The first benefit of the machine learning over the Stochastic Differential Equation approach is that we were able to achieve the same results though a simpler channel. The second advantage is a flexibility to regime change.
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1812.10183&r=all
  13. By: Daniel Muller
    Abstract: In recent years, artificial intelligence (AI) decision-making and autonomous systems became an integrated part of the economy, industry, and society. The evolving economy of the human-AI ecosystem raising concerns regarding the risks and values inherited in AI systems. This paper investigates the dynamics of creation and exchange of values and points out gaps in perception of cost-value, knowledge, space and time dimensions. It shows aspects of value bias in human perception of achievements and costs that encoded in AI systems. It also proposes rethinking hard goals definitions and cost-optimal problem-solving principles in the lens of effectiveness and efficiency in the development of trusted machines. The paper suggests a value-driven with cost awareness strategy and principles for problem-solving and planning of effective research progress to address real-world problems that involve diverse forms of achievements, investments, and survival scenarios.
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1811.06606&r=all
  14. By: Zhuoran Xiong (Bruce); Xiao-Yang Liu (Bruce); Shan Zhong (Bruce); Hongyang (Bruce); Yang; Anwar Walid
    Abstract: Stock trading strategy plays a crucial role in investment companies. However, it is challenging to obtain optimal strategy in the complex and dynamic stock market. We explore the potential of deep reinforcement learning to optimize stock trading strategy and thus maximize investment return. 30 stocks are selected as our trading stocks and their daily prices are used as the training and trading market environment. We train a deep reinforcement learning agent and obtain an adaptive trading strategy. The agent's performance is evaluated and compared with Dow Jones Industrial Average and the traditional min-variance portfolio allocation strategy. The proposed deep reinforcement learning approach is shown to outperform the two baselines in terms of both the Sharpe ratio and cumulative returns.
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1811.07522&r=all
  15. By: Pater, Robert; Szkola, Jaroslaw; Kozak, Marcin
    Abstract: There is an increasing need for analysing demand for skills at labour markets. While most studies aggregate skills in groups or use available proxies for them, the authors analyse companies' demand for individual competences. Such an analysis better reflects reality, because companies usually require from future workers particular competences rather than generally defined groups of skills. However, no method exists to analyse on a large scale which competences are required by employers. At a detailed level, there are hundreds of competences, so this demand cannot be measured in a sample survey. The authors propose a method for a continuous and efficient analysis of demand for new workers' competences. The method is based on gathering internet job offers and analysing them with data mining and text analysis tools. They applied it to analyse transversal competences on a Polish labour market during November 2012- December 2015. The authors used the detailed European Commission classification of transversal competences. They found that within the general groups of competences, companies required only certain ones, especially 'language and communication competences' and neglected others. The companies' requirements were countercyclical, that is, they increased them during recession and decreased them during economic expansion. However, the structure of the demanded competences did not change during the analysed period, suggesting that the structure is relatively stable, at least over the business cycle. The method can be used continuously. Various institutions can analyse and publish up-to-date information on the current demand for competences as well as tendencies in this demand.
    Keywords: online data,skill demand,text analysis,vacancy market,worker competence,worker competency
    JEL: I20 J63
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:zbw:ifwedp:201883&r=all

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.