nep-big New Economics Papers
on Big Data
Issue of 2021‒04‒12
25 papers chosen by
Tom Coupé
University of Canterbury

  1. Comparing hundreds of machine learning classifiers and discrete choice models in predicting travel behavior: an empirical benchmark By Shenhao Wang; Baichuan Mo; Stephane Hess; Jinhua Zhao
  2. A Stochastic Time Series Model for Predicting Financial Trends using NLP By Pratyush Muthukumar; Jie Zhong
  3. The Application of Machine Learning Algorithms for Spatial Analysis: Predicting of Real Estate Prices in Warsaw By Dawid Siwicki
  4. Applications of Machine Learning in Document Digitisation By Christian M. Dahl; Torben S. D. Johansen; Emil N. S{\o}rensen; Christian E. Westermann; Simon F. Wittrock
  5. Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L) By Igor Halperin
  6. The Effect of Sport in Online Dating: Evidence from Causal Machine Learning By Boller, Daniel; Lechner, Michael; Okasa, Gabriel
  7. Future of work: ethics By David Pastor-Escuredo
  8. Information Communication & Computation Technology (ICCT) as a Strategic Tool for Industry Sectors By Aithal, Sreeramana; L. M., Madhushree
  9. Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy By Falco J. Bargagli Stoffi; Kenneth De Beckker; Joana E. Maldonado; Kristof De Witte
  10. Machine Learning in International Trade Research ?- Evaluating the Impact of Trade Agreements By Holger Breinlich; Valentina Corradi; Nadia Rocha; Michele Ruta; Joao M.C. Santos Silva; Tom Zylkin
  11. Real Estate Appraisal in Brazil By Marzagão, Thiago; Ferreira, Rodrigo; Sales, Leonardo
  12. The VIX index under scrutiny of machine learning techniques and neural networks By Ali Hirsa; Joerg Osterrieder; Branka Hadji Misheva; Wenxin Cao; Yiwen Fu; Hanze Sun; Kin Wai Wong
  13. Embeddings and Attention in Predictive Modeling By Kevin Kuo; Ronald Richman
  14. Forecasting with Deep Learning: S&P 500 index By Firuz Kamalov; Linda Smail; Ikhlaas Gurrib
  15. Artificial Neural Network and Analytical Hierarchy Process Integration: A Tool to Estimate Business Strategy of Bank By Mochammad Ridwan Ristyawan
  16. A Big Data Analysis of the Ethereum Network: from Blockchain to Google Trends By Dorsa Mohammadi Arezooji
  17. Accurate Stock Price Forecasting Using Robust and Optimized Deep Learning Models By Jaydip Sen; Sidra Mehtab
  18. Stock price forecast with deep learning By Firuz Kamalov; Linda Smail; Ikhlaas Gurrib
  19. Transient Information Adaptation of Artificial Intelligence: Towards Sustainable Data Processes in Complex Projects By Dacre, Nicholas; Kockum, Fredrik; Senyo, PK
  20. Predicting Authoritarian Crackdowns: A Machine Learning Approach By Zhong, Weifeng; Chan, Julian
  21. Predicting Inflation with Neural Networks By Livia Paranhos
  22. The value of big data for analyzing growth dynamics of technology based new ventures By Maksim Malyy; Zeljko Tekic; Tatiana Podladchikova
  23. Big Data in Finance By Itay Goldstein; Chester S. Spatt; Mao Ye
  24. DoubleML -- An Object-Oriented Implementation of Double Machine Learning in Python By Philipp Bach; Victor Chernozhukov; Malte S. Kurz; Martin Spindler
  25. Behavioral Economics Approach to Interpretable Deep Image Classification. Rationally Inattentive Utility Maximization Explains Deep Image Classification By Kunal Pattanayak; Vikram Krishnamurthy

  1. By: Shenhao Wang; Baichuan Mo; Stephane Hess; Jinhua Zhao
    Abstract: Researchers have compared machine learning (ML) classifiers and discrete choice models (DCMs) in predicting travel behavior, but the generalizability of the findings is limited by the specifics of data, contexts, and authors' expertise. This study seeks to provide a generalizable empirical benchmark by comparing hundreds of ML and DCM classifiers in a highly structured manner. The experiments evaluate both prediction accuracy and computational cost by spanning four hyper-dimensions, including 105 ML and DCM classifiers from 12 model families, 3 datasets, 3 sample sizes, and 3 outputs. This experimental design leads to an immense number of 6,970 experiments, which are corroborated with a meta dataset of 136 experiment points from 35 previous studies. This study is hitherto the most comprehensive and almost exhaustive comparison of the classifiers for travel behavioral prediction. We found that the ensemble methods and deep neural networks achieve the highest predictive performance, but at a relatively high computational cost. Random forests are the most computationally efficient, balancing between prediction and computation. While discrete choice models offer accuracy with only 3-4 percentage points lower than the top ML classifiers, they have much longer computational time and become computationally impossible with large sample size, high input dimensions, or simulation-based estimation. The relative ranking of the ML and DCM classifiers is highly stable, while the absolute values of the prediction accuracy and computational time have large variations. Overall, this paper suggests using deep neural networks, model ensembles, and random forests as baseline models for future travel behavior prediction. For choice modeling, the DCM community should switch more attention from fitting models to improving computational efficiency, so that the DCMs can be widely adopted in the big data context.
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2102.01130&r=all
  2. By: Pratyush Muthukumar; Jie Zhong
    Abstract: Stock price forecasting is a highly complex and vitally important field of research. Recent advancements in deep neural network technology allow researchers to develop highly accurate models to predict financial trends. We propose a novel deep learning model called ST-GAN, or Stochastic Time-series Generative Adversarial Network, that analyzes both financial news texts and financial numerical data to predict stock trends. We utilize cutting-edge technology like the Generative Adversarial Network (GAN) to learn the correlations among textual and numerical data over time. We develop a new method of training a time-series GAN directly using the learned representations of Naive Bayes' sentiment analysis on financial text data alongside technical indicators from numerical data. Our experimental results show significant improvement over various existing models and prior research on deep neural networks for stock price forecasting.
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2102.01290&r=all
  3. By: Dawid Siwicki (Faculty of Economic Sciences, University of Warsaw)
    Abstract: The principal aim of this paper is to investigate the potential of machine learning algorithms in context of predicting housing prices. The most important issue in modelling spatial data is to consider spatial heterogeneity that can bias obtained results when is not taken into consideration. The purpose of this research is to compare prediction power of such methods: linear regression, artificial neural network, random forest, extreme gradient boosting and spatial error model. The evaluation was conducted using train, validation, test and k-Fold Cross-Validation methods. We also examined the ability of the above models to identify spatial dependencies, by calculating Moran’s I for residuals obtained on in-sample and out-of-sample data.
    Keywords: spatial analysis, machine learning, housing market, random forest, gradient boosting
    JEL: C31 C45 C52 C53 C55 R31
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:war:wpaper:2021-05&r=all
  4. By: Christian M. Dahl; Torben S. D. Johansen; Emil N. S{\o}rensen; Christian E. Westermann; Simon F. Wittrock
    Abstract: Data acquisition forms the primary step in all empirical research. The availability of data directly impacts the quality and extent of conclusions and insights. In particular, larger and more detailed datasets provide convincing answers even to complex research questions. The main problem is that 'large and detailed' usually implies 'costly and difficult', especially when the data medium is paper and books. Human operators and manual transcription have been the traditional approach for collecting historical data. We instead advocate the use of modern machine learning techniques to automate the digitisation process. We give an overview of the potential for applying machine digitisation for data collection through two illustrative applications. The first demonstrates that unsupervised layout classification applied to raw scans of nurse journals can be used to construct a treatment indicator. Moreover, it allows an assessment of assignment compliance. The second application uses attention-based neural networks for handwritten text recognition in order to transcribe age and birth and death dates from a large collection of Danish death certificates. We describe each step in the digitisation pipeline and provide implementation insights.
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2102.03239&r=all
  5. By: Igor Halperin
    Abstract: This paper addresses distributional offline continuous-time reinforcement learning (DOCTR-L) with stochastic policies for high-dimensional optimal control. A soft distributional version of the classical Hamilton-Jacobi-Bellman (HJB) equation is given by a semilinear partial differential equation (PDE). This `soft HJB equation' can be learned from offline data without assuming that the latter correspond to a previous optimal or near-optimal policy. A data-driven solution of the soft HJB equation uses methods of Neural PDEs and Physics-Informed Neural Networks developed in the field of Scientific Machine Learning (SciML). The suggested approach, dubbed `SciPhy RL', thus reduces DOCTR-L to solving neural PDEs from data. Our algorithm called Deep DOCTR-L converts offline high-dimensional data into an optimal policy in one step by reducing it to supervised learning, instead of relying on value iteration or policy iteration methods. The method enables a computable approach to the quality control of obtained policies in terms of both their expected returns and uncertainties about their values.
    Date: 2021–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2104.01040&r=all
  6. By: Boller, Daniel; Lechner, Michael; Okasa, Gabriel
    Abstract: Online dating emerged as a key platform for human mating. Previous research focused on socio-demographic characteristics to explain human mating in online dating environments, neglecting the commonly recognized relevance of sport. This research investigates the effect of sport activity on human mating by exploiting a unique data set from an online dating platform. Thereby, we leverage recent advances in the causal machine learning literature to estimate the causal effect of sport frequency on the contact chances. We find that for male users, doing sport on a weekly basis increases the probability to receive a first message from a woman by 50%, relatively to not doing sport at all. For female users, we do not find evidence for such an effect. In addition, for male users the effect increases with higher income.
    Keywords: Online dating, sports economics, big data, causal machine learning, effect heterogeneity, Modified Causal Forest
    JEL: J12 Z29 C21 C45
    Date: 2021–04
    URL: http://d.repec.org/n?u=RePEc:usg:econwp:2021:04&r=all
  7. By: David Pastor-Escuredo
    Abstract: Work must be reshaped in the upcoming new era characterized by new challenges and the presence of new technologies and computational tools. Over-automation seems to be the driver of the digitalization process. Substitution is the paradigm leading Artificial Intelligence and robotics development against human cognition. Digital technology should be designed to enhance human skills and make more productive use of human cognition and capacities. Digital technology is characterized also by scalability because of its easy and inexpensive deployment. Thus, automation can lead to the absence of jobs and scalable negative impact in human development and the performance of business. A look at digitalization from the lens of Sustainable Development Goals can tell us how digitalization impact in different sectors and areas considering society as a complex interconnected system. Here, reflections on how AI and Data impact future of work and sustainable development are provided grounded on an ethical core that comprises human-level principles and also systemic principles.
    Date: 2021–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2104.02580&r=all
  8. By: Aithal, Sreeramana; L. M., Madhushree
    Abstract: Information Communication and Computation Technology (ICCT) and Nanotechnology (NT) are recently identified Universal technologies of the 21st century and are expected to substantially contribute to the development of the society by solving the basic needs, advanced wants, and dreamy desires of human beings. In this paper, the possibilities of using ICCT and its underlying ten most important emerging technologies like Artificial intelligence, Big data & business analytics, Cloud computing, Digital marketing, 3D printing, Internet of Things, Online ubiquitous education, Optical computing, Storage technology, and Virtual & Augmented Reality are explored. The emerging trends of applications of the above underlying technologies of ICCT in the primary, secondary, tertiary and quaternary industry sectors of the society are discussed, analysed, and predicted using a newly developed predictive analysis model. The advantages, benefits, constraints, and disadvantages of such technologies to fulfill the desires of human beings to lead luxurious and comfort lifestyle from various stakeholders point of views are identified and discussed. The paper also focuses on the potential applications of ICCT as a strategic tool for survival, sustainability, differentiation, and development of various primary, secondary, tertiary, and quaternary industries.
    Keywords: ICCT, Universal technology, Emerging trends, Information science & technology, Industry sectors, ICCT as a strategic tool
    JEL: M0 M15 O3 O32 O33
    Date: 2019–11–15
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:105619&r=all
  9. By: Falco J. Bargagli Stoffi; Kenneth De Beckker; Joana E. Maldonado; Kristof De Witte
    Abstract: Despite their popularity, machine learning predictions are sensitive to potential unobserved predictors. This paper proposes a general algorithm that assesses how the omission of an unobserved variable with high explanatory power could affect the predictions of the model. Moreover, the algorithm extends the usage of machine learning from pointwise predictions to inference and sensitivity analysis. In the application, we show how the framework can be applied to data with inherent uncertainty, such as students' scores in a standardized assessment on financial literacy. First, using Bayesian Additive Regression Trees (BART), we predict students' financial literacy scores (FLS) for a subgroup of students with missing FLS. Then, we assess the sensitivity of predictions by comparing the predictions and performance of models with and without a highly explanatory synthetic predictor. We find no significant difference in the predictions and performances of the augmented (i.e., the model with the synthetic predictor) and original model. This evidence sheds a light on the stability of the predictive model used in the application. The proposed methodology can be used, above and beyond our motivating empirical example, in a wide range of machine learning applications in social and health sciences.
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2102.04382&r=all
  10. By: Holger Breinlich (University of Surrey, CEP and CEPR); Valentina Corradi (University of Surrey); Nadia Rocha (World Bank); Michele Ruta (World Bank); Joao M.C. Santos Silva (University of Surrey); Tom Zylkin (University of Richmond)
    Abstract: Modern trade agreements contain a large number of provisions besides tariff reductions, in areas as diverse as services trade, competition policy, trade-related investment measures, or public procurement. Existing research has struggled with overfitting and severe multicollinearity problems when trying to estimate the effects of these provisions on trade flows. In this paper, we develop a new method to estimate the impact of individual provisions on trade flows that does not require ad hoc assumptions on how to aggregate individual provisions. Building on recent developments in the machine learning and variable selection literature, we propose data-driven methods for selecting the most important provisions and quantifying their impact on trade flows. We find that provisions related to antidumping, competition policy, technical barriers to trade, and trade facilitation are associated with enhancing the trade-increasing effect of trade agreements.
    JEL: F14 F15 F17
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:sur:surrec:0521&r=all
  11. By: Marzagão, Thiago; Ferreira, Rodrigo; Sales, Leonardo
    Abstract: Brazilian banks commonly use linear regression to appraise real estate: they regress price on features like area, location, etc, and use the resulting model to estimate the market value of the target property. But Brazilian banks do not test the predictive performance of those models, which for all we know are no better than random guesses. That introduces huge inefficiencies in the real estate market. Here we propose a machine learning approach to the problem. We use real estate data scraped from 15 thousand online listings and use it to fit a boosted trees model. The resulting model has a median absolute error of 8,16%. We provide all data and source code.
    Date: 2021–04–08
    URL: http://d.repec.org/n?u=RePEc:osf:osfxxx:zrgv6&r=all
  12. By: Ali Hirsa; Joerg Osterrieder; Branka Hadji Misheva; Wenxin Cao; Yiwen Fu; Hanze Sun; Kin Wai Wong
    Abstract: The CBOE Volatility Index, known by its ticker symbol VIX, is a popular measure of the market's expected volatility on the SP 500 Index, calculated and published by the Chicago Board Options Exchange (CBOE). It is also often referred to as the fear index or the fear gauge. The current VIX index value quotes the expected annualized change in the SP 500 index over the following 30 days, based on options-based theory and current options-market data. Despite its theoretical foundation in option price theory, CBOE's Volatility Index is prone to inadvertent and deliberate errors because it is weighted average of out-of-the-money calls and puts which could be illiquid. Many claims of market manipulation have been brought up against VIX in recent years. This paper discusses several approaches to replicate the VIX index as well as VIX futures by using a subset of relevant options as well as neural networks that are trained to automatically learn the underlying formula. Using subset selection approaches on top of the original CBOE methodology, as well as building machine learning and neural network models including Random Forests, Support Vector Machines, feed-forward neural networks, and long short-term memory (LSTM) models, we will show that a small number of options is sufficient to replicate the VIX index. Once we are able to actually replicate the VIX using a small number of SP options we will be able to exploit potential arbitrage opportunities between the VIX index and its underlying derivatives. The results are supposed to help investors to better understand the options market, and more importantly, to give guidance to the US regulators and CBOE that have been investigating those manipulation claims for several years.
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2102.02119&r=all
  13. By: Kevin Kuo; Ronald Richman
    Abstract: We explore in depth how categorical data can be processed with embeddings in the context of claim severity modeling. We develop several models that range in complexity from simple neural networks to state-of-the-art attention based architectures that utilize embeddings. We illustrate the utility of learned embeddings from neural networks as pretrained features in generalized linear models, and discuss methods for visualizing and interpreting embeddings. Finally, we explore how attention based models can contextually augment embeddings, leading to enhanced predictive performance.
    Date: 2021–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2104.03545&r=all
  14. By: Firuz Kamalov; Linda Smail; Ikhlaas Gurrib
    Abstract: Stock price prediction has been the focus of a large amount of research but an acceptable solution has so far escaped academics. Recent advances in deep learning have motivated researchers to apply neural networks to stock prediction. In this paper, we propose a convolution-based neural network model for predicting the future value of the S&P 500 index. The proposed model is capable of predicting the next-day direction of the index based on the previous values of the index. Experiments show that our model outperforms a number of benchmarks achieving an accuracy rate of over 55%.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.14080&r=all
  15. By: Mochammad Ridwan Ristyawan (Department of Management, Faculty of Economics and Business, Universitas Tanjungpura, 78124, Pontianak, Indonesia Author-2-Name: Author-2-Workplace-Name: Author-3-Name: Author-3-Workplace-Name: Author-4-Name: Author-4-Workplace-Name: Author-5-Name: Author-5-Workplace-Name: Author-6-Name: Author-6-Workplace-Name: Author-7-Name: Author-7-Workplace-Name: Author-8-Name: Author-8-Workplace-Name:)
    Abstract: Objective - The disruption has been occurring in financial services. Thus, rethinking a new strategy for banking is needed to make a sustainable innovation in organizations. Studies mentioned that formulating strategy is a very costly, time-consuming, and comprehensive analysis. The purpose of this study is to present an integrated intelligence algorithm for estimating the bank's strategy in Indonesia. Methodology – This study used the integration model between two modules. The algorithm has two basic modules, called Artificial Neural Network (ANN) and Analytical Hierarchy Process (AHP). AHP is capable of handling a multi-level decision-making structure with the use of five expert judgments in the pairwise comparison process. Meanwhile, ANN is utilized as an inductive algorithm in discovering the predictive strategy of the bank and used to explain the strategic factors which improved in forward. Findings and Novelty – The empirical results indicate that ANN and AHP integration was proved to predict the business strategy of the bank in five scenarios. Strategy 5 was the best choice for the bank and Innovate Like Fintechs (ILF) is the most factor consideration. The strategy choice was appropriate for the condition of the bank's factors. This framework can be implemented to help bankers to decide on bank operations. Type of Paper - Empirical
    Keywords: Bank's strategy, ANN, AHP, BSC, Indonesia.
    JEL: M15 O32
    Date: 2021–03–31
    URL: http://d.repec.org/n?u=RePEc:gtr:gatrjs:jfbr179&r=all
  16. By: Dorsa Mohammadi Arezooji
    Abstract: First, a big data analysis of the transactions and smart contracts made on the Ethereum blockchain is performed, revealing interesting trends in motion. Next, these trends are compared with the public's interest in Ether and Bitcoin, measured by the volume of online searches. An analysis of the crypto prices and search trends suggests the existence of big players (and not the regular users), manipulating the market after a drop in prices. Lastly, a cross-correlation study of crypto prices and search trends reveals the pairs providing more accurate and timely predictions of Ether prices.
    Date: 2021–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2104.01764&r=all
  17. By: Jaydip Sen; Sidra Mehtab
    Abstract: Designing robust frameworks for precise prediction of future prices of stocks has always been considered a very challenging research problem. The advocates of the classical efficient market hypothesis affirm that it is impossible to accurately predict the future prices in an efficiently operating market due to the stochastic nature of the stock price variables. However, numerous propositions exist in the literature with varying degrees of sophistication and complexity that illustrate how algorithms and models can be designed for making efficient, accurate, and robust predictions of stock prices. We present a gamut of ten deep learning models of regression for precise and robust prediction of the future prices of the stock of a critical company in the auto sector of India. Using a very granular stock price collected at 5 minutes intervals, we train the models based on the records from 31st Dec, 2012 to 27th Dec, 2013. The testing of the models is done using records from 30th Dec, 2013 to 9th Jan 2015. We explain the design principles of the models and analyze the results of their performance based on accuracy in forecasting and speed of execution.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.15096&r=all
  18. By: Firuz Kamalov; Linda Smail; Ikhlaas Gurrib
    Abstract: In this paper, we compare various approaches to stock price prediction using neural networks. We analyze the performance fully connected, convolutional, and recurrent architectures in predicting the next day value of S&P 500 index based on its previous values. We further expand our analysis by including three different optimization techniques: Stochastic Gradient Descent, Root Mean Square Propagation, and Adaptive Moment Estimation. The numerical experiments reveal that a single layer recurrent neural network with RMSprop optimizer produces optimal results with validation and test Mean Absolute Error of 0.0150 and 0.0148 respectively.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2103.14081&r=all
  19. By: Dacre, Nicholas; Kockum, Fredrik; Senyo, PK
    Abstract: Large scale projects increasingly operate in complicated settings whilst drawing on an array of complex data-points, which require precise analysis for accurate control and interventions to mitigate possible project failure. Coupled with a growing tendency to rely on new information systems and processes in change projects, 90% of megaprojects globally fail to achieve their planned objectives. Renewed interest in the concept of Artificial Intelligence (AI) against a backdrop of disruptive technological innovations, seeks to enhance project managers’ cognitive capacity through the project lifecycle and enhance project excellence. However, despite growing interest there remains limited empirical insights on project managers’ ability to leverage AI for cognitive load enhancement in complex settings. As such this research adopts an exploratory sequential linear mixed methods approach to address unresolved empirical issues on transient adaptations of AI in complex projects, and the impact on cognitive load enhancement. Initial thematic findings from semi-structured interviews with domain experts, suggest that in order to leverage AI technologies and processes for sustainable cognitive load enhancement with complex data over time, project managers require improved knowledge and access to relevant technologies that mediate data processes in complex projects, but equally reflect application across different project phases. These initial findings support further hypothesis testing through a larger quantitative study incorporating structural equation modelling to examine the relationship between artificial intelligence and project managers’ cognitive load with project data in complex contexts.
    Date: 2020–09–02
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:pagbm&r=all
  20. By: Zhong, Weifeng; Chan, Julian (Mercury Publication)
    Abstract: Abstract not available.
    Date: 2020–02–10
    URL: http://d.repec.org/n?u=RePEc:ajw:wpaper:10464&r=all
  21. By: Livia Paranhos
    Abstract: This paper applies neural network models to forecast inflation. The use of a particular recurrent neural network, the long-short term memory model, or LSTM, that summarizes macroeconomic information into common components is a major contribution of the paper. Results from an exercise with US data indicate that the estimated neural nets usually present better forecasting performance than standard benchmarks, especially at long horizons. The LSTM in particular is found to outperform the traditional feed-forward network at long horizons, suggesting an advantage of the recurrent model in capturing the long-term trend of inflation. This finding can be rationalized by the so called long memory of the LSTM that incorporates relatively old information in the forecast as long as accuracy is improved, while economizing in the number of estimated parameters. Interestingly, the neural nets containing macroeconomic information capture well the features of inflation during and after the Great Recession, possibly indicating a role for nonlinearities and macro information in this episode. The estimated common components used in the forecast seem able to capture the business cycle dynamics, as well as information on prices.
    Date: 2021–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2104.03757&r=all
  22. By: Maksim Malyy (Skolkovo Institute of Science and Technology); Zeljko Tekic (Skolkovo Institute of Science and Technology; HSE University, Graduate School of Business); Tatiana Podladchikova (Skolkovo Institute of Science and Technology)
    Abstract: This study demonstrates that web-search traffic information, in particular, Google Trends data, is a credible novel source of high-quality and easy-to-access data for analyzing technology-based new ventures (TBNVs) growth trajectories. Utilizing the diverse sample of 241 US-based TBNVs, we comparatively analyze the relationship between companies' evolution curves represented by search activity on the one hand and by valuations achieved through rounds of venture investments on another. The results suggest that TBNV's growth dynamics are positively and strongly correlated with its web search traffic across the sample. This correlation is more robust when a company is a) more successful (in terms of valuation achieved) - especially if it is a "unicorn"; b) consumer-oriented (i.e., b2c); and 3) develops products in the form of a digital platform. Further analysis based on fuzzy-set Qualitative Comparative Analysis (fsQCA) shows that for the most successful companies ("unicorns") and consumer-oriented digital platforms (i.e., b2c digital platform companies) proposed approach may be extremely reliable, while for other high-growth TBNVs it is useful for analyzing their growth dynamics, albeit to a more limited degree. The proposed methodological approach opens a wide range of possibilities for analyzing, researching and predicting the growth of recently formed growth-oriented companies, in practice and academia.
    Date: 2021–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2104.03053&r=all
  23. By: Itay Goldstein; Chester S. Spatt; Mao Ye
    Abstract: Big data is revolutionizing the finance industry and has the potential to significantly shape future research in finance. This special issue contains articles following the 2019 NBER/ RFS conference on big data. In this Introduction to the special issue, we define the “Big Data” phenomenon as a combination of three features: large size, high dimension, and complex structure. Using the articles in the special issue, we discuss how new research builds on these features to push the frontier on fundamental questions across areas in finance – including corporate finance, market microstructure, and asset pricing. Finally, we offer some thoughts for future research directions.
    JEL: G12 G14 G3
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:28615&r=all
  24. By: Philipp Bach; Victor Chernozhukov; Malte S. Kurz; Martin Spindler
    Abstract: DoubleML is an open-source Python library implementing the double machine learning framework of Chernozhukov et al. (2018) for a variety of causal models. It contains functionalities for valid statistical inference on causal parameters when the estimation of nuisance parameters is based on machine learning methods. The object-oriented implementation of DoubleML provides a high flexibility in terms of model specifications and makes it easily extendable. The package is distributed under the MIT license and relies on core libraries from the scientific Python ecosystem: scikit-learn, numpy, pandas, scipy, statsmodels and joblib. Source code, documentation and an extensive user guide can be found at https://github.com/DoubleML/doubleml-for-py and https://docs.doubleml.org.
    Date: 2021–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2104.03220&r=all
  25. By: Kunal Pattanayak; Vikram Krishnamurthy
    Abstract: Are deep convolutional neural networks (CNNs) for image classification consistent with utility maximization behavior with information acquisition costs? This paper demonstrates the remarkable result that a deep CNN behaves equivalently (in terms of necessary and sufficient conditions) to a rationally inattentive utility maximizer, a model extensively used in behavioral economics to explain human decision making. This implies that a deep CNN has a parsimonious representation in terms of simple intuitive human-like decision parameters, namely, a utility function and an information acquisition cost. Also the reconstructed utility function that rationalizes the decisions of the deep CNNs, yields a useful preference order amongst the image classes (hypotheses).
    Date: 2021–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2102.04594&r=all

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.