nep-big 2020-11-09 papers

on Big Data

Issue of 2020‒11‒09
25 papers chosen by
Tom Coupé
University of Canterbury

Eye in the Sky: Private Satellites and Government Macro Data By Abhiroop Mukherjee; George Panayotov; Janghoon Shon
Interpretable Neural Networks for Panel Data Analysis in Economics By Yucheng Yang; Zhong Zheng; Weinan E
Who Sees the Future? A Deep Learning Language Model Demonstrates the Vision Advantage of Being Small By Vicinanza, Paul; Goldberg, Amir; Srivastava, Sameer B.
Is Image Encoding Beneficial for Deep Learning in Finance? An Analysis of Image Encoding Methods for the Application of Convolutional Neural Networks in Finance By Dan Wang; Tianrui Wang; Ionu\c{t} Florescu
Deep Reinforcement Learning for Asset Allocation in US Equities By Miquel Noguer i Alonso; Sonam Srivastava
Data science in economics: comprehensive review of advanced machine learning and deep learning methods By Nosratabadi, Saeed; Mosavi, Amir; Duan, Puhong; Ghamisi, Pedram; Filip, Ferdinand; Band, Shahab S.; Reuter, Uwe; Gama, Joao; Gandomi, Amir H.
The Economy, the Pandemic, and Machine Learning By Patrick T. Harker
The Knowledge Graph for Macroeconomic Analysis with Alternative Big Data By Yucheng Yang; Yue Pang; Guanhua Huang; Weinan E
Automated coding using machine-learning and remapping the U.S. nonprofit sector: A guide and benchmark By Ma, Ji
How to Talk When a Machine is Listening: Corporate Disclosure in the Age of AI By Sean Cao; Wei Jiang; Baozhong Yang; Alan L. Zhang
A random forest-based approach to identifying the most informative seasonality tests By Ollech, Daniel; Webel, Karsten
Bridging the gap between Markowitz planning and deep reinforcement learning By Eric Benhamou; David Saltiel; Sandrine Ungari; Abhishek Mukhopadhyay
The Pandemic, Automation, and Artificial Intelligence: Executive Briefing: AI and Machine Learning By Patrick T. Harker
Deciphering Federal Reserve Communication via Text Analysis of Alternative FOMC Statements By Taeyoung Doh; Dongho Song; Shu-Kuei X. Yang
Contracting, pricing, and data collection under the AI flywheel effect By Huseyin Gurkan; Francis de Véricourt
Big data for poverty measurement: insights from a scoping review By Stubbers, Michaëla; Holvoet, Nathalie
Oil-Price Uncertainty and the U.K. Unemployment Rate: A Forecasting Experiment with Random Forests Using 150 Years of Data By Rangan Gupta; Christian Pierdzioch; Afees A. Salisu
Reliance on Science by Inventors: Hybrid Extraction of In-text Patent-to-Article Citations By Matt Marx; Aaron Fuegi
Theory-based residual neural networks: A synergy of discrete choice models and deep neural networks By Shenhao Wang; Baichuan Mo; Jinhua Zhao
When Bots Take Over the Stock Market: Evasion Attacks Against Algorithmic Traders By Elior Nehemya; Yael Mathov; Asaf Shabtai; Yuval Elovici
Transnational machine learning with screens for flagging bid-rigging cartels By Huber, Martin; Imhof, David
The Role of Information Provision for Attitudes Towards Immigration: An Experimental Investigation. By Patrick Bareinz; Silke Uebelmesser
Are Crises Predictable? A Review of the Early Warning Systems in Currency and Stock Markets By Peiwan Wang; Lu Zong
How did Japan cope with COVID-19? Big Data and purchasing behavior (Japanese) By KONISHI Yoko; SAITO Takashi; ISHIKAWA Toshiki; IGEI Naoya
Motif: an open-source R tool for pattern-based spatial analysis By Nowosad, Jakub

Eye in the Sky: Private Satellites and Government Macro Data

By:	Abhiroop Mukherjee (Liwei Huang Associate Professor of Business, Department of Finance; The Hong Kong University of Science and Technology); George Panayotov (Associate Professor of Finance; The Hong Kong University of Science and Technology); Janghoon Shon (Ph.D. Candidate in Finance; The Hong Kong University of Science and Technology)
Abstract:	Relying on government announcements for macro information creates a potential conflict of interest and macro uncertainty. Many private entities have started using alternative data strategies to predict macro numbers. If alternative data strategies can work for macro predictions, we can get a sense of what is happening in the economy without having to rely on the government to inform us. We find a reduction in implied volatility and in price jumps in the U.S. crude oil market as a result of satellite-based inventory estimates. These findings point to a future in which the resolution of macro uncertainty is smoother, and governments have less control over macro information.
Keywords:	Technology, Satellites, macro-economic data
Date:	2020–09
URL:	http://d.repec.org/n?u=RePEc:hku:briefs:202042&r=all

Interpretable Neural Networks for Panel Data Analysis in Economics

By:	Yucheng Yang; Zhong Zheng; Weinan E
Abstract:	The lack of interpretability and transparency are preventing economists from using advanced tools like neural networks in their empirical work. In this paper, we propose a new class of interpretable neural network models that can achieve both high prediction accuracy and interpretability in regression problems with time series cross-sectional data. Our model can essentially be written as a simple function of a limited number of interpretable features. In particular, we incorporate a class of interpretable functions named persistent change filters as part of the neural network. We apply this model to predicting individual's monthly employment status using high-dimensional administrative data in China. We achieve an accuracy of 94.5% on the out-of-sample test set, which is comparable to the most accurate conventional machine learning methods. Furthermore, the interpretability of the model allows us to understand the mechanism that underlies the ability for predicting employment status using administrative data: an individual's employment status is closely related to whether she pays different types of insurances. Our work is a useful step towards overcoming the "black box" problem of neural networks, and provide a promising new tool for economists to study administrative and proprietary big data.
Date:	2020–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2010.05311&r=all

Who Sees the Future? A Deep Learning Language Model Demonstrates the Vision Advantage of Being Small

By:	Vicinanza, Paul (Stanford U); Goldberg, Amir (Stanford U); Srivastava, Sameer B. (U of California, Berkeley)
Abstract:	Which groups are most likely to become visionaries that define the future of their field? Because vision is difficult to measure, prior work has reached conflicting conclusions: one perspective emphasizes the benefits of being large, established, and central, while another stresses the value of being small, upstart, and peripheral. We propose that this tension can be resolved by disentangling vision--the capacity to generate contextually novel ideas that foretell the future of a field--from the traces of vision that result in tangible innovation. Using Bidirectional Encoder Representations from Transformers (BERT), we develop a novel method to identify the visionaries in a field from conversational text data. Applying this method to a corpus of over 100,000 quarterly earnings calls conducted by 6,000 firms from 2011 to 2016, we develop a measure--prescience--that identifies novel ideas which later become commonplace. Prescience is predictive of firmsâ€™ stock market returns: A one standard deviation increase in prescience is associated with a 4% increase in annual returns, and firms exhibiting especially high levels of prescience (above the 95th percentile) reap especially high returns. Moreover, contrary to theories of incumbent advantage, we find that small firms are more likely to possess prescience than large firms. The method we develop can be readily extended to other domains to identify visionary individuals and groups based on the language they use rather than the artifacts they produce.
Date:	2020–05
URL:	http://d.repec.org/n?u=RePEc:ecl:stabus:3869&r=all

Is Image Encoding Beneficial for Deep Learning in Finance? An Analysis of Image Encoding Methods for the Application of Convolutional Neural Networks in Finance

By:	Dan Wang; Tianrui Wang; Ionu\c{t} Florescu
Abstract:	In 2012, SEC mandated all corporate filings for any company doing business in US be entered into the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system. In this work we are investigating ways to analyze the data available through EDGAR database. This may serve portfolio managers (pension funds, mutual funds, insurance, hedge funds) to get automated insights into companies they invest in, to better manage their portfolios. The analysis is based on Artificial Neural Networks applied to the data.} In particular, one of the most popular machine learning methods, the Convolutional Neural Network (CNN) architecture, originally developed to interpret and classify images, is now being used to interpret financial data. This work investigates the best way to input data collected from the SEC filings into a CNN architecture. We incorporate accounting principles and mathematical methods into the design of three image encoding methods. Specifically, two methods are derived from accounting principles (Sequential Arrangement, Category Chunk Arrangement) and one is using a purely mathematical technique (Hilbert Vector Arrangement). In this work we analyze fundamental financial data as well as financial ratio data and study companies from the financial, healthcare and IT sectors in the United States. We find that using imaging techniques to input data for CNN works better for financial ratio data but is not significantly better than simply using the 1D input directly for fundamental data. We do not find the Hilbert Vector Arrangement technique to be significantly better than other imaging techniques.
Date:	2020–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2010.08698&r=all

Deep Reinforcement Learning for Asset Allocation in US Equities

By:	Miquel Noguer i Alonso; Sonam Srivastava
Abstract:	Reinforcement learning is a machine learning approach concerned with solving dynamic optimization problems in an almost model-free way by maximizing a reward function in state and action spaces. This property makes it an exciting area of research for financial problems. Asset allocation, where the goal is to obtain the weights of the assets that maximize the rewards in a given state of the market considering risk and transaction costs, is a problem easily framed using a reinforcement learning framework. It is first a prediction problem for expected returns and covariance matrix and then an optimization problem for returns, risk, and market impact. Investors and financial researchers have been working with approaches like mean-variance optimization, minimum variance, risk parity, and equally weighted and several methods to make expected returns and covariance matrices' predictions more robust. This paper demonstrates the application of reinforcement learning to create a financial model-free solution to the asset allocation problem, learning to solve the problem using time series and deep neural networks. We demonstrate this on daily data for the top 24 stocks in the US equities universe with daily rebalancing. We use a deep reinforcement model on US stocks using different architectures. We use Long Short Term Memory networks, Convolutional Neural Networks, and Recurrent Neural Networks and compare them with more traditional portfolio management. The Deep Reinforcement Learning approach shows better results than traditional approaches using a simple reward function and only being given the time series of stocks. In Finance, no training to test error generalization results come guaranteed. We can say that the modeling framework can deal with time series prediction and asset allocation, including transaction costs.
Date:	2020–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2010.04404&r=all

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

By:	Nosratabadi, Saeed; Mosavi, Amir; Duan, Puhong; Ghamisi, Pedram; Filip, Ferdinand; Band, Shahab S.; Reuter, Uwe; Gama, Joao; Gandomi, Amir H.
Abstract:	This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.
Date:	2020–10–15
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:yc6e2&r=all

The Economy, the Pandemic, and Machine Learning

By:	Patrick T. Harker
Abstract:	The U.S. economy is recovering more strongly than originally anticipated, but significant risks related to COVID-19 and fiscal policy remain, said Patrick T. Harker, president and CEO of the Federal Reserve Bank of Philadelphia. Harker, delivering a keynote address virtually at the Official Monetary and Financial Institutions Forum, focused on artificial intelligence and machine learning.
Keywords:	COVID-19
Date:	2020–09–29
URL:	http://d.repec.org/n?u=RePEc:fip:fedpsp:88805&r=all

The Knowledge Graph for Macroeconomic Analysis with Alternative Big Data

By:	Yucheng Yang; Yue Pang; Guanhua Huang; Weinan E
Abstract:	The current knowledge system of macroeconomics is built on interactions among a small number of variables, since traditional macroeconomic models can mostly handle a handful of inputs. Recent work using big data suggests that a much larger number of variables are active in driving the dynamics of the aggregate economy. In this paper, we introduce a knowledge graph (KG) that consists of not only linkages between traditional economic variables but also new alternative big data variables. We extract these new variables and the linkages by applying advanced natural language processing (NLP) tools on the massive textual data of academic literature and research reports. As one example of the potential applications, we use it as the prior knowledge to select variables for economic forecasting models in macroeconomics. Compared to statistical variable selection methods, KG-based methods achieve significantly higher forecasting accuracy, especially for long run forecasts.
Date:	2020–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2010.05172&r=all

Automated coding using machine-learning and remapping the U.S. nonprofit sector: A guide and benchmark

By:	Ma, Ji (The University of Texas at Austin)
Abstract:	This research developed a machine-learning classifier that reliably automates the coding process using the National Taxonomy of Exempt Entities as a schema and remapped the U.S. nonprofit sector. I achieved 90% overall accuracy for classifying the nonprofits into nine broad categories and 88% for classifying them into 25 major groups. The intercoder reliabilities between algorithms and human coders measured by kappa statistics are in the "almost perfect" range of 0.80--1.00. The results suggest that a state-of-the-art machine-learning algorithm can approximate human coders and substantially improve researchers' productivity. I also reassigned multiple category codes to over 439 thousand nonprofits and discovered a considerable amount of organizational activities that were previously ignored. The classifier is an essential methodological prerequisite for large-N and Big Data analyses, and the remapped U.S. nonprofit sector can serve as an important instrument for asking or reexamining fundamental questions of nonprofit studies.
Date:	2020–10–10
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:pt3q9&r=all

How to Talk When a Machine is Listening: Corporate Disclosure in the Age of AI

By:	Sean Cao; Wei Jiang; Baozhong Yang; Alan L. Zhang
Abstract:	This paper analyzes how corporate disclosure has been reshaped by machine processors, employed by algorithmic traders, robot investment advisors, and quantitative analysts. Our findings indicate that increasing machine and AI readership, proxied by machine downloads, motivates firms to prepare filings that are more friendly to machine parsing and processing. Moreover, firms with high expected machine downloads manage textual sentiment and audio emotion in ways catered to machine and AI readers, such as by differentially avoiding words that are perceived as negative by computational algorithms as compared to those by human readers, and by exhibiting speech emotion favored by machine learning software processors. The publication of Loughran and McDonald (2011) is instrumental in attributing the change in the measured sentiment to machine and AI readership. While existing research has explored how investors and researchers apply machine learning and computational tools to quantify qualitative information from disclosure and news, this study is the first to identify and analyze the feedback effect on corporate disclosure decisions, i.e., how companies adjust the way they talk knowing that machines are listening.
JEL:	G14 G30
Date:	2020–10
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:27950&r=all

A random forest-based approach to identifying the most informative seasonality tests

By:	Ollech, Daniel; Webel, Karsten
Abstract:	Virtually each seasonal adjustment software includes an ensemble of seasonality tests for assessing whether a given time series is in fact a candidate for seasonal adjustment. However, such tests are certain to produce either the same resultor conflicting results, raising the question if there is a method that is capable of identifying the most informative tests in order (1) to eliminate the seemingly non-informative ones in the former case and (2) to find a final decision in the more severe latter case. We argue that identifying the seasonal status of a given time series is essentially a classification problem and, thus, can be solved with machine learning methods. Using simulated seasonal and non-seasonal ARIMA processes that are representative of the Bundesbank's time series database, we compare certain popular methods with respect to accuracy, interpretability and availability of unbiased variable importance measures and find random forests of conditional inference trees to be the method which best balances these key requirements. Applying this method to the seasonality tests implemented in the seasonal adjustment software JDemetra+ finally reveals that the modifiedQSand Friedman tests yield by far the most informative results.
Keywords:	binary classification,conditional inference trees,correlated predictors,JDemetra+,simulation study,supervised machine learning
JEL:	C12 C14 C22 C45 C63
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:zbw:bubdps:552020&r=all

Bridging the gap between Markowitz planning and deep reinforcement learning

By:	Eric Benhamou; David Saltiel; Sandrine Ungari; Abhishek Mukhopadhyay
Abstract:	While researchers in the asset management industry have mostly focused on techniques based on financial and risk planning techniques like Markowitz efficient frontier, minimum variance, maximum diversification or equal risk parity, in parallel, another community in machine learning has started working on reinforcement learning and more particularly deep reinforcement learning to solve other decision making problems for challenging task like autonomous driving, robot learning, and on a more conceptual side games solving like Go. This paper aims to bridge the gap between these two approaches by showing Deep Reinforcement Learning (DRL) techniques can shed new lights on portfolio allocation thanks to a more general optimization setting that casts portfolio allocation as an optimal control problem that is not just a one-step optimization, but rather a continuous control optimization with a delayed reward. The advantages are numerous: (i) DRL maps directly market conditions to actions by design and hence should adapt to changing environment, (ii) DRL does not rely on any traditional financial risk assumptions like that risk is represented by variance, (iii) DRL can incorporate additional data and be a multi inputs method as opposed to more traditional optimization methods. We present on an experiment some encouraging results using convolution networks.
Date:	2020–09
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2010.09108&r=all

The Pandemic, Automation, and Artificial Intelligence: Executive Briefing: AI and Machine Learning

By:	Patrick T. Harker
Abstract:	Philadelphia Fed President and CEO Patrick T. Harker spoke about the economy, AI, and automation at a Global Interdependence Center event. Harker said the “pandemic has had the effect of accelerating trends that were already present in our society.” These trends include automation and racial disparities in unemployment.
Keywords:	COVID19
Date:	2020–10–06
URL:	http://d.repec.org/n?u=RePEc:fip:fedpsp:88840&r=all

Deciphering Federal Reserve Communication via Text Analysis of Alternative FOMC Statements

By:	Taeyoung Doh; Dongho Song; Shu-Kuei X. Yang
Abstract:	We apply a natural language processing algorithm to FOMC statements to construct a new measure of monetary policy stance, including the tone and novelty of a policy statement. We exploit cross-sectional variations across alternative FOMC statements to identify the tone (for example, dovish or hawkish), and contrast the current and previous FOMC statements released after Committee meetings to identify the novelty of the announcement. We then use high-frequency bond prices to compute the surprise component of the monetary policy stance. Our text-based estimates of monetary policy surprises are not sensitive to the choice of bond maturities used in estimation, are highly correlated with forward guidance shocks in the literature, and are associated with lower stock returns after unexpected policy tightening. The key advantage of our approach is that we are able to conduct a counterfactual policy evaluation by replacing the released statement with an alternative statement, allowing us to perform a more detailed investigation at the sentence and paragraph level.
Keywords:	FOMC; Alternative FOMC statements; Counterfactual policy evaluation; Monetary policy stance; Text analysis; Natural language processing
JEL:	E30 E40 E50 G12
Date:	2020–10–06
URL:	http://d.repec.org/n?u=RePEc:fip:fedkrw:88946&r=all

Contracting, pricing, and data collection under the AI flywheel effect

By:	Huseyin Gurkan (ESMT European School of Management and Technology); Francis de Véricourt (ESMT European School of Management and Technology)
Abstract:	This paper explores how firms that lack expertise in machine learning (ML) can leverage the so-called AI Flywheel effect. This eff ect designates a virtuous cycle by which, as an ML product is adopted and new user data are fed back to the algorithm, the product improves, enabling further adoptions. However, managing this feedback loop is difficult, especially when the algorithm is contracted out. Indeed, the additional data that the AI Flywheel effect generates may change the provider's incentives to improve the algorithm over time. We formalize this problem in a simple two-period moral hazard framework that captures the main dynamics between machine learning, data acquisition, pricing, and contracting. We find that the firm's decisions crucially depend on how the amount of data on which the machine is trained interacts with the provider's effort. If this effort has a more (resp. less) significant impact on accuracy for larger volumes of data, the firm underprices (resp. overprices) the product. Interestingly, these distortions sometimes improve social welfare, which accounts for the customer surplus and profits of both the firm and provider. Further, the interaction between incentive issues and the positive externalities of the AI Flywheel effect have important implications for the firm's data collection strategy. In particular, the firm can boost its profit by increasing the product's capacity to acquire usage data only up to a certain level. If the product collects too much data per user, the firm's profit may actually decrease. As a result, the firm should consider reducing its product's data acquisition capacity when its initial dataset to train the algorithm is large enough.
Keywords:	Data, machine learning, data product, pricing, incentives, contracting
Date:	2020–03–03
URL:	http://d.repec.org/n?u=RePEc:esm:wpaper:esmt-20-01_r1&r=all

Big data for poverty measurement: insights from a scoping review

By:	Stubbers, Michaëla; Holvoet, Nathalie
Abstract:	This research presents a scoping review of 53 systematically selected studies that employ big data to measure and monitor poverty concepts. The primary aim of the review is to explore if and how big data can be used as a replacement or complementary to national and international statistics to identify, measure, and monitor poverty, economic development and inequality on a macro level. The analysis reveals that (1) the relevance of the field so far is driven by data availability, (2) researchers from different fields are involved as data types and analytics employed stem from various research domains, however, researchers from the global south are underrepresented, (3) the main data types used are Call Detail Records (CDR) and satellite image data while night-light is frequently associated with economic development, (4) the choice for certain data types is based on the hypothesis that the manifestations of poverty and development leave traces that are captured by big data sources, (5) big data techniques are so far mainly applied for feature extraction while classical statistical techniques are preferred for analysis. With this in mind, the review highlights challenges and opportunities of using big data for development statistics and briefly discusses the implications for monitoring and evaluation showing that it is highly unlikely that big data statistics will replace traditionally generated development data any time soon. Many barriers need to be overcome, including some technical challenges, stability and sustainability issues as well as institutional and legal aspects. In the meantime, big data offers undoubtfully a major opportunity to play a role to improve accuracy, timeliness and relevance of socio-economic indicators especially where no data is available, or where quality is highly disputable.
Keywords:	big data; poverty measurement; poverty
Date:	2020–10
URL:	http://d.repec.org/n?u=RePEc:iob:dpaper:202003&r=all

Oil-Price Uncertainty and the U.K. Unemployment Rate: A Forecasting Experiment with Random Forests Using 150 Years of Data

By:	Rangan Gupta (Department of Economics, University of Pretoria, Pretoria, 0002, South Africa); Christian Pierdzioch (Department of Economics, Helmut Schmidt University, Holstenhofweg 85, P.O.B. 700822, 22008 Hamburg, Germany); Afees A. Salisu (Centre for Econometric & Allied Research, University of Ibadan, Ibadan, Nigeria)
Abstract:	We analyze the predictive role of oil-price uncertainty for changes in the UK unemployment rate using more than a century of monthly data covering the period from 1859 to 2020. To this end, we use a machine-learning technique known as random forests. Random forests render it possible to model the potentially nonlinear link between oil-price uncertainty and subsequent changes in the unemployment rate in an entirely data-driven way, where it is possible to control for the impact of several other macroeconomic variables and other macroeconomic and financial uncertainties. Upon estimating random forests on rolling-estimation windows, we find evidence that oil-price uncertainty predicts out-ofsample changes in the unemployment rate, where the relative importance of oil-price uncertainty has undergone substantial swings during the history of the modern petroleum industry that started with the drilling of the first oil well at Titusville (Pennsylvania, United States) in 1859.
Keywords:	Machine learning, Random forests, Oil uncertainty, Macroeconomic and financial uncertainties, Unemployment rate, United Kingdom
JEL:	C22 C53 E24 E43 F31 G10 Q02
Date:	2020–10
URL:	http://d.repec.org/n?u=RePEc:pre:wpaper:202095&r=all

Reliance on Science by Inventors: Hybrid Extraction of In-text Patent-to-Article Citations

By:	Matt Marx; Aaron Fuegi
Abstract:	We curate and characterize a complete set of citations from patents to scientific articles, including nearly 16 million from the full text of USPTO and EPO patents. Combining heuristics and machine learning, we achieve 25% higher performance than machine learning alone. At 99.4% accuracy, coverage of 87.6% is achieved, and coverage above 90% with accuracy above 93%. Performance is evaluated with a set of 5,939 randomly-sampled, cross-verified “known good” citations, which the authors have never seen. We compare these “in-text” citations with the “official” citations on the front page of patents. In-text citations are more diverse temporally, geographically, and topically. They are less self-referential and less likely to be recycled from one patent to the next. That said, in-text citations have been overshadowed by front-page in the past few decades, dropping from 80% of all paper-to-patent citations to less than 40%. In replicating two published articles that use only citations on the front page of patents, we show that failing to capture those in the body text leads to understating the relationship between academic science and commercial invention. All patent-to-article citations, as well as the known-good test set, are available at http://relianceonscience.org.
JEL:	O31 O32 O33 O34
Date:	2020–10
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:27987&r=all

Theory-based residual neural networks: A synergy of discrete choice models and deep neural networks

By:	Shenhao Wang; Baichuan Mo; Jinhua Zhao
Abstract:	Researchers often treat data-driven and theory-driven models as two disparate or even conflicting methods in travel behavior analysis. However, the two methods are highly complementary because data-driven methods are more predictive but less interpretable and robust, while theory-driven methods are more interpretable and robust but less predictive. Using their complementary nature, this study designs a theory-based residual neural network (TB-ResNet) framework, which synergizes discrete choice models (DCMs) and deep neural networks (DNNs) based on their shared utility interpretation. The TB-ResNet framework is simple, as it uses a ($\delta$, 1-$\delta$) weighting to take advantage of DCMs' simplicity and DNNs' richness, and to prevent underfitting from the DCMs and overfitting from the DNNs. This framework is also flexible: three instances of TB-ResNets are designed based on multinomial logit model (MNL-ResNets), prospect theory (PT-ResNets), and hyperbolic discounting (HD-ResNets), which are tested on three data sets. Compared to pure DCMs, the TB-ResNets provide greater prediction accuracy and reveal a richer set of behavioral mechanisms owing to the utility function augmented by the DNN component in the TB-ResNets. Compared to pure DNNs, the TB-ResNets can modestly improve prediction and significantly improve interpretation and robustness, because the DCM component in the TB-ResNets stabilizes the utility functions and input gradients. Overall, this study demonstrates that it is both feasible and desirable to synergize DCMs and DNNs by combining their utility specifications under a TB-ResNet framework. Although some limitations remain, this TB-ResNet framework is an important first step to create mutual benefits between DCMs and DNNs for travel behavior modeling, with joint improvement in prediction, interpretation, and robustness.
Date:	2020–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2010.11644&r=all

When Bots Take Over the Stock Market: Evasion Attacks Against Algorithmic Traders

By:	Elior Nehemya; Yael Mathov; Asaf Shabtai; Yuval Elovici
Abstract:	In recent years, machine learning has become prevalent in numerous tasks, including algorithmic trading. Stock market traders utilize learning models to predict the market's behavior and execute an investment strategy accordingly. However, learning models have been shown to be susceptible to input manipulations called adversarial examples. Yet, the trading domain remains largely unexplored in the context of adversarial learning. This is mainly because of the rapid changes in the market which impair the attacker's ability to create a real-time attack. In this study, we present a realistic scenario in which an attacker gains control of an algorithmic trading bots by manipulating the input data stream in real-time. The attacker creates an universal perturbation that is agnostic to the target model and time of use, while also remaining imperceptible. We evaluate our attack on a real-world market data stream and target three different trading architectures. We show that our perturbation can fool the model at future unseen data points, in both white-box and black-box settings. We believe these findings should serve as an alert to the finance community about the threats in this area and prompt further research on the risks associated with using automated learning models in the finance domain.
Date:	2020–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2010.09246&r=all

Transnational machine learning with screens for flagging bid-rigging cartels

By:	Huber, Martin; Imhof, David
Abstract:	We investigate the transnational transferability of statistical screening methods originally developed using Swiss data for detecting bid-rigging cartels in Japan. We find that combining screens for the distribution of bids in tenders with machine learning to classify collusive vs. competitive tenders entails a correct classification rate of 88% to 93% when training and testing the method based on Japanese data from the so-called Okinawa bid-rigging cartel. As in Switzerland, bid rigging in Okinawa reduced the variance and increased the asymmetry in the distribution of bids. When pooling the data from both countries for training and testing the classification models, we still obtain correct classification rates of 82% to 88%. However, when training the models in data from one country to test their performance in the data from the other country, rates go down substantially, due to some screens for competitive Japanese tenders being similar to those for collusive Swiss tenders. Our results thus suggest that a countryâ€™s institutional context matters for the distribution of bids, such that a country-specific training of classification models is to be preferred over applying trained models across borders, even though some screens turn out to be more stable across countries than others.
Keywords:	Bid rigging; screening methods; machine learning; random forest; ensemble methods
JEL:	C21 C45 C52 D22 D40 K40
Date:	2020–10–26
URL:	http://d.repec.org/n?u=RePEc:fri:fribow:fribow00519&r=all

The Role of Information Provision for Attitudes Towards Immigration: An Experimental Investigation.

By:	Patrick Bareinz; Silke Uebelmesser
Abstract:	We conduct a survey experiment on the effect of information provision on attitudes towards immigration in Germany. The focus lies on two theory-based economic channels, labor market and welfare state concerns, and immigration policy preferences. Using probability-based representative survey data, we experimentally vary the quantity and the type of information provided to respondents. We find that a bundle of information on both the share and the unemployment rate of foreigners robustly decreases welfare state concerns about immigration. There are slightly less pronounced effects on the labor market and policy channels. Further data-driven analyses reveal heterogeneity in treatment effects. Our findings therefore suggest that careful composition and targeting of information interventions can increase their effectiveness in the public debate on immigration.
Keywords:	immigration attitudes, survey experiment, information provision, belief updating, welfare state, labor market, machine learning
JEL:	C90 D83 F22 J15
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_8635&r=all

Are Crises Predictable? A Review of the Early Warning Systems in Currency and Stock Markets

By:	Peiwan Wang; Lu Zong
Abstract:	The study efforts to explore and extend the crisis predictability by synthetically reviewing and comparing a full mixture of early warning models into two constitutions: crisis identifications and predictive models. Given empirical results on Chinese currency and stock markets, three-strata findings are concluded as (i) the SWARCH model conditional on an elastic thresholding methodology can most accurately classify crisis observations and greatly contribute to boosting the predicting precision, (ii) stylized machine learning models are preferred given higher precision in predicting and greater benefit in practicing, (iii) leading factors sign the crisis in a diversified way for different types of markets and varied prediction periods.
Date:	2020–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2010.10132&r=all

How did Japan cope with COVID-19? Big Data and purchasing behavior (Japanese)

By:	KONISHI Yoko; SAITO Takashi; ISHIKAWA Toshiki; IGEI Naoya
Abstract:	Japan has been recognized for having successfully controlled the spread of the coronavirus disease (COVID-19) pandemic. This study aims to gather insights to combat the spread of infection in our daily lives by observing our purchasing behavior. We use point of sales data from supermarkets, convenience stores, home centers, drug stores, and electronics retail stores for a nationwide analysis. Our analysis revealed the following. First, the Japanese actively prevented the spread of infection by voluntarily wearing masks, using alcohol-based disinfectants, and gargling. Second, people willingly stayed home during the semi-lockdown. Third, infection prevention essentials continued to be purchased during periods of both low and high levels of infection. We conclude that continuing to prevent the pandemic with masks, hand washing and sanitizing, and gargling, along with spending more time at home and maintaining safe distancing, will be effective in reducing the spread of the virus. Finally, the infections and deaths were primarily concentrated in the metropolitan area and Kansai region, where the nature of the spread of the infection was different compared to that in small and middle-sized prefectures.
Date:	2020–09
URL:	http://d.repec.org/n?u=RePEc:eti:rdpsjp:20037&r=all

Motif: an open-source R tool for pattern-based spatial analysis

By:	Nowosad, Jakub
Abstract:	Context Pattern-based spatial analysis provides methods to describe and quantitatively compare spatial patterns for categorical raster datasets. It allows for spatial search, change detection, and clustering of areas with similar patterns. Objectives We developed an R package motif as a set of open-source tools for pattern-based spatial analysis. Methods This package provides most of the functionality of existing software (except spatial segmentation), but also extends the existing ideas through support for multi-layer raster datasets. It accepts larger-than-RAM datasets and works across all of the major operating systems. Results In this study, we describe the software design of the tool, its capabilities, and present four case studies. They include calculation of spatial signatures based on land cover data for regular and irregular areas, search for regions with similar patterns of geomorphons, detection of changes in land cover patterns, and clustering of areas with similar spatial patterns of land cover and landforms. Conclusions The methods implemented in motif should be useful in a wide range of applications, including land management, sustainable development, environmental protection, forest cover change and urban growth monitoring, and agriculture expansion studies. The motif package homepage is https://nowosad.github.io/motif.
Date:	2020–10–17
URL:	http://d.repec.org/n?u=RePEc:osf:ecoevo:kj7fu&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.