|
on Big Data |
By: | Hofer, Martin (Vienna University of Economics and Business); Sako, Tomas (Freelance data scientist); Martinez, Jr., Arturo (Asian Development Bank); Addawe, Mildred (Asian Development Bank); Durante, Ron Lester (Asian Development Bank) |
Abstract: | The spatial granularity of poverty statistics can have a significant impact on the efficiency of targeting resources meant to improve the living conditions of the poor. However, achieving granularity typically requires increasing the sample sizes of surveys on household income and expenditure or living standards, an option that is not always practical for government agencies that conduct these surveys. Previous studies that examined the use of innovative (geospatial) data sources such as those from high-resolution satellite imagery suggest that such method may be an alternative approach of producing granular poverty maps. This study outlines a computational framework to enhance the spatial granularity of government-published poverty estimates using a deep layer computer vision technique applied on publicly available medium-resolution satellite imagery, household surveys, and census data from the Philippines and Thailand. By doing so, the study explores a potentially more cost-effective alternative method for poverty estimation method. The results suggest that even using publicly accessible satellite imagery, in which the resolutions are not as fine as those in commercially sourced images, predictions generally aligned with the distributional structure of government-published poverty estimates, after calibration. The study further contributes to the existing literature by examining robustness of the resulting estimates to user-specified algorithmic parameters and model specifications. |
Keywords: | big data; computer vision; data for development; machine learning algorithm; official statistics; poverty; SDG |
JEL: | C19 D31 I32 O15 |
Date: | 2020–12–29 |
URL: | http://d.repec.org/n?u=RePEc:ris:adbewp:0629&r= |
By: | Puttanapong , Nattapong (Thammasat University); Martinez, Jr. , Arturo (Asian Development Bank); Addawe, Mildred (Asian Development Bank); Bulan, Joseph (Asian Development Bank); Durante , Ron Lester (Asian Development Bank); Martillan , Marymell (Asian Development Bank) |
Abstract: | Poverty statistics are conventionally compiled using data from household income and expenditure survey or living standards survey. This study examines an alternative approach in estimating poverty by investigating whether readily available geospatial data can accurately predict the spatial distribution of poverty in Thailand. In particular, geospatial data examined in this study include night light intensity, land cover, vegetation index, land surface temperature, built-up areas, and points of interest. The study also compares the predictive performance of various econometric and machine learning methods such as generalized least squares, neural network, random forest, and support vector regression. Results suggest that intensity of night lights and other variables that approximate population density are highly associated with the proportion of an area’s population who are living in poverty. The random forest technique yielded the highest level of prediction accuracy among the methods considered in this study, perhaps due to its capability to fit complex association structures even with small and medium-sized datasets. Moving forward, additional studies are needed to investigate whether the relationships observed here remain stable over time, and therefore, may be used to approximate the prevalence of poverty for years when household surveys on income and expenditures are not conducted, but data on geospatial correlates of poverty are available. |
Keywords: | big data; computer vision; data for development; machine learning algorithm; multidimensional poverty; official statistics; poverty; SDG; Thailand |
JEL: | C19 D31 I32 O15 |
Date: | 2020–12–29 |
URL: | http://d.repec.org/n?u=RePEc:ris:adbewp:0630&r= |
By: | Zihao Zhang; Stefan Zohren |
Abstract: | We design multi-horizon forecasting models for limit order book (LOB) data by using deep learning techniques. Unlike standard structures where a single prediction is made, we adopt encoder-decoder models with sequence-to-sequence and Attention mechanisms, to generate a forecasting path. Our methods achieve comparable performance to state-of-art algorithms at short prediction horizons. Importantly, they outperform when generating predictions over long horizons by leveraging the multi-horizon setup. Given that encoder-decoder models rely on recurrent neural layers, they generally suffer from a slow training process. To remedy this, we experiment with utilising novel hardware, so-called Intelligent Processing Units (IPUs) produced by Graphcore. IPUs are specifically designed for machine intelligence workload with the aim to speed up the computation process. We show that in our setup this leads to significantly faster training times when compared to training models with GPUs. |
Date: | 2021–05 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2105.10430&r= |
By: | Tim Leung; Theodore Zhao |
Abstract: | We present the method of complementary ensemble empirical mode decomposition (CEEMD) and Hilbert-Huang transform (HHT) for analyzing nonstationary financial time series. This noise-assisted approach decomposes any time series into a number of intrinsic mode functions, along with the corresponding instantaneous amplitudes and instantaneous frequencies. Different combinations of modes allow us to reconstruct the time series using components of different timescales. We then apply Hilbert spectral analysis to define and compute the associated instantaneous energy-frequency spectrum to illustrate the properties of various timescales embedded in the original time series. Using HHT, we generate a collection of new features and integrate them into machine learning models, such as regression tree ensemble, support vector machine (SVM), and long short-term memory (LSTM) neural network. Using empirical financial data, we compare several HHT-enhanced machine learning models in terms of forecasting performance. |
Date: | 2021–05 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2105.10871&r= |
By: | Xin Jin |
Abstract: | This paper presents a framework of imitating the price behavior of the underlying stock for reinforcement learning option price. We use accessible features of the equities pricing data to construct a non-deterministic Markov decision process for modeling stock price behavior driven by principal investor's decision making. However, low signal-to-noise ratio and instability that appear immanent in equity markets pose challenges to determine the state transition (price change) after executing an action (principal investor's decision) as well as decide an action based on current state (spot price). In order to conquer these challenges, we resort to a Bayesian deep neural network for computing the predictive distribution of the state transition led by an action. Additionally, instead of exploring a state-action relationship to formulate a policy, we seek for an episode based visible-hidden state-action relationship to probabilistically imitate principal investor's successive decision making. Our algorithm then maps imitative principal investor's decisions to simulated stock price paths by a Bayesian deep neural network. Eventually the optimal option price is reinforcement learned through maximizing the cumulative risk-adjusted return of a dynamically hedged portfolio over simulated price paths of the underlying. |
Date: | 2021–05 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2105.11376&r= |
By: | Ozan Aksoy (Centre for Quantitative Social Sciences in the Social Research Institute, University College London) |
Abstract: | In this study I analyse through machine learning the content of all Friday khutbas (sermons) read to millions of citizens in thousands of Mosques of Turkey since 2015. I focus on six non-religious and recurrent topics that feature in the sermons, namely business, family, nationalism, health, trust, and patience. I demonstrate that the content of the sermons responds strongly to events of national importance. I then link the Friday sermons with ~4.8 million tweets on these topics to study whether and how the content of sermons affects social media behaviour. I find generally large effects of the sermons on tweets, but there is also heterogeneity by topic. It is strongest for nationalism, patience, and health and weakest for business. Overall, these results show that religious institutions in Turkey are influential in shaping the public’s social media content and that this influence is mainly prevalent on salient issues. More generally, these results show that mass offline religious activity can have strong effects on social media behaviour |
Keywords: | text-as-data analysis, computational social science, social media, religion, Islam, Turkey |
JEL: | C63 N35 Z12 |
Date: | 2021–05–01 |
URL: | http://d.repec.org/n?u=RePEc:qss:dqsswp:2117&r= |
By: | Huseyin Gurkan (ESMT European School of Management and Technology); Francis de Véricourt (ESMT European School of Management and Technology) |
Abstract: | This paper explores how firms that lack expertise in machine learning (ML) can leverage the so-called AI Flywheel effect. This effect designates a virtuous cycle by which, as an ML product is adopted and new user data are fed back to the algorithm, the product improves, enabling further adoptions. However, managing this feedback loop is difficult, especially when the algorithm is contracted out. Indeed, the additional data that the AI Flywheel effect generates may change the provider's incentives to improve the algorithm over time. We formalize this problem in a simple two-period moral hazard framework that captures the main dynamics among ML, data acquisition, pricing, and contracting. We find that the firm's decisions crucially depend on how the amount of data on which the machine is trained interacts with the provider's effort. If this effort has a more (less) significant impact on accuracy for larger volumes of data, the firm underprices (overprices) the product. Interestingly, these distortions sometimes improve social welfare, which accounts for the customer surplus and profits of both the firm and provider. Further, the interaction between incentive issues and the positive externalities of the AI Flywheel effect has important implications for the firm's data collection strategy. In particular, the firm can boost its profit by increasing the product's capacity to acquire usage data only up to a certain level. If the product collects too much data per user, the firm's profit may actually decrease, i.e., more data is not necessarily better. As a result, the firm should consider reducing its product's data acquisition capacity when its initial dataset to train the algorithm is large enough. |
Keywords: | Data, machine learning, data product, pricing, incentives, contracting |
Date: | 2020–03–03 |
URL: | http://d.repec.org/n?u=RePEc:esm:wpaper:esmt-20-01_r2&r= |
By: | Zeinab Rouhollahi |
Abstract: | Recently, financial institutes have been dealing with an increase in financial crimes. In this context, financial services firms started to improve their vigilance and use new technologies and approaches to identify and predict financial fraud and crime possibilities. This task is challenging as institutions need to upgrade their data and analytics capabilities to enable new technologies such as Artificial Intelligence (AI) to predict and detect financial crimes. In this paper, we put a step towards AI-enabled financial crime detection in general and money laundering detection in particular to address this challenge. We study and analyse the recent works done in financial crime detection and present a novel model to detect money laundering cases with minimum human intervention needs. |
Date: | 2021–05 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2105.10866&r= |
By: | Kok Fong See (Universiti Sains Malaysia, Malaysia); Shawna Grosskopf (Oregon State University, United States); Vivian Valdmanis (Western Michigan University, United States); Valentin Zelenyuk (School of Economics and Centre for Efficiency and Productivity Analysis (CEPA) at The University of Queensland, Australia) |
Abstract: | Not only does healthcare play a key role in a country’s economy, but it is also one of the fastest-growing sectors for most countries, resulting in rising expenditures. In turn, efficiency and productivity analyses of the healthcare industry have attracted attention from a wide variety of interested parties, including academics, hospital administrators, and policy makers. As a result, a very large number of studies of efficiency and productivity in the healthcare industry have appeared over the past three decades in a variety of outlets. In this paper, we conduct a comprehensive and systematic review of these studies with the aid of modern machine technology learning methods for bibliometric analysis. This approach facilitated our identification and analysis and allowed us to reveal patterns and clusters in the data from 477 efficiency and productivity articles associated with the healthcare industry from 1983 to 2019, produced by nearly 1000 authors and published in a multitude of academic journals. Leveraging on such ‘biblioanalytics’, combined with our own understanding of the field, we then highlight the trends and possible future of efficiency and productivity studies in healthcare. |
Date: | 2021–05 |
URL: | http://d.repec.org/n?u=RePEc:qld:uqcepa:161&r= |
By: | Yong Shi; Wei Dai; Wen Long; Bo Li |
Abstract: | The Gaussian Process with a deep kernel is an extension of the classic GP regression model and this extended model usually constructs a new kernel function by deploying deep learning techniques like long short-term memory networks. A Gaussian Process with the kernel learned by LSTM, abbreviated as GP-LSTM, has the advantage of capturing the complex dependency of financial sequential data, while retaining the ability of probabilistic inference. However, the deep kernel Gaussian Process has not been applied to forecast the conditional returns and volatility in financial market to the best of our knowledge. In this paper, grid search algorithm, used for performing hyper-parameter optimization, is integrated with GP-LSTM to predict both the conditional mean and volatility of stock returns, which are then combined together to calculate the conditional Sharpe Ratio for constructing a long-short portfolio. The experiments are performed on a dataset covering all constituents of Shenzhen Stock Exchange Component Index. Based on empirical results, we find that the GP-LSTM model can provide more accurate forecasts in stock returns and volatility, which are jointly evaluated by the performance of constructed portfolios. Further sub-period analysis of the experiment results indicates that the superiority of GP-LSTM model over the benchmark models stems from better performance in highly volatile periods. |
Date: | 2021–05 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2105.12293&r= |
By: | Joan Huang (Reserve Bank of Australia); John Simon (Reserve Bank of Australia) |
Abstract: | High-quality central bank communication can improve the effectiveness of monetary policy and is an essential element in providing greater central bank transparency. There is, however, no agreement on what high-quality communication looks like. To shed light on this, we investigate 3 important aspects of central bank communication. We focus on how different audiences perceive the readability and degree of reasoning within various economic publications; providing the reasons for decisions is a critical element of transparency. We find that there is little correlation between perceived readability and reasoning in the economic communications we analyse, which highlights that commonly used measures of readability can miss important aspects of communication. We also find that perceptions of communication quality can vary significantly between audiences; one size does not fit all. To dig deeper we use machine learning techniques and develop a model that predicts the way different audiences rate the readability of and reasoning within texts. The model highlights that simpler writing is not necessarily more readable nor more revealing of the author's reasoning. The results also show how readability and reasoning vary within and across documents; good communication requires a variety of styles within a document, each serving a different purpose, and different audiences need different styles. Greater central bank transparency and more effective communication require an emphasis not just on greater readability of a single document, but also on setting out the reasoning behind conclusions in a variety of documents that each meet the needs of different audiences. |
Keywords: | central bank communications; machine learning; natural language processing; readability; central bank transparency |
JEL: | C61 C83 D83 E58 Z13 |
Date: | 2021–05 |
URL: | http://d.repec.org/n?u=RePEc:rba:rbardp:rdp2021-05&r= |
By: | Boot, Arnoud W A; Hoffmann, Peter; Laeven, Luc; Ratnovski, Lev |
Abstract: | We study the effects of technological change on financial intermediation, distinguishing between innovations in information (data collection and processing) and communication (relationships and distribution). Both follow historic trends towards an increased use of hard information and less in-person interaction, which are accelerating rapidly. We point to more recent innovations, such as the combination of data abundance and artificial intelligence, and the rise of digital platforms. We argue that in particular the rise of new communication channels can lead to the vertical and horizontal disintegration of the traditional bank business model. Specialized providers of financial services can chip away activities that do not rely on access to balance sheets, while platforms can interject themselves between banks and customers. We discuss limitations to these challenges, and the resulting policy implications. |
Keywords: | communication; financial innovation; Financial Intermediation; Fintech; Information |
JEL: | E58 G20 G21 O33 |
Date: | 2020–07 |
URL: | http://d.repec.org/n?u=RePEc:cpr:ceprdp:15004&r= |
By: | Sansone, Dario (University of Exeter); Zhu, Anna (RMIT University) |
Abstract: | Using high-quality nation-wide social security data combined with machine learning tools, we develop predictive models of income support receipt intensities for any payment enrolee in the Australian social security system between 2014 and 2018. We show that off-the-shelf machine learning algorithms can significantly improve predictive accuracy compared to simpler heuristic models or early warning systems currently in use. Specifically, the former predicts the proportion of time individuals are on income support in the subsequent four years with greater accuracy, by a magnitude of at least 22% (14 percentage points increase in the R2), compared to the latter. This gain can be achieved at no extra cost to practitioners since the algorithms use administrative data currently available to caseworkers. Consequently, our machine learning algorithms can improve the detection of long-term income support recipients, which can potentially provide governments with large savings in accrued welfare costs. |
Keywords: | income support, machine learning, Australia |
JEL: | C53 H53 I38 J68 |
Date: | 2021–05 |
URL: | http://d.repec.org/n?u=RePEc:iza:izadps:dp14377&r= |
By: | Bianchi, Francesco; Ludvigson, Sydney C.; Ma, Sai |
Abstract: | This paper combines a data rich environment with a machine learning algorithm to provide estimates of time-varying systematic expectational errors ("belief distortions") about the macroeconomy embedded in survey responses. We find that such distortions are large on average even for professional forecasters, with all respondent-types over-weighting their own forecast relative to other information. Forecasts of inflation and GDP growth oscillate between optimism and pessimism by quantitatively large amounts. To investigate the dynamic relation of belief distortions with the macroeconomy, we construct indexes of aggregate (across surveys and respondents) expectational biases in survey forecasts. Over-optimism is associated with an increase in aggregate economic activity. Our estimates provide a benchmark to evaluate theories for which information capacity constraints, extrapolation, sentiments, ambiguity aversion, and other departures from full information rational expectations play a role in business cycles. |
Keywords: | beliefs; Biases; Expectations; Machine Learning |
JEL: | E17 E27 E32 E7 G4 |
Date: | 2020–07 |
URL: | http://d.repec.org/n?u=RePEc:cpr:ceprdp:15003&r= |
By: | Longden, Elaine (Tilburg University, School of Economics and Management) |
Date: | 2021 |
URL: | http://d.repec.org/n?u=RePEc:tiu:tiutis:e1d97882-8cf3-40a4-a82e-8ad900e59177&r= |
By: | Jiongyan Zhang |
Abstract: | In order to study the phenomenon of regional economic development and urban expansion from the perspective of night-light remote sensing images, researchers use NOAA-provided night-light remote sensing image data (data from 1992 to 2013) along with ArcGIS software to process image information, obtain the basic pixel information data of specific areas of the image, and analyze these data from the space-time domain for presentation of the trend of regional economic development in China in recent years, and tries to explore the urbanization effect brought by the rapid development of China's economy. Through the analysis and study of the data, the results show that the urbanization development speed in China is still at its peak, and has great development potential and space. But at the same time, people also need to pay attention to the imbalance of regional development. |
Date: | 2020–11 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2105.10459&r= |
By: | Haozhe Su; M. V. Tretyakov; David P. Newton |
Abstract: | Transition probability densities are fundamental to option pricing. Advancing recent work in deep learning, we develop novel transition density function generators through solving backward Kolmogorov equations in parametric space for cumulative probability functions, using neural networks to obtain accurate approximations of transition probability densities, creating ultra-fast transition density function generators offline that can be trained for any underlying. These are 'single solve' , so they do not require recalculation when parameters are changed (e.g. recalibration of volatility) and are portable to other option pricing setups as well as to less powerful computers, where they can be accessed as quickly as closed-form solutions. We demonstrate the range of application for one-dimensional cases, exemplified by the Black-Scholes-Merton model, two-dimensional cases, exemplified by the Heston process, and finally for a modified Heston model with time-dependent parameters that has no closed-form solution. |
Date: | 2021–05 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2105.10467&r= |
By: | Aspachs, Oriol; Durante, Ruben; Graziano, Alberto; Mestres, Josep; Montalvo, Jose G; Reynal-Querol, Marta |
Abstract: | Official statistics on economic inequality are only available at low frequency and with considerable delay. This makes it challenging to assess the impact on inequality of fast-unfolding crises like the COVID-19 pandemic, and to rapidly evaluate and tailor policy responses. We propose a new methodology to track income inequality at high frequency using anonymized data from bank records for over three million account holders in Spain. Using this approach, we analyse how inequality evolved between February and July 2020 (compared to the same months of 2019). We first show that the wage distribution in our data matches very closely that from official labour surveys. We then document that, in the absence of government intervention, inequality would have increased dramatically, mainly due to job losses and wage cuts experienced by low-wage workers. The increase in pre-transfer inequality was especially pronounced among younger and foreign-born individuals, and in regions more dependent on tourism. Finally, we find that public transfers and unemployment insurance schemes were very effective at providing a safety net to the most affected segments of the population and at offsetting most of the increase in inequality. |
Keywords: | Administrative data; COVID-19; High Frequency Data; inequality |
JEL: | C81 D63 E24 J31 |
Date: | 2020–07 |
URL: | http://d.repec.org/n?u=RePEc:cpr:ceprdp:15118&r= |
By: | El Youssefi Ahmed (USMBA - Université Sidi Mohamed Ben Abdellah - Fès [Université de Taza]); Abdelahad Chraibi (Alicante [Seclin]); Julien Taillard (Alicante [Seclin]); Ahlame Begdouri (USMBA - Université Sidi Mohamed Ben Abdellah - Fès [Université de Taza]) |
Abstract: | A patients' medical record represents their medical history and enclose interesting information about their health status within written reports. These reports usually contain measurements (among other information) that need to be reviewed before any new medical intervention, since they might influence the medical decision regarding the types of drugs that are prescribed or their dosage. In this paper, we introduce a method that extracts measurements automatically from textual medical discharge summaries, admission notes, progress notes, and primary care notes. We don't distinguish between reports belonging to different services. For doing so, we propose a system that uses Grobid-quantities to extract value/unit pairs, uses generated rules from analysis of medical reports and text mining tools to identify candidate measurements. These candidates are then classified using a Long Short Term Memory (LSTM) network trained model to determine which is the corresponding measurement to the value/unit pair. The results are promising: 95.13% accuracy, a precision of 92.38%, a recall of 94.01% and an F1 score of 89.49%. |
Keywords: | Conditional Random Fields (CRF),Long Short Term Memory (LSTM),Natural Language Processing,Measurement,Medical report |
Date: | 2020–10–26 |
URL: | http://d.repec.org/n?u=RePEc:hal:journl:hal-03229520&r= |
By: | Timothy DeLise |
Abstract: | This research investigates pricing financial options based on the traditional martingale theory of arbitrage pricing applied to neural SDEs. We treat neural SDEs as universal It\^o process approximators. In this way we can lift all assumptions on the form of the underlying price process, and compute theoretical option prices numerically. We propose a variation of the SDE-GAN approach by implementing the Wasserstein distance metric as a loss function for training. Furthermore, it is conjectured that the error of the option price implied by the learnt model can be bounded by the very Wasserstein distance metric that was used to fit the empirical data. |
Date: | 2021–05 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2105.13320&r= |