|
on Big Data |
By: | Michael Allan Ribers; Hannes Ullrich |
Abstract: | Antibiotic resistance constitutes a major health threat. Predicting bacterial causes of infections is key to reducing antibiotic misuse, a leading driver of antibiotic resistance. We train a machine learning algorithm on administrative and microbiological laboratory data from Denmark to predict diagnostic test outcomes for urinary tract infections. Based on predictions, we develop policies to improve prescribing in primary care, highlighting the relevance of physician expertise and policy implementation when patient distributions vary over time. The proposed policies delay antibiotic prescriptions for some patients until test results are known and give them instantly to others. We find that machine learning can reduce antibiotic use by 7.42 percent without reducing the number of treated bacterial infections. As Denmark is one of the most conservative countries in terms of antibiotic use, this result is likely to be a lower bound of what can be achieved elsewhere. |
Keywords: | antibiotic prescribing, prediction policy, machine learning, expert decision-making |
JEL: | C10 I11 I18 L38 O38 Q28 |
Date: | 2019 |
URL: | http://d.repec.org/n?u=RePEc:ces:ceswps:_7654&r=all |
By: | Mitnik, Oscar A.; Sanchez, Raul; Yañez, Patricia |
Abstract: | This paper quantifies the impacts of transport infrastructure investments on economic activity in Haiti, proxied by satellite luminosity data. Our identification strategy exploits the differential timing of rehabilitation projects across various road segments of the primary road network. We combine multiple sources of non-traditional data and carefully address concerns related to unobserved heterogeneity. The results obtained across multiple specifications consistently indicate that receiving a road rehabilitation project leads to an increase in luminosity values between 7 percent and 15 percent at the communal section level. Taking into account the national level elasticity between luminosity values and GDP, we approximate that these interventions translate in GDP increases of around 0.6 percent and 1.2 percent in communal sections that were benefited by a transport project. Findings also uncover some temporal and spatial variation, showing that effects take some time to appear and that it is not the richest or the poorest communities that are gaining from these investments but those in the middle of the income distribution. |
JEL: | O10 R40 O47 D04 |
Date: | 2019–06 |
URL: | http://d.repec.org/n?u=RePEc:idb:brikps:28&r=all |
By: | Jillian Grennan (Duke University - Fuqua School of Business; Duke Innovation & Entrepreneurship Initiative); Roni Michaely (University of Geneva - Geneva Finance Research Institute (GFRI); Swiss Finance Institute) |
Abstract: | Market intelligence FinTechs aggregate many data sources, including nontraditional ones, and synthesize such data using artificial intelligence to make investment recommendations. Using data from a market intelligence FinTech, we evaluate the relationship between the FinTech data coverage and market efficiency. We find an increase in price informativeness for stocks with higher FinTech coverage and that traditional sources of information have less impact on prices for those stocks. Consistent with FinTechs changing investors' behavior, we show a substitution between traditional information sources and FinTechs using internet click data. Overall, our results suggest the rise in FinTechs for investment recommendations benefits investors. |
Keywords: | Fintech, FinTechs (financial technology firms), Market intelligence, Artificial intelligence, Aggregators, Social media, Financial blogs, Information and market efficiency, Big data, Machine learning, Datamining, Data signal providers |
JEL: | D14 G11 G14 G23 |
Date: | 2019–03 |
URL: | http://d.repec.org/n?u=RePEc:chf:rpseri:rp1910&r=all |
By: | Simon Schn\"urch; Andreas Wagner |
Abstract: | This paper employs machine learning algorithms to forecast German electricity spot market prices. The forecasts utilize in particular bid and ask order book data from the spot market but also fundamental market data like renewable infeed and expected demand. Appropriate feature extraction for the order book data is developed. Using cross-validation to optimise hyperparameters, neural networks and random forests are proposed and compared to statistical reference models. The machine learning models outperform traditional approaches. |
Date: | 2019–06 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:1906.06248&r=all |
By: | Laura Palagi (Department of Computer, Control and Management Engineering Antonio Ruberti (DIAG), University of Rome La Sapienza, Rome, Italy); Ruggiero Seccia (Department of Computer, Control and Management Engineering Antonio Ruberti (DIAG), University of Rome La Sapienza, Rome, Italy) |
Abstract: | Deep Feedforward Neural Networks' (DFNNs) weights estimation relies on the solution of a very large nonconvex optimization problem that may have many local (no global) minimizers, saddle points and large plateaus. Furthermore, the time needed to find good solutions to the training problem heavily depends on both the number of samples and the number of weights (variables). In this work, we show how Block Coordinate Descent (BCD) methods can be applied to improve the performance of state-of-the-art algorithms by avoiding bad stationary points and flat regions. We first describe a batch BCD method able to effectively tackle difficulties due to the network's depth; then we further extend the algorithm proposing an online BCD scheme able to scale with respect to both the number of variables and the number of samples. We perform extensive numerical results on standard datasets using different deep networks, and we showed how the application of (online) BCD methods to the training phase of DFNNs permits to outperform standard batch/online algorithms leading to an improvement on both the training phase and the generalization performance of the networks. |
Keywords: | Deep Feedforward Neural Networks ; Block coordinate decomposition ; Online Optimization ; Large scale optimization |
Date: | 2019 |
URL: | http://d.repec.org/n?u=RePEc:aeg:report:2019-06&r=all |
By: | Giovanni Barone-Adesi (University of Lugano; Swiss Finance Institute); Antonietta Mira (Università della Svizzera italiana - InterDisciplinary Institute of Data Science); Matteo Pisati (Universita' della Svizzera Italiana) |
Abstract: | The problem of market predictability can be decomposed into two parts: predictive models and predictors. At first, we show how the joint employment of model selection and machine learning models can dramatically increase our capability to forecast the equity premium out-of-sample. Secondly, we introduce batteries of powerful predictors which brings the monthly S&P500 R-square to a high level of 24%. Finally, we prove how predictability is a generalized characteristic of U.S. equity markets. For each of the three parts, we consider potential and challenges posed by the new approaches in the asset pricing field. |
Keywords: | Markets Predictability, Machine Learning, Model Selection |
Date: | 2019–03 |
URL: | http://d.repec.org/n?u=RePEc:chf:rpseri:rp1915&r=all |
By: | Rishab Guha; Serena Ng |
Abstract: | This paper analyzes weekly scanner data collected for 108 groups at the county level between 2006 and 2014. The data display multi-dimensional weekly seasonal effects that are not exactly periodic but are cross-sectionally dependent. Existing univariate procedures are imperfect and yield adjusted series that continue to display strong seasonality upon aggregation. We suggest augmenting the univariate adjustments with a panel data step that pools information across counties. Machine learning tools are then used to remove the within-year seasonal variations. A demand analysis of the adjusted budget shares finds three factors: one that is trending, and two cyclical ones that are well aligned with the level and change in consumer confidence. The effects of the Great Recession vary across locations and product groups, with consumers substituting towards home cooking away from non-essential goods. The adjusted data also reveal changes in spending to unanticipated shocks at the local level. The data are thus informative about both local and aggregate economic conditions once the seasonal effects are removed. The two-step methodology can be adapted to remove other types of nuisance variations provided that these variations are cross-sectionally dependent. |
JEL: | E21 E32 |
Date: | 2019–05 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:25899&r=all |
By: | Suyong Song; Stephen S. Baek |
Abstract: | We study the association between physical appearance and family income using a novel data which has 3-dimensional body scans to mitigate the issue of reporting errors and measurement errors observed in most previous studies. We apply machine learning to obtain intrinsic features consisting of human body and take into account a possible issue of endogenous body shapes. The estimation results show that there is a significant relationship between physical appearance and family income and the associations are different across the gender. This supports the hypothesis on the physical attractiveness premium and its heterogeneity across the gender. |
Date: | 2019–06 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:1906.06747&r=all |
By: | Pierrick Piette (SAF - Laboratoire de Sciences Actuarielle et Financière - UCBL - Université Claude Bernard Lyon 1 - Université de Lyon, LPSM UMR 8001 - Laboratoire de Probabilités, Statistique et Modélisation - UPD7 - Université Paris Diderot - Paris 7 - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique) |
Abstract: | On the one hand, recent advances in satellite imagery and remote sensing allow one to easily follow in near-real time the crop conditions all around the world. On the other hand, it has been shown that governmental agricultural reports contain useful news for the commodities market, whose participants react to this valuable information. In this paper, we investigate wether one can forecast some of the newsworthy information contained in the USDA reports through satellite data. We focus on the corn futures market over the period 2000-2016. We first check the well-documented presence of market reactions to the release of the monthly WASDE reports through statistical tests. Then we investigate the informational value of early yield estimates published in these governmental reports. Finally, we propose an econometric model based on MODIS NDVI time series to forecast this valuable information. Results show that market rationally reacts to the NASS early yield forecasts. Moreover, the modeled NDVI-based information is signicantly correlated with the market reactions. To conclude, we propose some ways of improvement to be considered for a practical implementation. |
Keywords: | NDVI,USDA reports,MODIS,Market information,Corn,Commodities market |
Date: | 2019–06–06 |
URL: | http://d.repec.org/n?u=RePEc:hal:wpaper:hal-02149355&r=all |
By: | Gill Newton (University of Cambridge) |
Abstract: | Part I of this paper describes a new 'Big Data' resource for historical mortality, the Family History Society burials dataset. This comprises 8.9 million individual records harmonised from Family History Society transcriptions of burial records in 4,200 English places with varying coverage dates spanning from about 1500 to 2000, and concentrated in the period 1600 to 1850. Adult and child burials have been separately identified using family relationship information, and post-1812 more precise age information is stated. Part II presents an exploratory analysis of burial seasonality and age at death using the Family History Society burials dataset. The seasonality of birth and baptism, which impacts on infant burial seasonality, is also considered using a subsample of four English counties (Suffolk, Cambridgeshire, Nottinghamshire and Lancashire). This research forms part of a Wellcome Trust funded research project led by Richard Smith at CAMPOP entitled ‘Migration, Mortality and Medicalisation: investigating the long-run epidemiological consequences of urbanisation 1600-1945’. |
Keywords: | seasonality, mortality, burials, baptisms, big data |
JEL: | N33 |
Date: | 2019–06–07 |
URL: | http://d.repec.org/n?u=RePEc:cmh:wpaper:34&r=all |
By: | André Binette; Karyne B. Charbonneau; Nicholas Curtis; Gabriela Galassi; Scott Counts; Justin Cranshaw |
Abstract: | Labour markets in Canada and around the world are evolving rapidly with the digital economy. Traditional data are adapting gradually but are not yet able to provide timely information on this evolution. |
Keywords: | Central bank research; Labour markets; Monetary Policy |
JEL: | C80 E24 J21 |
Date: | 2019–06 |
URL: | http://d.repec.org/n?u=RePEc:bca:bocsan:19-18&r=all |
By: | Wendy C.Y. LI; NIREI Makoto; YAMANA Kazufumi |
Abstract: | The Facebook-Cambridge Analytica data scandal demonstrates that there is no such thing as a free lunch in the digital world. Online platform companies exchange "free" digital goods and services for consumer data, reaping potentially significant economic benefits by monetizing data. The proliferation of "free" digital goods and services pose challenges not only to policymakers who generally rely on prices to indicate a good's value but also to corporate managers and investors who need to know how to value data, a key input of digital goods and services. In this research, we first examine the data activities for seven major types of online platforms based on the underlying business models. We show how online platform companies take steps to create the value of data, and present the data value chain to show the value-added activities involved in each step. We find that online platform companies can vary in the degree of vertical integration in the data value chain, and the variation can determine how they monetize their data and how much economic benefit they can capture. Unlike R&D that may depreciate due to obsolescence, data can produce new values through data fusion, a unique feature that creates unprecedented challenges in measurements. Our initial estimates indicate that data can have enormous value. Online platform companies can capture the most benefit from the data because they create the value of the data and because consumers lack knowledge regarding the value of their own data. As trends such as 5G and the Internet of Things are accelerating the accumulation speed of data types and volume, the valuation of data will have important policy implications for investment, trade, and growth. |
Date: | 2019–03 |
URL: | http://d.repec.org/n?u=RePEc:eti:dpaper:19022&r=all |
By: | Jozef Barunik; Cathy Yi-Hsuan Chen; Jan Vecer |
Abstract: | We propose how to quantify high-frequency market sentiment using high-frequency news from NASDAQ news platform and support vector machine classifiers. News arrive at markets randomly and the resulting news sentiment behaves like a stochastic process. To characterize the joint evolution of sentiment, price, and volatility, we introduce a unified continuous-time sentiment-driven stochastic volatility model. We provide closed-form formulas for moments of the volatility and news sentiment processes and study the news impact. Further, we implement a simulation-based method to calibrate the parameters. Empirically, we document that news sentiment raises the threshold of volatility reversion, sustaining high market volatility. |
Date: | 2019–05 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:1906.00059&r=all |