nep-big 2018-04-23 papers

on Big Data

Issue of 2018‒04‒23
nine papers chosen by
Tom Coupé
University of Canterbury

Incentive Compatible Estimators By Eliaz, Kfir; Spiegler, Ran
A Contemporary Sentiment Analysis Approach: Algorithm-Based Analysis of News Items within the Direct Real Estate Market in the US By Marcel Lang; Jessica Ruscheinsky; Jochen Hausler
Textual Analysis based Real-Estate-Sentiment Evaluation of Internet Data By Jessica Ruscheinsky; Marcel Lang; Wolfgang Schaefers
Selecting Directors Using Machine Learning By Isil Erel; Léa H. Stern; Chenhao Tan; Michael S. Weisbach
Using Supervised Learning to Select Audit Targets in Performance-Based Financing in Health: An Example from Zambia By Dhruv Grover; Sebastian Bauhoff; Jed Friedman
“A regional perspective on the accuracy of machine learning forecasts of tourism demand based on data characteristics” By Oscar Claveria; Enric Monte; Salvador Torra
Losses on Asset Returns Caused by Perception Gaps of Fundamental Values: Evidence from laboratory experiments By HIGASHIDA Keisaku; TANAKA Kenta; MANAGI Shunsuke
Predicting Value with Vacant Possession, Market Rent, and Value in Use for Housing in the Netherlands A case for investors in housing By van Sprundel; Werner Petrus Adrianus; Paul René Fran van Loon
Urban Growth, Spatial Change, Land Use, Housing and Population Relations: The Case of Ankara Province By Yesim Aliefendioglu; Sibel Canaz Sevgen; Gizem Var; Harun Tanrivermis

By:	Eliaz, Kfir; Spiegler, Ran
Abstract:	We study a model in which a "statistician" takes an action on behalf of an agent, based on a random sample involving other people. The statistician follows a penalized regression procedure: the action that he takes is the dependent variable's estimated value given the agent's disclosed personal characteristics. We ask the following question: Is truth-telling an optimal disclosure strategy for the agent, given the statistician's procedure? We discuss possible implications of our exercise for the growing reliance on "machine learning" methods that involve explicit variable selection.
Date:	2018–03
URL:	http://d.repec.org/n?u=RePEc:cpr:ceprdp:12804&r=big

A Contemporary Sentiment Analysis Approach: Algorithm-Based Analysis of News Items within the Direct Real Estate Market in the US

By:	Marcel Lang; Jessica Ruscheinsky; Jochen Hausler
Abstract:	Among others, Ghysels et al. (2007) found, that fundamental economic indicators alone are not able to fully explain the dynamics of commercial real estate returns in the United States. Even more clearly, Khadjeh Nassirtoussi et al. (2014) state that investors often change their purchasing behavior according to irrational and emotional assumptions. This work tries to investigate aspects of these yet insufficiently researched factors influencing the direct real estate market in more detail.With news being one of the major information sources for investors, it can be assumed that they might affect decision making processes and hence may also influence prices. This behavior should be especially interesting in the direct real estate market, as the buying process, compared to stocks, for example, is comparatively long. Accordingly, the question arises as to whether, media can be used to explain market dynamics when sentiments are extracted from news items with different algorithmic approaches.This idea is tested by looking at the commercial real estate market in the US. Thus a data set of about 40,000 SNL news items was collected covering the time span from from 2005 until 2015 where all news articles had to contain the keyword "Real Estate" and were geographically limited to being published in the United States. First of all, a Naïve Bayes classifying algorithm is applied to extract market sentiment, as it is among the most promising ones in performing neuro-linguistic programming tasks, according to Antweiler and Frank (2004). Outcomes are further compared to the results of a support vector machine algorithm developed by Cortes and Vapnik (1995), which, in short, creates a linear decision surface to allow for a sentiment classification of text samples. Both algorithms count among the most popular supervised learning methods.In order to quantify the impact of sentiment in the commercial real estate market, both the CoStar Composite Repeat Sale Indices, which represents all property classes and types, as well as the Moody’s/RCA Commercial Property Price Indices are applied to investigate the relationship between media sentiment and property returns in different sectors of the Real Estate market.To the best of our knowledge, this paper is the first work applying algorithm-based textual analysis on news articles to create a measure of media sentiment on the direct commercial real estate market in the US.
Keywords:	Algorithm based evaltuation; Sentiment measurement; Textual Analysis
JEL:	R3
Date:	2017–07–01
URL:	http://d.repec.org/n?u=RePEc:arz:wpaper:eres2017_212&r=big

Textual Analysis based Real-Estate-Sentiment Evaluation of Internet Data

By:	Jessica Ruscheinsky; Marcel Lang; Wolfgang Schaefers
Abstract:	Mathieu (2016) finds investor sentiment to be a significant factor in explaining REIT returns and REIT return volatility in the US. As Freybote and Seagraves (2016) show, particularly institutional investors tend to rely on the sentiment of specialized real estate investors, by analyzing the buy-sell-imbalance as an indicator of the demand for a particular asset. Based on the aforementioned factors, the objective of this paper is to complement the sentiment-investigating literature by applying two methodologies of textual analysis to real-estate-related newspaper headlines, in order to create sentiment measures and test relationships to US REIT prices. Furthermore, this study analyzes, if real estate related news do reflect, cause or enhance market performance in the real estate sector.For this purpose, a set of about 130,000 newspaper articles from four different US newspapers, with a time frame from 2005 until 2015, was collected. Following the approach of Bollen et al. (2009), sentiment analysis is applied with a term based methodology, by counting words that indicate positive or negative sentiment derived from different research approaches. Moreover, this dictionary-based methodology will be supplemented by and compared to the results of a machine learning tool, the "Google Prediction API". In consequence, qualitative information from news stories and posts are converted into a quantifiable measure achieved by analyzing the positive and negative tone of the information.To test the explanatory power of the created sentiment measures on REIT market movements in the US (FTSE EPRA/NAREIT), a regression model is employed. Due to the unique characteristics of REITs, variables to control for macroeconomic changes, the general stock market and also representatives for the direct real estate market are included in the model. Results show, that the created real estate sentiment measures have significant effects on the REIT market. Different measures were found to have varying relationships. Furthermore, the created sentiment measures are more powerful in times of a decreasing REIT market than in increasing times.To the best of our knowledge, this is the first research work applying textual analysis to capture sentiment in the securitized real estate market in the US. Furthermore, the broad collection of newspaper articles from four different sources is unique, as normally only one or two different sources have been used in literature so far.
Keywords:	REIT market; Sentiment Analysis; Textual Analysis; US
JEL:	R3
Date:	2017–07–01
URL:	http://d.repec.org/n?u=RePEc:arz:wpaper:eres2017_195&r=big

Selecting Directors Using Machine Learning

By:	Isil Erel; Léa H. Stern; Chenhao Tan; Michael S. Weisbach
Abstract:	Can an algorithm assist firms in their hiring decisions of corporate directors? This paper proposes a method of selecting boards of directors that relies on machine learning. We develop algorithms with the goal of selecting directors that would be preferred by the shareholders of a particular firm. Using shareholder support for individual directors in subsequent elections and firm profitability as performance measures, we construct algorithms to make out-of-sample predictions of these measures of director performance. We then run tests of the quality of these predictions and show that, when compared with a realistic pool of potential candidates, directors predicted to do poorly by our algorithms indeed rank much lower in performance than directors who were predicted to do well. Deviations from the benchmark provided by the algorithms suggest that firm-selected directors are more likely to be male, have previously held more directorships, have fewer qualifications and larger networks. Machine learning holds promise for understanding the process by which existing governance structures are chosen, and has potential to help real world firms improve their governance.
JEL:	G34 M12 M51
Date:	2018–03
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:24435&r=big

Using Supervised Learning to Select Audit Targets in Performance-Based Financing in Health: An Example from Zambia

By:	Dhruv Grover (University of California, San Diego); Sebastian Bauhoff (Center for Global Development); Jed Friedman (World Bank)
Abstract:	Independent verification is a critical component of performance-based financing (PBF) in health care, in which facilities are offered incentives to increase the volume of specific services but the same incentives may lead them to over-report. We examine alternative strategies for targeted sampling of health clinics for independent verification. Specifically, we empirically compare several methods of random sampling and predictive modeling on data from a Zambian PBF pilot that contains reported and verified performance for quantity indicators of 140 clinics. Our results indicate that machine learning methods, particularly Random Forest, outperform other approaches and can increase the cost-effectiveness of verification activities.
Keywords:	performance-based financing, performance verification, audits, machine learning, health care finance, health care providers
JEL:	C20 C52 I15 I18
Date:	2018–04–11
URL:	http://d.repec.org/n?u=RePEc:cgd:wpaper:481&r=big

“A regional perspective on the accuracy of machine learning forecasts of tourism demand based on data characteristics”

By:	Oscar Claveria (AQR-IREA AQR-IREA, University of Barcelona (UB). Tel. +34-934021825; Fax. +34-934021821. Department of Econometrics, Statistics and Applied Economics, University of Barcelona, Diagonal 690, 08034 Barcelona, Spain); Enric Monte (Department of Signal Theory and Communications, Polytechnic University of Catalunya (UPC)); Salvador Torra (Riskcenter-IREA, Department of Econometrics and Statistics, University of Barcelona (UB))
Abstract:	In this work we assess the role of data characteristics in the accuracy of machine learning (ML) tourism forecasts from a spatial perspective. First, we apply a seasonal-trend decomposition procedure based on non-parametric regression to isolate the different components of the time series of international tourism demand to all Spanish regions. This approach allows us to compute a set of measures to describe the features of the data. Second, we analyse the performance of several ML models in a recursive multiple-step-ahead forecasting experiment. In a third step, we rank all seventeen regions according to their characteristics and the obtained forecasting performance, and use the rankings as the input for a multivariate analysis to evaluate the interactions between time series features and the accuracy of the predictions. By means of dimensionality reduction techniques we summarise all the information into two components and project all Spanish regions into perceptual maps. We find that entropy and dispersion show a negative relation with accuracy, while the effect of other data characteristics on forecast accuracy is heavily dependent on the forecast horizon.
Keywords:	STL decomposition, non-parametric regression, time series features, forecast accuracy, machine learning, tourism demand, regional analysis. JEL classification:C45, C51, C53, C63, E27, L83.
Date:	2018–03
URL:	http://d.repec.org/n?u=RePEc:ira:wpaper:201805&r=big

Losses on Asset Returns Caused by Perception Gaps of Fundamental Values: Evidence from laboratory experiments

By:	HIGASHIDA Keisaku; TANAKA Kenta; MANAGI Shunsuke
Abstract:	A large number of studies have tackled the question of asset bubbles, in which whether or not market participants are able to calculate fundamental values is considered to play a key role in reducing bubbles. Contrary to the existing literature on uncertainty, this study conducts a series of laboratory experiments, wherein subjects cannot calculate objective expected returns with certainty. In such cases, gaps between objective and subjective expected returns (perception gaps) arise. The purpose of this study is to clarify (i) how asset prices fluctuate and (ii) if perception gaps lead to inefficient transactions. Moreover, (iii) we estimate the losses caused by perception gaps. Our estimation results indicate that perception gaps linger across rounds, and, accordingly, these gaps may generate earnings losses. Moreover, we find that the greater a perception gap of a subject, the greater is the inefficiency from his/her transactions. Traders now are using artificial intelligence (AI) for decision making. We also discuss policy implications on the introduction of AI into asset markets.
Date:	2018–02
URL:	http://d.repec.org/n?u=RePEc:eti:dpaper:18008&r=big

Predicting Value with Vacant Possession, Market Rent, and Value in Use for Housing in the Netherlands A case for investors in housing

By:	van Sprundel; Werner Petrus Adrianus; Paul René Fran van Loon
Abstract:	Real estate is considered to be an imperfect market. It does not meet the standards of a hypothetical perfectly competitive market. This is partly due to information asymmetry. The Dutch housing market, however, seems to be moving from a market with limited information, to a market with an increasing amount of available information. This reasoning makes housing increasingly attractive for data analytics. Several data solutions applied have been programmed to date. Automated Valuation Models (AVMs) are one example, where data is mainly used to predict the value with vacant possession of owner-occupied housing. Most AVMs apply a hedonic pricing model where "value" is decomposed into housing and locational characteristics. These characteristics from rental and owner-occupied housing transactions and valuations have been collected from databases of Colliers International to construct an AVM. Indexation plays a crucial role in AVMs to correct for the difference in time of historic transactions. This is where a hedonic index for Dutch owner-occupied housing is constructed and tested with 7,500 repeated sales. Compared to the index of the Dutch Central Bureau of Statistics, this index is 32%more accurate in areas with a high urban density and 19%in areas with a low urban density. This is due to the inclusion of housing and locational characteristics. Apart from an AVM to predict the value with vacant possession for owner-occupied housing, a similar methodology can be used to construct an AVM to predict the market rent of rental housing. Market rent can be compared to the actual rent and the difference is an input for a machine learning algorithm that combines both AVMs and predicts the value in use. This algorithm has been trained and tested on anonymised data from valuation models. The accuracy becomes better with the inclusion of Colliers’ valuation experts giving their opinion on future trends of the ratio between value with vacant possession and value in use. This makes the algorithm particularly driven by "man and machine", where the experts rely on knowledge and experience. This combination has led to an algorithm to go from value with vacant possession to value in use (predicting the ratio) with a mean absolute error of 2.1%and a median absolute error of 0.9%, tested on 1,400 observations. The value in use finds its use case especially with investors who want to get a quick and accurate estimate the value of their housing portfolio.
Keywords:	Machine Learning; Market Rent; Predicting; Value in Use; Value with Vacant Possession
JEL:	R3
Date:	2017–07–01
URL:	http://d.repec.org/n?u=RePEc:arz:wpaper:eres2017_152&r=big

Urban Growth, Spatial Change, Land Use, Housing and Population Relations: The Case of Ankara Province

By:	Yesim Aliefendioglu; Sibel Canaz Sevgen; Gizem Var; Harun Tanrivermis
Abstract:	Urban development and shaping of spaces are closely associated with a city’s economic activities and demographic characteristics. After the announcement of Ankara as the capital city in 1923, the size of the city was understood to be very insufficient and a new zoning plan was started to combat the inadequate housing assets and even cooperatives were established to accelerate housing construction for public officers. However, it is noteworthy that the development was too slow until the 1950s, after which date the urban scale grew rapidly due to the growing population and diversifying economic activities, new development areas turned to settlements, and naturally, natural vegetation was rapidly destroyed. Despite the increasing housing stock in the city, the demand for and prices of real estate in general, and particularly housing, are very high, which encourages allocation of new settlement areas as residential housing areas. A process, whereby an increased density in residential areas is provisioned with the new development plans made by the Municipalities, the Housing Development Administration, and the other public authorities, is experienced and with the urban transformation and renewal projects, the current housing stock is being swiftly renewed. In these circumstances, space use and spatial development in the city are managed based on revenue or economic rent rather than public policy and sustainability goals while fringe urban development also leads to increased infrastructure investment and local service costs.In this paper, the population, number of housing, and spatial development relationships in the 1923-2017 period were first dealt with based on macro data and the results of that study were analyzed in integration with remote sensing data and the man-made structures and the development in residential areas identified with satellite data were comparatively evaluated. ın the second stage, factors affecting the change in the total land assets in the province were examined with a regression model and it was established that the process of transformation of land to land lot was shaped by the impact of demographic factors. In order to regulate the use of space at the provincial and regional levels, there seems to be a requirement to primarily control the migration to the city; manage urban development based on public policies; restructure planning policies on equitable sharing of the economic rent created by planning activities in the city; and t
Keywords:	housing development; Land Use Change; public policy requirements; Urban space and development; vegetation and man-made structures
JEL:	R3
Date:	2017–07–01
URL:	http://d.repec.org/n?u=RePEc:arz:wpaper:eres2017_388&r=big

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.