nep-big New Economics Papers
on Big Data
Issue of 2023‒11‒27
twenty papers chosen by
Tom Coupé, University of Canterbury


  1. Opportunistic Political Central Bank Coverage: Does media coverage of ECB's Monetary Policy Impacts German Political Parties' Popularity? By Hugo Oriola; Matthieu Picault
  2. Deep Learning and Bayesian Calibration Approach to Hourly Passenger Occupancy Prediction in Beijing Metro: A Study Exploiting Cellular Data and Metro Conditions By Sun, He; Cabras, Stefano
  3. Deepfake Detection With and Without Content Warnings By Lewis, Andrew; Vu, Patrick; Duch, Raymond; Chowdhury, Areeq
  4. Learning Probability Distributions of Intraday Electricity Prices By Jozef Barunik; Lubos Hanus
  5. The inequalities of different dimensions of visible street urban green space provision: a machine learning approach By Wang, Ruoyu; Cao, Mengqiu; Yao, Yao; Wu, Wenjie
  6. Remote work across jobs, companies and space By Nicholas Bloom; Steven J. Davis; Stephen Hansen; Peter Lambert; Raffaella Sadun; Bledi Taska
  7. Sensory Experiences in Retail: Linking Visitors’ Review with Commercial Revitalization By Jeongseob Kim; Jiwoong Jeong
  8. How big is the real estate property? Using zero-shot vs. rule-based classificationfor size extraction in real estate contracts By Julia Angerer; Wolfgang Brunauer
  9. The impact of social media sentiment on US REITs – a glimpse through the lense of ESG-conscious investors By Sophia Bodensteiner
  10. Changing the Location Game – Improving Location Analytics with the Help of Explainable AI By Moritz Stang; Bastian Krämer; Marcelo Del Cajias; Wolfgang Schäfers
  11. Location Analysis and Pricing of Amenities By Anett Wins; Marcelo Del Cajias
  12. The Future of Real Estate Market? Exploring the Potential of Big Data Analytics in South Africa By Koech Cheruiyot; Lungile Gamede
  13. Multimodal Information Fusion for the Prediction of the Condition of Condominiums By Miroslav Despotovic; David Koch; Matthias Zeppelzauer; Stumpe Eric; Simon Thaler; Wolfgang A. Brunauer
  14. Prediction of energy consumption in existing public buildings using gray-box based models By Sinan Güne; Mustafa Tombul; Harun Tanrivermis
  15. Remote work across jobs, companies and space By Nicholas Bloom; Steven J. Davis; Stephen Hansen; Peter Lambert; Raffaella Sadun; Bledi Taska
  16. Visual Bias By Giulia Caprini
  17. Benefits and limitations of machine learning methods in the inhomogeneous real estate market of mixed-use asset class By Matthias Soot; Sabine Horvath; Hans-Berndt Neuner; Alexandra Weitkamp
  18. Maximizing Portfolio Predictability with Machine Learning By Michael Pinelis; David Ruppert
  19. The Fundamental Properties, Stability and Predictive Power of Distributional Preferences By Fehr, Ernst; Epper, Thomas; Senn, Julien
  20. Machine Learning for Blockchain: Literature Review and Open Research Questions By Zhang, Luyao

  1. By: Hugo Oriola; Matthieu Picault
    Abstract: We define the concept of Opportunistic Political Central Bank Coverage (OPCBC) which corresponds to an opportunistic modification of parties’ popularity induced by media coverage of monetary policy. More precisely, we suppose that the treatment of monetary policy in the press has a significant impact on the popularity of national political parties prior to an election. To investigate on the existence of this concept, we collect monthly popularity ratings for 6 German political forces on the period between January 2005 and December 2021. Then, we measure media coverage through a textual analysis on more than 26.000 press articles from 6 different German newspapers. Finally, we estimate popularity functions for these German political parties in which we introduce our textual measures interacted with a dummy taking the value 1 in the month prior to an election. Our analysis underlines the existence of OPCBCs in Germany in the month preceding federal elections and elections to the European Parliament. This result is robust to the use of a SUR model, alternative pre-electoral periods, the implementation of two different tone analysis, the use of Google Trends data and the interest of the public for members of the ECB. Finally, it seems that the existence of OPCBCs depend on the partisanship of the media studied.
    Keywords: European Central Bank; Press; Textual Analysis; Tone Analysis; Elections; Political Cycles; Germany
    JEL: E58 D72 P35
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:drm:wpaper:2023-30&r=big
  2. By: Sun, He; Cabras, Stefano
    Abstract: In In burgeoning urban landscapes, the proliferation of the populace necessitates swift and accurate urban transit solutions to cater to the citizens' commuting requirements. A pivotal aspect of fostering optimized traffic management and ensuring resilient responses to unanticipated passenger surges is precisely forecasting hourly occupancy levels within urban subway systems. This study embarks on delineating a two-tiered model designed to address this imperative adeptly: 1. Preliminary Phase - Employing a Feed Forward Neural Network (FFNN): In the initial phase, a Feed Forward Neural Network (FFNN) is employed to gauge the occupancy levels across various subway stations. The FFNN, a class of artificial neural networks, is well-suited for this task because it can learn from the data and make predictions or decisions without being explicitly programmed to perform the task. Through a series of interconnected nodes, known as neurons, arranged in layers, the FFNN processes the input data, adjusts its weights based on the error of its predictions, and optimizes the network for accurate forecasting. For the random process of occupation levels in time and space, this phase encapsulates the so-called process filtration, wherein the underlying patterns and dynamics of subway occupancy are captured and represented in a structured format, ready for subsequent analysis. The estimates garnered from this phase are pivotal and form the foundation for the subsequent modelling stage. 2. Subsequent Phase - Implementing a Bayesian Proportional-Odds Model with Hourly Random Effects: With the estimates from the FFNN at disposal, the study transitions to the subsequent phase wherein a Bayesian Proportional-Odds Model is utilized. This model is particularly adept for scenarios where the response variable is ordinal, as in the case of occupancy levels (Low, Medium, High). The Bayesian framework, underpinned by the principles of probability, facilitates the incorporation of prior probabilities on model parameters and updates this knowledge with observed data to make informed predictions. The unique feature of this model is the incorporation of a random effect for hours, which acknowledges the inherent variability across different hours of the day. This is paramount in urban transit systems where passenger influx varies significantly with the hour. The synergy of these two models facilitates calibrated estimations of occupancy levels, both conditionally (relative to the sample) and unconditionally (on a detached test set). This dual-phase methodology furnishes analysts with a robust and reliable insight into the quality of predictions propounded by this model. This, in turn, avails a data-driven foundation for making informed decisions in real-time traffic management, emergency response planning, and overall operational optimization of urban subway systems. The model expounded in this study is presently under scrutiny for potential deployment by the Beijing Metro Group Ltd. This initiative reflects a practical stride towards embracing sophisticated analytical models to ameliorate urban transit management, thereby contributing to the broader objective of fostering sustainable and efficient urban living environments amidst the surging urban populace.
    Keywords: Bayesian Model Calibration; Deep Learning; Integrated Nested Laplace Approximation; Proportional Odds Model; Spatial-Temporal Modelling
    Date: 2023–11–07
    URL: http://d.repec.org/n?u=RePEc:cte:wsrepe:38783&r=big
  3. By: Lewis, Andrew; Vu, Patrick; Duch, Raymond (University of Oxford); Chowdhury, Areeq
    Abstract: The rapid advancement of ‘deepfake’ video technology — which uses deep learning artificial intelligence algorithms to create fake videos that look real — has given urgency to the question of how policymakers and technology companies should moderate inauthentic content. We conduct an experiment to measure people’s alertness to and ability to detect a high-quality deepfake amongst a set of videos. First, we find that in a natural setting with no content warnings, individuals who are exposed to a deepfake video of neutral content are no more likely to detect anything out of the ordinary (32.9%) compared to a control group who viewed only authentic videos (34.1%). Second, we find that when individuals are given a warning that at least one video in a set of five videos is a deepfake, only 21.6% of respondents correctly identify the deepfake as the only inauthentic video, while the remainder erroneously select at least one genuine video as a deepfake.
    Date: 2023–10–15
    URL: http://d.repec.org/n?u=RePEc:osf:osfxxx:cb7rw&r=big
  4. By: Jozef Barunik; Lubos Hanus
    Abstract: We propose a novel machine learning approach to probabilistic forecasting of hourly intraday electricity prices. In contrast to recent advances in data-rich probabilistic forecasting that approximate the distributions with some features such as moments, our method is non-parametric and selects the best distribution from all possible empirical distributions learned from the data. The model we propose is a multiple output neural network with a monotonicity adjusting penalty. Such a distributional neural network can learn complex patterns in electricity prices from data-rich environments and it outperforms state-of-the-art benchmarks.
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2310.02867&r=big
  5. By: Wang, Ruoyu; Cao, Mengqiu; Yao, Yao; Wu, Wenjie
    Abstract: Awareness is growing that the uneven provision of street urban green space (UGS) may lead to environmental injustice. Most previous studies have focused on the over-head perspective of street UGS provision. However, only a few studies have evaluated the disparities in visible street UGS provision. While a plethora of studies have focused on a single dimension of visible UGS provision, no previous studies have developed a framework for systematically evaluating visible street UGS provision. This study therefore proposes a novel 4 ‘A′ framework, and aims to assess different dimensions (namely: availability; accessibility; attractiveness; and aesthetics) of visible street UGS provision, using Beijing as a case study. It investigates inequities in different dimensions of visible street UGS provision. In addition, it also explores the extent to which a neighbourhood's economic level is associated with different dimensions of visible street UGS. Our results show that, in Beijing, the four chosen dimensions of visible street UGS provision significantly differ in terms of spatial distribution and the association between them. Furthermore, we found that the value of the Gini index and Moran's I index for attractiveness and aesthetics are higher than those for availability and accessibility, which indicates a more unequal distribution of visible street UGS from a qualitative perspective. We also found that a community's economic level is positively associated with attractiveness and aesthetics, while no evidence was found to support the claim that the economic level of a community associated with availability and accessibility. This study suggests that visible street UGS provision is unequal; therefore, urban planning policy should pay more attention to disparities in visible street UGS provision, particularly in urban areas.
    Keywords: 4 ‘A′ framework; Beijing; disparity; machine learning; street view data; visible street urban green space; 41971194 and 41801306 ; 52278085
    JEL: C1
    Date: 2022–12–01
    URL: http://d.repec.org/n?u=RePEc:ehl:lserod:117694&r=big
  6. By: Nicholas Bloom; Steven J. Davis; Stephen Hansen; Peter Lambert; Raffaella Sadun; Bledi Taska
    Abstract: The pandemic catalyzed an enduring shift to remote work. To measure and characterize this shift, we examine more than 250 million job vacancy postings across five English-speaking countries. Our measurements rely on a state-of-the-art language-processing framework that we fit, test, and refine using 30, 000 human classifications. We achieve 99% accuracy in flagging job postings that advertise hybrid or fully remote work, greatly outperforming dictionary methods and also outperforming other machine-learning methods. From 2019 to early 2023, the share of postings that say new employees can work remotely one or more days per week rose more than three-fold in the U.S and by a factor of five or more in Australia, Canada, New Zealand and the U.K. These developments are highly non-uniform across and within cities, industries, occupations, and companies. Even when zooming in on employers in the same industry competing for talent in the same occupations, we find large differences in the share of job postings that explicitly offer remote work.
    Keywords: remote work, pandemic
    Date: 2023–03–17
    URL: http://d.repec.org/n?u=RePEc:cep:poidwp:067&r=big
  7. By: Jeongseob Kim; Jiwoong Jeong
    Abstract: This study explores the importance of sensory experiences of visitors to commercial streets and their role in vitalizing commercial districts, based on big data analysis using review data of social media. With the growth of online commerce, the vitality and function of commercial streets has been declining. According to sensory marketing theory, it is essential to develop commercial spaces that allow consumers to have positive experiences directly with their five senses, which is difficult to replace with online commerce, in order to attract more visitors. Urban designers and scholars have also emphasized the importance of sensory experiences in urban open spaces to revitalize commercial areas. Visitors evaluate their experiences with their five senses - sight, hearing, touch, smell, and taste - in offline commercial streets or stores, and unique and enjoyable sensory experiences can lead them to stay longer or revisit the places. Special visit experiences are often shared on social media, and these reviews contribute to attracting new visitors. Therefore, this study examines whether stores with positive sensory experiences on the commercial street attract more visitors. To explore the relationship between sensory experiences and commercial revitalization, this study analyzes Google review data for a place of interest (POI) in Seoul between 2017 and 2021 using text-mining techniques. Based on a dictionary-based classification of five sense-related experiences from the review data, the sensory experiences of each POI are quantified. Then, the sensory experiences of each POI are aggregated into those of a commercial block. Finally, the connection between floating population and sensory experiences at a commercial block level is analyzed using a spatial econometric model. The findings of this study could provide a new perspective in understanding the characteristics of urban commercial districts and be the basis for developing revitalization strategies of commercial areas.
    Keywords: commercial revitalization; Retail; sensory experience; social media data
    JEL: R3
    Date: 2023–01–01
    URL: http://d.repec.org/n?u=RePEc:arz:wpaper:eres2023_138&r=big
  8. By: Julia Angerer; Wolfgang Brunauer
    Abstract: Due to the massive amount of real-estate related text documents, the necessity to automatically process the data is evident. Especially purchase contracts contain valuable transaction and property description information, like usable area. In this research project, a natural language processing (NLP) approach using open-source transformer-based models was investigated. The potential of pre-trained language models for zero-shot classification is highlighted, especially in cases where no training data is available. This approach is particularly relevant for analyzing purchase contracts in the legal domain, where it can be challenging to manually extract the information or to build comprehensive regular expression rules manually. A data set consisting of classified contract sentence parts, each containing onesize and context information, was created manually for model comparison. The experiments conducted in this study demonstrate that pre-trained language models can accurately classify sentence parts containing a size, with varying levels of performance across different models. The results suggest that pre-trained language models can be effective tools for processing textual data in the real estate and legal domains and can provide valuable insights into the underlying structures and patterns in such data. Overall, this research contributes to the understanding of the capabilities of pre-trained language models in NLP and highlights their potential for practical applications in real-world settings, particularly in the legal domain where there is a large volume of textual data and annotated training data is not available.
    Keywords: contract documents; Information Extraction; Natural Language Processing; zero-shot classification
    JEL: R3
    Date: 2023–01–01
    URL: http://d.repec.org/n?u=RePEc:arz:wpaper:eres2023_304&r=big
  9. By: Sophia Bodensteiner
    Abstract: In recent years, social media platforms have become vibrant online platforms where all kinds of market participants share their opinions on equity markets. This provision of opinions and thus of information has attracted the interest of the financial industry. Parallel to this trend, the finance and real estate industry have also recognized the importance of environmental, social, governance (ESG), due in particular to increasing pressure from the public. Bringing these developments together, this paper uses a textual analysis approach to analyze the public’s opinion on social media regarding the ESG performance of real estate related companies. The aim of the analysis is to examine how the public opinion on ESG is reflected in Twitter data and how it can be used to predict the performance of US REITs (Real Estate Investment Trusts). Therefore, using a three-step procedure, this paper first identifies ESG-related tweets, then measures the sentiment of those tweets using different natural language processing techniques and, using the results of the sentiment analysis, calculates the impact of those tweets on the performance of the corresponding company. The first step is achieved, by employing a Global Vectors (GloVe) model, which allows to select tweets based on ESG-related keywords of the corpus. In the second steps a lexicon-based method is applied to create a sentiment index, which is the baseline for the following analysis. Besides, a CNN-LSTM based sentiment index will be created, which might be more powerful in capturing the linguistic complexity of language in social media. Last, the sentiment indices are compared to the performance of the corresponding company in order to determine any correlation and predictive power. Our results not only show a significant correlation between the sentiment indices and the performance of the companies, but also a significant predictive power with positive tweets being associated with better performance and vice versa. These findings suggest that Twitter data can be a valuable source for predicting ESG performance and that using word embedding models, such as GloVe, and lexicon-based methods for sentiment analysis can improve the accuracy of the results.
    Keywords: Esg; GloVe; Sentiment Analysis; Twitter
    JEL: R3
    Date: 2023–01–01
    URL: http://d.repec.org/n?u=RePEc:arz:wpaper:eres2023_162&r=big
  10. By: Moritz Stang; Bastian Krämer; Marcelo Del Cajias; Wolfgang Schäfers
    Abstract: Besides its structural and economic characteristics, the location of a property is probably one of the most important determinants of its underlying value. In contrast to property valuations, there are hardly any approaches to date that evaluate the quality of a real estate location in an automated manner. The reasons are the complexity, the number of interactions and the non-linearities underlying the quality specifications of a certain location. These are difficult to represent by traditional econometric models. The aim of this paper is thus to present a newly developed data-driven approach for the assessments of real estate locations. By combining a state-of-the-art machine learning algorithm and the local post-hoc model agnostic method of Shapley Additive Explanations, the newly developed SHAP location score is able to account for empirical complexities, especially for non-linearities and higher order interactions. The SHAP location score represents an intuitive and flexible approach based on econometric modeling techniques and the basic assumptions of hedonic pricing theory. The approach can be applied post-hoc to any common machine learning method and can be flexibly adapted to the respective needs. This constitutes a significant extension of traditional urban models and offers many advantages for a wide range of real estate players.
    Keywords: Automated Location Valuation Model; Explainable AI; Location Analytics; Machine Learning
    JEL: R3
    Date: 2023–01–01
    URL: http://d.repec.org/n?u=RePEc:arz:wpaper:eres2023_139&r=big
  11. By: Anett Wins; Marcelo Del Cajias
    Abstract: Modern location analysis evaluates location attractiveness almost in real time, combining the knowledge of local real estate experts and artificial intelligence. In this paper we develop an algorithm – The Amenities Magnet algorithm – that measures and benchmarks the attractiveness of locations based on the urban amenities’ footprint of the surrounding area, grouped according to relevance for residential purposes and taking distance information from Google and OpenStreetMap into account. As cities are continuously evolving, benchmarking locations’ amenity-wise change of attractiveness over time helps to detect upswing areas and thus supports investment decisions. According to the 15-minute city concept, the welfare of residents is proportional to the amenities accessible within a short walk or bike ride. Measuring individual scorings for the seven basic living needs results in a more detailed, disaggregated location assessment. Based on these insights, an advanced machine learning (ML) algorithm under the Gradient Boosting framework (XGBoost) is adapted to model residential rental prices for the region Greater Manchester, United Kingdom, and achieves an improved predictive power. To extract interpretable results and quantify the contribution of certain amenities to rental prices eXplainable Artificial Intelligence (XAI) methods are used. Tenants' willingness to pay (WTP) for accessibility to amenities varies by type. In Manchester tram stops, bars, schools and the proximity to the city center in particular emerged as relevant value drivers. Even if the results of the case study are not generally applicable, the methodology can be transferred to any market in order to reveal regional patterns.
    Keywords: Amenities Magnet algorithm; location analysis; residential rental pricing; XGBoost
    JEL: R3
    Date: 2023–01–01
    URL: http://d.repec.org/n?u=RePEc:arz:wpaper:eres2023_102&r=big
  12. By: Koech Cheruiyot; Lungile Gamede
    Abstract: Big data is noted as being useful in various business markets, including the real estate market. By making use of data analytics to obtain actionable information significant from analyzing significant quantity of data, businesses may have an advantage over rivals in the market. This paper examines the current applications of big data analytics, its potential uses, as well as potential barriers to its use in the South African real estate market. A qualitative approach was adopted to administer semi-structured interviews to big data analytics specialists in the South African real estate market. Initial results show that Proptech market is still in its infancy in general and there is limited use of big data analytics in the South African real estate market in particular. However, there are benefits to using the technology, such as more efficient and effective customer service. Major challenges include the fact that the South African market is not ready to use it since there is no clarity or lack of knowledge that correlate the levels of investments needed and the accruing benefits. Challenges related to storage systems and costs and the scarcity of skills for technologies to support big data and big data analytics also prevail.
    Keywords: big data analytics; Proptech market; Real Estate Market; South Africa
    JEL: R3
    Date: 2023–01–01
    URL: http://d.repec.org/n?u=RePEc:afr:wpaper:afres2023-023&r=big
  13. By: Miroslav Despotovic; David Koch; Matthias Zeppelzauer; Stumpe Eric; Simon Thaler; Wolfgang A. Brunauer
    Abstract: Today's data analysis techniques allow for the combination of multiple different data modalities, which should also allow for more accurate feature extraction. In our research, we leverage the capacity of machine learning tools to build a model with shared neural network layers and multiple inputs that is more flexible and allows for more robust extraction of real estate attributes. The most common form of data for a real estate assessment is data structured in tables, such as size or year of construction, but also descriptions of the real estate. Other data that can also be easily found in real estate listings are visual data such as exterior and interior photographs. In the presented approach, we fuse textual information and variable quantity of interior photographs per condominium for condition assessment and investigate how multiple modalities can be efficiently combined using deep learning. We train and test the performance of a pre-trained convolutional neural network fine-tuned with variable quantity of interior views of selected condominiums. In parallel, we train and test the pre-trained bidirectional encoder-transformer language model using text data from the same observations. Finally, we build an experimental neural network model using both modalities for the same task and compare the performance with the models trained with a single modality. Our initial assumption that coupling both networks would lead to worse performance compared to fine-tuned single-modal models was not confirmed, as we achieved the better performance with the proposed multi-modal model despite the impairment of a very unbalanced dataset. The novelty here is the multimodal modeling of variable quantity of real estate-related attributes in a unified model that integrates all available modalities and can thus use their complementary information. With the presented approach, we intend to extend the existing information extraction methods for automated valuation models, which in turn would contribute to a higher transparency of valuation procedures and thus to more reliable statements about the value of real estate.
    Keywords: Avm; Computer vision; Hedonic Pricing; NLP
    JEL: R3
    Date: 2023–01–01
    URL: http://d.repec.org/n?u=RePEc:arz:wpaper:eres2023_22&r=big
  14. By: Sinan Güne; Mustafa Tombul; Harun Tanrivermis
    Abstract: Energy consumption prediction on buildings can help building owners and operators to reduce energy costs, reduce environmental impact, improve occupant comfort, and optimize building performance. Study aims to develop a prediction model for energy consumption prediction in university campus buildings using machine learning techniques with time series and physics/engineering-based datasets. Time series energy consumption data sets from existing buildings, as well as building physics/engineering data, will be analyzed to estimate campus scale energy consumption. Time series data will be used for heating/cooling and lighting, and physics/engineering data will be used for outdoor data such as outdoor air temperature, relative humidity, and building specific characteristics such as building floor area, floor height, and material type. To improve prediction accuracy, a simulation study will be conducted using a physics-based approach, and a model will be developed. The results of this approach will be used as input for the data-based approach, and a hybrid model will be presented for prediction using deep learning techniques such as LSTM and RNN. Within the scope of the study, studies on energy consumption prediction of existing buildings generally use models containing time series datasets on energy consumption or models containing building physical information. Considering that each of these data impacts energy consumption, evaluating data together helps make more accurate consumption forecasts. However, evaluating these data together is a big problem in itself. Within the scope of the study, predictions will be made for using these two data types together and the advantages and shortcomings of the model results compared to data-based models will be discussed. While previous research has primarily focused on either time series datasets or building physical information, this study will think to be one of the first to evaluate these two data types together in order to provide more accurate energy consumption predictions and generalizable results.
    Keywords: Energy Consumption; Energy Efficiency; gray-box based model; Machine Learning
    JEL: R3
    Date: 2023–01–01
    URL: http://d.repec.org/n?u=RePEc:arz:wpaper:eres2023_255&r=big
  15. By: Nicholas Bloom; Steven J. Davis; Stephen Hansen; Peter Lambert; Raffaella Sadun; Bledi Taska
    Abstract: The pandemic catalyzed an enduring shift to remote work. To measure and characterize this shift, we examine more than 250 million job vacancy postings across five English-speaking countries. Our measurements rely on a state-of-the-art language-processing framework that we fit, test, and refine using 30, 000 human classifications. We achieve 99% accuracy in flagging job postings that advertise hybrid or fully remote work, greatly outperforming dictionary methods and also outperforming other machine-learning methods. From 2019 to early 2023, the share of postings that say new employees can work remotely one or more days per week rose more than three-fold in the U.S and by a factor of five or more in Australia, Canada, New Zealand and the U.K. These developments are highly non-uniform across and within cities, industries, occupations, and companies. Even when zooming in on employers in the same industry competing for talent in the same occupations, we find large differences in the share of job postings that explicitly offer remote work.
    Keywords: Covid-19, hybrid working, employment
    Date: 2023–07–14
    URL: http://d.repec.org/n?u=RePEc:cep:cepdps:dp1935&r=big
  16. By: Giulia Caprini
    Abstract: I study the non-verbal language of leading pictures in online news and its influence on readers’ opinions. I develop a visual vocabulary and use a dictionary approach to analyze around 300, 000 photos published in US news in 2020. I document that the visual language of US media is politically partisan and significantly polarised. I then demonstrate experimentally that the news’ partisan visual language is not merely distinctive of outlets’ ideological positions, but also promotes them among readers. In a survey experiment, identical articles with images of opposing partisanships induce different opinions, tilted towards the pictures’ ideological poles. Moreover, as readers react more to images aligned with their viewpoint, the news’ visual bias causes issue polarization to increase. Finally, I find that media can effectively slant readers using neutral texts and partisan pictures: this result calls for the inclusion of image scrutiny in news assessments and fact checking, today largely text-based.
    Date: 2023–05–03
    URL: http://d.repec.org/n?u=RePEc:oxf:wpaper:1016&r=big
  17. By: Matthias Soot; Sabine Horvath; Hans-Berndt Neuner; Alexandra Weitkamp
    Abstract: Property rates, usually used in the income approach, can be determined in a reverse income approach model for every transaction where the net yield is known. The height of the property rates represents the risk of the asset that is traded. The height of the yield, therefore, depends on influencing parameters that can explain the risk. A classical approach to investigate these influences is a multiple linear regression model. In an inhomogeneous market, the investigation leads to bad results for the classic approach. In this work, we will compare different parametric and non-parametric methods to model the height of the rates. Thus, we present the application of Artificial Neural Networks (ANN) as well as Random Forest Regression (RFR) as non-parametric methods and compare the results with parametric approaches like the classic multiple linear regression (MLR) as well as a Geographical Weighted Regression (GWR). The dataset consists of a submarket of mixed-use-buildings (residential and commercial) in the federal state of Lower Saxony (Germany). The asset class of mixed-use is only traded 200 times per year in the federal state with more than 8 million inhabitants. Therefore, the investigated sample (including 5 years of data) comes from the official purchase price database. Beside the building characteristics (No. of floors, year of construction and average rent per sqm), locational parameters are considered (standard land value, population forecast, and population structure). Due to the inhomogeneous rural, urban and socio-demographic environment, the models can be complex. The evaluation of the different approaches led to inhomogeneous results. No perfect method can be determined for the dataset. Our goal is to understand and interpret the different results in the view of how the methods work. Therefore, we investigate the results by means of the used influencing parameters (model size), sample sizes and the influence/significance of the parameters on the result. The patterns found are discussed in comparison of methods and in the context of the data. We conclude our contribution by formulating the possibilities and limitations.
    Keywords: Complexity; Machine Learning; mixed use buildings
    JEL: R3
    Date: 2023–01–01
    URL: http://d.repec.org/n?u=RePEc:arz:wpaper:eres2023_241&r=big
  18. By: Michael Pinelis; David Ruppert
    Abstract: We construct the maximally predictable portfolio (MPP) of stocks using machine learning. Solving for the optimal constrained weights in the multi-asset MPP gives portfolios with a high monthly coefficient of determination, given the sample covariance matrix of predicted return errors from a machine learning model. Various models for the covariance matrix are tested. The MPPs of S&P 500 index constituents with estimated returns from Elastic Net, Random Forest, and Support Vector Regression models can outperform or underperform the index depending on the time period. Portfolios that take advantage of the high predictability of the MPP's returns and employ a Kelly criterion style strategy consistently outperform the benchmark.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2311.01985&r=big
  19. By: Fehr, Ernst (University of Zurich); Epper, Thomas (CNRS); Senn, Julien (University of Zurich)
    Abstract: Parsimony is a desirable feature of economic models but almost all human behaviors are characterized by vast individual variation that appears to defy parsimony. How much parsimony do we need to give up to capture the fundamental aspects of a population's distributional preferences and to maintain high predictive ability? Using a Bayesian nonparametric clustering method that makes the trade-off between parsimony and descriptive accuracy explicit, we show that three preference types - an inequality averse, an altruistic and a predominantly selfish type - capture the essence of behavioral heterogeneity. These types independently emerge in four different data sets and are strikingly stable over time. They predict out-of-sample behavior equally well as a model that permits all individuals to differ and substantially better than a representative agent model and a state-of-the-art machine learning algorithm. Thus, a parsimonious model with three stable types captures key characteristics of distributional preferences and has excellent predictive power.
    Keywords: distributional preferences, altruism, inequality aversion, preference heterogeneity, stability, out-of-sample prediction, parsimony, Bayesian nonparametrics
    JEL: D31 D63 C49 C90
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp16535&r=big
  20. By: Zhang, Luyao
    Abstract: In this research, we explore the nexus between artificial intelligence (AI) and blockchain, two paramount forces steering the contemporary digital era. AI, replicating human cognitive functions, encompasses capabilities from visual discernment to complex decision-making, with significant applicability in sectors such as healthcare and finance. Its influence during the web2 epoch not only enhanced the prowess of user-oriented platforms but also prompted debates on centralization. Conversely, blockchain provides a foundational structure advocating for decentralized and transparent transactional archiving. Yet, the foundational principle of "code is law" in blockchain underscores an imperative need for the fluid adaptability that AI brings. Our analysis methodically navigates the corpus of literature on the fusion of blockchain with machine learning, emphasizing AI's potential to elevate blockchain's utility. Additionally, we chart prospective research trajectories, weaving together blockchain and machine learning in niche domains like causal machine learning, reinforcement mechanism design, and cooperative AI. These intersections aim to cultivate interdisciplinary pursuits in AI for Science, catering to a broad spectrum of stakeholders.
    Date: 2023–11–02
    URL: http://d.repec.org/n?u=RePEc:osf:osfxxx:g2q5t&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.