nep-big New Economics Papers
on Big Data
Issue of 2023‒03‒06
eighteen papers chosen by
Tom Coupé
University of Canterbury

  1. In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation By Alicia Curth; Mihaela van der Schaar
  2. The private and external costs of Germany’s nuclear phase-out By Jarvis, Stephen; Deschenes, Olivier; Jha, Akshaya
  3. Income insecurity and mental health in pandemic times By Dirk Foremny; Pilar Sorribas-Navarro; Judit Vall Castelló
  4. A multiparametric procedure to understand how the Covid-19 influenced the real estate market By Laura Gabrielli; Aurora Ruggeri; Massimiliano Scarpa
  5. Human-AI Interaction – Investigating the Impact on Individuals and Organizations By Peters, Felix
  6. Measuring Regulatory Barriers Using Annual Reports of Firms By Haosen Ge
  7. A Bottom Up Industrial Taxonomy for the UK. Refinements and an Application By Juan Mateos-Garcia; George Richardson
  8. Expanding Corporate Use of Artificial Intelligence By Choi, Mincheol; Song, Danbee; Cho, Jaehan
  9. BEAST-Net: Learning novel behavioral insights using a neural network adaptation of a behavioral model By Shoshan, Vered; Hazan, Tamir; Plonsky, Ori
  10. Can Textual Analysis solve the Underpricing Puzzle? A US REIT Study By Nino Paulus; Marina Koelbl; Wolfgang Schäfers
  11. Understanding rental profit and mechanisms that yields rental and real estate prices using machine learning approach By Martin Regnaud; Julie Le Gallo; Marie Breuille
  12. Data-driven Approach for Static Hedging of Exchange Traded Options By Vikranth Lokeshwar Dhandapani; Shashi Jain
  13. Application of Pretrained Language Models in Modern Financial Research By Lee, Heungmin
  14. Estimating Very Large Demand Systems By Lanier, Joshua; Large, Jeremy; Quah, John
  15. Explainable AI in a Real Estate Context – Exploring the Determinants of Residential Real Estate Values By Bastian Krämer; Moritz Stang; Cathrine Nagl; Wolfgang Schäfers
  16. Human emotion recognition in the significance assessment of property attributes By Malgorzata Renigier-Biozor; Artur Janowski; Marek Walacik; Aneta Chmielewska
  17. Automatic Locally Robust Estimation with Generated Regressors By Juan Carlos Escanciano; Telmo P\'erez-Izquierdo
  18. Machine Learning in Building Documentation (ML-BAU-DOK) - Foundations for Information Extraction for Energy Efficiency and Life Cycle Analysis By Jonathan Rothenbusch; Konstantin Schütz; Feibai Huang; Björn-Martin Kurzrock

  1. By: Alicia Curth; Mihaela van der Schaar
    Abstract: Personalized treatment effect estimates are often of interest in high-stakes applications -- thus, before deploying a model estimating such effects in practice, one needs to be sure that the best candidate from the ever-growing machine learning toolbox for this task was chosen. Unfortunately, due to the absence of counterfactual information in practice, it is usually not possible to rely on standard validation metrics for doing so, leading to a well-known model selection dilemma in the treatment effect estimation literature. While some solutions have recently been investigated, systematic understanding of the strengths and weaknesses of different model selection criteria is still lacking. In this paper, instead of attempting to declare a global `winner', we therefore empirically investigate success- and failure modes of different selection criteria. We highlight that there is a complex interplay between selection strategies, candidate estimators and the DGP used for testing, and provide interesting insights into the relative (dis)advantages of different criteria alongside desiderata for the design of further illuminating empirical studies in this context.
    Date: 2023–02
  2. By: Jarvis, Stephen; Deschenes, Olivier; Jha, Akshaya
    Abstract: Many countries have phased out nuclear power in response to concerns about nuclear waste and the risk of nuclear accidents. This paper examines the shutdown of more than half of the nuclear production capacity in Germany after the Fukushima accident in 2011. We use hourly data on power plant operations and a machine learning approach to estimate the impacts of the phase-out policy. We find that reductions in nuclear electricity production were offset primarily by increases in coal-fired production and net electricity imports. Our estimates of the social cost of the phase-out range from €3 to €8 billion per year. The majority of this cost comes from the increased mortality risk associated with exposure to the local air pollution emitted when burning fossil fuels. Policymakers would have to significantly overestimate the risk or cost of a nuclear accident to conclude that the benefits of the phase-out exceed its social costs. We discuss the likely role of behavioral biases in this setting, and highlight the importance of ensuring that policymakers and the public are informed about the health effects of local air pollution.
    JEL: C53 Q41 Q53
    Date: 2022–06–14
  3. By: Dirk Foremny (Universitat de Barcelona & IEB); Pilar Sorribas-Navarro (Universitat de Barcelona & IEB); Judit Vall Castelló (Universitat de Barcelona & IEB & CRES-UPF)
    Abstract: This paper provides novel evidence of the mental health effects of the Covid-19 outbreak. Between April 2020 and April 2022, we run four waves of a large representative survey in Spain, which we benchmark against a decade of pre-pandemic data. We document a large and sudden deterioration of mental health at the beginning of the pandemic, as the share of people reporting being depressed increased from 16% before the pandemic to 46% in April 2020. This effect is persistent over time, which translates into important and irreversible consequences, such as a surge in suicides. The effect is more pronounced for women, younger individuals and those with unstable incomes. Finally, using mediation analysis, event studies and machine learning techniques, we document the role of the labor market as an important driver of these effects, as women and the young are more exposed to unstable income sources.
    Keywords: Mental health, Gender, Inequality, Labor markets, Pandemic, Covid-19
    JEL: I14 H2 H12 E24
    Date: 2022
  4. By: Laura Gabrielli; Aurora Ruggeri; Massimiliano Scarpa
    Abstract: This paper, part of a more comprehensive research line, aims to discuss how the covid-19 pandemic has affected the demand in the real estate market in Padua. Padua is a medium-sized city that represents the typical Italian town.The authors have been investigating the real estate market in Padua for a few years, collecting information on the buildings on sale from related selling websites. This data collection procedure has been accomplished with the help of an automated web crawler developed in Python language.For this reason, the authors are now able to compare the real estate market in Padua at different times. In particular, two databases are here put into a detailed comparison. Database A dates back to 2019 (II semester), capturing a pre-Covid-19 scenario, while database B is dated 2021 (II semester), representing the actual situation.First of all, two forecasting algorithms to predict the market value of the properties as a function of their characteristics are developed using an Artificial Neural Networks (ANNs) approach.ANNs are a multi-parametric statistical technique employed to forecast a property's market value. The input neurons of the network, i.e. the independent variables, are the buildings' descriptive features and characteristics, while the output neuron is the market value, the dependent variable.ANN(A) is developed on database A, and ANN(B) is created on B. The comparison of the two forecasting functions represents the differences in the demand after two years from the first Covid-19 alerts.Since ANNs are a multi-parametric procedure, this methodology isolates each attribute's singular influence on the forecasted price. It is, therefore, possible to understand how the preferences of the demand have changed during the pandemic. Some characteristics are now more appreciated than before, such as external spaces, like a terrace or a private garden. Also, systems and technologies seem more appealing now than before the pandemic, for example, the presence of optical fibre or mechanical ventilation. Moreover, wider building typologies are more appreciated now, like villas, detached and semi-detached houses, or farmhouses. But, on the contrary, other characteristics are less appreciated. The location, for instance, is less influential than before in price formation. These changes in preferences can be attributed to the new lifestyle since new habits have been produced after the lockdown experience and new smart working schedules that the pandemic has led to.
    Keywords: Analytical Neural Network; COVID-19; Real Estate Valuation; Structural characteristics
    JEL: R3
    Date: 2022–01–01
  5. By: Peters, Felix
    Abstract: Artificial intelligence (AI) has become increasingly prevalent in consumer and business applications, equally affecting individuals and organizations. The emergence of AI-enabled systems, i.e., systems harnessing AI capabilities that are powered by machine learning (ML), is primarily driven by three technological trends and innovations: increased use of cloud computing allowing large-scale data collection, the development of specialized hardware, and the availability of software tools for developing AI-enabled systems. However, recent research has mainly focused on technological innovations, largely neglecting the interaction between humans and AI-enabled systems. Compared to previous technologies, AI-enabled systems possess some unique characteristics that make the design of human-AI interaction (HAI) particularly challenging. Examples of such challenges include the probabilistic nature of AIenabled systems due to their dependence on statistical patterns identified in data and their ability to take over predictive tasks previously reserved for humans. Thus, it is widely agreed that existing guidelines for human-computer interaction (HCI) need to be extended to maximize the potential of this groundbreaking technology. This thesis attempts to tackle this research gap by examining both individual-level and organizational-level impacts of increasing HAI. Regarding the impact of HAI on individuals, two widely discussed issues are how the opacity of complex AI-enabled systems affects the user interaction and how the increasing deployment of AI-enabled systems affects performance on specific tasks. Consequently, papers A and B of this cumulative thesis address these issues. Paper A addresses the lack of user-centric research in the field of explainable AI (XAI), which is concerned with making AI-enabled systems more transparent for end-users. It is investigated how individuals perceive explainability features of AI-enabled systems, i.e., features which aim to enhance transparency. To answer this research question, an online lab experiment with a subsequent survey is conducted in the context of credit scoring. The contributions of this study are two-fold. First, based on the experiment, it can be observed that individuals positively perceive explainability features and have a significant willingness to pay for them. Second, the theoretical model for explaining the purchase decision shows that increased perceived transparency leads to increased user trust and a more positive evaluation of the AI-enabled system. Paper B aims to identify task and technology characteristics that determine the fit between an individual's tasks and an AI-enabled system, as this is commonly believed to be the main driver for system utilization and individual performance. Based on a qualitative research approach in the form of expert interviews, AI-specific factors for task and technology characteristics, as well as the task-technology fit, are developed. The resulting theoretical model enables empirical research to investigate the relationship between task-technology fit and individual performance and can also be applied by practitioners to evaluate use cases of AI-enabled system deployment. While the first part of this thesis discusses individual-level impacts of increasing HAI, the second part is concerned with organizational-level impacts. Papers C and D address how the increasing use of AI-enabled systems within organizations affect organizational justice, i.e., the fairness of decision-making processes, and organizational learning, i.e., the accumulation and dissemination of knowledge. Paper C addresses the issue of organizational justice, as AI-enabled systems are increasingly supporting decision-making tasks that humans previously conducted on their own. In detail, the study examines the effects of deploying an AI-enabled system in the candidate selection phase of the recruiting process. Through an online lab experiment with recruiters from multinational companies, it is shown that the introduction of so-called CV recommender systems, i.e., systems that identify suitable candidates for a given job, positively influences the procedural justice of the recruiting process. More specifically, the objectivity and consistency of the candidate selection process are strengthened, which constitute two essential components of procedural justice. Paper D examines how the increasing use of AI-enabled systems influences organizational learning processes. The study derives propositions from conducting a series of agent-based simulations. It is found that AI-enabled systems can take over explorative tasks, which enables organizations to counter the longstanding issue of learning myopia, i.e., the human tendency to favor exploitation over exploration. Moreover, it is shown that the ongoing reconfiguration of deployed AI-enabled systems represents an essential activity for organizations aiming to leverage their full potential. Finally, the results suggest that knowledge created by AI-enabled systems can be particularly beneficial for organizations in turbulent environments.
    Date: 2023
  6. By: Haosen Ge
    Abstract: Existing studies show that regulation is a major barrier to global economic integration. Nonetheless, identifying and measuring regulatory barriers remains a challenging task for scholars. I propose a novel approach to quantify regulatory barriers at the country-year level. Utilizing information from annual reports of publicly listed companies in the U.S., I identify regulatory barriers business practitioners encounter. The barrier information is first extracted from the text documents by a cutting-edge neural language model trained on a hand-coded training set. Then, I feed the extracted barrier information into a dynamic item response theory model to estimate the numerical barrier level of 40 countries between 2006 and 2015 while controlling for various channels of confounding. I argue that the results returned by this approach should be less likely to be contaminated by major confounders such as international politics. Thus, they are well-suited for future political science research.
    Date: 2023–01
  7. By: Juan Mateos-Garcia; George Richardson
    Abstract: In previous research, we used web data and machine learning methods to assess the limitations of the Standard Industrial Taxonomy (SIC) that measures the industrial structure of the UK, and developed a prototype taxonomy based on a bottom-up analysis of business website descriptions that could complement the SIC taxonomy and address some of its limitations. Here, we refine and improve that prototype taxonomy by doubling the number of SIC4 codes it covers, implementing a consequential evaluation strategy to select its clustering parameters, and generating measures of confidence about a company's assignment to a text sector based on the distribution of its neighbours and its distance in semantic (text) space. We deploy the resulting taxonomy to segment UK local economies based on their sectoral, similarities and differences and analyse the geography, sectoral composition and comparative performance in a variety of secondary indicators recently compiled to inform the UK Government's Levelling Up agenda. This analysis reveals significant links between the industrial composition of a local economy based on our taxonomy and a variety of social and economic outcomes, suggesting that policymakers should play strong attention to the industrial make-up of economies across the UK as they design and implement levelling-up strategies to reduce disparities between them.
    Keywords: Industrial taxonomy, web data, machine learning
    JEL: C80 L60 O25 O3
    Date: 2022–11
  8. By: Choi, Mincheol (Korea Institute for Industrial Economics and Trade); Song, Danbee (Korea Institute for Industrial Economics and Trade); Cho, Jaehan (Korea Institute for Industrial Economics and Trade)
    Abstract: Despite interest in artificial intelligence (AI) as a transformative driver of economic growth, few Korean companies use AI. Companies currently using AI are increasing their AI investments and expenditures. Companies employ AI in a wide range of functions and fields including automated operations, predictive analytics, product and service development, and sales and inventory management Challenges to the application and use of AI exist in multiple, closely-connected domains. Challenges include human capital scarcity, inadequate funding, the difficulty of acquiring technology, and both the internal and external business environments confronting companies. This work analyzes the barriers to increased corporate adoption of AI technologies and proposes a set of policy suggestions to improve AI uptake at Korean companies.
    Keywords: artificial intelligence; AI; AI adoption; productivity; firm productivity; corporate productivity; STEM; skilled labor; information technology; IT; information and communications technology; ICT; R&D; research and development; innovation; innovation policy; AI policy; regulation; personal data
    JEL: E24 H52 I28 J24 O32 O38
    Date: 2021–04–04
  9. By: Shoshan, Vered; Hazan, Tamir; Plonsky, Ori (Technion - Israel Institute of Technology)
    Abstract: In this paper, we propose a behavioral model called BEAST-Net, which combines the basic logic of BEAST, a psychological theory-based behavioral model, with machine learning (ML) techniques. Our approach is to formalize BEAST mathematically as a differentiable function and parameterize it with a neural network, enabling us to learn the model parameters from data and optimize it using backpropagation. The resulting model, BEAST-Net, is able to scale to larger datasets and adapt to new data with greater ease, while retaining the psychological insights and interpretability of the original model. We evaluate BEAST-Net on the largest public benchmark dataset of human choice tasks and show that it outperforms several baselines, including the original BEAST model. Furthermore, we demonstrate that our model can be used to provide interpretable explanations for choice behavior, allowing us to derive new psychological insights from the data. Our work makes a significant contribution to the field of human decision making by showing that ML techniques can be used to improve the scalability and adaptability of psychological theory based models while preserving their interpretability and ability to provide insights.
    Date: 2023–01–30
  10. By: Nino Paulus; Marina Koelbl; Wolfgang Schäfers
    Abstract: Although many theories aim to explain IPO underpricing, initial-day returns of US REIT IPOs remain a “puzzle”. Considering equity offerings, literature provides several theories for the occurrence of underpricing. Theories on asymmetric information and value uncertainty based on Rock’s “winner’s curse” (1986) and Beatty and Ritter (1986) were among the first. Since then, the literature on REIT IPOs has focused on indirect respectively quantitative proxies for information asymmetries between REITs and investors to determine IPO underpricing. This study, however, proposes textual analysis to exploit the qualitative information, revealed through one of the most important documents during the IPO process – Form S-11, as a direct measure of information asymmetries. To our knowledge, this is the first study which applies textual analysis to corporate disclosures of US REITs in order to explain IPO underpricing. Specifically, we determine the level of uncertain language in the prospectus, as well as its similarity to recently filed registration statements, to assess whether textual features can solve the underpricing puzzle. To do so, we have gathered all prospects and data regarding firm characteristics, offering characteristics, third-party certification and market conditions for US equity REIT IPOs from January 1996 to December 2019. Our overall sample includes 114 IPOs for which we first clean the prospects to reduce linguistic complexity and facilitate the textual analysis procedure. Afterwards the level of uncertainty is derived using the Loughran and McDonald (2011) sentiment wordlist for uncertainty and the similarity is calculated by mapping the documents onto a vector space model which enables us to measure the similarity between two vectors using cosine similarity (Hanley and Hoberg, 2010; Brown and Tucker, 2013). In accordance with Ferris et al. (2013), we assume that uncertain language makes it more difficult for potential investors to price the issue and thus increases underpricing. Furthermore, it is hypothesized that a higher similarity to previous filings indicates that the prospectus provides little useful information and thus does not resolve existing information asymmetries, leading to increased underpricing. Incorporating these measures into an OLS, contrary to expectations, we do not find a statistically significant association between uncertain language in Form S-11 and initial-day returns. This result is interpreted as suggesting that uncertain language in the prospectus does not reflect the issuer's expectations about the company's future prospects, but rather is necessary because of forecasting difficulties and litigation risk. Analyzing disclosure similarity instead, this study finds a statistically and economically significant impact of qualitative information on initial-day returns. Thus, REIT managers may reduce underpricing by voluntarily providing more information to potential investors in Form S-11. This demonstrates that textual analysis can in fact help to explain underpricing of US REIT IPOs, as qualitative information in Forms S-11 decreases information asymmetries between US REIT managers and investors, thus reducing underpricing. Consequently, REIT managers are incentivized to provide as much information as possible to reduce underpricing building (Sherman and Titman, 2002), while investors could use textual analysis to identify offerings that promise the highest returns.
    Keywords: Information Asymmetry; Initial public offering; Textual Analysis; Underpricing
    JEL: R3
    Date: 2022–01–01
  11. By: Martin Regnaud; Julie Le Gallo; Marie Breuille
    Abstract: In 2020, MeilleursAgents was estimating that 2 years and 10 months were needed by a French household to amortize the cost of buying versus renting on average in France. At the same time, in Paris, the same household would have to wait 6 years and 11 months to amortize its costs. These figures are of utmost importance for households to help them decide whether they should buy or rent their main residence.From an operational perspective, estimating this time is made possible by a precise knowledge of rent to price ratios. The main objective of this contribution is estimating those ratios on the whole French territory using observations of rent and transaction prices for the same housing between 2010 and 2021. Once those ratios are estimated, we highlight the factors that determine them using machine learning methods.Using a coarsened exact matching, we estimate rent to price ratios on the whole territory. Then we compare two different approaches to identify the determinants of these ratios. The first approach consists in explaining the ratios using a linear regression model to predict them using housing characteristics and geographical amenities. The second approach uses a gradient boosting decision tree model to predict the ratios. Hence, we can explain the role of each feature of the model thanks to explainability methods associated with tree models: feature importance and shape values. In order to proceed with this study, we use rental listings from MeilleursAgents platform that have geolocation at the address level. This use of such listings is inspired by Chapelle&Eymeoud(2018)1 which shows that web scrapped listings are unbiased compared to survey data such as the ones from the “Observatoire Locaux des Loyers” (OL) in France. Moreover, these surveys are limited to certain dense areas whereas our study aims at comparing mechanisms on the whole French territory.These listings are matched with the national DV3F French database which provides us with a parcel geolocation level. Matching these two sources between 2010 and 2021 provides us with 85’000 matched ratio observations. The (rent, price) couples are used to estimate rent to price ratios and to highlight the differences in the influence of each factor depending on the territory but also thanks to our precise geolocation, inside urban areas.Our study has a double contribution. First, from a methodological point of view, using a gradient boosting model to estimate and explain rent to price ratios has never been done. The main advantage of this method compared to classic methods is a better handling of interactions and effect heterogeneity. The second contribution leans on the precise geolocation level of our observations. These ratios are most of the time studied using ratios of average rent and average prices because of the scarcity of precisely geolocated data. Yet, Hill&Syed(2016)2 showed that such an approximation can lead to an error up to 20% when estimating the ratio. Therefore, they advise to use housing level matching to control feature heterogeneity between rented and sold housing. Our study is thus the first study in France that allows an exact matching outside of the dense areas covered by the “Observatoire Locaux des loyers” on this topic.We highlight the strong heterogeneity of rent to price ratios inside dense urban areas but also at a larger scale. To our knowledge, this study is the first to bring this phenomenon out at this national scale.1. Chapelle G., Eymedoud J.-B., « Can Big Data Increase Our Knowledge of Local Rental Markets? Estimating the Cost of Density with Rents », SciencesPo Mimeo, 20182. Hill R.J., Syed I.A, « Hedonic price–rent ratios, user cost, and departures from equilibrium in the housing market», Regional Science and Urban Economics, pp 60-72, 2016
    Keywords: Machine Learning; Rent profitability; Rent to price ratios; Web platform data
    JEL: R3
    Date: 2022–01–01
  12. By: Vikranth Lokeshwar Dhandapani; Shashi Jain
    Abstract: In this paper, we present a data-driven explainable machine learning algorithm for semi-static hedging of Exchange Traded options taking into account transaction costs with efficient run-time. Further, we also provide empirical evidence on the performance of hedging longer-term National Stock Exchange (NSE) Index options using a self-replicating portfolio of shorter-term options and cash position, achieved by the automated algorithm, under different modeling assumptions and market conditions including covid stressed period. We also systematically assess the performance of the model using the Superior Predictive Ability (SPA) test by benchmarking against the static hedge proposed by Peter Carr and Liuren Wu and industry-standard dynamic hedging. We finally perform a Profit and Loss (PnL) attribution analysis for the option to be hedged, delta hedge, and static hedge portfolio to identify the factors that explain the performance of static hedging.
    Date: 2023–02
  13. By: Lee, Heungmin
    Abstract: In recent years, pretrained language models (PLMs) have emerged as a powerful tool for natural language processing (NLP) tasks. In this paper, we examine the potential of these models in the finance sector and the challenges they face in this domain. We also discuss the interpretability of these models and the ethical considerations associated with their deployment in finance. Our analysis shows that pretrained language models have the potential to revolutionize the way financial data is analyzed and processed. However, it is important to address the challenges and ethical considerations associated with their deployment to ensure that they are used in a responsible and accountable manner. Future research will focus on developing models that can handle the volatility of financial data, mitigate bias in the training data, and provide interpretable predictions. Overall, we believe that the future of AI in finance will be shaped by the continued development and deployment of pretrained language models.
    Date: 2023–02–01
  14. By: Lanier, Joshua; Large, Jeremy; Quah, John
    Abstract: We present a discrete choice, random utility model and a new estimation technique for analyzing consumer demand for large numbers of products. We allow the consumer to purchase multiple units of any product and to purchase multiple products at once (think of a consumer selecting a bundle of goods in a supermarket). In our model each product has an associated unobservable vector of attributes from which the consumer derives utility. Our model allows for heterogeneous utility functions across consumers, complex patterns of substitution and complementarity across products, and nonlinear price effects. The dimension of the attribute space is, by assumption, much smaller than the number of products, which effectively reduces the size of the consumption space and simplifies estimation. Nonetheless, because the number of bundles available is massive, a new estimation technique, which is based on the practice of negative sampling in machine learning, is needed to sidestep an intractable likelihood function. We prove consistency of our estimator, validate the consistency result through simulation exercises, and estimate our model using supermarket scanner data.
    Keywords: discrete choice, demand estimation, negative sampling, machine learning, scanner data
    JEL: C13 C34 D12 L20 L66
    Date: 2022–06
  15. By: Bastian Krämer; Moritz Stang; Cathrine Nagl; Wolfgang Schäfers
    Abstract: Real estate is a heterogeneous commodity where no two are alike. Therefore, making assumptions about determinants and the way they influence the value of a property is difficult. Traditionally, parametric and, thus, assumption-based regression techniques are used to identify those dependencies. However, recent studies show that these relationships can only be mapped to a limited extent by those approaches. On the contrary, modern Machine Learning (ML) approaches are less restrictive and able to identify complex patterns hidden in the data. Nevertheless, these algorithms are less transparent to human beings. An ML approach may be the best solution to predict the value of a property, but it fails at determining the factors driving that value. To overcome this limitation, explainable artificial intelligence (XAI) has come forward as a new important direction of research. So far, there has been almost no research applying XAI in the field of real estate. Therefore, we introduce two different state-of-the-art XAI approaches, namely Permutation Feature Importance (PFI) and Accumulated Local Effects Plots (ALE) in the context of real estate valuation. Focusing on the residential market, we use a dataset consisting of around 1.2 million observations in Germany. Our findings show that using XAI methods enables us to open the “black box” of ML models. In addition, we find several unexpected non-linear dependencies between real estate values and their hedonic characteristics and therefore deliver important insights to better understand the fundamental functioning of residential real estate markets.
    Keywords: ALE Plots; Explainable AI; housing market; Machine Learning
    JEL: R3
    Date: 2022–01–01
  16. By: Malgorzata Renigier-Biozor; Artur Janowski; Marek Walacik; Aneta Chmielewska
    Abstract: One of the largest problems in the real estate market analysis, which includes valuation, is determining the significance of individual property attributes that may affect value or attractiveness perception. The study attempts to assess the significance of selected attributes of real estate based on the detection and analysis of the emotions of potential investors. Human facial expression is a carrier of information that can be recorded and interpreted effectively via the use of artificial intelligence methods, machine learning and computer vision. The development of a reliable algorithm requires, in this case, the identification and investigation of factors that may affect the final solution of the problem, from behavioural aspects through technological possibilities. In the presented experiment, an approach that correlates the emotional states of buyers with the visualization of selected attributes of properties is utilized. The objective of this study is to develop an original method for assessing the significance of property attributes based on emotion recognition technology as an alternative to the commonly used methods in the real estate analysis and valuation, which are usually based on surveys. The empirical analysis enabled determination of the mainstream property attributes significance from evoked emotions intensity within the group of property clients. The significance ranking determined on the basis of the unconscious expressed facial emotions was verified and compared to the answers given in a form of questionnaire. The results have shown that the conscious declaration of the attribute ranking differs from the emotion detection conclusions in several cases.
    Keywords: Artificial Intelligence; attribute significance; emotion recognition technology; human emotion detection
    JEL: R3
    Date: 2022–01–01
  17. By: Juan Carlos Escanciano; Telmo P\'erez-Izquierdo
    Abstract: Many economic and causal parameters of interest depend on generated regressors, including structural parameters in models with endogenous variables estimated by control functions and in models with sample selection. Inference with generated regressors is complicated by the very complex expression for influence functions and asymptotic variances. To address this problem, we propose automatic Locally Robust/debiased GMM estimators in a general setting with generated regressors. Importantly, we allow for the generated regressors to be generated from machine learners, such as Random Forest, Neural Nets, Boosting, and many others. We use our results to construct novel Doubly Robust estimators for the Counterfactural Average Structural Function and Average Partial Effects in models with endogeneity and sample selection, respectively.
    Date: 2023–01
  18. By: Jonathan Rothenbusch; Konstantin Schütz; Feibai Huang; Björn-Martin Kurzrock
    Abstract: The construction and real estate industry features long product life cycles, a wide range of stakeholders and a high information density. Information is not only available in large quantities, but also in a very heterogeneous and user-specific manner. Digital building documentation is partially available in some companies and even non-existent in others. The result is often analogue, unstructured building documentation, which makes the processing of data and information considerably more difficult and, in the worst case, leads to media disruptions between those involved.However, the benefits of a lean, complete and targeted digital building documentation can be manifold. In particular, automated information extraction and further information retrieval are seen as having great potential. Information extraction as an ultimate aim, requires a defined handling of analogue documents, transparent criteria regarding data quality and machine readability as well as a clear classification system.The research project ML-BAU-DOK (funded by The Federal Office for Building and Regional Planning BBSR, SWD- presents the necessary preparatory processes for advanced digital use of building documentation. First, a set of rules is created to digitize paper-based documentation in a targeted manner. The automated separation of mass documentation into individual documents, as well as the classification of documents into selected document classes, is mapped using machine-learning. The document classes are consolidated from the current worldwide class standards and prioritized according to their information content. The project includes the evaluation of 600, 000 document pages, which are analysed class-specifically with regard to two use cases, energy efficiency and life cycle analysis. The methodology ensures transferability of the results to other use cases.The key result of the ML-BAU-DOK is an algorithm that automatically separates individual documents from a mass scan, assigns the individual documents to defined document classes, and thus reduces the amount of scanning and filing required. This leads to a classification system that enables information extraction as a subsequent goal and brings the construction and real estate industry closer to a Common Data Environment.
    Keywords: Document Classification; Document Separation; Heterogeneous Building Documentation; Machine Learning
    JEL: R3
    Date: 2022–01–01

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.