nep-big New Economics Papers
on Big Data
Issue of 2023‒01‒02
twenty-two papers chosen by
Tom Coupé
University of Canterbury

  1. pystacked: Stacking generalization and machine learning in Stata By Christian B. Hansen; Mark E. Schaffer; Achim Ahrens
  2. Accountability in Artificial Intelligence By Gil, Olga
  3. Data-gravity and Data Science: Educational Approaches and Solutions By Popov Alexandr; Deryabin Andrey; Gluhov Pavel
  4. ỨNG DỤNG PHƯƠNG PHÁP SEM-NEURAL NETWORK ĐỂ XÂY DỰNG MÔ HÌNH DỰ BÁO TRẢI NGHIỆM KHÁCH HÀNG VỀ DỊCH VỤ NGÂN HÀNG SỐ TẠI CÁC NGÂN HÀNG THƯƠNG MẠI VIỆT NAM By Le, Anh Hoang; , Le Nguyen Hoai Thi; Huong, Luong Tran Hoang; , La Phu Hao; Nga, Nguyen Thi Thuy
  5. Identifying and characterising AI adopters: A novel approach based on big data By Flavio Calvino; Lea Samek; Mariagrazia Squicciarini; Cody Morris
  6. Graph-Regularized Tensor Regression: A Domain-Aware Framework for Interpretable Multi-Way Financial Modelling By Yao Lei Xu; Kriton Konstantinidis; Danilo P. Mandic
  7. Using Machine Learning for Efficient Flexible Regression Adjustment in Economic Experiments By John List; Ian Muir; Gregory Sun
  8. Empirical Asset Pricing via Ensemble Gaussian Process Regression By Damir Filipovi\'c; Puneet Pasricha
  9. Asymptotic study of stochastic adaptive algorithm in non-convex landscape By Sébastien Gadat; Ioana Gavra
  10. Digital Technologies for Digital Innovation: Unlocking Data and Knowledge to Drive Organizational Value Creation By Koppe, Timo
  11. The Regulation of Medical AI: Policy Approaches, Data, and Innovation Incentives By Ariel Dora Stern
  12. The Short-Term Predictability of Returns in Order Book Markets: a Deep Learning Perspective By Lorenzo Lucchese; Mikko Pakkanen; Almut Veraart
  13. ANALYSING (A)SYMMETRIES IN STUDENT ACCOMMODATION PRICING: EVIDENCE FROM EUROPEAN STUDENT ACCOMMODATION MARKET By Olayiwola Oladiran; Muhammad Abbas
  14. Management of Big data: An empirical investigation of the Too-Much-of-a-Good-Thing effect in medium and large firms By Claudio Vitari; Elisabetta Raguseo; Federico Pigni
  15. The Heterogeneous Response of Real Estate Asset Prices to a Global Shock By Sandro Heiniger; Winfried Koeniger; Michael Lechner
  16. Key challenges for the participatory governance of AI in public administration By Wong, Janis; Morgan, Deborah; Straub, Vincent John; Hashem, Youmna; Bright, Jonathan
  17. A probability transducer and decision-theoretic augmentation for machine-learning classifiers By Dyrland, Kjetil; Lundervold, Alexander Selvikvåg; Porta Mana, PierGianLuca
  18. Analysis of mechanisms for managing the quality of education in the Russian Federation based on "big data" By Malevanov Yuriy; Dozhdikov Anton; Ivanov Alexandr
  19. African time travellers: what can we learn from 500 years of written accounts? By Edward Kerby; Alexander Moradi; Hanjo Odendaal
  20. "Big Data Applications with Theoretical Models and Social Media in Financial Management" By Taiga Saito; Shivam Gupta
  21. Central Bank Communication about Climate Change By David M. Arseneau; Alejandro Drexler; Mitsuhiro Osada
  22. Urban Exodus? Understanding Human Mobility in Britain During the COVID-19 Pandemic Using Facebook Data By Rowe, Francisco; Calafiore, Alessia; Arribas-Bel, Dani; Samardzhiev, Krasen; Fleischmann, Martin

  1. By: Christian B. Hansen (University of Chicago); Mark E. Schaffer (Heriot-Watt University); Achim Ahrens (ETH Zürich)
    Abstract: pystacked implements stacked generalization (Wolpert 1992) for regression and binary classification via Python’s scikit-learn. Stacking combines multiple supervised machine learners—the “base” or “level-0” learners—into a single learner. The currently supported base learners include regularized regression, random forest, gradient boosted trees, support vector machines, and feed-forward neural nets (multilayer perceptron). pystacked can also be used as a ‘regular’ machine learning program to fit a single base learner and, thus, provides an easy-to-use API for scikit-learn’s machine learning algorithms.
    Date: 2022–11–30
    URL: http://d.repec.org/n?u=RePEc:boc:csug22:01&r=big
  2. By: Gil, Olga
    Abstract: This work stresses the importance of AI accountability to citizens and explores how a fourth independent government branch/institutions could be endowed to ensure that algorithms in today´s democracies convene to the principles of Constitutions. The purpose of this fourth branch of government in modern democracies could be to enshrine accountability of artificial intelligence development, including software-enabled technologies, and the implementation of policies based on big data within a wider democratic regime context. The work draws on Philosophy of Science, Political Theory (Ethics and Ideas), as well as concepts derived from the study of democracy (responsibility and accountability) to make a theoretical analysis of what artificial intelligence (AI) means for the governance of society and what are the limitations of such type of AI governance. The discussion shows that human ideas, as cement of societies, make it problematic to enshrine governance of artificial intelligence into the world of devices. In ethical grounds, the work stresses an existing trade off between greater and faster advancement of technology, or innovation on the one hand, and human well being on the oher, where the later is not automatically guaranteed by default. This trade off is yet unresolved. The work contends that features of AI offer an opportunity to revise government priorities from a multilevel perspective, from the local to the upper levels.
    Date: 2022–09–07
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:wckuf&r=big
  3. By: Popov Alexandr (Russian Presidential Academy of National Economy and Public Administration); Deryabin Andrey (Russian Presidential Academy of National Economy and Public Administration); Gluhov Pavel (Russian Presidential Academy of National Economy and Public Administration)
    Abstract: In the context of fundamental changes in the economy and the labor market, the introduction of educational programs in the field of data analysis and machine learning at all levels of education with the priority of integrating mathematical, natural science and socio-humanitarian knowledge becomes important. An overview and analysis of foreign experience and main discussion topics in the development of educational modules for data science and machine learning for adolescents and adolescents is presented.
    Keywords: data science; machine learning; educational programs; data literacy; data analysis; data science; artificial Intelligence; general education; vocational guidance; additional education; computer science; STEM; social science
    Date: 2021–01
    URL: http://d.repec.org/n?u=RePEc:rnp:wpaper:s21042&r=big
  4. By: Le, Anh Hoang (Ho Chi Minh University of Banking); , Le Nguyen Hoai Thi; Huong, Luong Tran Hoang; , La Phu Hao; Nga, Nguyen Thi Thuy
    Abstract: The client experience has been improved by the recent growth of digital banking services (PwC, 2018). Finding the variables that influence how customers experience this service is the issue that now interests researchers and commercial banks. This study focuses on identifying the factors impacting consumers' experiences with digital banking services at Vietnamese commercial banks in an effort to provide a solution to the aforementioned problem. This study is also the first to combine interaction estimation through a structural equation modeling (SEM), and machine learning techniques through an artificial neural network (ANN) model to create a predictive model of customer experience on digital banking services in Vietnamese commercial banks. The SEM model estimation results indicate that perceived convenience, functional quality, and service quality, brand awareness, safety perception, and usability are the elements influencing the customer's experience utilizing digital banking services. In order to improve the customer experience of digital banking services at Vietnamese commercial banks, the study has developed a customer experience forecasting model and provided some managerial implications.
    Date: 2022–11–13
    URL: http://d.repec.org/n?u=RePEc:osf:osfxxx:vrmp9&r=big
  5. By: Flavio Calvino; Lea Samek; Mariagrazia Squicciarini; Cody Morris
    Abstract: This work employs a novel approach to identify and characterise firms adopting Artificial Intelligence (AI), using different sources of large microdata. Focusing on the United Kingdom, the analysis combines data on Intellectual Property Rights, website information, online job postings, and firm-level financials for the first time. It shows that a significant share of AI adopters is active in Information and Communication Technologies and professional services, and is located in the South of the United Kingdom, particularly around London. Adopters tend to be highly productive and larger than other firms, while young adopters tend to hire AI workers more intensively. Human capital appears to play an important role, not only for AI adoption but also for firms’ productivity returns. Significant differences in the characteristics of AI adopters emerge when distinguishing between firms carrying out AI innovation, those with an AI core business, and those searching for AI talent.
    Keywords: artificial intelligence, productivity, technology adoption
    Date: 2022–12–19
    URL: http://d.repec.org/n?u=RePEc:oec:stiaaa:2022/06-en&r=big
  6. By: Yao Lei Xu; Kriton Konstantinidis; Danilo P. Mandic
    Abstract: Analytics of financial data is inherently a Big Data paradigm, as such data are collected over many assets, asset classes, countries, and time periods. This represents a challenge for modern machine learning models, as the number of model parameters needed to process such data grows exponentially with the data dimensions; an effect known as the Curse-of-Dimensionality. Recently, Tensor Decomposition (TD) techniques have shown promising results in reducing the computational costs associated with large-dimensional financial models while achieving comparable performance. However, tensor models are often unable to incorporate the underlying economic domain knowledge. To this end, we develop a novel Graph-Regularized Tensor Regression (GRTR) framework, whereby knowledge about cross-asset relations is incorporated into the model in the form of a graph Laplacian matrix. This is then used as a regularization tool to promote an economically meaningful structure within the model parameters. By virtue of tensor algebra, the proposed framework is shown to be fully interpretable, both coefficient-wise and dimension-wise. The GRTR model is validated in a multi-way financial forecasting setting and compared against competing models, and is shown to achieve improved performance at reduced computational costs. Detailed visualizations are provided to help the reader gain an intuitive understanding of the employed tensor operations.
    Date: 2022–10
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2211.05581&r=big
  7. By: John List; Ian Muir; Gregory Sun
    Abstract: This study investigates how to use regression adjustment to reduce variance in experimental data. We show that the estimators recommended in the literature satisfy an orthogonality property with respect to the parameters of the adjustment. This observation greatly simplifies the derivation of the asymptotic variance of these estimators and allows us to solve for the efficient regression adjustment in a large class of adjustments. Our efficiency results generalize a number of previous results known in the literature. We then discuss how this efficient regression adjustment can be feasibly implemented. We show the practical relevance of our theory in two ways. First, we use our efficiency results to improve common practices currently employed in field experiments. Second, we show how our theory allows researchers to robustly incorporate machine learning techniques into their experimental estimators to minimize variance.
    Date: 2022
    URL: http://d.repec.org/n?u=RePEc:feb:natura:00763&r=big
  8. By: Damir Filipovi\'c; Puneet Pasricha
    Abstract: We introduce an ensemble learning method based on Gaussian Process Regression (GPR) for predicting conditional expected stock returns given stock-level and macro-economic information. Our ensemble learning approach significantly reduces the computational complexity inherent in GPR inference and lends itself to general online learning tasks. We conduct an empirical analysis on a large cross-section of US stocks from 1962 to 2016. We find that our method dominates existing machine learning models statistically and economically in terms of out-of-sample $R$-squared and Sharpe ratio of prediction-sorted portfolios. Exploiting the Bayesian nature of GPR, we introduce the mean-variance optimal portfolio with respect to the predictive uncertainty distribution of the expected stock returns. It appeals to an uncertainty averse investor and significantly dominates the equal- and value-weighted prediction-sorted portfolios, which outperform the S&P 500.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.01048&r=big
  9. By: Sébastien Gadat (TSE-R - Toulouse School of Economics - UT1 - Université Toulouse 1 Capitole - Université Fédérale Toulouse Midi-Pyrénées - EHESS - École des hautes études en sciences sociales - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement); Ioana Gavra (IRMAR - Institut de Recherche Mathématique de Rennes - UR1 - Université de Rennes 1 - UNIV-RENNES - Université de Rennes - INSA Rennes - Institut National des Sciences Appliquées - Rennes - INSA - Institut National des Sciences Appliquées - UNIV-RENNES - Université de Rennes - ENS Rennes - École normale supérieure - Rennes - UR2 - Université de Rennes 2 - UNIV-RENNES - Université de Rennes - CNRS - Centre National de la Recherche Scientifique - Institut Agro Rennes Angers - Institut Agro - Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement)
    Abstract: This paper studies some asymptotic properties of adaptive algorithms widely used in optimization and machine learning, and among them Adagrad and Rmsprop, which are involved in most of the blackbox deep learning algorithms. Our setup is the non-convex landscape optimization point of view, we consider a one time scale parametrization and we consider the situation where these algorithms may be used or not with mini-batches. We adopt the point of view of stochastic algorithms and establish the almost sure convergence of these methods when using a decreasing step-size towards the set of critical points of the target function. With a mild extra assumption on the noise, we also obtain the convergence towards the set of minimizers of the function. Along our study, we also obtain a \convergence rate" of the methods, in the vein of the works of [GL13].
    Keywords: Stochastic optimization,Stochastic adaptive algorithm,Convergence of random variables
    Date: 2022–08
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-03857182&r=big
  10. By: Koppe, Timo
    Abstract: The rise of digitization has radically transformed innovation processes of today's companies and is increasingly challenging existing theories and practices. Digital innovation can describe both the use of digital technologies during the innovation process and the outcome of innovation. This thesis aims to improve the understanding of digital innovation in today's digitized world by contributing to the theoretical and practical knowledge along the four organizational activities of the digital innovation process: initiation, development, implementation, and exploitation. In doing so, the thesis pays special attention to the use of digital technologies and tools (e.g., machine learning, online crowdsourcing platforms, etc.) that unlock knowledge and data to facilitate new products, services, and other value streams. When initiating digital innovations, organizations seek to identify, assimilate, and apply valuable knowledge from within and outside the organization. This activity is crucial for organizations as it determines how they address the increasing pressure to innovate in their industries and markets while innovation processes themselves are changing and becoming more distributed and open. Papers A and B of this thesis address this phase by examining how digital technologies are changing knowledge gathering, e.g., through new ways of crowdsourcing ideas and facilitating cooperation and collaboration among users and innovation collectives. Paper A focuses on organizational culture as a critical backdrop of digital innovations and explores whether it influences the implementation of idea platforms and, in this way, facilitates the discovery of innovations. The paper reveals that the implementation of idea platforms is facilitated by a culture that emphasizes policies, procedures, and information management. Additionally, the paper highlights the importance of taking organizational culture into account when introducing a new technology or process that may be incompatible with the existing culture. Paper B examines newly formed innovation collectives and initiatives for developing ventilators to address shortages during the rise of the COVID-19 pandemic. The paper focuses on digital technologies enabling a transformation in the way innovation collectives form, communicate, and collaborate - all during a period of shutdown and social distancing. The paper underlines the role of digital technologies and collaboration platforms through networking, communication, and decentralized development. The results show that through the effective use of digital technologies, even complex innovations are no longer developed only in large enterprises but also by innovation collectives that can involve dynamic sets of actors with diverse goals and capabilities. In addition, established organizations are increasingly confronted with community innovations that offer complex solutions based on a modular architecture characteristic of digital innovations. Such modular layered architectures are a critical concept in the development of digital innovations. This phase of the digital innovation process encompasses the design, development, and adoption of technological artifacts, which are explored in Sections C and D of this paper. Paper C focuses on the latter, the adoption of digital services artifacts in the plant and mechanical engineering industry. The paper presents an integrative model based on the Technology-Organization-Environment (TOE) framework that examines different contextual factors as important components of the introduction, adoption, and routinization of digital service innovations. The results provide a basis for studying the assimilation of digital service innovations and can serve as a reference model for informing managerial decisions. Paper D, in turn, focuses on the design and development of a technology artifact. The paper focuses on applying cloud-based machine learning services to implement a visual inspection system in the manufacturing industry. The results show, for one, the value of standardization and vendor-supplied IS architecture concepts in digital innovation and, for another, how such innovations can facilitate further innovations in manufacturing. The implementation of digital innovations marks the third phase of the digital innovation process, which is addressed in Paper E. It encompasses organizational changes that occur during digital innovation initiatives. This phase emphasizes change through digital innovation initiatives within the organization (e.g., strategy, structure, people, and technology) and across the organizational environment. Paper E investigates how digital service innovations impact industrial firms, relationships between firms and their customers, and product/service offerings. The paper uses work systems theory as a theoretical foundation to structure the results and analyze them through the lens of service systems. While this analysis helps to identify the organizational changes that result from the implementation of digital innovations, the paper also provides a basis for further research and supports practitioners with systematic analyses of organizational change. The last phase of the digital innovation process is about exploiting existing systems/data for new purposes and innovations. In this regard, it is important to better understand the improvements and effects in the domains beyond the sheer outcome of digital innovation, such as organizational learning or organizational change capabilities. Paper F of this thesis investigates the exploitation of digital innovations in the context of organizational learning. One aspect of this addresses how individuals within the organization leverage innovation to explore and exploit knowledge. Paper F utilizes the organizational learning perspective and examines the dynamics of human learning and machine learning to understand how organizations can benefit from their respective idiosyncrasies in enabling bilateral learning. The paper demonstrates how bilateral human-machine learning can improve the overall performance using a case study from the trading sector. Drawing on these findings, the paper offers new insights into the coordination of human learning and machine learning, and moreover, the collaboration between human and artificial intelligence in organizational routines.
    Date: 2022
    URL: http://d.repec.org/n?u=RePEc:dar:wpaper:135493&r=big
  11. By: Ariel Dora Stern
    Abstract: For those who follow health and technology news, it is difficult to go more than a few days without reading about a compelling new application of Artificial Intelligence (AI) to health care. AI has myriad applications in medicine and its adjacent industries, with AI-driven tools already in use in basic science, translational medicine, and numerous corners of health care delivery, including administrative work, diagnosis, and treatment. In diagnosis and treatment, a large and growing number of AI tools meet the statutory definition of a medical device or that of an in-vitro diagnostic. Those that do are subject to regulation by local authorities, resulting in both practical and strategic implications for manufacturers, along with a more complex set of innovation incentives. This chapter presents background on medical device regulation—especially as it relates to software products—and quantitatively describes the emergence of AI among FDA-regulated products. The empirical section of this chapter explores characteristics of AI-supported/driven medical devices (“AI devices”) in the United States. It presents data on their origins (by firm type and country), their safety profiles (as measured by associated adverse events and recalls), and concludes with a discussion of the implications of regulation for innovation incentives in medical AI.
    JEL: I11 I18 K2 K32 O31 O32 O33
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:30639&r=big
  12. By: Lorenzo Lucchese; Mikko Pakkanen; Almut Veraart
    Abstract: In this paper, we conduct a systematic large-scale analysis of order book-driven predictability in high-frequency returns by leveraging deep learning techniques. First, we introduce a new and robust representation of the order book, the volume representation. Next, we conduct an extensive empirical experiment to address various questions regarding predictability. We investigate if and how far ahead there is predictability, the importance of a robust data representation, the advantages of multi-horizon modeling, and the presence of universal trading patterns. We use model confidence sets, which provide a formalized statistical inference framework particularly well suited to answer these questions. Our findings show that at high frequencies predictability in mid-price returns is not just present, but ubiquitous. The performance of the deep learning models is strongly dependent on the choice of order book representation, and in this respect, the volume representation appears to have multiple practical advantages.
    Date: 2022–11
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2211.13777&r=big
  13. By: Olayiwola Oladiran; Muhammad Abbas
    Abstract: This paper examines the relationship between student housing attributes and the pricing of student accommodation. The paper further explores the asymmetries in pricing for Purpose- built Student Accommodation (PBSA) and Private Student Accommodation Providers (PSAP). We utilise a web scraping procedure to access online-listed property information and prices from 25 major student destination cities in Europe on student.com and Study Abroad Apartments. Using machine learning methodology, we analyse some key tangible and non-tangible features of the properties and explore their relationships with the listed price. We also examine the potential effects of economies of scale through variations in the pricing mechanism for PBSAs and PSAPs. The results show that the non-tangible property attributes have a stronger relationship with student accommodation prices in comparison to the tangible attributes. We also observe that the influence of these non-tangible property features on student accommodation prices is significantly stronger for PSAP properties in comparison to PBSA properties. The results suggest that through the economies of scale mechanism, institutional investors may be able to provide some facilities in their PBSAs at lower costs than PSAP investors and this may result in lower premiums for these facilities as reflected in the pricing. From a methodological point of view, we show that the use of asset features and historic pricing trends can enable the training of various supervised machine learning algorithms which in turn can improve asset pricing, taking account of national and non-institutional investment types.
    JEL: R3
    Date: 2022–01–01
    URL: http://d.repec.org/n?u=RePEc:afr:wpaper:2022-017&r=big
  14. By: Claudio Vitari (AMU - Aix Marseille Université, CERGAM - Centre d'Études et de Recherche en Gestion d'Aix-Marseille - AMU - Aix Marseille Université - UTLN - Université de Toulon); Elisabetta Raguseo (Polito - Politecnico di Torino = Polytechnic of Turin); Federico Pigni (EESC-GEM Grenoble Ecole de Management)
    Abstract: Firms adopt Big data solutions, but a body of evidence suggests that Big data in some cases may create more problems than benefits. We hypothesize that the problem may not be Big data in itself but rather too much of it. These kinds of effects echo the Too-Much-of-a-Good-Thing (TMGT) effect in the field of management. This theory also seems meaningful and applicable in management information systems. We contribute to assessments of the TMGT effect related to Big data by providing an answer to the following question: When does the extension of Big data lead to value erosion? We collected data from a sample of medium and large firms and established a set of regression models to test the relationship between Big data and value creation, considering firm size as a moderator. The data confirm the existence of both an inverted U-shaped curve and firm size moderation. These results extend the applicability of the TMGT effect theory and are useful for firms exploring investments in Big data.
    Keywords: Too-Much-of-a-Good-Thing effect,inverted U-shaped curve,Big data,business value,medium and large firms
    Date: 2022
    URL: http://d.repec.org/n?u=RePEc:hal:gemptp:hal-03876785&r=big
  15. By: Sandro Heiniger; Winfried Koeniger; Michael Lechner
    Abstract: We estimate the transmission of the pandemic shock in 2020 to prices in the residential and commercial real estate market by causal machine learning, using new granular data at the municipal level for Germany. We exploit differences in the incidence of Covid infections or short-time work at the municipal level for identification. In contrast to evidence for other countries, we find that the pandemic had only temporary negative effects on rents for some real estate types and increased asset prices of real estate particularly in the top price segment of commercial real estate.
    Keywords: real estate, asset prices, rents, Covid pandemic, short-time work, affordability crisis
    JEL: E21 E22 G12 G51 R21 R31
    Date: 2022
    URL: http://d.repec.org/n?u=RePEc:ces:ceswps:_10083&r=big
  16. By: Wong, Janis; Morgan, Deborah; Straub, Vincent John; Hashem, Youmna; Bright, Jonathan
    Abstract: As artificial intelligence (AI) becomes increasingly embedded in government operations, retaining democratic control over these technologies is becoming ever more crucial for mitigating potential biases or lack of transparency. However, while much has been written about the need to involve citizens in AI deployment in public administration, little is known about how democratic control of these technologies works in practice. This chapter proposes to address this gap through participatory governance, a subset of governance theory that emphasises democratic engagement, in particular through deliberative practices. We begin by introducing the opportunities and challenges the AI use in government poses. Next, we outline the dimensions of participatory governance and introduce an exploratory framework which can be adopted in the AI implementation process. Finally, we explore how these considerations can be applied to AI governance in public bureaucracies. We conclude by outlining future directions in the study of AI systems governance in government.
    Date: 2022–11–29
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:pdcrm&r=big
  17. By: Dyrland, Kjetil; Lundervold, Alexander Selvikvåg (Western Norway University of Applied Sciences); Porta Mana, PierGianLuca (HVL Western Norway University of Applied Sciences)
    Abstract: In a classification task from a set of features, one would ideally like to have the probability of the class conditional on the features. Such probability is computationally almost impossible to find in many important cases. The primary idea of the present work is to calculate the probability of a class conditional not on the features, but on a trained classifying algorithm's output. Such probability is easily calculated and provides an output-to-probability ’transducer’ that can be applied to the algorithm's future outputs. In conjunction with problem-dependent utilities, the probabilities of the transducer allows one to make the optimal choice among the classes or among a set of more general decisions, by means of expected-utility maximization. The combined procedure is a computationally cheap yet powerful ‘augmentation’ of the original classifier. This idea is demonstrated in a simplified drug-discovery problem with a highly imbalanced dataset. The augmentation leads to improved results, sometimes close to theoretical maximum, for any set of problem-dependent utilities. The calculation of the transducer also provides, automatically: (i) a quantification of the uncertainty about the transducer itself; (ii) the expected utility of the augmented algorithm (including its uncertainty), which can be used for algorithm selection; (iii) the possibility of using the algorithm in a ‘generative mode’, useful if the training dataset is biased. It is argued that the optimality, flexibility, and uncertainty assessment provided by the transducer & augmentation are dearly needed for classification problems in fields such as medicine and drug discovery.
    Date: 2022–06–01
    URL: http://d.repec.org/n?u=RePEc:osf:osfxxx:vct9y&r=big
  18. By: Malevanov Yuriy (Russian Presidential Academy of National Economy and Public Administration); Dozhdikov Anton (Russian Presidential Academy of National Economy and Public Administration); Ivanov Alexandr (Russian Presidential Academy of National Economy and Public Administration)
    Abstract: The purpose of the research: development of mechanisms for optimizing the quality management system of education at various levels in the digital environment based on methods of working with big data.
    Keywords: big data, education management, peer review, data lakes, education control and supervision, education quality assessment, digital educational environment
    Date: 2021–01
    URL: http://d.repec.org/n?u=RePEc:rnp:wpaper:s21046&r=big
  19. By: Edward Kerby; Alexander Moradi; Hanjo Odendaal
    Abstract: In this paper we study 500 years of African economic history using traveller accounts. We systematically collected 2,464 unique documents, of which 855 pass language and rigorous data quality requirements. Our final corpus of texts contains more than 230,000 pages. Analysing such a corpus is an insurmountable task for traditional historians and would probably take a lifetime’s work. Applying modern day computational linguistic techniques such as a structural topic model approach (STM) in combination with domain knowledge of African economic history, we analyse how first hand accounts (topics) evolve across space, time and traveller occupations. Apart from obvious accounts of climate, geography and zoology, we find topics around imperialism, diplomacy, conflict, trade/commerce, health/medicine, evangelization and many more topics of interest to scholarship. We find that some topics follow notable epochs defined by underlying relevance and that travellers’ occupational backgrounds influence the narratives in their writing. Many topics exhibit good temporal and spatial coverage, and a large variation in occupational backgrounds adding different perspectives to a topic. This makes the large body of written accounts a promising source to systemically shed new light on some of Africa’s precolonial past.
    URL: http://d.repec.org/n?u=RePEc:oxf:esohwp:_201&r=big
  20. By: Taiga Saito (Faculty of Economics, The University of Tokyoy); Shivam Gupta (Faculty of Economics, The University of Tokyo)
    Abstract: This study presents big data applications with quantitative theoretical models in financial management and investigates possible incorpora- tion of social media factors into the models. Specifically, we examine three models, a revenue management model, an interest rate model with market sentiments, and a high-frequency trading equity market model, and consider possible extensions of those models to include social media. Since social media plays a substantial role in promoting prod- ucts and services, engaging with customers, and sharing sentiments among market participants, it is important to include social media fac- tors in the stochastic optimization models for financial management. Moreover, we compare the three models from a qualitative and quan- titative point of view and provide managerial implications on how these models are synthetically used along with social media in financial management with a concrete case of a hotel REIT. The contribution of this research is that we investigate the possible incorporation of social media factors into the three models whose objectives are revenue management and debt and equity financing, essential areas in finan- cial management, which helps to estimate the effect and the impact of social media quantitatively if internal data necessary for parameter estimation are available, and provide managerial implications for the synthetic use of the three models from a higher viewpoint. The numer- ical experiment along with the proposition indicates that the model can be used in the revenue management of hotels, and by improving the social media factor, the hotel can work on maximizing its sales.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:tky:fseres:2022cf1205&r=big
  21. By: David M. Arseneau; Alejandro Drexler; Mitsuhiro Osada
    Abstract: This paper applies natural language processing to a large corpus of central bank speeches to identify those related to climate change. We analyze these speeches to better understand how central banks communicate about climate change. By all accounts, communication about climate change has accelerated sharply in recent years. The breadth of topics covered is wide, ranging from the impact of climate change on the economy to financial innovation, sustainable finance, monetary policy, and the central bank mandate. Financial stability concerns are touched upon, but macroprudential policy is rarely mentioned. Direct central bank action largely revolves around identifying and monitoring potential risks to the financial system. Finally, we find that central banks tend to use speculative language more frequently when talking about climate change relative to other topics.
    Keywords: Financial stability; Transparency; Central bank mandate; Green finance; Natural language processing; Central bank speeches
    JEL: E58 E61 Q54
    Date: 2022–05–27
    URL: http://d.repec.org/n?u=RePEc:fip:fedgfe:2022-31&r=big
  22. By: Rowe, Francisco (University of Liverpool); Calafiore, Alessia (University of Liverpool); Arribas-Bel, Dani; Samardzhiev, Krasen; Fleischmann, Martin
    Abstract: Existing empirical work has focused on assessing the effectiveness of non-pharmaceutical interventions on human mobility to contain the spread of COVID-19. Less is known about the ways in which the COVID-19 pandemic has reshaped the spatial patterns of population movement within countries. Anecdotal evidence of an urban exodus from large cities to rural areas emerged during early phases of the pan- demic across western societies. Yet, these claims have not been empirically assessed. Traditional data sources, such as censuses offer coarse temporal frequency to analyse population movement over short-time intervals. Drawing on a data set of 21 million observations from Facebook users, we aim to analyse the extent and evolution of changes in the spatial patterns of population movement across the rural-urban continuum in Britain over an 18-month period from March, 2020 to August, 2021. Our findings show an overall and sustained decline in population movement during periods of high stringency measures, with the most densely populated areas reporting the largest reductions. During these periods, we also find evidence of higher-than-average mobility from highly dense population areas to low densely populated areas, lending some support to claims of large-scale population movements from large cities. Yet, we show that these trends were temporary. Overall mobility levels trended back to pre-coronavirus levels after the easing of non-pharmaceutical interventions. Following these interventions, we also found a reduction in movement to low density areas and a rise in mobility to high density agglomerations. Overall, these findings reveal that while COVID-19 generated shock waves leading to temporary changes in the patterns of population movement in Britain, the resulting vibrations have not significantly reshaped the prevalent structures in the national pattern of population movement.
    Date: 2022–06–03
    URL: http://d.repec.org/n?u=RePEc:osf:osfxxx:6hjv3&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.