nep-big New Economics Papers
on Big Data
Issue of 2023‒04‒03
23 papers chosen by
Tom Coupé
University of Canterbury

  1. Forecasting Bilateral Refugee Flows with High-dimensional Data and Machine Learning Techniques By Konstantin Boss; Andre Groeger; Tobias Heidland; Finja Krueger; Conghan Zheng
  2. Feature Selection for Forecasting By Hakan Pabuccu; Adrian Barbu
  3. Machine Learning algorithms, perspectives, and real-world application: Empirical evidence from United States trade data By Aggarwal, Sakshi
  4. Addressing Sample Selection Bias for Machine Learning Methods By Dylan Brewer; Alyssa Carlson
  5. Why Big Data Can Make Creative Destruction More Creative – But Less Destructive By Norbäck, Pehr-Johan; Persson, Lars
  6. Forecasting fiscal crises in emerging markets and low-income countries with machine learning models By Raffaele De Marchi; Alessandro Moro
  7. Artificial Intelligence (AI) and Policy in Developing Countries By Muhammad Hamza Amjad
  8. EnsembleIV: Creating Instrumental Variables from Ensemble Learners for Robust Statistical Inference By Gordon Burtch; Edward McFowland III; Mochen Yang; Gediminas Adomavicius
  9. Surrogate Data Models: Interpreting Large-scale Machine Learning Crisis Prediction Models By Maksym Ivanyna; Ritong Qu; Ruofei Hu; Cheng Zhong; Mr. Jorge A Chan-Lau
  10. Generative Ornstein-Uhlenbeck Markets via Geometric Deep Learning By Anastasis Kratsios; Cody Hyndman
  11. Predicting individual-level longevity with statistical and machine learning methods By Luca Badolato; Ari Gabriel Decter-Frain; Nicolas Irons; Maria Laura Miranda; Erin Walk; Elnura Zhalieva; Monica Alexander; Ugofilippo Basellini; Emilio Zagheni
  12. Censored Quantile Regression with Many Controls By Seoyun Hong
  13. AI and Macroeconomic Modeling: Deep Reinforcement Learning in an RBC model By Tohid Atashbar; Rui Aruhan Shi
  14. Translating Intersectionality to Fair Machine Learning in Health Sciences By Lett, Elle; La Cava, William
  15. Financial Distress Prediction For Small And Medium Enterprises Using Machine Learning Techniques By Yuan Gao; Biao Jiang; Jietong Zhou
  16. A neural network based model for multi-dimensional nonlinear Hawkes processes By Sobin Joseph; Shashi Jain
  17. Policy gradient learning methods for stochastic control with exit time and applications to share repurchase pricing By Mohamed Hamdouche; Pierre Henry-Labordere; Huyen Pham
  18. Exploring Human and Artificial Intelligence Collaboration and Its Impact on Organizational Performance: A Multi-Level Analysis By Sturm, Timo
  19. Creating Disasters: Recession Forecasting with GAN-Generated Synthetic Time Series Data By Sam Dannels
  20. Combining search strategies to improve performance in the calibration of economic ABMs By Aldo Glielmo; Marco Favorito; Debmallya Chanda; Domenico Delli Gatti
  21. Building Floorspace in China: A Dataset and Learning Pipeline By Peter Egger; Susie Xi Rao; Sebastiano Papini
  22. Walk the green talk? A textual analysis of pension funds’ disclosures of sustainable investing By Rob Bauer; Dirk Broeders; Annick van Ool
  23. Both eyes open: Vigilant Incentives help Regulatory Markets improve AI Safety By Paolo Bova; Alessandro Di Stefano; The Anh Han

  1. By: Konstantin Boss; Andre Groeger; Tobias Heidland; Finja Krueger; Conghan Zheng
    Abstract: We develop monthly refugee flow forecasting models for 150 origin countries to the EU27, using machine learning and high-dimensional data, including digital trace data from Google Trends. Comparing different models and forecasting horizons and validating them out-of-sample, we find that an ensemble forecast combining Random Forest and Extreme Gradient Boosting algorithms consistently outperforms for forecast horizons between 3 to 12 months. For large refugee flow corridors, this holds in a parsimonious model exclusively based on Google Trends variables, which has the advantage of close-to-real-time availability. We provide practical recommendations about how our approach can enable ahead-of-period refugee forecasting applications.
    Keywords: forecasting, refugee flows, asylum seekers, European Union, machine, learning, Google trends
    JEL: C53 C55 F22
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:bge:wpaper:1387&r=big
  2. By: Hakan Pabuccu; Adrian Barbu
    Abstract: This work investigates the importance of feature selection for improving the forecasting performance of machine learning algorithms for financial data. Artificial neural networks (ANN), convolutional neural networks (CNN), long-short term memory (LSTM) networks, as well as linear models were applied for forecasting purposes. The Feature Selection with Annealing (FSA) algorithm was used to select the features from about 1000 possible predictors obtained from 26 technical indicators with specific periods and their lags. In addition to this, the Boruta feature selection algorithm was applied as a baseline feature selection method. The dependent variables consisted of daily logarithmic returns and daily trends of ten financial data sets, including cryptocurrency and different stocks. Experiments indicate that the FSA algorithm increased the performance of ML models regardless of the problem type. The FSA hybrid machine learning models showed better performance in 10 out of 10 data sets for regression and 8 out of 10 data sets for classification. None of the hybrid Boruta models outperformed the hybrid FSA models. However, the BORCNN model performance was comparable to the best model for 4 out of 10 data sets for regression estimates. BOR-LR and BOR-CNN models showed comparable performance with the best hybrid FSA models in 2 out of 10 datasets for classification. FSA was observed to improve the model performance in both better performance metrics as well as a decreased computation time by providing a lower dimensional input feature space.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.02223&r=big
  3. By: Aggarwal, Sakshi
    Abstract: Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without being explicitly programmed. It is one of today’s most rapidly growing technical fields, lying at the crossroads of computer science and statistics, and at the core of artificial intelligence (AI) and data science. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and reinforcement learning exist in this area. Recent progress in ML has been driven both by the development of new learning algorithms theory, and by the ongoing explosion in the availability of vast amount of data (commonly known as “big-data”) and low-cost computation. The adoption of data-intensive ML-based methods can be found throughout science, technology, and commerce, leading to more evidence-based decision-making across many walks of life, including finance, manufacturing, international trade, economics, education, healthcare, marketing, policymaking, and data governance. The present paper provides a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and capabilities of an application. Moreover, the paper attempts to determine the accurate clusters of similar industries in United States that collectively account for more than 85 percent of economy’s aggregate export and import flows over the period 2002-2021 through clustering algorithm (unsupervised learning). Four clusters of mapping labels have been used, namely the low investment (LL), category 1 medium investment (HL), category 2 medium investment (LH) and high investment (HH). The empirical results indicate that machinery and electrical equipment is classified as a high investment sector due to its efficient production mechanism. The analysis further underlines the need for upstream value chain integration through skill-augmentation and innovation especially in low investment industries. Overall, this paper aims to explain the trends of ML approaches and their applicability in various real-world domains, as well as serve as a reference point for academia, industry professionals and policymakers particularly from a technical, ethical, and regulatory point of view.
    Keywords: Machine learning, Artificial intelligence, Clustering, K-means, international trade
    JEL: F14
    Date: 2023–03–03
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:116579&r=big
  4. By: Dylan Brewer (Georgia Institute of Technology); Alyssa Carlson (Department of Economics, University of Missouri)
    Abstract: We study approaches for adjusting machine learning methods when the training sample differs from the prediction sample on unobserved dimensions. The machine learning literature predominately assumes selection only on observed dimensions. Common approaches are to weight or include variables that influence selection as solutions to selection on observables. Simulation results show that selection on unobservables increases mean squared prediction error using popular machine-learning algorithms. Common machine learning practices such as weighting or including variables that influence selection into the training or testing sample often worsens sample selection bias. We propose two control-function approaches that remove the effects of selection bias before training and find that they reduce mean-squared prediction error in simulations. We apply these approaches to predicting the vote share of the incumbent in gubernatorial elections using previously observed re-election bids. We find that ignoring selection on unobservables leads to substantially higher predicted vote shares for the incumbent than when the control function approach is used.
    Keywords: sample selection, machine learning, control function, inverse probability weighting
    JEL: C13 C31 C55 D72
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:umc:wpaper:2302&r=big
  5. By: Norbäck, Pehr-Johan (Research Institute of Industrial Economics (IFN)); Persson, Lars (Research Institute of Industrial Economics (IFN))
    Abstract: The application of machine learning (ML) to big data has become increasingly important. We propose a model where firms have access to the same ML, but incumbents have access to historical data. We show that big data raises entrepreneurial barriers making the creative destruction process less destructive (less business-stealing) if the entrepreneur has weak access to the incumbent’s data. It is also shown that this induces entrepreneurs to take on more risk and be more creative. Policies making data generally available may therefore be suboptimal. Supporting entrepreneurs’ access to ML might be preferable since it stimulates creative entrepreneurship.
    Keywords: Machine Learning; Big Data; Creative Destruction; Entrepreneurship; Operational Data
    JEL: L10 L20 M13 O30
    Date: 2023–02–22
    URL: http://d.repec.org/n?u=RePEc:hhs:iuiwop:1454&r=big
  6. By: Raffaele De Marchi (Bank of Italy); Alessandro Moro (Bank of Italy)
    Abstract: Pre-existing public debt vulnerabilities have been exacerbated by the effects of the pandemic, raising the risk of fiscal crises in emerging markets and low-income countries. This underscores the importance of models designed to capture the main determinants of fiscal distress episodes and forecast sovereign debt crises. In this regard, our paper shows that machine learning techniques outperform standard econometric approaches, such as the probit model. Our analysis also identifies the variables that are the most relevant predictors of fiscal crises and assesses their impact on the probability of a crisis episode. Finally, the forecasts generated by the machine learning algorithms are used to derive aggregate fiscal distress indices that can signal effectively the build-up of debt-related vulnerabilities in emerging and low-income countries.
    Keywords: fiscal crises, debt sustainability, emerging and low-income countries, machine learning techniques
    JEL: C18 C52 F34 H63 H68
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:bdi:wptemi:td_1405_23&r=big
  7. By: Muhammad Hamza Amjad (Pakistan Institute of Development Economics)
    Abstract: First, we need to understand what the term Artificial Intelligence means. “Artificial Intelligence is an area of study in the field of computer science. Artificial intelligence is concerned with the development of computers able to engage in human-like thought processes such as learning, reasoning, and self-correction.”
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:pid:wbrief:2023:115&r=big
  8. By: Gordon Burtch; Edward McFowland III; Mochen Yang; Gediminas Adomavicius
    Abstract: Despite increasing popularity in empirical studies, the integration of machine learning generated variables into regression models for statistical inference suffers from the measurement error problem, which can bias estimation and threaten the validity of inferences. In this paper, we develop a novel approach to alleviate associated estimation biases. Our proposed approach, EnsembleIV, creates valid and strong instrumental variables from weak learners in an ensemble model, and uses them to obtain consistent estimates that are robust against the measurement error problem. Our empirical evaluations, using both synthetic and real-world datasets, show that EnsembleIV can effectively reduce estimation biases across several common regression specifications, and can be combined with modern deep learning techniques when dealing with unstructured data.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.02820&r=big
  9. By: Maksym Ivanyna; Ritong Qu; Ruofei Hu; Cheng Zhong; Mr. Jorge A Chan-Lau
    Abstract: Machine learning models are becoming increasingly important in the prediction of economic crises. The models, however, use datasets comprising a large number of predictors (features) which impairs model interpretability and their ability to provide adequate guidance in the design of crisis prevention and mitigation policies. This paper introduces surrogate data models as dimensionality reduction tools in large-scale crisis prediction models. The appropriateness of this approach is assessed by their application to large-scale crisis prediction models developed at the IMF. The results are consistent with economic intuition and validate the use of surrogates as interpretability tools.
    Keywords: Crisis prediction; machine learning; surrogates; explainable models; IMF ML crisis; ML crisis prediction; Annex I. IMF ML; surrogate data models; model interpretability; Early warning systems; Yield curve; Deposit rates; Global
    Date: 2023–02–24
    URL: http://d.repec.org/n?u=RePEc:imf:imfwpa:2023/041&r=big
  10. By: Anastasis Kratsios; Cody Hyndman
    Abstract: We consider the problem of simultaneously approximating the conditional distribution of market prices and their log returns with a single machine learning model. We show that an instance of the GDN model of Kratsios and Papon (2022) solves this problem without having prior assumptions on the market's "clipped" log returns, other than that they follow a generalized Ornstein-Uhlenbeck process with a priori unknown dynamics. We provide universal approximation guarantees for these conditional distributions and contingent claims with a Lipschitz payoff function.
    Date: 2023–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2302.09176&r=big
  11. By: Luca Badolato (Max Planck Institute for Demographic Research, Rostock, Germany); Ari Gabriel Decter-Frain (Max Planck Institute for Demographic Research, Rostock, Germany); Nicolas Irons (Max Planck Institute for Demographic Research, Rostock, Germany); Maria Laura Miranda (Max Planck Institute for Demographic Research, Rostock, Germany); Erin Walk; Elnura Zhalieva; Monica Alexander (Max Planck Institute for Demographic Research, Rostock, Germany); Ugofilippo Basellini (Max Planck Institute for Demographic Research, Rostock, Germany); Emilio Zagheni (Max Planck Institute for Demographic Research, Rostock, Germany)
    Abstract: Individual-level mortality prediction is a fundamental challenge with implications for people and societies. Accurate longevity predictions improve life planning, targeting of high-risk individuals, and organization of social interventions, policies, and public spending. Demographers and actuaries have been primarily concerned with mortality modeling and prediction at a macro level, leveraging strong regularities in mortality rates over age, sex, space, and time. Besides clinical settings, individual-level mortality predictions have been largely overlooked and have remained a challenging task. We model and predict individual-level lifespan using data from the US Health and Retirement Study, a nationally representative longitudinal survey of people over 50 years of age. We estimate 12 statistical and machine learning survival analysis models using over 150 predictors measuring behavioral, biological, demographic, health, and social indicators. Extending previous research on inequalities in mortality and morbidity, we investigate inequalities in individual mortality prediction by gender, race and ethnicity, and education. Machine learning and traditional models report comparable accuracy and relatively high discriminative performance, particularly when including time-varying information (best mean Area Under the Curve = 0.87). However, the models and predictors used fail to account for a majority of lifespan heterogeneity at the individual level. We observe consistent inequalities in mortality predictability and risk discrimination, with lower prediction accuracy for men, non-Hispanic Blacks, and low-educated individuals. In addition, people in these groups show lower accuracy in their subjective predictions of their own lifespan. Finally, we see minimal variation in the top features across groups, with variables related to habits, health history, and finances being relevant predictors. Our results assess how well mortality can be predicted from representative surveys, providing baselines and guidance for future research across countries.
    Keywords: USA, forecasts, inequality, longevity
    JEL: J1 Z0
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:dem:wpaper:wp-2023-008&r=big
  12. By: Seoyun Hong
    Abstract: This paper develops estimation and inference methods for censored quantile regression models with high-dimensional controls. The methods are based on the application of double/debiased machine learning (DML) framework to the censored quantile regression estimator of Buchinsky and Hahn (1998). I provide valid inference for low-dimensional parameters of interest in the presence of high-dimensional nuisance parameters when implementing machine learning estimators. The proposed estimator is shown to be consistent and asymptotically normal. The performance of the estimator with high-dimensional controls is illustrated with numerical simulation and an empirical application that examines the effect of 401(k) eligibility on savings.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.02784&r=big
  13. By: Tohid Atashbar; Rui Aruhan Shi
    Abstract: This study seeks to construct a basic reinforcement learning-based AI-macroeconomic simulator. We use a deep RL (DRL) approach (DDPG) in an RBC macroeconomic model. We set up two learning scenarios, one of which is deterministic without the technological shock and the other is stochastic. The objective of the deterministic environment is to compare the learning agent's behavior to a deterministic steady-state scenario. We demonstrate that in both deterministic and stochastic scenarios, the agent's choices are close to their optimal value. We also present cases of unstable learning behaviours. This AI-macro model may be enhanced in future research by adding additional variables or sectors to the model or by incorporating different DRL algorithms.
    Keywords: Reinforcement learning; Deep reinforcement learning; Artificial intelligence; RL; DRL; Learning algorithms; Macro modeling; RBC; Real business cycles; DDPG; Deep deterministic policy gradient; Actor-critic algorithms
    Date: 2023–02–24
    URL: http://d.repec.org/n?u=RePEc:imf:imfwpa:2023/040&r=big
  14. By: Lett, Elle; La Cava, William
    Abstract: Machine learning (ML)-derived tools are rapidly being deployed as an additional input in the clinical decision-making process to optimize health interventions. However, ML models also risk propagating societal discrimination and exacerbating existing health inequities. The field of ML fairness has focused on developing approaches to mitigate bias in ML models. To date, the focus has been on the model fitting process, simplifying the processes of structural discrimination to definitions of model bias based on performance metrics. Here, we reframe the ML task through the lens of intersectionality, a Black feminist theoretical framework that contextualizes individuals in interacting systems of power and oppression, linking inquiry into measuring fairness to the pursuit of health justice. In doing so, we present intersectional ML fairness as a paradigm shift that moves from an emphasis on model metrics to an approach for ML that is centered around achieving more equitable health outcomes.
    Date: 2023–02–27
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:gu7yh&r=big
  15. By: Yuan Gao; Biao Jiang; Jietong Zhou
    Abstract: Financial Distress Prediction plays a crucial role in the economy by accurately forecasting the number and probability of failing structures, providing insight into the growth and stability of a country's economy. However, predicting financial distress for Small and Medium Enterprises is challenging due to their inherent ambiguity, leading to increased funding costs and decreased chances of receiving funds. While several strategies have been developed for effective FCP, their implementation, accuracy, and data security fall short of practical applications. Additionally, many of these strategies perform well for a portion of the dataset but are not adaptable to various datasets. As a result, there is a need to develop a productive prediction model for better order execution and adaptability to different datasets. In this review, we propose a feature selection algorithm for FCP based on element credits and data source collection. Current financial distress prediction models rely mainly on financial statements and disregard the timeliness of organization tests. Therefore, we propose a corporate FCP model that better aligns with industry practice and incorporates the gathering of thin-head component analysis of financial data, corporate governance qualities, and market exchange data with a Relevant Vector Machine. Experimental results demonstrate that this strategy can improve the forecast efficiency of financial distress with fewer characteristic factors.
    Date: 2023–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2302.12118&r=big
  16. By: Sobin Joseph; Shashi Jain
    Abstract: This paper introduces the Neural Network for Nonlinear Hawkes processes (NNNH), a non-parametric method based on neural networks to fit nonlinear Hawkes processes. Our method is suitable for analyzing large datasets in which events exhibit both mutually-exciting and inhibitive patterns. The NNNH approach models the individual kernels and the base intensity of the nonlinear Hawkes process using feed forward neural networks and jointly calibrates the parameters of the networks by maximizing the log-likelihood function. We utilize Stochastic Gradient Descent to search for the optimal parameters and propose an unbiased estimator for the gradient, as well as an efficient computation method. We demonstrate the flexibility and accuracy of our method through numerical experiments on both simulated and real-world data, and compare it with state-of-the-art methods. Our results highlight the effectiveness of the NNNH method in accurately capturing the complexities of nonlinear Hawkes processes.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.03073&r=big
  17. By: Mohamed Hamdouche; Pierre Henry-Labordere; Huyen Pham
    Abstract: We develop policy gradients methods for stochastic control with exit time in a model-free setting. We propose two types of algorithms for learning either directly the optimal policy or by learning alternately the value function (critic) and the optimal control (actor). The use of randomized policies is crucial for overcoming notably the issue related to the exit time in the gradient computation. We demonstrate the effectiveness of our approach by implementing our numerical schemes in the application to the problem of share repurchase pricing. Our results show that the proposed policy gradient methods outperform PDE or other neural networks techniques in a model-based setting. Furthermore, our algorithms are flexible enough to incorporate realistic market conditions like e.g. price impact or transaction costs.
    Date: 2023–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2302.07320&r=big
  18. By: Sturm, Timo
    Abstract: To achieve great performance and ensure their long-term survival, organizations must successfully act in and adapt to the reality that surrounds them, which requires organizations to learn effectively. For decades, organizations have relied exclusively on human learning for this purpose. With today’s rise of machine learning (ML) systems as a modern form of artificial intelligence (AI) and their ability to autonomously learn and act, ML systems can now also contribute to this vital process, offering organizations an alternative way to learn. Although organizations are increasingly adopting ML systems within a wide range of processes, we still know surprisingly little about how the learning of humans and ML systems affects each other and how their mutual learning affects organizational performance. Although a significant amount of research has addressed ML, existing research leaves it largely unclear whether and when humans and ML systems act as beneficial complementarities or as mutual impediments within the context of learning. This is problematic, as the (mis)use of ML systems may corrupt an organization’s central process of learning and thus impair the organizational adaptation that is crucial for organizational survival. To help organizations facilitate useful synergies of humans and ML systems, this dissertation explores humans’ and ML systems’ idiosyncrasies and their bilateral interplay. As research on organizational learning has demonstrated, the key to managing such dynamics is the effective coordination of the ones who learn. The studies that were conducted for this dissertation therefore aim to uncover virtuous and vicious dynamics between humans and ML systems and how these dynamics can be managed to increase organizational performance. To take a holistic perspective, this dissertation explores three central levels of analysis. The first level of analysis deals with performance impacts on the individual level. Here, the analysis focuses on two essential issues. First, the availability of ML systems as an alternative to humans requires organizations to rethink their problem delegation strategies. Organizations can benefit the most from the relative strengths of humans and ML systems if they are able to delegate problems to those whose expertise and capabilities best fit the problem. This requires organizations to develop an understanding of the problem characteristics that point to problems that are better (or less) suited to being solved by ML systems than by humans. Using a qualitative interview approach, the first study identifies central criteria and procedural artifacts and synthesizes these into a framework for identifying and evaluating problems in ML contexts. The framework provides a theoretical basis to help inform research about delegation decisions between humans and ML systems by unpacking problem nuances that decisively render problems suitable for ML systems. Building on these insights, a subsequent qualitative analysis explores how the dependency between a human and an ML system with respect to the delegated problem affects performance outcomes. The theoretical model that is proposed explains individual performance gains that result from ML systems’ use as a function of the fit between task, data, and technology characteristics. The model highlights how idiosyncrasies of an ML system can affect a human expert’s task execution performance when the expert bases her/his task execution on the ML system’s contributions. This study provides first empirical evidence on controllable levers for managing involved dependencies to increase individual performance. The second level of analysis focuses on performance impacts on the group level. In contrast to traditional (non-ML) information systems, ML systems’ unique learning ability enables them to contribute independently to team endeavors, joining groups as active members that can affect group dynamics through their own contributions. Thus, in a third study, a digital trace analysis is conducted to explore the dynamics of a real-world case in which a group of human traders and a productively trading reinforcement ML system collaborate during trading. The studied case reveals that bilateral learning between multiple humans and an ML system can increase trading performance, which appears to be the result of an emerging virtuous cycle between the humans and the ML system. The findings demonstrate that the interactions between the humans and the ML system can lead to group performance that outperforms the individual trading of either the humans or the ML system. However, in order to achieve this, organizations must effectively coordinate the knowledge transfer and the roles of the involved humans and the ML system. The third level of analysis focuses on performance impacts on the organization level. As ML systems increasingly contribute to organizational processes in all areas of the organization, changes in the organization’s fundamental concepts are likely to occur, and these may affect the organization’s overall performance. In a fourth study, a series of agent-based simulations are therefore used to explore the dynamics of organization-wide interactions between humans and ML systems. The results imply that ML systems can help stimulate the pursuit of innovative directions, liberating humans from exploring unorthodox ideas. The results also show that the alignment of human learning and ML is largely beneficial but can, under certain conditions, become detrimental to organizations. The findings emphasize that effective coordination of humans and ML systems that takes environmental conditions into account can determine the positive and negative impacts of ML systems on organization-level performance. The analyses included in this dissertation highlight that it is precisely the unique differences between humans and ML systems that often seem to make them better complements than substitutes for one another. The secret to unleashing the true potential of ML systems may therefore lie in effectively coordinating the differences between humans and ML systems within their bilateral relationship to produce virtuous cycles of mutual improvement. This dissertation is a first step toward developing theory and guidance on coordinating the dynamics between humans and ML systems, with the aim of helping to rethink collaboration theory in the era of AI.
    Date: 2023
    URL: http://d.repec.org/n?u=RePEc:dar:wpaper:137083&r=big
  19. By: Sam Dannels
    Abstract: A common problem when forecasting rare events, such as recessions, is limited data availability. Recent advancements in deep learning and generative adversarial networks (GANs) make it possible to produce high-fidelity synthetic data in large quantities. This paper uses a model called DoppelGANger, a GAN tailored to producing synthetic time series data, to generate synthetic Treasury yield time series and associated recession indicators. It is then shown that short-range forecasting performance for Treasury yields is improved for models trained on synthetic data relative to models trained only on real data. Finally, synthetic recession conditions are produced and used to train classification models to predict the probability of a future recession. It is shown that training models on synthetic recessions can improve a model's ability to predict future recessions over a model trained only on real data.
    Date: 2023–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2302.10490&r=big
  20. By: Aldo Glielmo; Marco Favorito; Debmallya Chanda; Domenico Delli Gatti
    Abstract: Calibrating agent-based models (ABMs) in economics and finance typically involves a derivative-free search in a very large parameter space. In this work, we benchmark a number of search methods in the calibration of a well-known macroeconomic ABM on real data, and further assess the performance of "mixed strategies" made by combining different methods. We find that methods based on random-forest surrogates are particularly efficient, and that combining search methods generally increases performance since the biases of any single method are mitigated. Moving from these observations, we propose a reinforcement learning (RL) scheme to automatically select and combine search methods on-the-fly during a calibration run. The RL agent keeps exploiting a specific method only as long as this keeps performing well, but explores new strategies when the specific method reaches a performance plateau. The resulting RL search scheme outperforms any other method or method combination tested, and does not rely on any prior information or trial and error procedure.
    Date: 2023–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2302.11835&r=big
  21. By: Peter Egger; Susie Xi Rao; Sebastiano Papini
    Abstract: This paper provides the first milestone in measuring the floor space of buildings (that is, building footprint and height) and its evolution over time for China. Doing so requires building on imagery that is of a medium-fine-grained granularity, as longer cross-sections and time series data across many cities are only available in such format. We use a multi-class object segmenter approach to gauge the floor space of buildings in the same framework: first, we determine whether a surface area is covered by buildings (the square footage of occupied land); second, we need to determine the height of buildings from their imagery. We then use Sentinel-1 and -2 satellite images as our main data source. The benefits of these data are their large cross-sectional and longitudinal scope plus their unrestricted accessibility. We provide a detailed description of the algorithms used to generate the data and the results. We analyze the preprocessing steps of reference data (if not ground truth data) and their consequences for measuring the building floor space. We also discuss the future steps in building a time series on urban development based on our preliminary experimental results.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.02230&r=big
  22. By: Rob Bauer; Dirk Broeders; Annick van Ool
    Abstract: In this paper, we analyze the disclosures of sustainable investing by Dutch pension funds in their annual reports from 2016 to 2021. We introduce a novel textual analysis approach using state-of-the-art natural language processing (NLP) techniques to measure the awareness and implementation of sustainable investing, where we define awareness as the amount of attention paid to sustainable investing in the annual report. We exploit a proprietary dataset to analyze the relation between pension fund characteristics and sustainable investing. We find that a pension fund’s size increases both the awareness and the implementation of sustainable investing. Moreover, we analyze the role of signing the International Responsible Business Conduct (IRBC) initiative. Large pension funds, pension funds with more female trustees, or pension funds with a positive belief about the risk-return relation of sustainable investing are more likely to sign the IRBC initiative. Although signing this initiative increases the specificity of pension fund statements about sustainable investing, we do not find an effect on the implementation of sustainable investing.
    Keywords: ESG; sustainability; SI initiatives; pension funds; textual analysis; natural language processing.
    JEL: C8 G11 J32 M14 Q54
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:dnb:dnbwpp:770&r=big
  23. By: Paolo Bova; Alessandro Di Stefano; The Anh Han
    Abstract: In the context of rapid discoveries by leaders in AI, governments must consider how to design regulation that matches the increasing pace of new AI capabilities. Regulatory Markets for AI is a proposal designed with adaptability in mind. It involves governments setting outcome-based targets for AI companies to achieve, which they can show by purchasing services from a market of private regulators. We use an evolutionary game theory model to explore the role governments can play in building a Regulatory Market for AI systems that deters reckless behaviour. We warn that it is alarmingly easy to stumble on incentives which would prevent Regulatory Markets from achieving this goal. These 'Bounty Incentives' only reward private regulators for catching unsafe behaviour. We argue that AI companies will likely learn to tailor their behaviour to how much effort regulators invest, discouraging regulators from innovating. Instead, we recommend that governments always reward regulators, except when they find that those regulators failed to detect unsafe behaviour that they should have. These 'Vigilant Incentives' could encourage private regulators to find innovative ways to evaluate cutting-edge AI systems.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.03174&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.