|
on Big Data |
By: | Kendra Walker; Ben Moscona; Kelsey Jack; Seema Jayachandran; Namrata Kala; Rohini Pande; Jiani Xue; Marshall Burke |
Abstract: | Crop residue burning is a major source of air pollution in many parts of the world, notably South Asia. Policymakers, practitioners and researchers have invested in both measuring impacts and developing interventions to reduce burning. However, measuring the impacts of burning or the effectiveness of interventions to reduce burning requires data on where burning occurred. These data are challenging to collect in the field, both in terms of cost and feasibility. We take advantage of data from ground-based monitoring of crop residue burning in Punjab, India to explore whether burning can be detected more effectively using accessible satellite imagery. Specifically, we used 3m PlanetScope data with high temporal resolution (up to daily) as well as publicly-available Sentinel-2 data with weekly temporal resolution but greater depth of spectral information. Following an analysis of the ability of different spectral bands and burn indices to separate burned and unburned plots individually, we built a Random Forest model with those determined to provide the greatest separability and evaluated model performance with ground-verified data. Our overall model accuracy of 82-percent is favorable given the challenges presented by the measurement. Based on insights from this process, we discuss technical challenges of detecting crop residue burning from satellite imagery as well as challenges to measuring impacts, both of burning and of policy interventions. |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2209.10148&r= |
By: | Dangxing Chen; Weicheng Ye |
Abstract: | For many years, machine learning methods have been used in a wide range of fields, including computer vision and natural language processing. While machine learning methods have significantly improved model performance over traditional methods, their black-box structure makes it difficult for researchers to interpret results. For highly regulated financial industries, transparency, explainability, and fairness are equally, if not more, important than accuracy. Without meeting regulated requirements, even highly accurate machine learning methods are unlikely to be accepted. We address this issue by introducing a novel class of transparent and interpretable machine learning algorithms known as generalized gloves of neural additive models. The generalized gloves of neural additive models separate features into three categories: linear features, individual nonlinear features, and interacted nonlinear features. Additionally, interactions in the last category are only local. The linear and nonlinear components are distinguished by a stepwise selection algorithm, and interacted groups are carefully verified by applying additive separation criteria. Empirical results demonstrate that generalized gloves of neural additive models provide optimal accuracy with the simplest architecture, allowing for a highly accurate, transparent, and explainable approach to machine learning. |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2209.10082&r= |
By: | Dangxing Chen; Weicheng Ye |
Abstract: | The forecasting of credit default risk has been an active research field for several decades. Historically, logistic regression has been used as a major tool due to its compliance with regulatory requirements: transparency, explainability, and fairness. In recent years, researchers have increasingly used complex and advanced machine learning methods to improve prediction accuracy. Even though a machine learning method could potentially improve the model accuracy, it complicates simple logistic regression, deteriorates explainability, and often violates fairness. In the absence of compliance with regulatory requirements, even highly accurate machine learning methods are unlikely to be accepted by companies for credit scoring. In this paper, we introduce a novel class of monotonic neural additive models, which meet regulatory requirements by simplifying neural network architecture and enforcing monotonicity. By utilizing the special architectural features of the neural additive model, the monotonic neural additive model penalizes monotonicity violations effectively. Consequently, the computational cost of training a monotonic neural additive model is similar to that of training a neural additive model, as a free lunch. We demonstrate through empirical results that our new model is as accurate as black-box fully-connected neural networks, providing a highly accurate and regulated machine learning method. |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2209.10070&r= |
By: | Milien Dhorne; Claire Nicolas; Christopher Arderne; Juliette Besnard |
Keywords: | Energy - Electric Power Energy - Energy Conservation & Efficiency |
Date: | 2021–04 |
URL: | http://d.repec.org/n?u=RePEc:wbk:wboper:35473&r= |
By: | Dangxing Chen; Weicheng Ye; Jiahui Ye |
Abstract: | The forecasting of the credit default risk has been an important research field for several decades. Traditionally, logistic regression has been widely recognized as a solution due to its accuracy and interpretability. As a recent trend, researchers tend to use more complex and advanced machine learning methods to improve the accuracy of the prediction. Although certain non-linear machine learning methods have better predictive power, they are often considered to lack interpretability by financial regulators. Thus, they have not been widely applied in credit risk assessment. We introduce a neural network with the selective option to increase interpretability by distinguishing whether the datasets can be explained by the linear models or not. We find that, for most of the datasets, logistic regression will be sufficient, with reasonable accuracy; meanwhile, for some specific data portions, a shallow neural network model leads to much better accuracy without significantly sacrificing the interpretability. |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2209.10127&r= |
By: | Yicong Liu; Kaili Wang; Patrick Loa; Khandker Nurul Habib |
Abstract: | The COVID-19 pandemic dramatically catalyzed the proliferation of e-shopping. The dramatic growth of e-shopping will undoubtedly cause significant impacts on travel demand. As a result, transportation modeller's ability to model e-shopping demand is becoming increasingly important. This study developed models to predict household' weekly home delivery frequencies. We used both classical econometric and machine learning techniques to obtain the best model. It is found that socioeconomic factors such as having an online grocery membership, household members' average age, the percentage of male household members, the number of workers in the household and various land use factors influence home delivery demand. This study also compared the interpretations and performances of the machine learning models and the classical econometric model. Agreement is found in the variable's effects identified through the machine learning and econometric models. However, with similar recall accuracy, the ordered probit model, a classical econometric model, can accurately predict the aggregate distribution of household delivery demand. In contrast, both machine learning models failed to match the observed distribution. |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2209.10664&r= |
By: | Thanos Papadopoulos (University of Kent [Canterbury]); Uthayasankar Sivarajah (School of Management [Bradford] - University of Bradford); Konstantina Spanaki (Audencia Business School); Stella Despoudi (Aston Business School - Aston University [Birmingham]); Angappa Gunasekaran (CSUB - California State University [Bakersfield]) |
Date: | 2022 |
URL: | http://d.repec.org/n?u=RePEc:hal:journl:hal-03766170&r= |
By: | Soohan Kim; Seok-Bae Yun; Hyeong-Ohk Bae; Muhyun Lee; Youngjoon Hong |
Abstract: | Predicting volatility is important for asset predicting, option pricing and hedging strategies because it cannot be directly observed in the financial market. The Black-Scholes option pricing model is one of the most widely used models by market participants. Notwithstanding, the Black-Scholes model is based on heavily criticized theoretical premises, one of which is the constant volatility assumption. The dynamics of the volatility surface is difficult to estimate. In this paper, we establish a novel architecture based on physics-informed neural networks and convolutional transformers. The performance of the new architecture is directly compared to other well-known deep-learning architectures, such as standard physics-informed neural networks, convolutional long-short term memory (ConvLSTM), and self-attention ConvLSTM. Numerical evidence indicates that the proposed physics-informed convolutional transformer network achieves a superior performance than other methods. |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2209.10771&r= |
By: | L. Ingber |
Date: | 2022 |
URL: | http://d.repec.org/n?u=RePEc:lei:ingber:22rn&r= |
By: | Christophe Geissler; Nicolas Morizet; Matteo Rizzato; Julien Wallart |
Abstract: | The finance industry is producing an increasing amount of datasets that investment professionals can consider to be influential on the price of financial assets. These datasets were initially mainly limited to exchange data, namely price, capitalization and volume. Their coverage has now considerably expanded to include, for example, macroeconomic data, supply and demand of commodities, balance sheet data and more recently extra-financial data such as ESG scores. This broadening of the factors retained as influential constitutes a serious challenge for statistical modeling. Indeed, the instability of the correlations between these factors makes it practically impossible to identify the joint laws needed to construct scenarios. Fortunately, spectacular advances in Deep Learning field in recent years have given rise to GANs. GANs are a type of generative machine learning models that produce new data samples with the same characteristics as a training data distribution in an unsupervised way, avoiding data assumptions and human induced biases. In this work, we are exploring the use of GANs for synthetic financial scenarios generation. This pilot study is the result of a collaboration between Fujitsu and Advestis and it will be followed by a thorough exploration of the use cases that can benefit from the proposed solution. We propose a GANs-based algorithm that allows the replication of multivariate data representing several properties (including, but not limited to, price, market capitalization, ESG score, controversy score,. . .) of a set of stocks. This approach differs from examples in the financial literature, which are mainly focused on the reproduction of temporal asset price scenarios. We also propose several metrics to evaluate the quality of the data generated by the GANs. This approach is well fit for the generation of scenarios, the time direction simply arising as a subsequent (eventually conditioned) generation of data points drawn from the learned distribution. Our method will allow to simulate high dimensional scenarios (compared to $\lesssim 10$ features currently employed in most recent use cases) where network complexity is reduced thanks to a wisely performed feature engineering and selection. Complete results will be presented in a forthcoming study. |
Date: | 2022–07 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2209.03935&r= |
By: | Oleksandr Talavera (University of Birmingham); Shuxing Yin (University of Sheffield); Mao Zhang (University of St Andrews) |
Abstract: | Using machine learning techniques, we extract vocal emotions from audio files of earnings conference calls and examine how managers communicate with analysts in question-and-answer (Q&A) sessions. Focusing on these conversations, we find that the vocal emotion of managers (answering questions) is affected by how each question is asked and who asks the question. Managers, who dialogues with a positive emotive analyst or a female analyst, exhibit a more positive vocal response. Our data also provide evidence of distinctive manager-specific vocal communication styles. Female managers and younger managers are more likely to display negative vocal emotions as compared to male and older colleagues. Stock prices respond to managers' vocal emotions in a timely manner. Analysts also incorporate such vocal emotions into near-term earnings forecasts. |
Keywords: | conference calls; vocal emotion; manager-analyst conversation; gender; market reaction |
JEL: | G10 G14 G41 G30 J16 M14 |
Date: | 2022–10 |
URL: | http://d.repec.org/n?u=RePEc:bir:birmec:22-11&r= |
By: | Peter Christensen; Paul Francisco; Erica Myers; Hansen Shao; Mateus Souza |
Abstract: | Building energy efficiency has been a cornerstone of greenhouse gas mitigation strategies for decades. However, impact evaluations have revealed that energy savings typically fall short of engineering model forecasts that currently guide funding decisions. This creates a resource allocation problem that impedes progress on climate change. Using data from the largest U.S. energy efficiency program, we demonstrate that a data-driven approach to predicting retrofit impacts based on previously realized outcomes is more accurate than the status quo engineering models. Targeting high-return interventions based on these predictions dramatically increases net social benefits, from $0.93 to $1.23 per dollar invested. |
JEL: | H50 Q4 |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:30467&r= |
By: | Xiaohua Bao; Hailiang Huang; Larry D Qiu; Xiaozhuo Wang |
Abstract: | The notion that the exchange rate affects exports is well understood. However, whether exporters respond to the expectations of the exchange rate is unknown. Hence, in this study, we construct a measure of exchange rate expectations based on news articles from the Factiva database. We use machine learning to identify and classify news articles about the appreciation of the renminbi (RMB, Chinese currency). Our empirical estimation shows that from 2000 to 2006, Chinese firms reduced their exports in response to a higher expectation of RMB appreciation. They switched their sales from export to domestic markets. The responses are larger in low-productivity firms, state-owned enterprises, processing trade, and final goods trade. |
Keywords: | Exchange rate expectation; Exports; RMB appreciation |
Date: | 2022 |
URL: | http://d.repec.org/n?u=RePEc:not:notgep:2022-07&r= |
By: | Alena Pavlova (Institute of Economic Studies, Faculty of Social Sciences, Charles University, Prague, Czech Republic) |
Abstract: | This article explores the relationship between labor costs and price inflation under two conditions. Firstly, with linear assumption and classical techniques. Secondly, without assuming linearity, by a novel non-parametric machine learning method, namely gradient boosting. With quarterly data from 1996 to 2022 for V4 countries, we find linear and non-linear dependency between labor cost and price inflation. However, the magnitude of the connection is country-specific and changes over time. Our findings indicate that a significant linear relationship between considered variables does not lead to the higher predictability power of labor cost in a non-parametric model, which predicts inflation. Even opposed, the Czech Republic, the country with the highest correlation between unit labor cost(ULC) and deflator, shows better prediction in a case when the ULC is not in the set of independent variables. This fact highlights the importance of non-linearity for the inflation model. |
Keywords: | inflation, labor cost, non-linear model, V4 countries |
JEL: | E24 E31 E37 |
Date: | 2022–10 |
URL: | http://d.repec.org/n?u=RePEc:fau:wpaper:wp2022_25&r= |
By: | St\'ephane Cr\'epey (LPSM, UPCit\'e); Lehdili Noureddine (LPSM, UPCit\'e); Nisrine Madhar (LPSM, UPCit\'e); Maud Thomas (LPSM, SU) |
Abstract: | We consider time series representing a wide variety of risk factors in the context of financial risk management. A major issue of these data is the presence of anomalies that induce a miscalibration of the models used to quantify and manage risk, whence potentially erroneous risk measures on their basis. Therefore, the detection of anomalies is of utmost importance in financial risk management. We propose an approach that aims at improving anomaly detection on financial time series, overcoming most of the inherent difficulties. One first concern is to extract from the time series valuable features that ease the anomaly detection task. This step is ensured through a compression and reconstruction of the data with the application of principal component analysis. We define an anomaly score using a feed-forward neural network. A time series is deemed contaminated when its anomaly score exceeds a given cutoff. This cutoff value is not a hand-set parameter, instead it is calibrated as a parameter of the neural network throughout the minimisation of a customized loss function. The efficiency of the proposed model with respect to several well-known anomaly detection algorithms is numerically demonstrated. We show on a practical case of value-at-risk estimation, that the estimation errors are reduced when the proposed anomaly detection model is used, together with a naive imputation approach to correct the anomaly. |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2209.11686&r= |
By: | Gutiérrez, Antonio |
Abstract: | Urban mobility patterns are changing as a response to new behaviours in cities. With more journeys, increased demand for motorised vehicles and longer distances to travel the need to study urban mobility is necessary to guide society towards a more sustainable horizon. Big Data and the digital footprint of people and vehicles have created a new source of appropriate information for urban mobility studies. Therefore, this article presents the different tools that offer high-frequency and spatial-temporal resolution data along with a review of the literature that uses these datasets in urban mobility research. |
Keywords: | urban mobility; social network; big Data |
JEL: | C80 R40 |
Date: | 2022–10–03 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:114854&r= |
By: | Stéphane Crépey (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistique et Modélisation - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique - UPCité - Université Paris Cité, UPCité - Université Paris Cité); Lehdili Noureddine (Natixis); Nisrine Madhar (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistique et Modélisation - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique - UPCité - Université Paris Cité, UPCité - Université Paris Cité, Natixis); Maud Thomas (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistique et Modélisation - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique - UPCité - Université Paris Cité, SU - Sorbonne Université) |
Abstract: | We consider time series representing a wide variety of risk factors in the context of financial risk management. A major issue of these data is the presence of anomalies that induce a miscalibration of the models used to quantify and manage risk, whence potentially erroneous risk measures on their basis. Therefore, the detection of anomalies is of utmost importance in financial risk management. We propose an approach that aims at improving anomaly detection on financial time series, overcoming most of the inherent difficulties. One first concern is to extract from the time series valuable features that ease the anomaly detection task. This step is ensured through a compression and reconstruction of the data with the application of principal component analysis. We define an anomaly score using a feed-forward neural network. A time series is deemed contaminated when its anomaly score exceeds a given cutoff. This cutoff value is not a hand-set parameter, instead it is calibrated as a parameter of the neural network throughout the minimisation of a customized loss function. The efficiency of the proposed model with respect to several well-known anomaly detection algorithms is numerically demonstrated. We show on a practical case of value-at-risk estimation, that the estimation errors are reduced when the proposed anomaly detection model is used, together with a naive imputation approach to correct the anomaly. |
Keywords: | anomaly detection,financial time series,principal component analysis,neural network,density estimation,missing data,market risk,value at risk |
Date: | 2022–09–15 |
URL: | http://d.repec.org/n?u=RePEc:hal:wpaper:hal-03777995&r= |
By: | World Bank |
Keywords: | Information and Communication Technologies - Digital Divide Information and Communication Technologies - ICT Applications Information and Communication Technologies - ICT Economics Information and Communication Technologies - ICT Policy and Strategies Information and Communication Technologies - Information Technology |
Date: | 2021–05 |
URL: | http://d.repec.org/n?u=RePEc:wbk:wboper:35619&r= |
By: | Adebayo Oshingbesan; Eniola Ajiboye; Peruth Kamashazi; Timothy Mbaka |
Abstract: | Asset allocation (or portfolio management) is the task of determining how to optimally allocate funds of a finite budget into a range of financial instruments/assets such as stocks. This study investigated the performance of reinforcement learning (RL) when applied to portfolio management using model-free deep RL agents. We trained several RL agents on real-world stock prices to learn how to perform asset allocation. We compared the performance of these RL agents against some baseline agents. We also compared the RL agents among themselves to understand which classes of agents performed better. From our analysis, RL agents can perform the task of portfolio management since they significantly outperformed two of the baseline agents (random allocation and uniform allocation). Four RL agents (A2C, SAC, PPO, and TRPO) outperformed the best baseline, MPT, overall. This shows the abilities of RL agents to uncover more profitable trading strategies. Furthermore, there were no significant performance differences between value-based and policy-based RL agents. Actor-critic agents performed better than other types of agents. Also, on-policy agents performed better than off-policy agents because they are better at policy evaluation and sample efficiency is not a significant problem in portfolio management. This study shows that RL agents can substantially improve asset allocation since they outperform strong baselines. On-policy, actor-critic RL agents showed the most promise based on our analysis. |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2209.10458&r= |
By: | Elin Halvorsen; Hans Holter; Serdar Ozkan; Kjetil Storesletten |
Abstract: | This paper examines whether nonlinear and non-Gaussian features of earnings dynamics are caused by hours or hourly wages. Our findings from the Norwegian administrative and survey data are as follows: (i) Nonlinear mean reversion in earnings is driven by the dynamics of hours worked rather than wages since wage dynamics are close to linear, while hours dynamics are nonlinear—negative changes to hours are transitory, while positive changes are persistent. (ii) Large earnings changes are driven equally by hours and wages, whereas small changes are associated mainly with wage shocks. (iii) Both wages and hours contribute to negative skewness and high kurtosis for earnings changes, although hour-wage interactions are quantitatively more important. (iv) When considering household earnings and disposable household income, the deviations from normality are mitigated relative to individual labor earnings: changes in disposable household income are approximately symmetric and less leptokurtic. |
Keywords: | earnings dynamics; income shocks; insurance; wages; hours; higher-order earnings risk; skewness; kurtosis; machine learning |
JEL: | E24 H24 J24 J31 |
Date: | 2022–09–16 |
URL: | http://d.repec.org/n?u=RePEc:fip:fedlwp:94799&r= |
By: | Fariha Kamal; Jessica McCloskey; Wei Ouyang |
Abstract: | This paper describes the construction of two confidential crosswalk files enabling a comprehensive identification of multinational rms in the U.S. economy. The effort combines firm-level surveys on direct investment conducted by the U.S. Bureau of Economic Analysis (BEA) and the U.S. Census Bureau's Business Register (BR) spanning the universe of employer businesses from 1997 to 2017. First, the parent crosswalk links BEA firm-level surveys on U.S. direct investment abroad and the BR. Second, the affiliate crosswalk links BEA firm-level surveys on foreign direct investment in the United States and the BR. Using these newly available links, we distinguish between U.S.- and foreign-owned multinational firms and describe their prevalence and economic activities in the national economy, by sector, and by geography. |
Keywords: | multinational rms, records matching, machine learning |
JEL: | F10 F14 F23 |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:cen:wpaper:22-39&r= |
By: | D Barrera (UNIANDES - Universidad de los Andes [Bogota]); S Crépey (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistique et Modélisation - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique - UPCité - Université Paris Cité, UPCité - Université Paris Cité); E Gobet (CMAP - Centre de Mathématiques Appliquées - Ecole Polytechnique - X - École polytechnique - CNRS - Centre National de la Recherche Scientifique, X - École polytechnique, Université Paris-Saclay); Hoang-Dung Nguyen (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistique et Modélisation - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique - UPCité - Université Paris Cité, UPCité - Université Paris Cité, Natixis); B Saadeddine (UPS - Université d'Évry Paris-Saclay, Crédit Agricole) |
Abstract: | We propose a non-asymptotic convergence analysis of a two-step approach to learn a conditional value-at-risk (VaR) and expected shortfall (ES) in a nonparametric setting using Rademacher and Vapnik-Chervonenkis bounds. Our approach for the VaR is extended to the problem of learning at once multiple VaRs corresponding to different quantile levels. This results in efficient learning schemes based on neural network quantile and least-squares regressions. An a posteriori Monte Carlo (non-nested) procedure is introduced to estimate distances to the ground-truth VaR and ES without access to the latter. This is illustrated using numerical experiments in a Gaussian toy-model and a financial case-study where the objective is to learn a dynamic initial margin. |
Keywords: | value-at-risk,expected shortfall,quantile regression,quantile crossings,neural networks,62L20,62M45,91G60,91G70,2G32 |
Date: | 2022–09–13 |
URL: | http://d.repec.org/n?u=RePEc:hal:wpaper:hal-03775901&r= |
By: | Pierre Dubois (TSE - Toulouse School of Economics - UT1 - Université Toulouse 1 Capitole - Université Fédérale Toulouse Midi-Pyrénées - EHESS - École des hautes études en sciences sociales - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, UT1 - Université Toulouse 1 Capitole - Université Fédérale Toulouse Midi-Pyrénées); Rachel Griffith (Unknown); Martin O'Connell (Unknown) |
Abstract: | The adoption of barcode scanning technology in the 1970's gave rise to a new form of data; scanner data. Soon afterwards researchers began using this new resource, and since then a large number of papers have exploited scanner data. The data provide detailed price, quantity and product characteristic information for completely disaggregate products at high frequency and typically either track a panel of stores and/or consumers. Their availability has led to advances, inter alia, in the study of consumer demand, the measurement of market power, firms' strategic interactions and decision-making, the evaluation of policy reforms, and the measurement of price dispersion and in ation. In this article we highlight some of the pro and cons of this data source, and discuss some of the ways its availability to researchers hastransformed the economics literature. |
Keywords: | scanner data,demand estimation,market power,policy counterfactual,inflation |
Date: | 2022–08 |
URL: | http://d.repec.org/n?u=RePEc:hal:journl:hal-03770614&r= |