nep-big New Economics Papers
on Big Data
Issue of 2024‒05‒27
sixteen papers chosen by
Tom Coupé, University of Canterbury


  1. Unveiling the Impact of Macroeconomic Policies: A Double Machine Learning Approach to Analyzing Interest Rate Effects on Financial Markets By Anoop Kumar; Suresh Dodda; Navin Kamuni; Rajeev Kumar Arora
  2. Macroeconomic Forecasting Using Machine Learning: A Case of Slovakia By Ádám Csápai
  3. Deep Joint Learning valuation of Bermudan Swaptions By Francisco G\'omez Casanova; \'Alvaro Leitao; Fernando de Lope Contreras; Carlos V\'azquez
  4. Machine Learning Based Linkage of Company Data for Economic Research: Application to the EBDC Business Panels By Valentin Reich
  5. An economically-consistent discrete choice model with flexible utility specification based on artificial neural networks By Jose Ignacio Hernandez; Niek Mouter; Sander van Cranenburgh
  6. Synthetic controls with machine learning: application on the effect of labour deregulation on worker productivity in Brazil By Douglas Kiarelly Godoy de Araujo
  7. The crisis effect in TPB as a moderator for post-pandemic entrepreneurial intentions among higher education students: PLS-SEM and ANN Approach By Chahal, Jyoti; Dagar, Vishal; Dagher, Leila; Rao, Amar; Ntom Udemba, Edmund
  8. Enhancing path-integral approximation for non-linear diffusion with neural network By Anna Knezevic
  9. An End-to-End Structure with Novel Position Mechanism and Improved EMD for Stock Forecasting By Chufeng Li; Jianyong Chen
  10. On the testability of common trends in panel data without placebo periods By Martin Huber
  11. Patterns in Reported Adaptation Constraints: Insights from Peer-Reviewed Literature on Flood and Sea-Level Rise By Gil-Clavel, Sofia; Wagenblast, Thorid; Akkerman, Joos; Filatova, Tatiana
  12. Using Post-Regularization Distribution Regression to Measure the Effects of a Minimum Wage on Hourly Wages, Hours Worked and Monthly Earnings By Biewen, Martin; Erhardt, Pascal
  13. Automated Social Science: Language Models as Scientist and Subjects By Benjamin S. Manning; Kehang Zhu; John J. Horton
  14. Internet sentiment exacerbates intraday overtrading, evidence from A-Share market By Peng Yifeng
  15. War Causes Religiosity: Gravestone Evidence from the Vietnam Draft Lottery By Mill, Wladislaw; Ebert, Tobias; Berkessel, Jana; Jonsson, Thorsteinn; Lehmann, Sune; Gebauer, Jochen
  16. Long-term forecasts of statewide travel demand patterns using large-scale mobile phone GPS data: A case study of Indiana By Rajat Verma; Eunhan Ka; Satish V. Ukkusuri

  1. By: Anoop Kumar; Suresh Dodda; Navin Kamuni; Rajeev Kumar Arora
    Abstract: This study examines the effects of macroeconomic policies on financial markets using a novel approach that combines Machine Learning (ML) techniques and causal inference. It focuses on the effect of interest rate changes made by the US Federal Reserve System (FRS) on the returns of fixed income and equity funds between January 1986 and December 2021. The analysis makes a distinction between actively and passively managed funds, hypothesizing that the latter are less susceptible to changes in interest rates. The study contrasts gradient boosting and linear regression models using the Double Machine Learning (DML) framework, which supports a variety of statistical learning techniques. Results indicate that gradient boosting is a useful tool for predicting fund returns; for example, a 1% increase in interest rates causes an actively managed fund's return to decrease by -11.97%. This understanding of the relationship between interest rates and fund performance provides opportunities for additional research and insightful, data-driven advice for fund managers and investors
    Date: 2024–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2404.07225&r=big
  2. By: Ádám Csápai (University of Economics in Bratislava)
    Abstract: We assess the forecasting performance of the selected machine learning methods. According to previous research, they can enhance short-term forecasting performance. We forecast industrial production, inflation and unemployment in Slovakia. We compare the forecasting performance of the models using the mean absolute error and root-mean-squared error. We forecast the variables using ensemble machine learning techniques, such as random forest, bagging and boosting. Additionally, we explore regularized least squares models, such as ridge regression, lasso regression, and elastic net models. Finally, we examine the forecasting performance of neural networks and compare the mean and trimmed mean of model forecasts with individual model performance. Our findings affirm that these methods can enhance forecast accuracy of short-term forecasts, particularly when a relatively large dataset is available. Individual machine learning models prove themselves to be even more accurate than the averages of model forecasts.
    Keywords: Economic forecasting, Slovakia, Ensemble machine learning, Regularized least squares, Neural networks
    JEL: C53 E37 E27
    URL: http://d.repec.org/n?u=RePEc:sek:iefpro:14115967&r=big
  3. By: Francisco G\'omez Casanova; \'Alvaro Leitao; Fernando de Lope Contreras; Carlos V\'azquez
    Abstract: This paper addresses the problem of pricing involved financial derivatives by means of advanced of deep learning techniques. More precisely, we smartly combine several sophisticated neural network-based concepts like differential machine learning, Monte Carlo simulation-like training samples and joint learning to come up with an efficient numerical solution. The application of the latter development represents a novelty in the context of computational finance. We also propose a novel design of interdependent neural networks to price early-exercise products, in this case, Bermudan swaptions. The improvements in efficiency and accuracy provided by the here proposed approach is widely illustrated throughout a range of numerical experiments. Moreover, this novel methodology can be extended to the pricing of other financial derivatives.
    Date: 2024–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2404.11257&r=big
  4. By: Valentin Reich
    Abstract: This article presents a comprehensive approach to probabilistic linkage of German com pany data using Machine Learning and Natural Language Processing techniques. Here, the long-running ifo Institute surveys are linked to fnancial information in the Orbis database by addressing the unique challenges of company data linkage, such as corporate structures and linguistic nuances in company names. Compared to a previous linkage, the approach achieves improved match rates and is able to re-evaluate existing matches. This article contributes best practice advice for company data linkage and serves as a documentation for the resulting research dataset. ifo Working Papers
    Keywords: record linkage, company data, orbis, survey data
    JEL: C81 C88
    Date: 2024
    URL: http://d.repec.org/n?u=RePEc:ces:ifowps:_409&r=big
  5. By: Jose Ignacio Hernandez; Niek Mouter; Sander van Cranenburgh
    Abstract: Random utility maximisation (RUM) models are one of the cornerstones of discrete choice modelling. However, specifying the utility function of RUM models is not straightforward and has a considerable impact on the resulting interpretable outcomes and welfare measures. In this paper, we propose a new discrete choice model based on artificial neural networks (ANNs) named "Alternative-Specific and Shared weights Neural Network (ASS-NN)", which provides a further balance between flexible utility approximation from the data and consistency with two assumptions: RUM theory and fungibility of money (i.e., "one euro is one euro"). Therefore, the ASS-NN can derive economically-consistent outcomes, such as marginal utilities or willingness to pay, without explicitly specifying the utility functional form. Using a Monte Carlo experiment and empirical data from the Swissmetro dataset, we show that ASS-NN outperforms (in terms of goodness of fit) conventional multinomial logit (MNL) models under different utility specifications. Furthermore, we show how the ASS-NN is used to derive marginal utilities and willingness to pay measures.
    Date: 2024–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2404.13198&r=big
  6. By: Douglas Kiarelly Godoy de Araujo
    Abstract: Synthetic control methods are a data-driven way to calculate counterfactuals from control individuals for the estimation of treatment effects in many settings of empirical importance. In canonical implementations, this weighting is linear and the key methodological steps of donor pool selection and covariate comparison between the treated entity and its synthetic control depend on some degree of subjective judgment. Thus current methods may not perform best in settings with large datasets or when the best synthetic control is obtained by a nonlinear combination of donor pool individuals. This paper proposes "machine controls", synthetic controls based on automated donor pool selection through clustering algorithms, supervised learning for flexible non-linear weighting of control entities and manifold learning to confirm numerically whether the synthetic control indeed resembles the target unit. The machine controls method is demonstrated with the effect of the 2017 labour deregulation on worker productivity in Brazil. Contrary to policymaker expectations at the time of enactment of the reform, there is no discernible effect on worker productivity. This result points to the deep challenges in increasing the level of productivity, and with it, economic welfare.
    Keywords: causal inference, synthetic controls, machine learning, labour reforms, productivity
    JEL: B41 C32 C54 E24 J50 J83 O47
    Date: 2024–04
    URL: http://d.repec.org/n?u=RePEc:bis:biswps:1181&r=big
  7. By: Chahal, Jyoti; Dagar, Vishal; Dagher, Leila; Rao, Amar; Ntom Udemba, Edmund
    Abstract: This research examines college students' entrepreneurial inclinations using TPB, self-efficacy, and the crisis effect. It also examines the crisis effect's moderating influence post-pandemic. A unique analytical technique using Structural Equation Modeling (SEM) and Artificial Neural Network (ANN) was used to evaluate the model's resilience. 310 Indian university students were surveyed online. Self-efficacy is a crucial predictor of entrepreneurial tendencies among higher education students. ANN analysis confirms SEM findings that self-efficacy and perceived behavior control shape entrepreneurial desires. Despite its negative impact, the crisis effect doesn't appear to affect entrepreneurs' objectives. The crisis impact moderates all exogenous and endogenous factors except subjective norms and entrepreneurial goals, the research finds. The research also shows that students' education and geography affect their entrepreneurial inclinations. Gender, however, has little control. Policymakers and higher education administrators could boost entrepreneurial ambitions by fostering students' self-efficacy and perceived behavior control. Understanding these elements allows higher education stakeholders to create targeted interventions and support systems to foster college student entrepreneurship.
    Keywords: Entrepreneurial Intentions; Crisis-Effect; Self-efficacy; Artificial Neural Network (ANN); PLS-SEM; Post-Pandemic
    JEL: A22
    Date: 2024
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:120706&r=big
  8. By: Anna Knezevic
    Abstract: Enhancing the existing solution for pricing of fixed income instruments within Black-Karasinski model structure, with neural network at various parameterisation points to demonstrate that the method is able to achieve superior outcomes for multiple calibrations across extended projection horizons.
    Date: 2024–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2404.08903&r=big
  9. By: Chufeng Li; Jianyong Chen
    Abstract: As a branch of time series forecasting, stock movement forecasting is one of the challenging problems for investors and researchers. Since Transformer was introduced to analyze financial data, many researchers have dedicated themselves to forecasting stock movement using Transformer or attention mechanisms. However, existing research mostly focuses on individual stock information but ignores stock market information and high noise in stock data. In this paper, we propose a novel method using the attention mechanism in which both stock market information and individual stock information are considered. Meanwhile, we propose a novel EMD-based algorithm for reducing short-term noise in stock data. Two randomly selected exchange-traded funds (ETFs) spanning over ten years from US stock markets are used to demonstrate the superior performance of the proposed attention-based method. The experimental analysis demonstrates that the proposed attention-based method significantly outperforms other state-of-the-art baselines. Code is available at https://github.com/DurandalLee/ACEFormer .
    Date: 2024–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2404.07969&r=big
  10. By: Martin Huber
    Abstract: We demonstrate and discuss the testability of the common trend assumption imposed in Difference-in-Differences (DiD) estimation in panel data when not relying on multiple pre-treatment periods for running placebo tests. Our testing approach involves two steps: (i) constructing a control group of non-treated units whose pre-treatment outcome distribution matches that of treated units, and (ii) verifying if this control group and the original non-treated group share the same time trend in average outcomes. Testing is motivated by the fact that in several (but not all) panel data models, a common trend violation across treatment groups implies and is implied by a common trend violation across pre-treatment outcomes. For this reason, the test verifies a sufficient, but (depending on the model) not necessary condition for DiD-based identification. We investigate the finite sample performance of a testing procedure that is based on double machine learning, which permits controlling for covariates in a data-driven manner, in a simulation study and also apply it to labor market data from the National Supported Work Demonstration.
    Date: 2024–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2404.16961&r=big
  11. By: Gil-Clavel, Sofia (Max Planck Institute for Demographic Research); Wagenblast, Thorid; Akkerman, Joos (Delft University of Technology); Filatova, Tatiana
    Abstract: Understanding which climate change adaptation constraints manifest for different actors – governments, communities, individuals and households – is essential, as adaptation is turning into a matter of survival. Though rich qualitative research reveals constraints for diverse cases, methods to consolidate knowledge and elicit patterns in adaptation constraints for various actors and hazards are scarce. We fill this gap by analyzing associations between different adaptations and actors’ constraints in adaptation to climate-induced floods and sea-level rise. Our novel approach derives textual data from peer-reviewed articles (published before February 2024) by using natural language processing, supervised learning, thematic coding books, and network analysis. Results show that social capital, economic factors, and government support are constraints shared among all actors. With respect to adaptation types, communities are frequently associated with maladaptation, while individuals and households are frequently associated with transformational adaptation.
    Date: 2024–04–26
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:3cqvn&r=big
  12. By: Biewen, Martin (University of Tuebingen); Erhardt, Pascal (University of Tübingen)
    Abstract: We evaluate the distributional effects of a minimum wage introduction based on a data set with a moderate sample size but a large number of potential covariates. Therefore, the selection of relevant control variables at each distributional threshold is crucial to test hypotheses about the impact of the treatment. To this end, we use the post-double selection logistic distribution regression approach proposed by Belloni et al. (2018a), which allows for uniformly valid inference about the target coefficients of our low-dimensional treatment variables across the entire outcome distribution. Our empirical results show that the minimum wage crowded out hourly wages below the minimum threshold, benefitted monthly wages in the lower middle but not the lowest part of the distribution, and did not significantly affect the distribution of hours worked.
    Keywords: wage structure, automatic specification search, double machine learning
    JEL: J31 C3
    Date: 2024–03
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp16894&r=big
  13. By: Benjamin S. Manning; Kehang Zhu; John J. Horton
    Abstract: We present an approach for automatically generating and testing, in silico, social scientific hypotheses. This automation is made possible by recent advances in large language models (LLM), but the key feature of the approach is the use of structural causal models. Structural causal models provide a language to state hypotheses, a blueprint for constructing LLM-based agents, an experimental design, and a plan for data analysis. The fitted structural causal model becomes an object available for prediction or the planning of follow-on experiments. We demonstrate the approach with several scenarios: a negotiation, a bail hearing, a job interview, and an auction. In each case, causal relationships are both proposed and tested by the system, finding evidence for some and not others. We provide evidence that the insights from these simulations of social interactions are not available to the LLM purely through direct elicitation. When given its proposed structural causal model for each scenario, the LLM is good at predicting the signs of estimated effects, but it cannot reliably predict the magnitudes of those estimates. In the auction experiment, the in silico simulation results closely match the predictions of auction theory, but elicited predictions of the clearing prices from the LLM are inaccurate. However, the LLM's predictions are dramatically improved if the model can condition on the fitted structural causal model. In short, the LLM knows more than it can (immediately) tell.
    JEL: D0 D9
    Date: 2024–04
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:32381&r=big
  14. By: Peng Yifeng
    Abstract: Market fluctuations caused by overtrading are important components of systemic market risk. This study examines the effect of investor sentiment on intraday overtrading activities in the Chinese A-share market. Employing high-frequency sentiment indices inferred from social media posts on the Eastmoney forum Guba, the research focuses on constituents of the CSI 300 and CSI 500 indices over a period from 01/01/2018, to 12/30/2022. The empirical analysis indicates that investor sentiment exerts a significantly positive impact on intraday overtrading, with the influence being more pronounced among institutional investors relative to individual traders. Moreover, sentiment-driven overtrading is found to be more prevalent during bull markets as opposed to bear markets. Additionally, the effect of sentiment on overtrading is observed to be more pronounced among individual investors in large-cap stocks compared to small- and mid-cap stocks.
    Date: 2024–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2404.12001&r=big
  15. By: Mill, Wladislaw; Ebert, Tobias; Berkessel, Jana; Jonsson, Thorsteinn; Lehmann, Sune; Gebauer, Jochen
    Abstract: Does war make people more religious? Answers to this classic question are dominated by the lack of causality. We exploit the Vietnam Draft Lottery -- a natural experiment that drafted male U.S. citizens into military service during the Vietnam War -- to conclusively show that war increases religiosity. We measure religiosity via religious imagery on web-scraped photographs of hundreds of thousands of gravestones of deceased U.S. Americans using a tailor-made convolutional neural network. Our analysis provides compelling and robust evidence that war indeed increases religiosity: people who were randomly drafted into war are at least 20 % more likely to have religious gravestones. This effect sets in almost immediately, persists even after 50 years, and generalizes across space and societal strata.
    Date: 2024–04–17
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:9se4r&r=big
  16. By: Rajat Verma; Eunhan Ka; Satish V. Ukkusuri
    Abstract: The growth in availability of large-scale GPS mobility data from mobile devices has the potential to aid traditional travel demand models (TDMs) such as the four-step planning model, but those processing methods are not commonly used in practice. In this study, we show the application of trip generation and trip distribution modeling using GPS data from smartphones in the state of Indiana. This involves extracting trip segments from the data and inferring the phone users' home locations, adjusting for data representativeness, and using a data-driven travel time-based cost function for the trip distribution model. The trip generation and interchange patterns in the state are modeled for 2025, 2035, and 2045. Employment sectors like industry and retail are observed to influence trip making behavior more than other sectors. The travel growth is predicted to be mostly concentrated in the suburban regions, with a small decline in the urban cores. Further, although the majority of the growth in trip flows over the years is expected to come from the corridors between the major urban centers of the state, relative interzonal trip flow growth will likely be uniformly spread throughout the state. We also validate our results with the forecasts of two travel demand models, finding a difference of 5-15% in overall trip counts. Our GPS data-based demand model will contribute towards augmenting the conventional statewide travel demand model developed by the state and regional planning agencies.
    Date: 2024–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2404.13211&r=big

This nep-big issue is ©2024 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.