nep-big New Economics Papers
on Big Data
Issue of 2022‒07‒18
25 papers chosen by
Tom Coupé
University of Canterbury

  1. Identifying Politically Connected Firms: A Machine Learning Approach By Deni Mazrekaj; Vitezslav Titl; Fritz Schiltz
  2. Impossibility of Collective Intelligence By Krikamol Muandet
  3. Debiased Machine Learning without Sample-Splitting for Stable Estimators By Qizhao Chen; Vasilis Syrgkanis; Morgane Austern
  4. Deep Learning the Efficient Frontier of Convex Vector Optimization Problems By Zachary Feinstein; Birgit Rudloff
  5. Urban economics in a historical perspective: Recovering data with machine learning By Pierre-Philippe Combes; Laurent Gobillon; Yanos Zylberberg
  6. The Fairness of Machine Learning in Insurance: New Rags for an Old Man? By Laurence Barry; Arthur Charpentier
  7. Human Wellbeing and Machine Learning By Ekaterina Oparina; Caspar Kaiser; Niccol\`o Gentile; Alexandre Tkatchenko; Andrew E. Clark; Jan-Emmanuel De Neve; Conchita D'Ambrosio
  8. Debiased Semiparametric U-Statistics: Machine Learning Inference on Inequality of Opportunity By Juan Carlos Escanciano; Jo\"el Robert Terschuur
  9. Targeting Impact versus Deprivation By Johannes Haushofer; Paul Niehaus; Carlos Paramo; Edward Miguel; Michael W. Walker
  10. Artificial Intelligence and Firm-level Productivity By Dirk Czarnitzki; Gastón P Fernández; Christian Rammer
  11. Using Text Data to Improve Industrial Statistics in the UK By Alex Bishop; Juan Mateos-Garcia; George Richardson
  12. RMT-Net: Reject-aware Multi-Task Network for Modeling Missing-not-at-random Data in Financial Credit Scoring By Qiang Liu; Yingtao Luo; Shu Wu; Zhen Zhang; Xiangnan Yue; Hong Jin; Liang Wang
  13. Spline-rule ensemble classifiers with structured sparsity regularization for interpretable customer churn modeling By Koen W. de Bock; Arno de Caigny
  14. Cheap Talk in Corporate Climate Commitments: The effectiveness of climate initiatives By Julia Anna Bingler; Mathias Kraus; Markus Leippold; Nicolas Webersinke
  15. Public sentiment towards economic sanctions in the Russia-Ukraine war By Vu M. Ngo; Toan L.D. Huynh; Phuc V. Nguyen; Huan H. Nguyen
  16. "Density forecasts of inflation using Gaussian process regression models". By Petar Soric; Enric Monte; Salvador Torra; Oscar Claveria
  17. Stock Trading Optimization through Model-based Reinforcement Learning with Resistance Support Relative Strength By Huifang Huang; Ting Gao; Yi Gui; Jin Guo; Peng Zhang
  18. Payday loans -- blessing or growth suppressor? Machine Learning Analysis By Rohith Mahadevan; Sam Richard; Kishore Harshan Kumar; Jeevitha Murugan; Santhosh Kannan; Saaisri; Tarun; Raja CSP Raman
  19. Predicting Day-Ahead Stock Returns using Search Engine Query Volumes: An Application of Gradient Boosted Decision Trees to the S&P 100 By Christopher Bockel-Rickermann
  20. Quality Innovation, Cost Innovation, Export, and Firm Productivity Evolution: Evidence from the Chinese Electronics Industry By Liu, Mengxiao; Wang, Luhang; Yi, Yimin
  21. Climate Change and Measures of Economic Growth:Solving the Spatial Mismatch Problem By Devina Lakhtakia; Ross McKitrick
  22. Convergence across Subnational Regions of Bangladesh – What the Night Lights Data Say? By Syed Abul, Basher; Jobaida, Behtarin; Salim, Rashid
  23. Roads illuminate development: Using nightlight luminosity to assess the impact of transport infrastructure By Bolivar, Osmar
  24. Biased reporting by the German media? By Löw, Franziska
  25. Ethical Decision Making for Artificial Intelligence: a Social Choice Approach By Federico Fioravanti; Iyad Rahwan; Fernando Abel Tohm\'e

  1. By: Deni Mazrekaj; Vitezslav Titl; Fritz Schiltz
    Abstract: This article introduces machine learning techniques to identify politically connected firms. By assembling information from publicly available sources and the Orbis company database, we constructed a novel firm population dataset from Czechia in which various forms of political connections can be determined. The data about firms’ connections are unique and comprehensive. They include political donations by the firm, having members of managerial boards who donated to a political party, and having members of boards who ran for political office. The results indicate that over 85% of firms with political connections can be accurately identified by the proposed algorithms. The model obtains this high accuracy by using only firm-level financial and industry indicators that are widely available in most countries. We propose that machine learning algorithms should be used by public institutions to identify politically connected firms with potentially large conflicts of interests, and we provide easy to implement R code to replicate our results.
    Keywords: Political Connections, Corruption, Prediction, Machine Learning
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:use:tkiwps:2110&r=
  2. By: Krikamol Muandet
    Abstract: Democratization of AI involves training and deploying machine learning models across heterogeneous and potentially massive environments. Diversity of data opens up a number of possibilities to advance AI systems, but also introduces pressing concerns such as privacy, security, and equity that require special attention. This work shows that it is theoretically impossible to design a rational learning algorithm that has the ability to successfully learn across heterogeneous environments, which we decoratively call collective intelligence (CI). By representing learning algorithms as choice correspondences over a hypothesis space, we are able to axiomatize them with essential properties. Unfortunately, the only feasible algorithm compatible with all of the axioms is the standard empirical risk minimization (ERM) which learns arbitrarily from a single environment. Our impossibility result reveals informational incomparability between environments as one of the foremost obstacles for researchers who design novel algorithms that learn from multiple environments, which sheds light on prerequisites for success in critical areas of machine learning such as out-of-distribution generalization, federated learning, algorithmic fairness, and multi-modal learning.
    Date: 2022–06
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2206.02786&r=
  3. By: Qizhao Chen; Vasilis Syrgkanis; Morgane Austern
    Abstract: Estimation and inference on causal parameters is typically reduced to a generalized method of moments problem, which involves auxiliary functions that correspond to solutions to a regression or classification problem. Recent line of work on debiased machine learning shows how one can use generic machine learning estimators for these auxiliary problems, while maintaining asymptotic normality and root-$n$ consistency of the target parameter of interest, while only requiring mean-squared-error guarantees from the auxiliary estimation algorithms. The literature typically requires that these auxiliary problems are fitted on a separate sample or in a cross-fitting manner. We show that when these auxiliary estimation algorithms satisfy natural leave-one-out stability properties, then sample splitting is not required. This allows for sample re-use, which can be beneficial in moderately sized sample regimes. For instance, we show that the stability properties that we propose are satisfied for ensemble bagged estimators, built via sub-sampling without replacement, a popular technique in machine learning practice.
    Date: 2022–06
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2206.01825&r=
  4. By: Zachary Feinstein; Birgit Rudloff
    Abstract: In this paper, we design a neural network architecture to approximate the weakly efficient frontier of convex vector optimization problems satisfying Slater's condition. The proposed machine learning methodology provides both an inner and outer approximation of the weakly efficient frontier, as well as an upper bound to the error at each approximated efficient point. In numerical case studies we demonstrate that the proposed algorithm is effectively able to approximate the true weakly efficient frontier of convex vector optimization problems. This remains true even for large problems (i.e., many objectives, variables, and constraints) and thus overcoming the curse of dimensionality.
    Date: 2022–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2205.07077&r=
  5. By: Pierre-Philippe Combes (Institut d'Études Politiques [IEP] - Paris, CNRS - Centre National de la Recherche Scientifique); Laurent Gobillon (PSE - Paris School of Economics - ENPC - École des Ponts ParisTech - ENS-PSL - École normale supérieure - Paris - PSL - Université Paris sciences et lettres - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique - EHESS - École des hautes études en sciences sociales - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, PJSE - Paris Jourdan Sciences Economiques - UP1 - Université Paris 1 Panthéon-Sorbonne - ENS-PSL - École normale supérieure - Paris - PSL - Université Paris sciences et lettres - EHESS - École des hautes études en sciences sociales - ENPC - École des Ponts ParisTech - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement); Yanos Zylberberg (University of Bristol [Bristol])
    Abstract: A recent literature has used a historical perspective to better understand fundamental questions of urban economics. However, a wide range of historical documents of exceptional quality remain underutilised: their use has been hampered by their original format or by the massive amount of information to be recovered. In this paper, we describe how and when the flexibility and predictive power of machine learning can help researchers exploit the potential of these historical documents. We first discuss how important questions of urban economics rely on the analysis of historical data sources and the challenges associated with transcription and harmonisation of such data. We then explain how machine learning approaches may address some of these challenges and we discuss possible applications.
    Keywords: Machine learning,History,Urban economics
    Date: 2021–05
    URL: http://d.repec.org/n?u=RePEc:hal:wpspec:halshs-03231786&r=
  6. By: Laurence Barry; Arthur Charpentier
    Abstract: Since the beginning of their history, insurers have been known to use data to classify and price risks. As such, they were confronted early on with the problem of fairness and discrimination associated with data. This issue is becoming increasingly important with access to more granular and behavioural data, and is evolving to reflect current technologies and societal concerns. By looking into earlier debates on discrimination, we show that some algorithmic biases are a renewed version of older ones, while others show a reversal of the previous order. Paradoxically, while the insurance practice has not deeply changed nor are most of these biases new, the machine learning era still deeply shakes the conception of insurance fairness.
    Date: 2022–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2205.08112&r=
  7. By: Ekaterina Oparina; Caspar Kaiser; Niccol\`o Gentile; Alexandre Tkatchenko; Andrew E. Clark; Jan-Emmanuel De Neve; Conchita D'Ambrosio
    Abstract: There is a vast literature on the determinants of subjective wellbeing. International organisations and statistical offices are now collecting such survey data at scale. However, standard regression models explain surprisingly little of the variation in wellbeing, limiting our ability to predict it. In response, we here assess the potential of Machine Learning (ML) to help us better understand wellbeing. We analyse wellbeing data on over a million respondents from Germany, the UK, and the United States. In terms of predictive power, our ML approaches do perform better than traditional models. Although the size of the improvement is small in absolute terms, it turns out to be substantial when compared to that of key variables like health. We moreover find that drastically expanding the set of explanatory variables doubles the predictive power of both OLS and the ML approaches on unseen data. The variables identified as important by our ML algorithms - $i.e.$ material conditions, health, and meaningful social relations - are similar to those that have already been identified in the literature. In that sense, our data-driven ML results validate the findings from conventional approaches.
    Date: 2022–06
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2206.00574&r=
  8. By: Juan Carlos Escanciano; Jo\"el Robert Terschuur
    Abstract: We construct locally robust/orthogonal moments in a semiparametric U-statistics setting. These are quadratic moments in the distribution of the data with a zero derivative with respect to first steps at their limit, which reduces model selection bias with machine learning first steps. We use orthogonal moments to propose new debiased estimators and valid inferences in a variety of applications ranging from Inequality of Opportunity (IOp) to distributional treatment effects. U-statistics with machine learning first steps arise naturally in these and many other applications. A leading example in IOp is the Gini coefficient of machine learning fitted values. We introduce a novel U-moment representation of the First Step Influence Function (U-FSIF) to take into account the effect of the first step estimation on an identifying quadratic moment. Adding the U-FISF to the identifying quadratic moment gives rise to an orthogonal quadratic moment. Our leading and motivational application is to measuring IOp, for which we propose a simple debiased estimator, and the first available inferential methods. We give general and simple regularity conditions for asymptotic theory, and demonstrate an improved finite sample performance in simulations for our debiased measures of IOp. In an empirical application, we find that standard measures of IOp are about six times more sensitive to first step machine learners than our debiased measures, and that between $42\%$ and $46\%$ of income inequality in Spain is explained by circumstances out of the control of the individual.
    Date: 2022–06
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2206.05235&r=
  9. By: Johannes Haushofer; Paul Niehaus; Carlos Paramo; Edward Miguel; Michael W. Walker
    Abstract: Targeting is a core element of anti-poverty program design, with benefits typically targeted to those most “deprived” in some sense (e.g., consumption, wealth). A large literature in economics examines how to best identify these households feasibly at scale, usually via proxy means tests (PMTs). We ask a different question, namely, whether targeting the most deprived has the greatest social welfare benefit: in particular, are the most deprived those with the largest treatment effects or do the “poorest of the poor” sometimes lack the circumstances and complementary inputs or skills to take full advantage of assistance? We explore this potential trade-off in the context of an NGO cash transfer program in Kenya, utilizing recent advances in machine learning (ML) methods (specifically, generalized random forests) to learn PMTs that target both a) deprivation and b) high conditional average treatment effects across several policy-relevant outcomes. We find that targeting solely on the basis of deprivation is generally not attractive in a social welfare sense, even when the social planner's preferences are highly redistributive. We show that a planner using simpler prediction models, based on OLS or less sophisticated ML approaches, could reach divergent conclusions. We discuss implications for the design of real-world anti-poverty programs at scale.
    JEL: C49 H31 O11
    Date: 2022–06
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:30138&r=
  10. By: Dirk Czarnitzki; Gastón P Fernández; Christian Rammer
    Abstract: Artificial Intelligence (AI) is often regarded as the next general-purpose technology with a rapid, penetrating, and far-reaching use over a broad number of industrial sectors. A main feature of new general-purpose technology is to enable new ways of production that may increase productivity. So far, however, only very few studies investigated likely productivity effects of AI at the firm-level; presumably because of lacking data. We exploit unique survey data on firms’ adoption of AI technology and estimate its productivity effects with a sample of German firms. We employ both a cross-sectional dataset and a panel database. To address the potential endogeneity of AI adoption, we also implement IV estimators. We find positive and significant effects of the use of AI on firm productivity. This finding holds for different measures of AI usage, i.e., an indicator variable of AI adoption, and the intensity with which firms use AI methods in their business processes.
    Keywords: Artificial Intelligence, Productivity, CIS data
    Date: 2022–02–17
    URL: http://d.repec.org/n?u=RePEc:ete:msiper:690486&r=
  11. By: Alex Bishop; Juan Mateos-Garcia; George Richardson
    Abstract: We use business website data to explore the limitations of the Standard Industrial Classification taxonomy and develop a prototype for a bottom-up industrial taxonomy based on semantic similarities between company descriptions. This prototype makes it possible to decompose uninformative SIC codes into granular industries, build user-driven industry groups which might be of interest to policymakers (e.g. 'green economy') and build indices of local economic composition that are more strongly associated with local economic performance than those based on the SIC taxonomy. We consider potential avenues to combine official and bottom-up taxonomies in order to improve our understanding the economy and inform economic policy.
    Keywords: emerging industries, industrial policy, industrial taxonomy, machine learning, web data
    JEL: C81 L52 R12
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:nsr:escoed:escoe-dp-2022-01&r=
  12. By: Qiang Liu; Yingtao Luo; Shu Wu; Zhen Zhang; Xiangnan Yue; Hong Jin; Liang Wang
    Abstract: In financial credit scoring, loan applications may be approved or rejected. We can only observe default/non-default labels for approved samples but have no observations for rejected samples, which leads to missing-not-at-random selection bias. Machine learning models trained on such biased data are inevitably unreliable. In this work, we find that the default/non-default classification task and the rejection/approval classification task are highly correlated, according to both real-world data study and theoretical analysis. Consequently, the learning of default/non-default can benefit from rejection/approval. Accordingly, we for the first time propose to model the biased credit scoring data with Multi-Task Learning (MTL). Specifically, we propose a novel Reject-aware Multi-Task Network (RMT-Net), which learns the task weights that control the information sharing from the rejection/approval task to the default/non-default task by a gating network based on rejection probabilities. RMT-Net leverages the relation between the two tasks that the larger the rejection probability, the more the default/non-default task needs to learn from the rejection/approval task. Furthermore, we extend RMT-Net to RMT-Net++ for modeling scenarios with multiple rejection/approval strategies. Extensive experiments are conducted on several datasets, and strongly verifies the effectiveness of RMT-Net on both approved and rejected samples. In addition, RMT-Net++ further improves RMT-Net's performances.
    Date: 2022–06
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2206.00568&r=
  13. By: Koen W. de Bock (Audencia Business School); Arno de Caigny (LEM - Lille économie management - UMR 9221 - UA - Université d'Artois - UCL - Université catholique de Lille - Université de Lille - CNRS - Centre National de la Recherche Scientifique)
    Abstract: An important business domain that relies heavily on advanced statistical- and machine learning algorithms to support operational decision-making is customer retention management. Customer churn prediction is a crucial tool to support customer retention. It allows an early identification of customers who are at risk to abandon the company and provides the ability to gain insights into why customers are at risk. Hence, customer churn prediction models should complement predictive performance with model insights. Inspired by their ability to reconcile strong predictive performance and interpretability, this study introduces rule ensembles and their extension, spline-rule ensembles, as a promising family of classification algorithms to the customer churn prediction domain. Spline-rule ensembles combine the flexibility of a tree-based ensemble classifier with the simplicity of regression analysis. They do, however, neglect the relatedness between potentially conflicting model components which can introduce unnecessary complexity in the models and compromises model interpretability. To tackle this issue, a novel algorithmic extension, spline-rule ensembles with sparse group lasso regularization (SRE-SGL) is proposed to enhance interpretability through structured regularization. Experiments on fourteen real-world customer churn data sets in different industries (i) demonstrate the superior predictive performance of spline-rule ensembles with sparse group lasso over a set well yet powerful benchmark methods in terms of AUC and top decile lift; (ii) show that spline-rule ensembles with sparse group lasso regularization significantly outperform conventional rule ensembles whilst performing at least as well as conventional spline-rule ensembles; and (iii) illustrate the interpretable nature of a spline-rule ensemble model and the advantage of structured regularization in SRE-SGL by means of a case study on customer churn prediction for a telecommunications company.
    Keywords: Customer churn prediction,Predictive analytics,Spline-rule ensemble,Interpretable data science,Sparse group lasso,Regularized regression
    Date: 2021–11
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-03391564&r=
  14. By: Julia Anna Bingler (Council on Economic Policies (CEP); ETH Zürich - CER-ETH - Center of Economic Research at ETH Zurich); Mathias Kraus (University of Erlangen-Nuremberg-Friedrich Alexander Universität Erlangen Nürnberg); Markus Leippold (University of Zurich; Swiss Finance Institute); Nicolas Webersinke (riedrich-Alexander-Universität Erlangen-Nürnberg)
    Abstract: Corporate climate disclosures are considered an essential prerequisite to managing climate-related financial risks. At the same time, current disclosures are imprecise, inaccurate, and greenwashing-prone. We introduce a deep learning approach to enable comprehensive climate disclosure analyses by fine-tuning the \climatebert model. From of 14,584 annual reports of the MSCI World index firms from 2010 to 2020, we extract the amount of cheap talk, defined as the share of precise versus imprecise climate commitments. We then test various hypotheses on the drivers of cheap talk. In particular, we ask whether climate initiatives discipline companies in the way they define and disclose actionable climate commitments in their annual reports.
    Keywords: Corporate climate disclosures, voluntary reporting, commitments, TCFD recommendations, textual analysis, natural language processing
    JEL: G2 G38 C8 M48
    Date: 2022–06
    URL: http://d.repec.org/n?u=RePEc:chf:rpseri:rp2254&r=
  15. By: Vu M. Ngo; Toan L.D. Huynh; Phuc V. Nguyen; Huan H. Nguyen
    Abstract: This paper introduces novel data on public sentiment towards economic sanctions based on nearly one million social media posts in 109 countries during the Russia-Ukraine war by using machine learning. We show the geographical heterogeneity between government stances and public sentiment. Finally, political regimes, trading relationships, and political instability could predict how people perceived this inhumane war.
    Keywords: democracy,public sentiment,Russia-Ukraine
    JEL: F51 H77
    Date: 2022
    URL: http://d.repec.org/n?u=RePEc:zbw:glodps:1108&r=
  16. By: Petar Soric (Faculty of Economics & Business University of Zagreb.); Enric Monte (Department of Signal Theory and Communications, Polytechnic University of Catalunya (UPC).); Salvador Torra (Riskcenter–IREA, University of Barcelona (UB).); Oscar Claveria (AQR–IREA, University of Barcelona (UB).)
    Abstract: The present study uses Gaussian Process regression models for generating density forecasts of inflation within the New Keynesian Phillips curve (NKPC) framework. The NKPC is a structural model of inflation dynamics in which we include the output gap, inflation expectations, fuel world prices and money market interest rates as predictors. We estimate country-specific time series models for the 19 Euro Area (EA) countries. As opposed to other machine learning models, Gaussian Process regression allows estimating confidence intervals for the predictions. The performance of the proposed model is assessed in a one-step-ahead forecasting exercise. The results obtained point out the recent inflationary pressures and show the potential of Gaussian Process regression for forecasting purposes.
    Keywords: Machine learning, Gaussian process regression, Time-series analysis, Economic forecasting, Inflation, New Keynesian Phillips curve. JEL classification: C45, C51, C53, E31.
    Date: 2022–07
    URL: http://d.repec.org/n?u=RePEc:ira:wpaper:202210&r=
  17. By: Huifang Huang; Ting Gao; Yi Gui; Jin Guo; Peng Zhang
    Abstract: Reinforcement learning (RL) is gaining attention by more and more researchers in quantitative finance as the agent-environment interaction framework is aligned with decision making process in many business problems. Most of the current financial applications using RL algorithms are based on model-free method, which still faces stability and adaptivity challenges. As lots of cutting-edge model-based reinforcement learning (MBRL) algorithms mature in applications such as video games or robotics, we design a new approach that leverages resistance and support (RS) level as regularization terms for action in MBRL, to improve the algorithm's efficiency and stability. From the experiment results, we can see RS level, as a market timing technique, enhances the performance of pure MBRL models in terms of various measurements and obtains better profit gain with less riskiness. Besides, our proposed method even resists big drop (less maximum drawdown) during COVID-19 pandemic period when the financial market got unpredictable crisis. Explanations on why control of resistance and support level can boost MBRL is also investigated through numerical experiments, such as loss of actor-critic network and prediction error of the transition dynamical model. It shows that RS indicators indeed help the MBRL algorithms to converge faster at early stage and obtain smaller critic loss as training episodes increase.
    Date: 2022–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2205.15056&r=
  18. By: Rohith Mahadevan; Sam Richard; Kishore Harshan Kumar; Jeevitha Murugan; Santhosh Kannan; Saaisri; Tarun; Raja CSP Raman
    Abstract: The upsurge of real estate involves a variety of factors that have got influenced by many domains. Indeed, the unrecognized sector that would affect the economy for which regulatory proposals are being drafted to keep this in control is the payday loans. This research paper revolves around the impact of payday loans in the real estate market. The research paper draws a first-hand experience of obtaining the index for the concentration of real estate in an area of reference by virtue of payday loans in Toronto, Ontario in particular, which sets out an ideology to create, evaluate and demonstrate the scenario through research analysis. The purpose of this indexing via payday loans is the basic - debt: income ratio which states that when the income of the person bound to pay the interest of payday loans increases, his debt goes down marginally which hence infers that the person invests in fixed assets like real estate which hikes up its growth.
    Date: 2022–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2205.15320&r=
  19. By: Christopher Bockel-Rickermann
    Abstract: The internet has changed the way we live, work and take decisions. As it is the major modern resource for research, detailed data on internet usage exhibits vast amounts of behavioral information. This paper aims to answer the question whether this information can be facilitated to predict future returns of stocks on financial capital markets. In an empirical analysis it implements gradient boosted decision trees to learn relationships between abnormal returns of stocks within the S&P 100 index and lagged predictors derived from historical financial data, as well as search term query volumes on the internet search engine Google. Models predict the occurrence of day-ahead stock returns in excess of the index median. On a time frame from 2005 to 2017, all disparate datasets exhibit valuable information. Evaluated models have average areas under the receiver operating characteristic between 54.2% and 56.7%, clearly indicating a classification better than random guessing. Implementing a simple statistical arbitrage strategy, models are used to create daily trading portfolios of ten stocks and result in annual performances of more than 57% before transaction costs. With ensembles of different data sets topping up the performance ranking, the results further question the weak form and semi-strong form efficiency of modern financial capital markets. Even though transaction costs are not included, the approach adds to the existing literature. It gives guidance on how to use and transform data on internet usage behavior for financial and economic modeling and forecasting.
    Date: 2022–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2205.15853&r=
  20. By: Liu, Mengxiao; Wang, Luhang; Yi, Yimin
    Abstract: This paper classifies innovation as quality-improving or cost-reducing and estimates a dynamic model incorporating firm export, quality innovation, and cost innovation decisions. Estimation results show that export, quality innovation, and cost innovation increase next-period firm productivity by 1.39%, 1.23%, and 1.27%, respectively. Additionally, quality innovation raises next-period export demand by 47%. Counterfactual analyses suggest that (1) foreign market growth has a larger impact on firm export and innovation decisions than domestic market growth, but neither market significantly affects firm productivity; (2) subsidizing continuing quality innovators generates the highest financial return, and subsidizing continuing cost innovators brings the most productivity gain.
    Keywords: export; quality innovation; cost innovation; firm productivity; dynamic estimation; neural network; machine learning; trade liberalization; innovation policy
    JEL: C45 F14 L1 L10 L25
    Date: 2022–07
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:113270&r=
  21. By: Devina Lakhtakia (Department of Economics and Finance, University of Guelph, Guelph ON Canada); Ross McKitrick (Department of Economics and Finance, University of Guelph, Guelph ON Canada)
    Abstract: Many studies have been undertaken to quantify the economic costs of climate change. However, while climate data is measured at a grid cell level, economic data are measured at the national level. In order to form correct damage estimates researchers must reconcile the two. Aggregating climate data up to the national level has been the more common approach but results are sensitive to how the averaging is done and the averaging process itself can bias the results. An alternative approach has been to project economic data down to the grid cell level. Nordhaus (2006) developed the G-Econ database to do this, but while it provides considerable spatial detail it provides only four quinquennial observations per cell from 1990 to 2005. We develop herein a model to predict within-grid cell economic activity using national, regional and local economic activity. The latter is measured using a unique dataset showing annual fight volumes at hundreds of urban and rural airports worldwide from 1976 to 2010. We show that the model has a high level of explanatory power and can be used in an iterative algorithm to infill and extrapolate the G-Econ data base to provide annual observations for grid cells in approximately 150 countries over the 1976 to 2010 interval. We supplement this with satellite nightlight data which provides even more spatial detail but over a shorter time frame.
    Keywords: Gross cell product, climate change, economic growth
    JEL: Q54 Q56 R11
    Date: 2022
    URL: http://d.repec.org/n?u=RePEc:gue:guelph:2022-03&r=
  22. By: Syed Abul, Basher; Jobaida, Behtarin; Salim, Rashid
    Abstract: We examine economic convergence among subnational regions of Bangladesh over the period 1992-2013. Unavailability of the traditional gross domestic product (GDP) for subnational areas and building on findings of recent luminosity literature, we use night lights intensity as a proxy for local economic activity to test the convergence hypothesis. Our results show the existence of both absolute and conditional convergence in night lights intensity, but with a very long half-life of convergence. Moreover, the results also indicate sigma divergence. Together, these findings suggest that regional disparity is persistent and wide across Bangladesh's 544 upazilas (subdistricts). There is evidence that lagging upazilas are catching up with the better off ones, but many are also converging with their neighbors or peers (a phenomenon known as “club convergence”). Overall, consistent with the evidence from studies on regional inequality in Bangladesh, our results also indicate that there is an “east-west” divide in luminosity across the subnational units in Bangladesh.
    Keywords: Convergence, Regional disparity, Bangladesh, Night lights
    JEL: O47 R11
    Date: 2022–01–15
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:111963&r=
  23. By: Bolivar, Osmar
    Abstract: The research aims to evaluate the impact of paved major roads on economic growth at the municipal level in Bolivia, Paraguay and Ecuador. Due to the absence of municipal information, pub-licly available satellite data are used to construct a municipal panel dataset on a yearly basis from 2000 to 2013; particularly, nightlight luminosity is adopted as a proxy for economic activity. Methodologically, empirical evidence is obtained regarding the effect of having access to a paved major road on luminosity, as well as the elasticity between GDP and nightlight luminosity; both estimates are then linked to approximate economic growth in benefited municipalities. The findings suggest that, on aver-age, economic activity was 0.5% to 0.6% higher in municipalities that benefited from paved major roads than in municipalities that did not. The effects vary over time and are dependent on whether the benefited areas are located closer to the road or are part of a population center.
    Keywords: Desarrollo urbano, Economía, Evaluación de impacto, Infraestructura, Investigación socioeconómica, Movilidad urbana, Movilidad urbana, Seguridad vial, Vialidad,
    Date: 2022
    URL: http://d.repec.org/n?u=RePEc:dbl:dblwop:1910&r=
  24. By: Löw, Franziska (Helmut Schmidt University, Hamburg)
    Abstract: The dynamics of online news and political outcomes have been of high interest in various research fields in recent years. This paper provides a new method to estimate media bias using a structural topic model and cosine similarity to test slanting toward different political actors. For the empirical analysis, the content of German online newspapers and press releases of German parties during the election campaign before the federal election in 2017 is analyzed. Following the assumption that a) potential media bias is demand-driven and b) election results can be used as a proxy for reader beliefs, the results show that news articles of most newspapers slant towards AfD topics. Furthermore, we find evidence for the hypothesis that the election day results in changes in news coverage since newspapers can observe the true beliefs of readers.
    Keywords: Media; Bias; Structural topic model; Text analysis
    JEL: C20 D72 L82
    Date: 2022–06–14
    URL: http://d.repec.org/n?u=RePEc:ris:vhsuwp:2022_193&r=
  25. By: Federico Fioravanti; Iyad Rahwan; Fernando Abel Tohm\'e
    Abstract: We present an axiomatic study of a method to automate ethical AI decision making. We consider two different but very intuitive notions of when an alternative is preferred over another, namely {\it pairwise majority} and {\it position} dominance. Voter preferences are learned through a permutation process and then aggregation rules are applied to obtain results that are socially considered to be ethically correct. In this setting we find many voting rules that satisfy desirable properties for an autonomous system.
    Date: 2022–06
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2206.05160&r=

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.