nep-cmp New Economics Papers
on Computational Economics
Issue of 2020‒03‒02
twenty-six papers chosen by



  1. Discretization and Machine Learning Approximation of BSDEs with a Constraint on the Gains-Process By Idris Kharroubi; Thomas Lim; Xavier Warin
  2. Time-inconsistent Markovian control problems under model uncertainty with application to the mean-variance portfolio selection By Tomasz R. Bielecki; Tao Chen; Igor Cialenco
  3. Inter-Depot Moves and Dynamic-Radius Search for Multi-Depot Vehicle Routing Problems By Jean Bertrand Gauthier; Stefan Irnich
  4. Misleading Estimation of Backwardness through NITI Aayog SDG index: A study to find loopholes and construction of alternative index with the help of Artificial Intelligence By Sen, Sugata; Sengupta, Soumya
  5. RUSMOD -- A Tool for Distributional Analysis in the Russian Federation By Matytsin,Mikhail; Popova,Daria; Freije-Rodriguez,Samuel
  6. Efficient Policy Learning from Surrogate-Loss Classification Reductions By Andrew Bennett; Nathan Kallus
  7. How Magic a Bullet Is Machine Learning for Credit Analysis? An Exploration with FinTech Lending Data By J. Christina Wang; Charles B. Perkins
  8. Economy-wide benefits and costs of local-level energy transition in Austrian Climate and Energy Model Regions By Thomas Schinko; Birgit Bednar-Friedl; Barbara Truger; Rafael Bramreiter; Nadejda Komendantova; Michael Hartner
  9. Deep Learning for Financial Applications : A Survey By Ahmet Murat Ozbayoglu; Mehmet Ugur Gudelek; Omer Berat Sezer
  10. Diverging roads: Theory-based vs. machine learning-implied stock risk premia By Grammig, Joachim; Hanenberg, Constantin; Schlag, Christian; Sönksen, Jantje
  11. Sugar taxes: An economy-wide assessment: The case of Guatemala By Piñeiro, Valeria; Diaz-Bonilla, Eugenio; Paz, Flor; Allen, Summer L.
  12. Predicting Bank Loan Default with Extreme Gradient Boosting By Rising Odegua
  13. On fintech and financial inclusion By Thomas Philippon
  14. Priority to Unemployed Immigrants? A Causal Machine Learning Evaluation of Training in Belgium By Cockx, Bart; Lechner, Michael; Bollens, Joost
  15. Simulating fire sales in a system of banks and asset managers By Calimani, Susanna; Hałaj, Grzegorz; Żochowski, Dawid
  16. Heterogeneous Impacts of Climate Change – The Ricardian Approach Using Vietnam Micro-Level Panel Data By Nguyen Chau, Trinh; Scrimgeour, Frank
  17. The network of firms implied by the news By Zheng, Hannan; Schwenkler, Gustavo
  18. Simplifying and Improving the Performance of Risk Adjustment Systems By Thomas G. McGuire; Anna L. Zink; Sherri Rose
  19. Reducing conservatism in robust optimization By Roos, Ernst; den Hertog, Dick
  20. Terrorist Attacks, Cultural Incidents and the Vote for Radical Parties: Analyzing Text from Twitter By Francesco Giavazzi; Felix Iglhaut; Giacomo Lemoli; Gaia Rubera
  21. A Hierarchy of Limitations in Machine Learning By Momin M. Malik
  22. Text-based crude oil price forecasting By Yun Bai; Xixi Li; Hao Yu; Suling Jia
  23. Optimal Land Use Switching Policy By Yadipur , Mahdi; Daglish, Toby; Saglam, Yigit
  24. Corruption red flags in public procurement: new evidence from Italian calls for tenders By Francesco Decarolis; Cristina Giorgiantonio
  25. The gender pay gap revisited: Does machine learning offer new insights? By Brieland, Stephanie; Töpfer, Marina
  26. Langfristige Wirkungen eines nicht abgeschlossenen Studiums auf individuelle Arbeitsmarktergebnisse und die allgemeine Lebenszufriedenheit By Heigle, Julia; Pfeiffer, Friedhelm

  1. By: Idris Kharroubi (LPSM UMR 8001 - Laboratoire de Probabilités, Statistique et Modélisation - UPMC - Université Pierre et Marie Curie - Paris 6 - UPD7 - Université Paris Diderot - Paris 7 - CNRS - Centre National de la Recherche Scientifique); Thomas Lim (LaMME - Laboratoire de Mathématiques et Modélisation d'Evry - INRA - Institut National de la Recherche Agronomique - UEVE - Université d'Évry-Val-d'Essonne - ENSIIE - CNRS - Centre National de la Recherche Scientifique, ENSIIE - Ecole Nationale Supérieure d'Informatique pour l'Industrie et l'Entreprise); Xavier Warin (EDF - EDF)
    Abstract: We study the approximation of backward stochastic differential equations (BSDEs for short) with a constraint on the gains process. We first discretize the constraint by applying a so-called facelift operator at times of a grid. We show that this discretely constrained BSDE converges to the continuously constrained one as the mesh grid converges to zero. We then focus on the approximation of the discretely constrained BSDE. For that we adopt a machine learning approach. We show that the facelift can be approximated by an optimization problem over a class of neural networks under constraints on the neural network and its derivative. We then derive an algorithm converging to the discretely constrained BSDE as the number of neurons goes to infinity. We end by numerical experiments. Mathematics Subject Classification (2010): 65C30, 65M75, 60H35, 93E20, 49L25.
    Keywords: Constrainted BSDEs,discrete-time approximation,neural networks approxi- mation,facelift transformation
    Date: 2020–02–05
    URL: http://d.repec.org/n?u=RePEc:hal:wpaper:hal-02468354&r=all
  2. By: Tomasz R. Bielecki; Tao Chen; Igor Cialenco
    Abstract: In this paper we study a class of time-inconsistent terminal Markovian control problems in discrete time subject to model uncertainty. We combine the concept of the sub-game perfect strategies with the adaptive robust stochastic to tackle the theoretical aspects of the considered stochastic control problem. Consequently, as an important application of the theoretical results, by applying a machine learning algorithm we solve numerically the mean-variance portfolio selection problem under the model uncertainty.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.02604&r=all
  3. By: Jean Bertrand Gauthier (Johannes Gutenberg University Mainz); Stefan Irnich (Johannes Gutenberg University Mainz)
    Abstract: Radius search is an effective neighborhood exploration technique for standard edge-exchange neighborhoods such as 2-opt, 2-opt*, swap, relocation, Or-opt, string exchange, etc. Up to now, it has only been used for vehicle routing problems with a homogeneous fleet and in the single-depot context. In this work, we extend dynamic-radius search to the multi-depot vehicle routing problem, in which 2-opt and 2-opt* moves may involve routes from different depots. To this end, we equip dynamic-radius search with a modified pruning criterion that still guarantees identifying a best-improving move, either intra-depot or inter-depot, with little additional computational effort. We experimentally confirm that substantial speedups of factors of 100 and more are observed compared to an also optimized implementation of lexicographic search, another effective neighborhood exploration technique using a feasibility-based pruning criterion. Moreover, the computational results show that depot swapping strongly favors heuristic solution quality, especially for multi-depot configurations where depots are not located close to each other.
    Keywords: Vehicle routing, Local search, Sequential search, Dynamic-radius search, Inter-depot
    Date: 2020–02–25
    URL: http://d.repec.org/n?u=RePEc:jgu:wpaper:2004&r=all
  4. By: Sen, Sugata; Sengupta, Soumya
    Abstract: UNDP Rio +20 summit in 2012 evolved a set of indicators to realise the targets of SDGs within a deadline. Measurement of the performances under these goals has followed the methodology as developed by UNDP which is nothing but the simple average of performances of the indicators under different domains. This work concludes that this methodology to measure the goal-wise as well as the composite performances is suffering from major shortcomings and proposes an alternative using the ideas of artificial intelligence. Here it is accepted that the indicators under different goals are inter-related and hence constructing index through simple average is misleading. Moreover the methodologies under the existing indices have failed to assign weights to different indicators. This work is based on secondary data and the goal-wise indices have been determined through normalised sigmoid functions. These goal-wise indices are plotted on a radar and the area of the radar is treated as measure under composite SDG performance. The whole work is presented through an artificial neural network. Observed that the goal-wise index as developed and tested here has shown that the UNDP as well as NITI Aayog index has delivered exaggerated values of goal-wise as well as composite performances.
    Keywords: SDG Index, Sigmoidal Activation Function, Artificial Neural Network
    JEL: C63 O15
    Date: 2020–02–06
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:98534&r=all
  5. By: Matytsin,Mikhail; Popova,Daria; Freije-Rodriguez,Samuel
    Abstract: The purpose of this paper is to introduce applications of RUSMOD -- a microsimulation model for fiscal incidence analysis in the Russian Federation. RUSMOD combines household survey micro-data and fiscal policy rules to simulate the Russian tax-benefit system: the size and distribution of taxes collected and benefits paid, and the impact of the system on different population groups. Microsimulation models, such as RUSMOD, are habitually used in developed countries, and can be versatile budgetary policy tools. Using this model, the current tax-benefit system in Russia is examined. The impact of the system is measured across the income distribution, age groups, family types, localities, as well as across time. One of the applications of RUSMOD this paper aims to assess is the role of the tax-benefit system in explaining the incidence of informal employment in Russia. The paper investigates whether the existing system creates disincentives for formalization in terms of reducing disposable incomes and increasing poverty and inequality, and whether a hypothetical tax reform would be able to reduce the opportunity costs of formalization for informal workers, improve distributional outcomes, and increase fiscal revenues.
    Date: 2019–09–03
    URL: http://d.repec.org/n?u=RePEc:wbk:wbrwps:8994&r=all
  6. By: Andrew Bennett; Nathan Kallus
    Abstract: Recent work on policy learning from observational data has highlighted the importance of efficient policy evaluation and has proposed reductions to weighted (cost-sensitive) classification. But, efficient policy evaluation need not yield efficient estimation of policy parameters. We consider the estimation problem given by a weighted surrogate-loss classification reduction of policy learning with any score function, either direct, inverse-propensity weighted, or doubly robust. We show that, under a correct specification assumption, the weighted classification formulation need not be efficient for policy parameters. We draw a contrast to actual (possibly weighted) binary classification, where correct specification implies a parametric model, while for policy learning it only implies a semiparametric model. In light of this, we instead propose an estimation approach based on generalized method of moments, which is efficient for the policy parameters. We propose a particular method based on recent developments on solving moment problems using neural networks and demonstrate the efficiency and regret benefits of this method empirically.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.05153&r=all
  7. By: J. Christina Wang; Charles B. Perkins
    Abstract: FinTech online lending to consumers has grown rapidly in the post-crisis era. As argued by its advocates, one key advantage of FinTech lending is that lenders can predict loan outcomes more accurately by employing complex analytical tools, such as machine learning (ML) methods. This study applies ML methods, in particular random forests and stochastic gradient boosting, to loan-level data from the largest FinTech lender of personal loans to assess the extent to which those methods can produce more accurate out-of-sample predictions of default on future loans relative to standard regression models. To explain loan outcomes, this analysis accounts for the economic conditions faced by a borrower after origination, which are typically absent from other ML studies of default. For the given data, the ML methods indeed improve prediction accuracy, but more so over the near horizon than beyond a year. This study then shows that having more data up to, but not beyond, a certain quantity enhances the predictive accuracy of the ML methods relative to that of parametric models. The likely explanation is that there has been data or model drift over time, so that methods that fit more complex models with more data can in fact suffer greater out-of-sample misses. Prediction accuracy rises, but only marginally, with additional standard credit variables beyond the core set, suggesting that unconventional data need to be sufficiently informative as a whole to help consumers with little or no credit history. This study further explores whether the greater functional flexibility of ML methods yields unequal benefit to consumers with different attributes or who reside in locales with varying economic conditions. It finds that the ML methods produce more favorable ratings for different groups of consumers, although those already deemed less risky seem to benefit more on balance.
    Keywords: FinTech/marketplace lending; supervised machine learning; default prediction
    JEL: C52 C53 C55 G23
    Date: 2019–10–14
    URL: http://d.repec.org/n?u=RePEc:fip:fedbwp:87410&r=all
  8. By: Thomas Schinko (International Institute for Applied Systems Analysis (IIASA), Austria); Birgit Bednar-Friedl (University of Graz, Austria); Barbara Truger (University of Graz, Austria); Rafael Bramreiter (University of Graz, Austria); Nadejda Komendantova (University of Graz, Austria); Michael Hartner (Energy Economics Group, TU Wien, Austria)
    Abstract: To achieve a low-carbon transition in the electricity sector, countries combine national-scale policies with regional renewable electricity (RES-E) initiatives. Taking Austria as an example, we investigate the economy-wide effects of implementing national-level feed-in tariffs alongside local-level ‘climate and energy model (CEM) regions’, taking account of policy externalities across the two governance levels. We distinguish three types of CEM regions by means of a cluster analysis and apply a sub-national Computable General Equilibrium (CGE) model to investigate two RES-E scenarios. We find that whether the net economic effects are positive or negative depends on three factors: (i) RES-E potentials, differentiated by technology and cluster region; (ii) economic competitiveness of RES-E technologies relative to each other and to the current generation mix; and (iii) support schemes in place which translate into policy costs. We conclude that the focus should mainly be on economically competitive technologies, such as PV and wind, to avoid unintended macroeconomic side-effects. To achieve that, national support policies for RES-E have to be aligned with regional energy initiatives.
    Keywords: energy transition; computable general equilibrium (CGE); national support policies; regional energy initiatives; policy externality
    JEL: Q42 R13 C68
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:grz:wpaper:2020-05&r=all
  9. By: Ahmet Murat Ozbayoglu; Mehmet Ugur Gudelek; Omer Berat Sezer
    Abstract: Computational intelligence in finance has been a very popular topic for both academia and financial industry in the last few decades. Numerous studies have been published resulting in various models. Meanwhile, within the Machine Learning (ML) field, Deep Learning (DL) started getting a lot of attention recently, mostly due to its outperformance over the classical models. Lots of different implementations of DL exist today, and the broad interest is continuing. Finance is one particular area where DL models started getting traction, however, the playfield is wide open, a lot of research opportunities still exist. In this paper, we tried to provide a state-of-the-art snapshot of the developed DL models for financial applications, as of today. We not only categorized the works according to their intended subfield in finance but also analyzed them based on their DL models. In addition, we also aimed at identifying possible future implementations and highlighted the pathway for the ongoing research within the field.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.05786&r=all
  10. By: Grammig, Joachim; Hanenberg, Constantin; Schlag, Christian; Sönksen, Jantje
    Abstract: We assess financial theory-based and machine learning-implied measurements of stock risk premia by comparing the quality of their return forecasts. In the low signal-to-noise environment of a one month horizon, we find that it is preferable to rely on a theory-based approach instead of engaging in the computerintensive hyper-parameter tuning of statistical models. The theory-based approach also delivers a solid performance at the one year horizon, at which only one machine learning methodology (random forest) performs substantially better. We also consider ways to combine the opposing modeling philosophies, and identify the use of random forests to account for the approximation residuals of the theory-based approach as a promising hybrid strategy. It combines the advantages of the two diverging paths in the finance world.
    Keywords: stock risk premia,return forecasts,machine learning,theorybased return prediction
    JEL: C53 C58 G12 G17
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:zbw:tuewef:130&r=all
  11. By: Piñeiro, Valeria; Diaz-Bonilla, Eugenio; Paz, Flor; Allen, Summer L.
    Abstract: This study presents an economy-wide analysis for Guatemala, considering several tax options on sugar and SSB and then tracing their differentiated general economic effects. We focus on Guatemala, considering the increasing health burden imposed by obesity and the fact that it is also an important sugar producer and exporter. In the next section we describe the general conditions for sugar production and consumption in Guatemala. We then describe the economy-wide model utilized, the modeled scenarios, and finally, the results of the simulations before concluding.
    Keywords: GUATEMALA, LATIN AMERICA, CENTRAL AMERICA, sugar, taxes, assessment, exports, food consumption, agricultural production, trade, health, nutrition, food systems, sugar production, consumption indicators, sugar exports,
    Date: 2019
    URL: http://d.repec.org/n?u=RePEc:fpr:lacwps:3&r=all
  12. By: Rising Odegua
    Abstract: Loan default prediction is one of the most important and critical problems faced by banks and other financial institutions as it has a huge effect on profit. Although many traditional methods exist for mining information about a loan application, most of these methods seem to be under-performing as there have been reported increases in the number of bad loans. In this paper, we use an Extreme Gradient Boosting algorithm called XGBoost for loan default prediction. The prediction is based on a loan data from a leading bank taking into consideration data sets from both the loan application and the demographic of the applicant. We also present important evaluation metrics such as Accuracy, Recall, precision, F1-Score and ROC area of the analysis. This paper provides an effective basis for loan credit approval in order to identify risky customers from a large number of loan applications using predictive modeling.
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.02011&r=all
  13. By: Thomas Philippon
    Abstract: The cost of financial intermediation has declined in recent years thanks to technology and increased competition in some parts of the finance industry. I document this fact and I analyze two features of new financial technologies that have stirred controversy: returns to scale and the use of big data and machine learning. I argue that the nature of fixed versus variable costs in robo-advising is likely to democratize access to financial services. Big data is likely to reduce the impact of negative prejudice in the credit market but it could reduce the effectiveness of existing policies aimed at protecting minorities.
    Keywords: fintech, discrimination, robo advising, credit scoring, big data, machine learning
    JEL: E2 G2 N2
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:bis:biswps:841&r=all
  14. By: Cockx, Bart (Ghent University); Lechner, Michael (University of St. Gallen); Bollens, Joost (VDAB, Belgium)
    Abstract: We investigate heterogenous employment effects of Flemish training programmes. Based on administrative individual data, we analyse programme effects at various aggregation levels using Modified Causal Forests (MCF), a causal machine learning estimator for multiple programmes. While all programmes have positive effects after the lock-in period, we find substantial heterogeneity across programmes and types of unemployed. Simulations show that assigning unemployed to programmes that maximise individual gains as identified in our estimation can considerably improve effectiveness. Simplified rules, such as one giving priority to unemployed with low employability, mostly recent migrants, lead to about half of the gains obtained by more sophisticated rules.
    Keywords: policy evaluation, active labour market policy, causal machine learning, modified causal forest, conditional average treatment effects
    JEL: J68
    Date: 2019–12
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp12875&r=all
  15. By: Calimani, Susanna; Hałaj, Grzegorz; Żochowski, Dawid
    Abstract: We develop an agent-based model of traditional banks and asset managers to investigate the contagion risk related to fire sales and balance sheet interactions. We take a structural approach to the price formation in fire sales as in Bluhm et al. (2014) and introduce a market clearing mechanism with endogenous formation of asset prices. We find that, first, banks which are active in both the interbank and securities markets may channel financial distress between the two markets. Second, while higher bank capital requirements decrease default risk and funding costs, they make it also more profitable to invest into less-liquid assets financed by interbank borrowing. Third, asset managers absorb small liquidity shocks, but they exacerbate contagion when their voluntary liquid buffers are fully utilised. Fourth, a system with larger and more interconnected agents is more prone to contagion risk stemming from funding shocks. JEL Classification: C6, G21, G23
    Keywords: agent-based model, asset managers, contagion, fire sales, systemic risk
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:ecb:ecbwps:20202373&r=all
  16. By: Nguyen Chau, Trinh; Scrimgeour, Frank
    Abstract: This analysis investigates economic impacts of climate changes on Vietnam agriculture. The Ricardian approach is applied to ten-year panel data using the Hsiao two-step method. Estimates of the Ricardian model suggest heterogeneous impacts of climate change. Rising temperature is especially harmful to the Northern Central and the Southern region. Shortage of rainfall in spring only causes losses to the Central Highlands and Northern region. Rising summer precipitation is extremely harmful. Increases in precipitation help to harness the benefit of rising autumn temperature. The simulation indicates net agricultural surpluses in the long-run, with the Central Highlands being an exception.
    Keywords: Agribusiness, Environmental Economics and Policy
    Date: 2019–08–29
    URL: http://d.repec.org/n?u=RePEc:ags:nzar19:302100&r=all
  17. By: Zheng, Hannan; Schwenkler, Gustavo
    Abstract: We show that the news is a rich source of data on distressed firm links that drive firm-level and aggregate risks. The news tends to report about links in which a less popular firm is distressed and may contaminate a more popular firm. This constitutes a contagion channel that yields predictable returns and downgrades. Shocks to the degree of news-implied firm connectivity predict increases in aggregate volatilities, credit spreads, and default rates, and declines in output. To obtain our results, we propose a machine learning methodology that takes text data as input and outputs a data-implied firm network. JEL Classification: E32, E44, L11, G10, C82
    Keywords: contagion, machine learning, natural language processing, networks, predictability, risk measurement
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:srk:srkwps:2020108&r=all
  18. By: Thomas G. McGuire; Anna L. Zink; Sherri Rose
    Abstract: Risk-adjustment systems used to pay health plans in individual health insurance markets have evolved towards better “fit” of payments to plan spending, at the individual and group levels, generally achieved by adding variables used for risk adjustment. Adding variables demands further plan and provider-supplied data. Some data called for in the more complex systems may be easily manipulated by providers, leading to unintended “upcoding” or to unnecessary service utilization. While these drawbacks are recognized, they are hard to quantify and are difficult to balance against the concrete, measurable improvements in fit that may be attained by adding variables to the formula. This paper takes a different approach to improving the performance of health plan payment systems. Using the HHS-HHC V0519 model of plan payment in the Marketplaces as a starting point, we constrain fit at the individual and group level to be as good or better than the current payment model while reducing the number of variables called for in the model. Opportunities for simplification are created by the introduction of three elements in design of plan payment: reinsurance (based on high spending or plan losses), constrained regressions, and powerful machine learning methods for variable selection. We first drop all variables relying on drug claims. Further major reductions in the number of diagnostic-based risk adjustors are possible using machine learning integrated with our constrained regressions. The fit performance of our simpler alternatives is as good or better than the current HHS-HHC V0519 formula.
    JEL: I11 I13 I18
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:26736&r=all
  19. By: Roos, Ernst (Tilburg University, School of Economics and Management); den Hertog, Dick (Tilburg University, School of Economics and Management)
    Abstract: Although Robust Optimization is a powerful technique in dealing with uncertainty in optimization, its solutions can be too conservative when it leads to an objective value much worse than the nominal solution or even to infeasibility of the robust problem. In practice, this can lead to robust solutions being disregarded in favor of the nominal solution. This conservatism is caused by both the constraint wise approach of Robust Optimization and its core assumption that all constraints are hard for all scenarios in the uncertainty set. This paper seeks to alleviate this conservatism by proposing an alternative robust formulation that condenses all uncertainty into a single constraint, binding the worst-case expected violation in the original constraints from above. Using recent results in distributionally robust optimization, the proposed formulation is shown to be tractable for both right- and left-hand side uncertainty. A computational study is performed with problems from the NETLIB library. For some problems, the percentage of uncertainty is magnified fourfold in terms of increase in objective value of the standard robust solution compared to the nominal solution, whereas we find solutions that safeguard against over half the violation while only a tenth of the uncertainty is reflected in the objective value. For problems with an infeasible standard robust counterpart, the suggested approach is still applicable and finds both solutions that safeguard against most of the uncertainty at a low price in terms of objective value.
    Date: 2019
    URL: http://d.repec.org/n?u=RePEc:tiu:tiutis:ad0238cd-de7a-4366-b487-be577584669e&r=all
  20. By: Francesco Giavazzi; Felix Iglhaut; Giacomo Lemoli; Gaia Rubera
    Abstract: We study the role of perceived threats from cultural diversity induced by terrorist attacks and a salient criminal event on public discourse and voters’ support for far-right parties. We first develop a rule which allocates Twitter users in Germany to electoral districts and then use a machine learning method to compute measures of textual similarity between the tweets they produce and tweets by accounts of the main German parties. Using the dates of the aforementioned exogenous events we estimate constituency-level shifts in similarity to party language. We find that following these events Twitter text becomes on average more similar to that of the main far-right party, AfD, while the opposite happens for some of the other parties. Regressing estimated shifts in similarity on changes in vote shares between federal elections we find a significant association. Our results point to the role of perceived threats on the success of nationalist parties.
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:igi:igierp:659&r=all
  21. By: Momin M. Malik
    Abstract: "All models are wrong, but some are useful", wrote George E. P. Box (1979). Machine learning has focused on the usefulness of probability models for prediction in social systems, but is only now coming to grips with the ways in which these models are wrong---and the consequences of those shortcomings. This paper attempts a comprehensive, structured overview of the specific conceptual, procedural, and statistical limitations of models in machine learning when applied to society. Machine learning modelers themselves can use the described hierarchy to identify possible failure points and think through how to address them, and consumers of machine learning models can know what to question when confronted with the decision about if, where, and how to apply machine learning. The limitations go from commitments inherent in quantification itself, through to showing how unmodeled dependencies can lead to cross-validation being overly optimistic as a way of assessing model performance.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.05193&r=all
  22. By: Yun Bai; Xixi Li; Hao Yu; Suling Jia
    Abstract: Crude oil price forecasting has attracted substantial attention in the field of forecasting. Recently, the research on text-based crude oil price forecasting has advanced. To improve accuracy, some studies have added as many covariates as possible, such as textual and nontextual factors, to their models, leading to unnecessary human intervention and computational costs. Moreover, some methods are only designed for crude oil forecasting and cannot be well transferred to the forecasting of other similar futures commodities. In contrast, this article proposes a text-based forecasting framework for futures commodities that uses only future news headlines obtained from Investing.com to forecast crude oil prices. Two marketing indexes, the sentiment index and the topic intensity index, are extracted from these news headlines. Considering that the public's sentiment changes over time, the time factor is innovatively applied to the construction of the sentiment index. Taking the nature of the short news headlines into consideration, a short text topic model called SeaNMF is used to calculate the topic intensity of the futures market more accurately. Two methods, VAR and RFE, are used for lag order judgment and feature selection, respectively, at the model construction stage. The experimental results show that the Ada-text model outperforms the Adaboost.RT baseline model and the other benchmarks.
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.02010&r=all
  23. By: Yadipur , Mahdi; Daglish, Toby; Saglam, Yigit
    Abstract: In this paper, we examine the real option to switching problems. We evaluate a situation where farmers can switch between different crops in response to crop price changes. We contribute to the literature in a variety of respects. First, in conventional models farmers can always choose to convert between different land uses at a cost. We extend the existing literature by adding a time cost as well as a cash cost. While pure cost models may be appropriate when farmers switch from farming sheep to cows and vice versa, in some activities, such as planting, this is not necessarily true, as there is a waiting time period as there is a delay in yields. Farmers in our example do not only face a cash cost but also face a prolonged period of no production until new crops reach maturity. Second, given the long conversion periods, our model allows farmers to reverse their switches. Given the complexity of the decision and thus the model, we employ computational methods to investigate how various factors enter into the decision making process.
    Keywords: Land Economics/Use
    Date: 2019–08–29
    URL: http://d.repec.org/n?u=RePEc:ags:nzar19:302101&r=all
  24. By: Francesco Decarolis (Bocconi University); Cristina Giorgiantonio (Bank of Italy)
    Abstract: This paper contributes to the analysis of quantitative indicators (i.e., red flags or screens) to detect corruption in public procurement. Expanding the set of commonly discussed indicators in the literature to new ones derived from the operating practices of police forces and the judiciary, this paper verifies the presence of these red flags in a sample of Italian awarding procedures for roadwork contracts in the period 2009-2015. Then, it validates the efficacy of the indicators through measures of direct corruption risks (judiciary cases and police investigations for corruption-related crimes) and indirect corruption risks (delays and cost overruns). From a policy perspective, our analysis shows that the most effective red flags in detecting corruption risks are those related to discretionary mechanisms for selecting private contractors (such as the most economically advantageous offer or negotiated procedures), compliance with the minimum time limit for the submission of tenders and subcontracting. Moreover, our analysis suggests that greater standardization in the call for tender documents can contribute to reducing corruption risks. From a methodological point of view, the paper highlights the relevance of prediction approaches based on machine learning methods (especially the random forests algorithm) for validating a large set of indicators.
    Keywords: public procurement, corruption, red flags
    JEL: D44 D47 H57 R42
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:bdi:opques:qef_544_20&r=all
  25. By: Brieland, Stephanie; Töpfer, Marina
    Abstract: This paper analyses gender differences in pay at the mean as well as along the wage distribution. Using data from the German Socio-Economic Panel, we estimate the adjusted gender pay gap applying a machine learning method (post-double-LASSO procedure). Comparing results from this method to conventional models in the literature, we find that the size of the adjusted pay gap differs substantially depending on the approach used. The main reason is that the machine learning approach selects numerous interactions and second-order polynomials as well as different sets of covariates at various points of the wage distribution. This insight suggests that more exible specifications are needed to estimate gender differences in pay more appropriately. We further show that estimates of all models are robust to remaining selection on unobservables.
    Keywords: Gender pay gap,Machine Learning,Selection on unobservables
    JEL: J7 J16 J31
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:zbw:faulre:111&r=all
  26. By: Heigle, Julia; Pfeiffer, Friedhelm
    Abstract: To the best of our knowledge, this study is the first study for Germany to assess the long-term impacts of studying without graduating on three labour market outcomes (working hours, wages, and occupational prestige), and on overall life satisfaction, on the basis of a sample of employed individuals from the Socio-Economic Panel (SOEP) who possess a university entrance qualification. The impact is analyzed relative to individuals who have never been enrolled in university study (baseline group) and to individuals that have attained a university degree. The impacts are assessed by means of a double machine learning procedure that accounts for selection into the three educational paths and generates the counterfactual outcomes for the different paths. The findings indicate an average impact of studying without graduating of plus 5 percentage points on occupational prestige, and minus 2.8 percentage points on life satisfaction relative to the baseline group. The estimates for wages and working hours are not significant. The effects of graduating on all outcomes is positive and substantial relative to studying without graduating or not studying at all.
    Keywords: Arbeitsmarkt,Humankapitalforschung,Studienerfolg,Studium ohne Abschluss
    JEL: I21 I28 J31
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:zbw:zewdip:20004&r=all

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.