nep-big New Economics Papers
on Big Data
Issue of 2022‒03‒07
33 papers chosen by
Tom Coupé
University of Canterbury

  1. Using satellites and artificial intelligence to measure health and material-living standards in India By Daoud, Adel; Jordan, Felipe; Sharma, Makkunda; Johansson, Fredrik; Dubhashi, Devdatt; Paul, Sourabh; Banerjee, Subhashis
  2. Predicting The Stock Trend Using News Sentiment Analysis and Technical Indicators in Spark By Taylan Kabbani; Fatih Enes Usta
  3. Estimation of the Farm-Level Yield-Weather-Relation Using Machine Learning By Schmidt, Lorenz; Odening, Martin; Schlanstein, Johann; Ritter, Matthias
  4. Machine-Learning the Skill of Mutual Fund Managers By Ron Kaniel; Zihan Lin; Markus Pelger; Stijn Van Nieuwerburgh
  5. The Rise of Machine Learning in the Academic Social Sciences By Rahal, Charles; Verhagen, Mark D.; Kirk, David
  6. Deep Learning of Potential Outcomes By Koch, Bernard; Sainburg, Tim; Geraldo, Pablo; JIANG, SONG; Sun, Yizhou; Foster, Jacob G.
  7. Estimation of Conditional Random Coefficient Models using Machine Learning Techniques By Stephan Martin
  8. Building a predictive machine learning model of gentrification in Sydney By Thackway, William; Ng, Matthew Kok Ming; Lee, Chyi Lin; Pettit, Christopher
  9. Determinants of Regional Raw Milk Prices in Russia By Kresova, Svetlana; Hess, Sebastian
  10. Identifying and Improving Functional Form Complexity: A Machine Learning Framework By Verhagen, Mark D.
  11. Do not rug on me: Zero-dimensional Scam Detection By Bruno Mazorra; Victor Adan; Vanesa Daza
  12. LEARNING IN RANDOM UTILITY MODELS VIA ONLINE DECISION PROBLEMS By Emerson Melo
  13. Human-centered mechanism design with Democratic AI By Raphael Koster; Jan Balaguer; Andrea Tacchetti; Ari Weinstein; Tina Zhu; Oliver Hauser; Duncan Williams; Lucy Campbell-Gillingham; Phoebe Thacker; Matthew Botvinick; Christopher Summerfield
  14. Third-Degree Price Discrimination in the Age of Big Data By Charlson, G.
  15. Economists in the 2008 Financial Crisis: Slow to See, Fast to Act By Daniel Levy; Tamir Mayer; Alon Raviv
  16. Predictive Algorithms in the Delivery of Public Employment Services By Körtner, John; Bonoli, Giuliano
  17. Micro-level Reserving for General Insurance Claims using a Long Short-Term Memory Network By Ihsan Chaoubi; Camille Besse; H\'el\`ene Cossette; Marie-Pier C\^ot\'e
  18. Digital discretion and public administration in Africa: Implications for the use of artificial intelligence By Plantinga, Paul
  19. OECD Framework for the Classification of AI systems By OECD
  20. Deep Learning Macroeconomics By Rafael R. S. Guimaraes
  21. Sharing Behavior in Ride-hailing Trips: A Machine Learning Inference Approach By Morteza Taiebat; Elham Amini; Ming Xu
  22. Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance By Gabriel Okasa
  23. RiskNet: Neural Risk Assessment in Networks of Unreliable Resources By Krzysztof Rusek; Piotr Bory{\l}o; Piotr Jaglarz; Fabien Geyer; Albert Cabellos; Piotr Cho{\l}da
  24. Price Revelation from Insider Trading: Evidence from Hacked Earnings News By Akey, Pat; Grégoire, Vincent; Martineau, Charles
  25. Developing urban biking typologies: quantifying the complex interactions of bicycle ridership, bicycle network and built environment characteristics By Beck, Ben; Winters, Meghan; Nelson, Trisalyn; Pettit, Christopher; Saberi, Meead; Thompson, Jason; Seneviratne, Sachith; Nice, Kerry A; Zarpelon-Leao, Simone; Stevenson, Mark
  26. A hybrid deep learning approach for purchasing strategy of carbon emission rights -- Based on Shanghai pilot market By Jiayue Xu
  27. Environmental News Emotion and Air Pollution in China By Sébastien Marchand; Damien Cubizol; Elda Nasho Ah-Pine; Huanxiu Guo
  28. Marginal Effects for Non-Linear Prediction Functions By Christian A. Scholbeck; Giuseppe Casalicchio; Christoph Molnar; Bernd Bischl; Christian Heumann
  29. A Stock Trading System for a Medium Volatile Asset using Multi Layer Perceptron By Ivan Letteri; Giuseppe Della Penna; Giovanni De Gasperis; Abeer Dyoub
  30. Toward a More Populous Online Platform: The Economic Impacts of Compensated Reviews By Peng Li; Arim Park; Soohyun Cho; Yao Zhao
  31. Simulating Using Deep Learning The World Trade Forecasting of Export-Import Exchange Rate Convergence Factor During COVID-19 By Effat Ara Easmin Lucky; Md. Mahadi Hasan Sany; Mumenunnesa Keya; Md. Moshiur Rahaman; Umme Habiba Happy; Sharun Akter Khushbu; Md. Arid Hasan
  32. Metric Hypertransformers are Universal Adapted Maps By Beatrice Acciaio; Anastasis Kratsios; Gudmund Pammer
  33. DeepScalper: A Risk-Aware Deep Reinforcement Learning Framework for Intraday Trading with Micro-level Market Embedding By Shuo Sun; Rundong Wang; Xu He; Junlei Zhu; Jian Li; Bo An

  1. By: Daoud, Adel; Jordan, Felipe; Sharma, Makkunda; Johansson, Fredrik; Dubhashi, Devdatt; Paul, Sourabh; Banerjee, Subhashis
    Abstract: The application of deep learning methods to survey human development in remote areas with satellite imagery at high temporal frequency can significantly enhance our understanding of spatial and temporal patterns in human development. Current applications have focused their efforts in predicting a narrow set of asset-based measurements of human well-being within a limited group of African countries. Here, we leverage georeferenced village-level census data from across 30 percent of the landmass of India to train a deep-neural network that predicts 16 variables representing material conditions from annual composites of Landsat 7 imagery. The census-based model is used as a feature extractor to train another network that predicts an even larger set of developmental variables (over 90 variables) included in two rounds of the National Family Health Survey (NFHS) survey. The census-based model outperforms the current standard in the literature, night-time-luminosity-based models, as a feature extractor for several of these large set of variables. To extend the temporal scope of the models, we suggest a distribution-transformation procedure to estimate outcomes over time and space in India. Our procedure achieves levels of accuracy in the R-square of 0.92 to 0.60 for 21 development outcomes, 0.59 to 0.30 for 25 outcomes, and 0.29 to 0.00 for 28 outcomes, and 19 outcomes had negative R-square. Overall, the results show that combining satellite data with Indian Census data unlocks rich information for training deep learning models that track human development at an unprecedented geographical and temporal definition.
    Date: 2021–12–14
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:vf28g&r=
  2. By: Taylan Kabbani (Ozyegin University; Huawei Turkey R&D Center); Fatih Enes Usta (Marmara University)
    Abstract: Predicting the stock market trend has always been challenging since its movement is affected by many factors. Here, we approach the future trend prediction problem as a machine learning classification problem by creating tomorrow_trend feature as our label to be predicted. Different features are given to help the machine learning model predict the label of a given day; whether it is an uptrend or downtrend, those features are technical indicators generated from the stock's price history. In addition, as financial news plays a vital role in changing the investor's behavior, the overall sentiment score on a given day is created from all news released on that day and added to the model as another feature. Three different machine learning models are tested in Spark (big-data computing platform), Logistic Regression, Random Forest, and Gradient Boosting Machine. Random Forest was the best performing model with a 63.58% test accuracy.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.12283&r=
  3. By: Schmidt, Lorenz; Odening, Martin; Schlanstein, Johann; Ritter, Matthias
    Abstract: Weather is a pivotal factor for crop production as it is highly volatile and can hardly be controlled by farm management practices. Since there is a tendency towards increased weather extremes in the future, understanding the weather-related yield factors becomes increasingly important not only for yield prediction, but also for the design of insurance products that mitigate financial losses for farmers. In this study, an artificial neural network is set up and calibrated to a rich set of farm-level wheat yield data in Germany covering the period from 2003 to 2018. A nonlinear regression model, which uses rainfall, temperature, and soil moisture as explanatory variables for yield deviations, serves as a benchmark. The empirical application reveals that the gain in estimation precision by using machine learning techniques compared with traditional estimation approaches is quite substantial and that the use of regionalized models and high-resolution weather data improve the performance of ANN.
    Keywords: Production Economics, Research Methods / Statistical Methods
    Date: 2021–11–18
    URL: http://d.repec.org/n?u=RePEc:ags:gewi21:317075&r=
  4. By: Ron Kaniel; Zihan Lin; Markus Pelger; Stijn Van Nieuwerburgh
    Abstract: We show, using machine learning, that fund characteristics can consistently differentiate high from low-performing mutual funds, as well as identify funds with net-of-fees abnormal returns. Fund momentum and fund flow are the most important predictors of future risk-adjusted fund performance, while characteristics of the stocks that funds hold are not predictive. Returns of predictive long-short portfolios are higher following a period of high sentiment or a good state of the macro-economy. Our estimation with neural networks enables us to uncover novel and substantial interaction effects between sentiment and both fund flow and fund momentum.
    JEL: G0 G11 G23 G5
    Date: 2022–02
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:29723&r=
  5. By: Rahal, Charles; Verhagen, Mark D.; Kirk, David (University of Oxford)
    Abstract: This short perspectives-style article explains recent trends and outlines three reasons to be even more optimistic about the future of Machine Learning in the academic Social Sciences.
    Date: 2021–10–01
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:gydve&r=
  6. By: Koch, Bernard; Sainburg, Tim; Geraldo, Pablo; JIANG, SONG; Sun, Yizhou; Foster, Jacob G.
    Abstract: This review systematizes the emerging literature for causal inference using deep neural networks under the potential outcomes framework. It provides an intuitive introduction on how deep learning can be used to estimate/predict heterogeneous treatment effects and extend causal inference to settings where confounding is non-linear, time varying, or encoded in text, networks, and images. To maximize accessibility, we also introduce prerequisite concepts from causal inference and deep learning. The survey differs from other treatments of deep learning and causal inference in its sharp focus on observational causal estimation, its extended exposition of key algorithms, and its detailed tutorials for implementing, training, and selecting among deep estimators in Tensorflow 2 available at github.com/kochbj/Deep-Learning-for-Caus al-Inference.
    Date: 2021–10–10
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:aeszf&r=
  7. By: Stephan Martin
    Abstract: Nonparametric random coefficient (RC)-density estimation has mostly been considered in the marginal density case under strict independence of RCs and covariates. This paper deals with the estimation of RC-densities conditional on a (large-dimensional) set of control variables using machine learning techniques. The conditional RC-density allows to disentangle observable from unobservable heterogeneity in partial effects of continuous treatments adding to a growing literature on heterogeneous effect estimation using machine learning. %It is also informative of the conditional potential outcome distribution. This paper proposes a two-stage sieve estimation procedure. First a closed-form sieve approximation of the conditional RC density is derived where each sieve coefficient can be expressed as conditional expectation function varying with controls. Second, sieve coefficients are estimated with generic machine learning procedures and under appropriate sample splitting rules. The $L_2$-convergence rate of the conditional RC-density estimator is derived. The rate is slower by a factor then typical rates of mean regression machine learning estimators which is due to the ill-posedness of the RC density estimation problem. The performance and applicability of the estimator is illustrated using random forest algorithms over a range of Monte Carlo simulations and with real data from the SOEP-IS. Here behavioral heterogeneity in an economic experiment on portfolio choice is studied. The method reveals two types of behavior in the population, one type complying with economic theory and one not. The assignment to types appears largely based on unobservables not available in the data.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.08366&r=
  8. By: Thackway, William; Ng, Matthew Kok Ming (University of New South Wales); Lee, Chyi Lin; Pettit, Christopher
    Abstract: In an era of rapid urbanisation and increasing wealth, gentrification is an urban phenomenon impacting many cities around the world. The ability of policymakers and planners to better understand and address gentrification-induced displacement hinges upon proactive intervention strategies. It is in this context that we build a tree-based machine learning (ML) model to predict neighbourhood change in Sydney. Change, in this context, is proxied by the Socioeconomic Index for Advantage and Disadvantage, in addition to census and other ancillary predictors. Our models predict gentrification from 2011-2016 with a balanced accuracy of 74.7%. Additionally, the use of an additive explanation tool enables individual prediction explanations and advanced feature contribution analysis. Using the ML model, we predict future gentrification in Sydney up to 2021. The predictions confirm that gentrification is expanding outwards from the city centre. A spill-over effect is predicted to the south, west and north-west of former gentrifying hotspots. The findings are expected to provide policymakers with a tool to better forecast where likely areas of gentrification will occur. This future insight can then inform suitable policy interventions and responses in planning for more equitable cities outcomes, specifically for vulnerable communities impacted by gentrification and neighbourhood change.
    Date: 2021–12–16
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:hkc96&r=
  9. By: Kresova, Svetlana; Hess, Sebastian
    Abstract: Drivers of regional milk price differences across Russian regions are difficult to determine due to limited data availability and restrictions on data collection. In this study, official data from Russian regions for the period from 2013 to 2018 was analysed based on 18 predictor variables in order to explain the regional raw milk price. Due to various data-based restrictions, the use of conventional panel regression models was limited and the analysis was therefore performed based on a Random Forest (RF) machine learning algorithm. Model training and hyperparameter optimization was performed on the training data set with time folds cross-validation. The findings of the study showed that the RF algorithm has a good predictive performance in the test data set even with the default RF values. Finally, the RF variable importance showed that income, gross regional product, livestock density, and milk yield are the four most important variables for explaining the variation in regional milk prices.
    Keywords: Agribusiness, International Development, Livestock Production/Industries
    Date: 2021–11–18
    URL: http://d.repec.org/n?u=RePEc:ags:gewi21:317051&r=
  10. By: Verhagen, Mark D.
    Abstract: `All models are wrong, but some are useful' is an often-used mantra, particularly when a model's ability to capture the full complexities of social life is questioned. However, an appropriate functional form is key to valid statistical inference, and under-estimating complexity can lead to biased results. Unfortunately, it is unclear a-priori what the appropriate complexity of a functional form should be. I propose to use methods from machine learning to identify the appropriate complexity of the functional form by i) generating an estimate of the fit potential of the outcome given a set of explanatory variables, ii) comparing this potential with the fit from the functional form originally hypothesized by the researcher, and iii) in case a lack of fit is identified, using recent advances in the field of explainable AI to generate understanding into the missing complexity. I illustrate the approach with a range of simulation and real-world examples.
    Date: 2021–12–01
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:bka76&r=
  11. By: Bruno Mazorra; Victor Adan; Vanesa Daza
    Abstract: Uniswap, like other DEXs, has gained much attention this year because it is a non-custodial and publicly verifiable exchange that allows users to trade digital assets without trusted third parties. However, its simplicity and lack of regulation also makes it easy to execute initial coin offering scams by listing non-valuable tokens. This method of performing scams is known as rug pull, a phenomenon that already existed in traditional finance but has become more relevant in DeFi. Various projects such as [34,37] have contributed to detecting rug pulls in EVM compatible chains. However, the first longitudinal and academic step to detecting and characterizing scam tokens on Uniswap was made in [44]. The authors collected all the transactions related to the Uniswap V2 exchange and proposed a machine learning algorithm to label tokens as scams. However, the algorithm is only valuable for detecting scams accurately after they have been executed. This paper increases their data set by 20K tokens and proposes a new methodology to label tokens as scams. After manually analyzing the data, we devised a theoretical classification of different malicious maneuvers in Uniswap protocol. We propose various machine-learning-based algorithms with new relevant features related to the token propagation and smart contract heuristics to detect potential rug pulls before they occur. In general, the models proposed achieved similar results. The best model obtained an accuracy of 0.9936, recall of 0.9540, and precision of 0.9838 in distinguishing non-malicious tokens from scams prior to the malicious maneuver.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.07220&r=
  12. By: Emerson Melo (Indiana University, Bloomington)
    Abstract: This paper studies the Random Utility Model (RUM) in environments where the decision maker is imperfectly informed about the payoffs associated to each of the alternatives he faces. By embedding the RUM into an online decision problem, we make four contributions. First, we propose a gradient-based learning algorithm and show that a large class of RUMs are Hannan consistent (Hannan [1957]); that is, the average difference between the expected payoffs generated by a RUM and that of the best fixed policy in hindsight goes to zero as the number of periods increase. Second, we show that the class of Generalized Extreme Value (GEV) models can be implemented with our learning algorithm. Examples in the GEV class include the Nested Logit, Ordered, and Product Differentiation models among many others. Third, we show that our gradient-based algorithm is the dual, in a convex analysis sense, of the Follow the Regularized Leader (FTRL) algorithm, which is widely used in the Machine Learning literature. Finally, we discuss how our approach can incorporate recency bias and be used to implement prediction markets in general environments.javascript:void(0);
    Keywords: Random utility models, Multinomial Logit Model, Generalized Nested Logit models, GEV class, Online optimization, Online learning, Hannan consistency, no-regret learning
    Date: 2021–08
    URL: http://d.repec.org/n?u=RePEc:inu:caeprp:2022003&r=
  13. By: Raphael Koster; Jan Balaguer; Andrea Tacchetti; Ari Weinstein; Tina Zhu; Oliver Hauser; Duncan Williams; Lucy Campbell-Gillingham; Phoebe Thacker; Matthew Botvinick; Christopher Summerfield
    Abstract: Building artificial intelligence (AI) that aligns with human values is an unsolved problem. Here, we developed a human-in-the-loop research pipeline called Democratic AI, in which reinforcement learning is used to design a social mechanism that humans prefer by majority. A large group of humans played an online investment game that involved deciding whether to keep a monetary endowment or to share it with others for collective benefit. Shared revenue was returned to players under two different redistribution mechanisms, one designed by the AI and the other by humans. The AI discovered a mechanism that redressed initial wealth imbalance, sanctioned free riders, and successfully won the majority vote. By optimizing for human preferences, Democratic AI may be a promising method for value-aligned policy innovation.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.11441&r=
  14. By: Charlson, G.
    Abstract: A platform holds information on the demographics of its users and wants maximise total surplus. The data generates a probability over which of two products a buyer prefers, with different data segmentations being more or less informative. The platform reveals segmentations of the data to two firms, one popular and one niche, preferring to reveal no information than completely revealing the consumer's type for certain. The platform can improve profits by revealing to both firms a segmentation where the niche firm is relatively popular, but still less popular than the other firm, potentially doing even better by revealing information asymmetrically. The platform has an incentive to provide more granular data in markets in which the niche firm is particularly unpopular or in which broad demographic categories are not particularly revelatory of type, suggesting that the profit associated with big data techniques differs depending on market characteristics.
    Keywords: Strategic interaction, network games, interventions, industrial organisation, platforms, hypergraphs
    JEL: D40 L10 L40
    Date: 2021–08–19
    URL: http://d.repec.org/n?u=RePEc:cam:camjip:2104&r=
  15. By: Daniel Levy (Department of Economics, Bar-Ilan University, Israel; Department of Economics, Emory University, US; ICEA, Wilfrid Laurier University, Canada; Rimini Centre for Economic Analysis; ISET, TSU, Georgia); Tamir Mayer (Graduate School of Business Administration, Bar-Ilan University, Israel); Alon Raviv (Graduate School of Business Administration, Bar-Ilan University, Israel)
    Abstract: We study the economics and finance scholars' reaction to the 2008 financial crisis using machine learning language analyses methods of Latent Dirichlet Allocation and dynamic topic modelling algorithms, to analyze the texts of 14,270 NBER working papers covering the 1999–2016 period. We find that academic scholars as a group were insufficiently engaged in crises' studies before 2008. As the crisis unraveled, however, they switched their focus to studying the crisis, its causes, and consequences. Thus, the scholars were “slow-to-see,” but they were “fast-to-act.” Their initial response to the ongoing Covid-19 crisis is consistent with these conclusions.
    Keywords: Financial crisis, Economic Crisis, Great recession, NBER working papers, LDA textual analysis, Topic modeling, Dynamic Topic Modeling, Machine learning
    JEL: E32 E44 E50 F30 G01 G20
    Date: 2022–02
    URL: http://d.repec.org/n?u=RePEc:rim:rimwps:22-04&r=
  16. By: Körtner, John (University of Lausanne); Bonoli, Giuliano
    Abstract: With the growing availability of digital administrative data and the recent advances in machine learning, the use of predictive algorithms in the delivery of labour market policy is becoming more prevalent. In public employment services (PES), predictive algorithms are used to support the classification of jobseekers based on their risk of long-term unem- ployment (profiling), the selection of beneficial active labour market programs (targeting), and the matching of jobseekers to suitable job opportunities (matching). In this chapter, we offer a conceptual introduction to the applications of predictive algorithms for the different functions PES have to fulfil and review the history of their use up to the current state of the practice. In addition, we discuss two issues that are inherent to the use of predictive algorithms: algorithmic fairness concerns and the importance of considering how caseworkers will interact with algorithmic systems and make decisions based on their predictions.
    Date: 2021–12–16
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:j7r8y&r=
  17. By: Ihsan Chaoubi; Camille Besse; H\'el\`ene Cossette; Marie-Pier C\^ot\'e
    Abstract: Detailed information about individual claims are completely ignored when insurance claims data are aggregated and structured in development triangles for loss reserving. In the hope of extracting predictive power from the individual claims characteristics, researchers have recently proposed to move away from these macro-level methods in favor of micro-level loss reserving approaches. We introduce a discrete-time individual reserving framework incorporating granular information in a deep learning approach named Long Short-Term Memory (LSTM) neural network. At each time period, the network has two tasks: first, classifying whether there is a payment or a recovery, and second, predicting the corresponding non-zero amount, if any. We illustrate the estimation procedure on a simulated and a real general insurance dataset. We compare our approach with the chain-ladder aggregate method using the predictive outstanding loss estimates and their actual values. Based on a generalized Pareto model for excess payments over a threshold, we adjust the LSTM reserve prediction to account for extreme payments.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.13267&r=
  18. By: Plantinga, Paul
    Abstract: The digitalisation of public services is implicated in fundamental changes to how civil servants make decisions and exercise discretion. Most significant has been a shift in responsibility away from ‘street-level bureaucrats’ to ‘system-level bureaucrats’; a technology-savvy community of officials, consultants and private enterprises involved in the design of information technology systems and associated rules. The relatively recent inclusion of artificial intelligence (AI) and data-driven algorithms raises new questions about the conflation of policy formulation and system development activities, but also intensifies concerns about the epistemic dependence and policy alienation of public officials. African public administrations are in an especially vulnerable position with respect to the adoption of AI, and so this chapter seeks to synthesise lessons from previous digital implementations on the continent, and considers the implications for AI use. Four broad considerations emerge from the review of literature: Integrity of recommendations provided by decision-support systems, including how they are influenced by local organisational practices and the reliability of underlying infrastructures; Inclusive decision-making that balances the (assumed) objectivity of data-driven algorithms and the influence of different stakeholder groups; Exception and accountability in how digital and AI platforms are funded, developed, implemented and used; and a Complete understanding of people and events through the integration of traditionally dispersed data sources and systems, and how policy actors seek to mitigate the risks associated with this aspiration.
    Date: 2022–01–10
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:2r98w&r=
  19. By: OECD
    Abstract: As artificial intelligence (AI) integrates all sectors at a rapid pace, different AI systems bring different benefits and risks. In comparing virtual assistants, self-driving vehicles and video recommendations for children, it is easy to see that the benefits and risks of each are very different. Their specificities will require different approaches to policy making and governance. To help policy makers, regulators, legislators and others characterise AI systems deployed in specific contexts, the OECD has developed a user-friendly tool to evaluate AI systems from a policy perspective. It can be applied to the widest range of AI systems across the following dimensions: People & Planet; Economic Context; Data & Input; AI model; and Task & Output. Each of the framework's dimensions has a subset of properties and attributes to define and assess policy implications and to guide an innovative and trustworthy approach to AI as outlined in the OECD AI Principles.
    Date: 2022–02–22
    URL: http://d.repec.org/n?u=RePEc:oec:stiaab:323-en&r=
  20. By: Rafael R. S. Guimaraes
    Abstract: Limited datasets and complex nonlinear relationships are among the challenges that may emerge when applying econometrics to macroeconomic problems. This research proposes deep learning as an approach to transfer learning in the former case and to map relationships between variables in the latter case. Although macroeconomists already apply transfer learning when assuming a given a priori distribution in a Bayesian context, estimating a structural VAR with signal restriction and calibrating parameters based on results observed in other models, to name a few examples, advance in a more systematic transfer learning strategy in applied macroeconomics is the innovation we are introducing. We explore the proposed strategy empirically, showing that data from different but related domains, a type of transfer learning, helps identify the business cycle phases when there is no business cycle dating committee and to quick estimate a economic-based output gap. Next, since deep learning methods are a way of learning representations, those that are formed by the composition of multiple non-linear transformations, to yield more abstract representations, we apply deep learning for mapping low-frequency from high-frequency variables. The results obtained show the suitability of deep learning models applied to macroeconomic problems. First, models learned to classify United States business cycles correctly. Then, applying transfer learning, they were able to identify the business cycles of out-of-sample Brazilian and European data. Along the same lines, the models learned to estimate the output gap based on the U.S. data and obtained good performance when faced with Brazilian data. Additionally, deep learning proved adequate for mapping low-frequency variables from high-frequency data to interpolate, distribute, and extrapolate time series by related series.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.13380&r=
  21. By: Morteza Taiebat; Elham Amini; Ming Xu
    Abstract: Ride-hailing is rapidly changing urban and personal transportation. Ride sharing or pooling is important to mitigate negative externalities of ride-hailing such as increased congestion and environmental impacts. However, there lacks empirical evidence on what affect trip-level sharing behavior in ride-hailing. Using a novel dataset from all ride-hailing trips in Chicago in 2019, we show that the willingness of riders to request a shared ride has monotonically decreased from 27.0% to 12.8% throughout the year, while the trip volume and mileage have remained statistically unchanged. We find that the decline in sharing preference is due to an increased per-mile costs of shared trips and shifting shorter trips to solo. Using ensemble machine learning models, we find that the travel impedance variables (trip cost, distance, and duration) collectively contribute to 95% and 91% of the predictive power in determining whether a trip is requested to share and whether it is successfully shared, respectively. Spatial and temporal attributes, sociodemographic, built environment, and transit supply variables do not entail predictive power at the trip level in presence of these travel impedance variables. This implies that pricing signals are most effective to encourage riders to share their rides. Our findings shed light on sharing behavior in ride-hailing trips and can help devise strategies that increase shared ride-hailing, especially as the demand recovers from pandemic.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.12696&r=
  22. By: Gabriel Okasa
    Abstract: Estimation of causal effects using machine learning methods has become an active research field in econometrics. In this paper, we study the finite sample performance of meta-learners for estimation of heterogeneous treatment effects under the usage of sample-splitting and cross-fitting to reduce the overfitting bias. In both synthetic and semi-synthetic simulations we find that the performance of the meta-learners in finite samples greatly depends on the estimation procedure. The results imply that sample-splitting and cross-fitting are beneficial in large samples for bias reduction and efficiency of the meta-learners, respectively, whereas full-sample estimation is preferable in small samples. Furthermore, we derive practical recommendations for application of specific meta-learners in empirical studies depending on particular data characteristics such as treatment shares and sample size.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.12692&r=
  23. By: Krzysztof Rusek; Piotr Bory{\l}o; Piotr Jaglarz; Fabien Geyer; Albert Cabellos; Piotr Cho{\l}da
    Abstract: We propose a graph neural network (GNN)-based method to predict the distribution of penalties induced by outages in communication networks, where connections are protected by resources shared between working and backup paths. The GNN-based algorithm is trained only with random graphs generated with the Barab\'asi-Albert model. Even though, the obtained test results show that we can precisely model the penalties in a wide range of various existing topologies. GNNs eliminate the need to simulate complex outage scenarios for the network topologies under study. In practice, the whole design operation is limited by 4ms on modern hardware. This way, we can gain as much as over 12,000 times in the speed improvement.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.12263&r=
  24. By: Akey, Pat; Grégoire, Vincent (HEC Montréal); Martineau, Charles (University of Toronto)
    Abstract: From 2010 to 2015, a group of traders illegally accessed earnings information before their public release by hacking several newswire services. We use this scheme as a natural experiment to investigate how informed investors select among private signals and how efficiently financial markets incorporate private information contained in trades into prices. We construct a measure of qualitative information using machine learning and find that the hackers traded on both qualitative and quantitative signals. The hackers’ trading caused 15% more of the earnings news to be incorporated in prices before their public release. Liquidity providers responded to the hackers’ trades by widening spreads.
    Date: 2021–12–01
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:qe6tu&r=
  25. By: Beck, Ben; Winters, Meghan; Nelson, Trisalyn; Pettit, Christopher; Saberi, Meead; Thompson, Jason; Seneviratne, Sachith; Nice, Kerry A; Zarpelon-Leao, Simone; Stevenson, Mark
    Abstract: Background: Extensive research has been conducted exploring associations of built environment characteristics and biking. However, these approaches have often lacked the ability to understanding the interactions of built environment, population and bicycle ridership. To overcome these limitations, this study aimed to develop novel urban biking typologies using unsupervised machine learning methods. Methods: We conducted a retrospective analysis of travel surveys, bicycle infrastructure and population and land use characteristics in the Greater Melbourne region, Australia. To develop the urban biking typology, we used a k-medoids clustering method. Results: Analyses revealed 5 clusters. We highlight areas with high bicycle network density and a high proportion of trips made by bike (Cluster 1; reflecting 12% of the population of Greater Melbourne, but 57% of all bike trips) and areas with high off-road and on-road bicycle network length, but a low proportion of trips made by bike (Cluster 4, reflecting 23% of the population of Greater Melbourne and 13% of all bike trips). Conclusion: Our novel approach to developing an urban biking typology enabled the exploration of the interaction of bicycle ridership, bicycle network, population and land use characteristics. Such approaches are important in advancing our understanding of bicycling behaviour, but further research is required to understand the generalisability of these findings to other settings.
    Date: 2021–11–25
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:8w7bg&r=
  26. By: Jiayue Xu
    Abstract: The price of carbon emission rights play a crucial role in carbon trading markets. Therefore, accurate prediction of the price is critical. Taking the Shanghai pilot market as an example, this paper attempted to design a carbon emission purchasing strategy for enterprises, and establish a carbon emission price prediction model to help them reduce the purchasing cost. To make predictions more precise, we built a hybrid deep learning model by embedding Generalized Autoregressive Conditional Heteroskedastic (GARCH) into the Gate Recurrent Unit (GRU) model, and compared the performance with those of other models. Then, based on the Iceberg Order Theory and the predicted price, we proposed the purchasing strategy of carbon emission rights. As a result, the prediction errors of the GARCH-GRU model with a 5-day sliding time window were the minimum values of all six models. And in the simulation, the purchasing strategy based on the GARCH-GRU model was executed with the least cost as well. The carbon emission purchasing strategy constructed by the hybrid deep learning method can accurately send out timing signals, and help enterprises reduce the purchasing cost of carbon emission permits.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.13235&r=
  27. By: Sébastien Marchand (CERDI - Centre d'Études et de Recherches sur le Développement International - CNRS - Centre National de la Recherche Scientifique - UCA - Université Clermont Auvergne); Damien Cubizol (CERDI - Centre d'Études et de Recherches sur le Développement International - CNRS - Centre National de la Recherche Scientifique - UCA - Université Clermont Auvergne); Elda Nasho Ah-Pine (CleRMa - Clermont Recherche Management - ESC Clermont-Ferrand - École Supérieure de Commerce (ESC) - Clermont-Ferrand - UCA - Université Clermont Auvergne); Huanxiu Guo (The Institute of Economics and Finance - Nanjing Audit University)
    Abstract: In 2013, the Chinese central government launched a war on air pollution. As a new and major source of information, the Internet plays an important role in diffusing environmental news emotion and shaping people's perceptions and emotions regarding the pollution. How could the government make use of the environmental news emotion as an informal regulation of pollution? The paper investigates the causal relationship between web news emotion (defined by the emotional tone of web news) and air pollution (SO2, NO2, PM2.5 and PM10) by exploiting the central government's war on air pollution. We combine daily monitoring data of air pollution at different levels (cities and counties, respectively the second and third administrative levels in China) with the GDELT database that allows us to have information on Chinese web news media (e.g. emotional tone of web news on air pollution). We find that a decrease of the emotional tone in web news (i.e. more negative emotions in the articles) can help to reduce air pollution at both city and county level. We attribute this effect to the context of China's war on air pollution in which the government makes use of the environmental news emotion as an informal regulation of pollution.
    Keywords: News emotion,Air pollution,Mass media,The internet,Government,China
    Date: 2021–11
    URL: http://d.repec.org/n?u=RePEc:hal:cdiwps:hal-03448375&r=
  28. By: Christian A. Scholbeck; Giuseppe Casalicchio; Christoph Molnar; Bernd Bischl; Christian Heumann
    Abstract: Beta coefficients for linear regression models represent the ideal form of an interpretable feature effect. However, for non-linear models and especially generalized linear models, the estimated coefficients cannot be interpreted as a direct feature effect on the predicted outcome. Hence, marginal effects are typically used as approximations for feature effects, either in the shape of derivatives of the prediction function or forward differences in prediction due to a change in a feature value. While marginal effects are commonly used in many scientific fields, they have not yet been adopted as a model-agnostic interpretation method for machine learning models. This may stem from their inflexibility as a univariate feature effect and their inability to deal with the non-linearities found in black box models. We introduce a new class of marginal effects termed forward marginal effects. We argue to abandon derivatives in favor of better-interpretable forward differences. Furthermore, we generalize marginal effects based on forward differences to multivariate changes in feature values. To account for the non-linearity of prediction functions, we introduce a non-linearity measure for marginal effects. We argue against summarizing feature effects of a non-linear prediction function in a single metric such as the average marginal effect. Instead, we propose to partition the feature space to compute conditional average marginal effects on feature subspaces, which serve as conditional feature effect estimates.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.08837&r=
  29. By: Ivan Letteri; Giuseppe Della Penna; Giovanni De Gasperis; Abeer Dyoub
    Abstract: Stock market forecasting is a lucrative field of interest with promising profits but not without its difficulties and for some people could be even causes of failure. Financial markets by their nature are complex, non-linear and chaotic, which implies that accurately predicting the prices of assets that are part of it becomes very complicated. In this paper we propose a stock trading system having as main core the feed-forward deep neural networks (DNN) to predict the price for the next 30 days of open market, of the shares issued by Abercrombie & Fitch Co. (ANF) in the stock market of the New York Stock Exchange (NYSE). The system we have elaborated calculates the most effective technical indicator, applying it to the predictions computed by the DNNs, for generating trades. The results showed an increase in values such as Expectancy Ratio of 2.112% of profitable trades with Sharpe, Sortino, and Calmar Ratios of 2.194, 3.340, and 12.403 respectively. As a verification, we adopted a backtracking simulation module in our system, which maps trades to actual test data consisting of the last 30 days of open market on the ANF asset. Overall, the results were promising bringing a total profit factor of 3.2% in just one month from a very modest budget of $100. This was possible because the system reduced the number of trades by choosing the most effective and efficient trades, saving on commissions and slippage costs.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.12286&r=
  30. By: Peng Li; Arim Park; Soohyun Cho; Yao Zhao
    Abstract: Many companies nowadays offer compensation to online reviews (called compensated reviews), expecting to increase the volume of their non-compensated reviews and overall rating. Does this strategy work? On what subjects or topics does this strategy work the best? These questions have still not been answered in the literature but draw substantial interest from the industry. In this paper, we study the effect of compensated reviews on non-compensated reviews by utilizing online reviews on 1,240 auto shipping companies over a ten-year period from a transportation website. Because some online reviews have missing information on their compensation status, we first develop a classification algorithm to differentiate compensated reviews from non-compensated reviews by leveraging a machine learning-based identification process, drawing upon the unique features of the compensated reviews. From the classification results, we empirically investigate the effects of compensated reviews on non-compensated. Our results indicate that the number of compensated reviews does indeed increase the number of non-compensated reviews. In addition, the ratings of compensated reviews positively affect the ratings of non-compensated reviews. Moreover, if the compensated reviews feature the topic or subject of a car shipping function, the positive effect of compensated reviews on non-compensated ones is the strongest. Besides methodological contributions in text classification and empirical modeling, our study provides empirical evidence on how to prove the effectiveness of compensated online reviews in terms of improving the platform's overall online reviews and ratings. Also, it suggests a guideline for utilizing compensated reviews to their full strength, that is, with regard to featuring certain topics or subjects in these reviews to achieve the best outcome.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.11051&r=
  31. By: Effat Ara Easmin Lucky; Md. Mahadi Hasan Sany; Mumenunnesa Keya; Md. Moshiur Rahaman; Umme Habiba Happy; Sharun Akter Khushbu; Md. Arid Hasan
    Abstract: By trade we usually mean the exchange of goods between states and countries. International trade acts as a barometer of the economic prosperity index and every country is overly dependent on resources, so international trade is essential. Trade is significant to the global health crisis, saving lives and livelihoods. By collecting the dataset called "Effects of COVID19 on trade" from the state website NZ Tatauranga Aotearoa, we have developed a sustainable prediction process on the effects of COVID-19 in world trade using a deep learning model. In the research, we have given a 180-day trade forecast where the ups and downs of daily imports and exports have been accurately predicted in the Covid-19 period. In order to fulfill this prediction, we have taken data from 1st January 2015 to 30th May 2021 for all countries, all commodities, and all transport systems and have recovered what the world trade situation will be in the next 180 days during the Covid-19 period. The deep learning method has received equal attention from both investors and researchers in the field of in-depth observation. This study predicts global trade using the Long-Short Term Memory. Time series analysis can be useful to see how a given asset, security, or economy changes over time. Time series analysis plays an important role in past analysis to get different predictions of the future and it can be observed that some factors affect a particular variable from period to period. Through the time series it is possible to observe how various economic changes or trade effects change over time. By reviewing these changes, one can be aware of the steps to be taken in the future and a country can be more careful in terms of imports and exports accordingly. From our time series analysis, it can be said that the LSTM model has given a very gracious thought of the future world import and export situation in terms of trade.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.12291&r=
  32. By: Beatrice Acciaio; Anastasis Kratsios; Gudmund Pammer
    Abstract: We introduce a universal class of geometric deep learning models, called metric hypertransformers (MHTs), capable of approximating any adapted map $F:\mathscr{X}^{\mathbb{Z}}\rightarrow \mathscr{Y}^{\mathbb{Z}}$ with approximable complexity, where $\mathscr{X}\subseteq \mathbb{R}^d$ and $\mathscr{Y}$ is any suitable metric space, and $\mathscr{X}^{\mathbb{Z}}$ (resp. $\mathscr{Y}^{\mathbb{Z}}$) capture all discrete-time paths on $\mathscr{X}$ (resp. $\mathscr{Y}$). Suitable spaces $\mathscr{Y}$ include various (adapted) Wasserstein spaces, all Fr\'{e}chet spaces admitting a Schauder basis, and a variety of Riemannian manifolds arising from information geometry. Even in the static case, where $f:\mathscr{X}\rightarrow \mathscr{Y}$ is a H\"{o}lder map, our results provide the first (quantitative) universal approximation theorem compatible with any such $\mathscr{X}$ and $\mathscr{Y}$. Our universal approximation theorems are quantitative, and they depend on the regularity of $F$, the choice of activation function, the metric entropy and diameter of $\mathscr{X}$, and on the regularity of the compact set of paths whereon the approximation is performed. Our guiding examples originate from mathematical finance. Notably, the MHT models introduced here are able to approximate a broad range of stochastic processes' kernels, including solutions to SDEs, many processes with arbitrarily long memory, and functions mapping sequential data to sequences of forward rate curves.
    Date: 2022–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.13094&r=
  33. By: Shuo Sun; Rundong Wang; Xu He; Junlei Zhu; Jian Li; Bo An
    Abstract: Reinforcement learning (RL) techniques have shown great success in quantitative investment tasks, such as portfolio management and algorithmic trading. Especially, intraday trading is one of the most profitable and risky tasks because of the intraday behaviors of the financial market that reflect billions of rapidly fluctuating values. However, it is hard to apply existing RL methods to intraday trading due to the following three limitations: 1) overlooking micro-level market information (e.g., limit order book); 2) only focusing on local price fluctuation and failing to capture the overall trend of the whole trading day; 3) neglecting the impact of market risk. To tackle these limitations, we propose DeepScalper, a deep reinforcement learning framework for intraday trading. Specifically, we adopt an encoder-decoder architecture to learn robust market embedding incorporating both macro-level and micro-level market information. Moreover, a novel hindsight reward function is designed to provide the agent a long-term horizon for capturing the overall price trend. In addition, we propose a risk-aware auxiliary task by predicting future volatility, which helps the agent take market risk into consideration while maximizing profit. Finally, extensive experiments on two stock index futures and four treasury bond futures demonstrate that DeepScalper achieves significant improvement against many state-of-the-art approaches.
    Date: 2021–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2201.09058&r=

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.