|
on Big Data |
By: | Kollár, Aladár |
Abstract: | In today's modern world, sports generate a great deal of data about each athlete, team, event, and season. Many people, from spectators to bettors, find it fascinating to predict the outcomes of sporting events. With the available data, the sports betting industry is turning to Artificial Intelligence. Working with a great deal of data and information is needed in sports betting all over the world. Artificial intelligence and machine learning are assisting in the prediction of sporting trends. The true influence of technology is felt as it offers these observations in real-time, which can have an impact on important factors in betting. An artificial neural network is made up of several small, interconnected processors called neurons, which are similar to the biological neurons in the brain. In ANN framework, MLP, the most applicable NN algorithm, are generally selected as the best model for predicting the outcomes of football matches. This review also discussed another common technique of modern intelligent technique, namely Support Vector Machines (SVM). Lastly, we also discussed the Markov chain to predict the result of a sport. Markov chain is the sequence or chain from which the next sample from this state space is sampled. |
Keywords: | Artificial Intelligence; ANN; Betting; sports; SVM; Markov chain |
JEL: | C5 C55 C6 |
Date: | 2021–03–21 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:106821&r=all |
By: | Hanjo Odendaal (Department of Economics, Stellenbosch University) |
Abstract: | This paper aims to offer an alternative to the manually labour intensive process of constructing a domain specific lexicon or dictionary through the operationalization of subjective information processing. This paper builds on current empirical literature by (a) constructing a domain specific dictionary for various economic confidence indices, (b) introducing a novel weighting schema of text tokens that account for time dependence; and (c) operationalising subjective information processing of text data using machine learning. The results show that sentiment indices constructed from machine generated dictionaries have a better fit with multiple indicators of economic activity than @loughran2011liability's manually constructed dictionary. Analysis shows a lower RMSE for the domain specific dictionaries in a five year holdout sample period from 2012 to 2017. The results also justify the time series weighting design used to overcome the p>>n problem, commonly found when working with economic time series and text data. |
Keywords: | Sentometrics, Machine learning, Domain-specific dictionaries |
JEL: | C32 C45 C53 C55 |
Date: | 2021 |
URL: | http://d.repec.org/n?u=RePEc:sza:wpaper:wpapers366&r=all |
By: | Mukul Jaggi; Priyanka Mandal; Shreya Narang; Usman Naseem; Matloob Khushi |
Abstract: | Stock price prediction can be made more efficient by considering the price fluctuations and understanding the sentiments of people. A limited number of models understand financial jargon or have labelled datasets concerning stock price change. To overcome this challenge, we introduced FinALBERT, an ALBERT based model trained to handle financial domain text classification tasks by labelling Stocktwits text data based on stock price change. We collected Stocktwits data for over ten years for 25 different companies, including the major five FAANG (Facebook, Amazon, Apple, Netflix, Google). These datasets were labelled with three labelling techniques based on stock price changes. Our proposed model FinALBERT is fine-tuned with these labels to achieve optimal results. We experimented with the labelled dataset by training it on traditional machine learning, BERT, and FinBERT models, which helped us understand how these labels behaved with different model architectures. Our labelling method competitive advantage is that it can help analyse the historical data effectively, and the mathematical function can be easily customised to predict stock movement. |
Date: | 2021–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2103.16388&r=all |
By: | Hannes Mueller; Christopher Rauh |
Abstract: | There is a growing interest in prevention in several policy areas and this provides a strong motivation for an improved integration of forecasting with machine learning into models of decision making. In this article we propose a framework to tackle conflict prevention. A key problem of conflict forecasting for prevention is that predicting the start of conflict in previously peaceful countries needs to overcome a low baseline risk. To make progress in this hard problem this project combines a newspaper-text corpus of more than 4 million articles with unsupervised and supervised machine learning. The output of the forecast model is then integrated into a simple static framework in which a decision maker decides on the optimal number of interventions to minimize the total cost of conflict and intervention. This exercise highlights the potential cost savings of prevention for which reliable forecasts are a prerequisite. |
Keywords: | armed conflict, forecasting, machine learning, newspaper text, random forest, topic models |
JEL: | O11 O43 |
Date: | 2021–03 |
URL: | http://d.repec.org/n?u=RePEc:bge:wpaper:1244&r=all |
By: | Shan Huang; Michael Allan Ribers; Hannes Ullrich |
Abstract: | Large-scale data show promise to provide efficiency gains through individualized risk predictions in many business and policy settings. Yet, assessments of the degree of data-enabled efficiency improvements remain scarce. We quantify the value of the availability of a variety of data combinations for tackling the policy problem of curbing antibiotic resistance, where the reduction of inefficient antibiotic use requires improved diagnostic prediction. Fousing on antibiotic prescribing for suspected urinary tract infections in primary care in Denmark, we link individual-level administrative data with microbiological laboratory test outcomes to train a machine learning algorithm predicting bacterial test results. For various data combinations, we assess out of sample prediction quality and efficiency improvements due to prediction-based prescription policies. The largest gains in prediction quality can be achieved using simple characteristics such as patient age and gender or patients’ health care data. However, additional patient background data lead to further incremental policy improvements even though gains in prediction quality are small. Our findings suggest that evaluating prediction quality against the ground truth only may not be sufficient to quantify the potential for policy improvements. |
Keywords: | Prediction policy; data combination; machine learning; antibiotic prescribing |
JEL: | C10 C55 I11 I18 Q28 |
Date: | 2021 |
URL: | http://d.repec.org/n?u=RePEc:diw:diwwpp:dp1939&r=all |
By: | Aleksy Klimowicz (Faculty of Economic Sciences, University of Warsaw); Krzysztof Spirzewski (Faculty of Economic Sciences, University of Warsaw) |
Abstract: | Numerous applications of AI are found in the banking sector. Starting from front-office, enhancing customer recognition and personalized services, continuing in middle-office with automated fraud-detection systems, ending with back-office and internal processes automatization. In this paper we provide comprehensive information on the phenomenon of peer-to-peer lending in the modern view of alternative finance and crowdfunding from several perspectives. The aim of this research is to explore the phenomenon of peer-to-peer lending market model. We apply and check the suitability and effectiveness of credit scorecards in the marketplace lending along with determining the appropriate cut-off point. We conducted this research by exploring recent studies and open-source data on marketplace lending. The scorecard development is based on the P2P loans open dataset that contains repayments record along with both hard and soft features of each loan. The quantitative part consists of applying a machine learning algorithm in building a credit scorecard, namely logistic regression. |
Keywords: | artificial intelligence, peer-to-peer lending, credit risk assessment, credit scorecards, logistic regression, machine learning |
JEL: | G21 C25 |
Date: | 2021 |
URL: | http://d.repec.org/n?u=RePEc:war:wpaper:2021-04&r=all |
By: | Martin Beraja; David Y. Yang; Noam Yuchtman |
Abstract: | Artificial intelligence (AI) innovation is data-intensive. States have historically collected large amounts of data, which is now being used by AI firms. Gathering comprehensive information on firms and government procurement contracts in China's facial recognition AI industry, we first study how government data shapes AI innovation. We find evidence of a precise mechanism: because data is sharable across uses, economies of scope arise. Firms awarded public security AI contracts providing access to more government data produce more software for both government and commercial purposes. In a directed technical change model incorporating this mechanism, we then study the trade-offs presented by states' AI procurement and data pro-vision policies. Surveillance states' demand for AI may incidentally promote growth, but distort innovation, crowd-out resources, and infringe on civil liberties. Government data provision may be justified when economies of scope are strong and citizens' privacy concerns are limited. |
Keywords: | data, innovation, artificial intelligence, China, economies of scope, directed technical change, industrial policy, privacy, surveillance |
JEL: | O30 P00 E00 L5 L63 O25 O40 |
Date: | 2021–03 |
URL: | http://d.repec.org/n?u=RePEc:cep:cepdps:dp1755&r=all |
By: | Yiyan Huang; Cheuk Hang Leung; Xing Yan; Qi Wu |
Abstract: | Most existing studies on the double/debiased machine learning method concentrate on the causal parameter estimation recovering from the first-order orthogonal score function. In this paper, we will construct the $k^{\mathrm{th}}$-order orthogonal score function for estimating the average treatment effect (ATE) and present an algorithm that enables us to obtain the debiased estimator recovered from the score function. Such a higher-order orthogonal estimator is more robust to the misspecification of the propensity score than the first-order one does. Besides, it has the merit of being applicable with many machine learning methodologies such as Lasso, Random Forests, Neural Nets, etc. We also undergo comprehensive experiments to test the power of the estimator we construct from the score function using both the simulated datasets and the real datasets. |
Date: | 2021–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2103.11869&r=all |
By: | Jon Ellingsen; Vegard H. Larsen; Leif Anders Thorsrud |
Abstract: | Using a unique dataset of 22.5 million news articles from the Dow Jones Newswires Archive, we perform an in depth real-time out-of-sample forecasting comparison study with one of the most widely used data sets in the newer forecasting literature, namely the FRED-MD dataset. Focusing on U.S. GDP, consumption and investment growth, our results suggest that the news data contains information not captured by the hard economic indicators, and that the news-based data are particularly informative for forecasting consumption developments. |
Keywords: | forecasting, real-time, machine learning, news, text data |
JEL: | C53 C55 E27 E37 |
Date: | 2020–10–08 |
URL: | http://d.repec.org/n?u=RePEc:bno:worpap:2020_14&r=all |
By: | Victor DeMiguel; Javier Gil-Bazo; Francisco J. Nogales; André A. P. Santos |
Abstract: | Identifying outperforming mutual funds ex-ante is a notoriously difficult task. We use machine learning methods to exploit the predictive ability of a large set of mutual fund characteristics that are readily available to investors. Using data on US equity funds in the 1980-2018 period, the methods allow us to construct portfolios of funds that earn positive and significant out-of-sample risk-adjusted after-fee returns as high as 4.2% per year. We further show that such outstanding performance is the joint outcome of both exploiting the information contained in multiple fund characteristics and allowing for flexibility in the relationship between predictors and fund performance. Our results confirm that even retail investors can benefit from investing in actively managed funds. However, we also find that the performance of all our portfolios has declined over time, consistent with increased competition in the asset market and diseconomies of scale at the industry level. |
Keywords: | mutual fund performance, performance predictability, active management, machine learning, elastic net, random forests, gradient boosting |
Date: | 2021–03 |
URL: | http://d.repec.org/n?u=RePEc:bge:wpaper:1245&r=all |
By: | Victor DeMiguel; Javier Gil-Bazo; Francisco J. Nogales; André A. P. Santos |
Abstract: | Identifying outperforming mutual funds ex-ante is a notoriously difficult task. We use machine learning methods to exploit the predictive ability of a large set of mutual fund characteristics that are readily available to investors. Using data on US equity funds in the 1980-2018 period, the methods allow us to construct portfolios of funds that earn positive and significant out-of-sample risk-adjusted after-fee returns as high as 4.2% per year. We further show that such outstanding performance is the joint outcome of both exploiting the information contained in multiple fund characteristics and allowing for flexibility in the relationship between predictors and fund performance. Our results confirm that even retail investors can benefit from investing in actively managed funds. However, we also find that the performance of all our portfolios has declined over time, consistent with increased competition in the asset market and diseconomies of scale at the industry level. |
Keywords: | Mutual fund performance, performance predictability, active management, machine learning, elastic net, random forests, gradient boosting |
Date: | 2021–03 |
URL: | http://d.repec.org/n?u=RePEc:upf:upfgen:1772&r=all |
By: | Artur Sokolovsky; Luca Arnaboldi; Jaume Bacardit; Thomas Gross |
Abstract: | Financial markets are a source of non-stationary multidimensional time series which has been drawing attention for decades. Each financial instrument has its specific changing over time properties, making their analysis a complex task. Improvement of understanding and development of methods for financial time series analysis is essential for successful operation on financial markets. In this study we propose a volume-based data pre-processing method for making financial time series more suitable for machine learning pipelines. We use a statistical approach for assessing the performance of the method. Namely, we formally state the hypotheses, set up associated classification tasks, compute effect sizes with confidence intervals, and run statistical tests to validate the hypotheses. We additionally assess the trading performance of the proposed method on historical data and compare it to a previously published approach. Our analysis shows that the proposed volume-based method allows successful classification of the financial time series patterns, and also leads to better classification performance than a price action-based method, excelling specifically on more liquid financial instruments. Finally, we propose an approach for obtaining feature interactions directly from tree-based models on example of CatBoost estimator, as well as formally assess the relatedness of the proposed approach and SHAP feature interactions with a positive outcome. |
Date: | 2021–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2103.12419&r=all |
By: | Ariel Neufeld; Julian Sester |
Abstract: | We introduce a novel and highly tractable supervised learning approach based on neural networks that can be applied for the computation of model-free price bounds of, potentially high-dimensional, financial derivatives and for the determination of optimal hedging strategies attaining these bounds. In particular, our methodology allows to train a single neural network offline and then to use it online for the fast determination of model-free price bounds of a whole class of financial derivatives with current market data. We show the applicability of this approach and highlight its accuracy in several examples involving real market data. Further, we show how a neural network can be trained to solve martingale optimal transport problems involving fixed marginal distributions instead of financial market data. |
Date: | 2021–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2103.11435&r=all |
By: | Hans Genberg (Asia School of Business); Özer Karagedikli (South East Asian Central Banks (SEACEN) Research and Training Centre and Centre for Applied Macroeconomic Analysis (CAMA)) |
Abstract: | In this article we review what machine learning might have to offer central banks as an analytical approach to support monetary policy decisions. After describing the central bank’s “problem†and providing a brief introduction to machine learning, we propose to use the gradual adoption of Vector Auto Regression (VAR) methods in central banks to speculate how machine learning models must (will?) evolve to become influential analytical tools supporting central banks’ monetary policy decisions. We argue that VAR methods achieved that status only after they incorporated elements that allowed users to interpret them in terms of structural economic theories. We believe that the same has to be the case for machine learning model. |
Date: | 2021–03 |
URL: | http://d.repec.org/n?u=RePEc:sea:wpaper:wp43&r=all |
By: | Karush Suri; Xiao Qi Shi; Konstantinos Plataniotis; Yuri Lawryshyn |
Abstract: | Advances in Reinforcement Learning (RL) span a wide variety of applications which motivate development in this area. While application tasks serve as suitable benchmarks for real world problems, RL is seldomly used in practical scenarios consisting of abrupt dynamics. This allows one to rethink the problem setup in light of practical challenges. We present Trade Execution using Reinforcement Learning (TradeR) which aims to address two such practical challenges of catastrophy and surprise minimization by formulating trading as a real-world hierarchical RL problem. Through this lens, TradeR makes use of hierarchical RL to execute trade bids on high frequency real market experiences comprising of abrupt price variations during the 2019 fiscal year COVID19 stock market crash. The framework utilizes an energy-based scheme in conjunction with surprise value function for estimating and minimizing surprise. In a large-scale study of 35 stock symbols from the S&P500 index, TradeR demonstrates robustness to abrupt price changes and catastrophic losses while maintaining profitable outcomes. We hope that our work serves as a motivating example for application of RL to practical problems. |
Date: | 2021–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2104.00620&r=all |
By: | Kässi, Otto; Lehdonvirta, Vili; Stephany, Fabian |
Abstract: | An unknown number of people around the world are earning income by working through online labour platforms such as Upwork and Amazon Mechanical Turk. We combine data collected from various sources to build a data-driven assessment of the number of such online workers (also known as online freelancers) globally. Our headline estimate is that there are 163 million freelancer profiles registered on online labour platforms globally. Approximately 19 million of them have obtained work through the platform at least once, and 5 million have completed at least 10 projects or earned at least $1000. These numbers suggest a substantial growth from 2015 in registered worker accounts, but much less growth in amount of work completed by workers. Our results indicate that online freelancing represents a non-trivial segment of labour today, but one that is spread thinly across countries and sectors. |
Date: | 2021–03–24 |
URL: | http://d.repec.org/n?u=RePEc:osf:socarx:78nge&r=all |
By: | Otto K\"assi; Vili Lehdonvirta; Fabian Stephany |
Abstract: | An unknown number of people around the world are earning income by working through online labour platforms such as Upwork and Amazon Mechanical Turk. We combine data collected from various sources to build a data-driven assessment of the number of such online workers (also known as online freelancers) globally. Our headline estimate is that there are 163 million freelancer profiles registered on online labour platforms globally. Approximately 19 million of them have obtained work through the platform at least once, and 5 million have completed at least 10 projects or earned at least $1000. These numbers suggest a substantial growth from 2015 in registered worker accounts, but much less growth in amount of work completed by workers. Our results indicate that online freelancing represents a non-trivial segment of labour today, but one that is spread thinly across countries and sectors. |
Date: | 2021–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2103.12648&r=all |
By: | Elif Semra Ceylan (Ernst & Young); Semih Tumen (TED University) |
Abstract: | The goal of this paper is to estimate the economic cost of conflict in selected Arab countries by using satellite images and geographical information systems (GIS) methods. Specifically, we employ image-processing techniques to generate data proxying intensity of economic activity at country and sub-region levels. The focus is on four countries: Iraq, Libya, Syria, and Yemen. These are the countries that have been most severely affected in various ways by the widespread wave of civil conflict occurred in the MENA region in the aftermath of the Arab Spring. Certain back-of-the-envelope calculations suggest that GDP and main factors of production are nearly halved in those countries. We use data provided by the National Geophysical Data Center of the United States to compare the night-light intensities before and after the conflict in those four countries. The night-light data serve as a proxy for regional economic activity and are widely used to generate credible economic data—mainly in circumstances where official data either do not exist or are not reliable. We construct indices combining the contrast and dispersion of night-lights within fine-grained geographical regions and then report the time series evolution of those indices both at country and sub-region levels. The estimates suggest that the scale and intensity of economic destruction in the region have been unprecedented in recent history and the extent of destruction is the largest in Syria and Yemen among those four conflict-afflicted countries. We also provide additional insights at sub-region level. |
Date: | 2021–02–20 |
URL: | http://d.repec.org/n?u=RePEc:erg:wpaper:1459&r=all |
By: | Q. Wang; Y. Zhou; J. Shen |
Abstract: | This article comes up with an intraday trading strategy under T+1 using Markowitz optimization and Multilayer Perceptron (MLP) with published stock data obtained from the Shenzhen Stock Exchange and Shanghai Stock Exchange. The empirical results reveal the profitability of Markowitz portfolio optimization and validate the intraday stock price prediction using MLP. The findings further combine the Markowitz optimization, an MLP with the trading strategy, to clarify this strategy's feasibility. |
Date: | 2021–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2103.13507&r=all |