nep-big New Economics Papers
on Big Data
Issue of 2019‒08‒12
sixteen papers chosen by
Tom Coupé
University of Canterbury

  1. Artificial Intelligence, Data, Ethics: An Holistic Approach for Risks and Regulation By Alexis Bogroff; Dominique Guegan
  2. The Impact of Increasing Returns on Knowledge and Big Data: From Adam Smith and Allyn Young to the Age of Machine Learning and Digital Platforms By Yao-Su Hu
  3. An Early Warning System for banking crises: From regression-based analysis to machine learning techniques By Elizabeth Jane Casabianca; Michele Catalano; Lorenzo Forni; Elena Giarda; Simone Passeri
  4. Integrating Ethical Values and Economic Value to Steer Progress in Artificial Intelligence By Anton Korinek
  5. Using Machine Learning to Detect and Predict Corporate Accounting Fraud (Japanese) By USUKI Teppei; KONDO Satoshi; SHIRAKI Kengo; SUGA Miki; MIYAKAWA Daisuke
  6. Evaluating the Effectiveness of Common Technical Trading Models By Joseph Attia
  7. How Polarized are Citizens? Measuring Ideology from the Ground-Up By Draca, Mirko; Schwarz, Carlo
  8. Accelerated Share Repurchase and other buyback programs: what neural networks can bring By Olivier Gu\'eant; Iuliia Manziuk; Jiang Pu
  9. Predicting criminal behavior with Levy flights using real data from Bogota By Mateo Dulce Rubio
  10. Trading via Image Classification By Naftali Cohen; Tucker Balch; Manuela Veloso
  11. Deep Learning-Based Least Square Forward-Backward Stochastic Differential Equation Solver for High-Dimensional Derivative Pricing By Jian Liang; Zhe Xu; Peter Li
  12. Asset mispricing in loan secondary market By Mustafa Caglayan; Tho Pham; Oleksandr Talavera; Xiong Xiong
  13. Forecasting High-Risk Composite CAMELS Ratings By Lewis Gaul; Jonathan Jones; Pinar Uysal
  14. Artificial intelligence, ideas by statistical mechanics, and affective modulation of information processing By L. Ingber
  15. How green is sugarcane ethanol? By Sant'Anna, Marcelo Castello Branco
  16. Do short-term rental platforms affect housing markets? Evidence from Airbnb in Barcelona By Miquel-Àngel Garcia-López; Jordi Jofre-Monseny; Rodrigo Martínez Mazza; Mariona Segú

  1. By: Alexis Bogroff (UP1 - Université Panthéon-Sorbonne); Dominique Guegan (UP1 - Université Panthéon-Sorbonne, CES - Centre d'économie de la Sorbonne - UP1 - Université Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, Labex ReFi - UP1 - Université Panthéon-Sorbonne, University of Ca’ Foscari [Venice, Italy])
    Abstract: An extensive list of risks relative to big data frameworks and their use through models of artificial intelligence is provided along with measurements and implementable solutions. Bias, interpretability and ethics are studied in depth, with several interpretations from the point of view of developers, companies and regulators. Reflexions suggest that fragmented frameworks increase the risks of models misspecification, opacity and bias in the result; Domain experts and statisticians need to be involved in the whole process as the business objective must drive each decision from the data extraction step to the final activatable prediction. We propose an holistic and original approach to take into account the risks encountered all along the implementation of systems using artificial intelligence from the choice of the data and the selection of the algorithm, to the decision making.
    Keywords: Artificial Intelligence,Bias,Big Data,Ethics,Governance,Interpretability,Regulation,Risk
    Date: 2019–06
  2. By: Yao-Su Hu (Hong Kong Shue Yan University, HK; SPRU, University of Sussex, UK)
    Abstract: Allyn Young’s concept of increasing returns, not to be confounded with static, equilibrium constructs of economies of scale and increasing returns to scale, is applied in this article to analyze how and why increasing returns arise in the production (generation) and use (application) of knowledge and of big data, thereby driving economic growth an progress. Knowledge is chosen as our focus because it is ‘our most powerful engine of production’ and big data is included to make the analysis more complete and up-to-date. We analyze four mechanisms or sources of increasing returns in the production of knowledge, and four in the use of knowledge. Turning to big data, increasing returns in the use thereof are examined in two spheres: the dominance resulting from the self-reinforcing functioning of digital platforms and machine learning through gigantic amounts of training data. Concluding remarks concern some key differences between knowledge and big data, some policy implications, and some of the social negative impacts from the ways in which big data is being used.
    Date: 2019–06
  3. By: Elizabeth Jane Casabianca (Prometeia Associazione per le Previsioni Econometriche, and DiSeS, Polytechnic University of Marche); Michele Catalano (Prometeia Associazione per le Previsioni Econometriche); Lorenzo Forni (Prometeia Associazione per le Previsioni Econometriche, and DSEA, University of Padua); Elena Giarda (Prometeia Associazione per le Previsioni Econometriche, and Cefin, University of Modena and Reggio Emilia); Simone Passeri (Prometeia Associazione per le Previsioni Econometriche)
    Abstract: Ten years after the outbreak of the 2007-2008 crisis, renewed attention is directed to money and credit fluctuations, financial crises and policy responses. By using an integrated dataset that includes 100 countries (advanced and emerging) spanning from 1970 to 2017, we propose an Early Warning System (EWS) to predict the build-up of systemic banking crises. The paper aims at (i) identifying the macroeconomic drivers of banking crises, (ii) going beyond the use of traditional discrete choice models by applying supervised machine learning (ML) and (iii) assessing the degree of countries’ exposure to systemic risks by means of predicted probabilities. Our results show that ML algorithms can have a better predictive performance than the logit models. All models deliver increasing predicted probabilities in the last years of the sample for the advanced countries, warning against the possible build-up of pre-crisis macroeconomic imbalances.
    Keywords: banking crises, EWS, machine learning, decision trees, AdaBoost
    JEL: C40 G01 C25 E44 G21
    Date: 2019–08
  4. By: Anton Korinek
    Abstract: Economics and ethics both offer important perspectives on our society, but they do so from two different viewpoints – the central focus of economics is how the price system in our economy values resources; the central focus of ethics is the moral evaluation of actions in our society. The rise of Artificial Intelligence (AI) forces humanity to confront new areas in which ethical values and economic value conflict, raising the question of what direction of technological progress is ultimately desirable for society. One crucial area are the effects of AI and related forms of automation on labor markets, which may lead to substantial increases in inequality unless mitigating policy actions are taken or progress is actively steered in a direction that complements human labor. Additional areas of conflict arise when AI systems optimize narrow market value but disregard broader ethical values and thus impose externalities on society, for example when AI systems engage in bias and discrimination, hack the human brain, and increasingly reduce human autonomy. Market incentives to create ever more intelligent systems lead to the ultimate ethical question: whether we should aim to create AI systems that surpass humans in general intelligence, and how to ensure that humanity is not left behind.
    JEL: E25 J23 J38 O33
    Date: 2019–08
  5. By: USUKI Teppei; KONDO Satoshi; SHIRAKI Kengo; SUGA Miki; MIYAKAWA Daisuke
    Abstract: In this paper, we examine to what extent the employment of machine learning technique contributes to better detection and prediction of corporate (i.e., firm-level) accounting fraud. The obtained results show, first, that the capacity to detect accounting fraud increases substantially by using the machine learning-based model. Second, a similar improvement in predictive power is also confirmed. Such higher performance is due to both the employment of the machine learning technique and the higher dimensions of predictors. Third, we also confirm that a larger variety of data, such as corporate governance-related variables, which have not necessarily been used as main predictors in the extant studies, contribute to better detection and prediction to some extent. These results jointly suggest the existence of various unexploited information sources which are potentially useful for the detection and prediction of corporate accounting fraud.
    Date: 2019–07
  6. By: Joseph Attia
    Abstract: How effective are the most common trading models? The answer may help investors realize upsides to using each model, act as a segue for investors into more complex financial analysis and machine learning, and to increase financial literacy amongst students. Creating original versions of popular models, like linear regression, K-Nearest Neighbor, and moving average crossovers, we can test how each model performs on the most popular stocks and largest indexes. With the results for each, we can compare the models, and understand which model reliably increases performance. The trials showed that while all three models reduced losses on stocks with strong overall downward trends, the two machine learning models did not work as well to increase profits. Moving averages crossovers outperformed a continuous investment every time, although did result in a more volatile investment as well. Furthermore, once finished creating the program that implements moving average crossover, what are the optimal periods to use? A massive test consisting of 169,880 trials, showed the best periods to use to increase investment performance (5,10) and to decrease volatility (33,44). In addition, the data showed numerous trends such as a smaller short SMA period is accompanied by higher performance. Plotting volatility against performance shows that the high risk, high reward saying holds true and shows that for investments, as the volatility increases so does its performance.
    Date: 2019–07
  7. By: Draca, Mirko (University of Warwick and Centre for Economic Performance, LSE); Schwarz, Carlo (University of Warwick and Centre for Competitive Advantage in the Global Economy (CAGE))
    Abstract: Strong evidence has been emerging that major democracies have become more politically polarized, at least according to measures based on the ideological positions of political elites. We ask: have the general public (‘citizens’) followed the same pattern? Our approach is based on unsupervised machine learning models as applied to issueposition survey data. This approach firstly indicates that coherent, latent ideologies are strongly apparent in the data, with a number of major, stable types that we label as: Liberal Centrist, Conservative Centrist, Left Anarchist and Right Anarchist. Using this framework, and a resulting measure of ‘citizen slant’, we are then able to decompose the shift in ideological positions across the population over time. Specifically, we find evidence of a ‘disappearing center’ in a range of countries with citizens shifting away from centrist ideologies into anti-establishment ‘anarchist’ ideologies over time. This trend is especially pronounced for the US.
    Keywords: Polarization ; Ideology ; Unsupervised Learning
    JEL: D72 C81
    Date: 2019
  8. By: Olivier Gu\'eant; Iuliia Manziuk; Jiang Pu
    Abstract: When firms want to buy back their own shares, they have a choice between several alternatives. If they often carry out open market repurchase, they also increasingly rely on banks through complex buyback contracts involving option components, e.g. accelerated share repurchase contracts, VWAP-minus profit-sharing contracts, etc. The entanglement between the execution problem and the option hedging problem makes the management of these contracts a difficult task that should not boil down to simple Greek-based risk hedging, contrary to what happens with classical books of options. In this paper, we propose a machine learning method to optimally manage several types of buyback contracts. In particular, we recover strategies similar to those obtained in the literature with partial differential equation and recombinant tree methods and show that our new method, which does not suffer from the curse of dimensionality, enables to address types of contract that could not be addressed with grid or tree methods.
    Date: 2019–07
  9. By: Mateo Dulce Rubio
    Abstract: I use residential burglary data from Bogota, Colombia, to fit an agent-based modelfollowing truncated Lévy flights (Pan et al., 2018) elucidating criminal rational behaviorand validating repeat/near-repeat victimization and broken windows effects. The estimatedparameters suggest that if an average house or its neighbors have never been attacked,and it is suddenly burglarized, the probability of a new attack the next day increases, dueto the crime event, in 79 percentage points. Moreover, the following day its neighborswill also face an increment in the probability of crime of 79 percentage points. This effectpersists for a long time span. The model presents an area under the Cumulative AccuracyProfile (CAP) curve, of 0.8 performing similarly or better than state-of-the-art crimeprediction models. Public policies seeking to reduce criminal activity and its negativeconsequences must take into account these mechanisms and the self-exciting nature ofcrime to effectively make criminal hotspots safer
    Keywords: Criminal behavior, Crime prediction model, Machine learning, Agent-basedmodel
    JEL: K42 H39 C53 C63
    Date: 2019–04–30
  10. By: Naftali Cohen; Tucker Balch; Manuela Veloso
    Abstract: The art of systematic financial trading evolved with an array of approaches, ranging from simple strategies to complex algorithms all relying, primary, on aspects of time-series analysis. Recently, after visiting the trading floor of a leading financial institution, we noticed that traders always execute their trade orders while observing images of financial time-series on their screens. In this work, we built upon the success in image recognition and examine the value in transforming the traditional time-series analysis to that of image classification. We create a large sample of financial time-series images encoded as candlestick (Box and Whisker) charts and label the samples following three algebraically-defined binary trade strategies. Using the images, we train over a dozen machine-learning classification models and find that the algorithms are very efficient in recovering the complicated, multiscale label-generating rules when the data is represented visually. We suggest that the transformation of continuous numeric time-series classification problem to a vision problem is useful for recovering signals typical of technical analysis.
    Date: 2019–07
  11. By: Jian Liang; Zhe Xu; Peter Li
    Abstract: We propose a new forward-backward stochastic differential equation solver for high-dimensional derivatives pricing problems by combining deep learning solver with least square regression technique widely used in the least square Monte Carlo method for the valuation of American options. Our numerical experiments demonstrate the efficiency and accuracy of our least square backward deep neural network solver and its capability to provide accurate prices for complex early exercise derivatives such as callable yield notes. Our method can serve as a generic numerical solver for pricing derivatives across various asset groups, in particular, as an efficient means for pricing high-dimensional derivatives with early exercises features.
    Date: 2019–07
  12. By: Mustafa Caglayan (Heriot-Watt University); Tho Pham (University of Reading); Oleksandr Talavera (University of Birmingham); Xiong Xiong (Tianjin University)
    Abstract: This study examines the presence of mispricing in Bondora, a leading European peer-to-peer lending platform, over the 2016-2019 period. Implementing machine-learning methods, we calculate the likelihood of success for loan resale in Bondora secondary market and compare with ex-post outcomes. We find evidence of mispricing mainly driven by the differences in market participants’ perceptions about asset values: low-quality assets are successfully sold while high-quality assets are not. Once sellers discover buyers’ beliefs about asset prices, they revalue their assets according to buyers’ perception to exploit this mismatch in subsequent listings. Our results are robust to various statistical and machine learning methods.
    Keywords: mispricing, online secondary market, peer-to-peer lending, belief dispersion
    JEL: G12 G20
    Date: 2019–07
  13. By: Lewis Gaul; Jonathan Jones; Pinar Uysal
    Keywords: Bank supervision and regulation, early warning models, CAMELS ratings, machine learning
    JEL: G21 G28 C53
    Date: 2019–07–23
  14. By: L. Ingber
    Date: 2019
  15. By: Sant'Anna, Marcelo Castello Branco
    Abstract: Biofuels offer one approach for reducing carbon emissions in transportation. However, the agricultural expansion needed to produce biofuels may endanger tropical forests. I use a dynamic model of land use to disentangle the roles played by agricultural expansion and yield increases in the supply of sugarcane ethanol in Brazil. The model is estimated using remote sensing (satellite) information of sugarcane activities. Estimates imply that, at the margin, 92% of new ethanol comes from increases in area and only 8% from increases in yield. Direct deforestation accounts for 12% of area expansion. I further assess carbon emissions and deforestation implications from ethanol policies.
    Date: 2019–07–25
  16. By: Miquel-Àngel Garcia-López (Universitat Autònoma de Barcelona, Institut d’Economia de Barcelona (IEB)); Jordi Jofre-Monseny (Universitat de Barcelona, Institut d’Economia de Barcelona (IEB)); Rodrigo Martínez Mazza (Universitat de Barcelona, Institut d’Economia de Barcelona (IEB)); Mariona Segú (RITM, Université Paris Sud, Paris Saclay)
    Abstract: In this paper, we assess the impact of the arrival and expansion of Airbnb on housing rents and prices in the city of Barcelona. Examining highly detailed data on rents and both transaction and posted prices, we use several econometric approaches that exploit the exact timing and geography of Airbnb activity in the city. These include i) panel fixed-effects models with neighborhood-specific time trends, ii) an instrumental variable shift-share approach in which tourist amenities predict where Airbnb listings will locate and Google searches predict when listings appear, and iii) event-study designs. For the average neighborhood in terms of Airbnb activity, our preferred results imply that rents have increased by 1.9%, while transaction (posted) prices have increased by 5.3% (3.7%). The estimated impact in neighborhoods with high Airbnb activity is substantial. For neighborhoods in the top decile of Airbnb activity distribution, rents are estimated to have increased by 7%, while increases in transaction (posted) prices are estimated at 19% (14%).
    Keywords: Housing markets, short-term rentals, Airbnb
    JEL: R10 R20 R31
    Date: 2019

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.