nep-big New Economics Papers
on Big Data
Issue of 2022‒09‒12
fifteen papers chosen by
Tom Coupé
University of Canterbury

  1. k-Means Clusterization and Machine Learning Prediction of European Most Cited Scientific Publications By Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio
  2. Transformer-Based Deep Learning Model for Stock Price Prediction: A Case Study on Bangladesh Stock Market By Tashreef Muhammad; Anika Bintee Aftab; Md. Mainul Ahsan; Maishameem Meherin Muhu; Muhammad Ibrahim; Shahidul Islam Khan; Mohammad Shafiul Alam
  3. Can a Machine Correct Option Pricing Models? By Caio Almeida; Jianqing Fan; Gustavo Freire; Francesca Tang
  4. The Export of Medium and High-Tech Products Manufactured in Europe By Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio
  5. The Determinants of Lifelong Learning in Europe By Costantiello, Alberto; Laureti, Lucio; Leogrande, Angelo
  6. GAM(L)A: An econometric model for interpretable machine learning By Sullivan Hué
  7. The Community Explorer How to Inform Effectively Policy on U.S. Diversity with County Level Data By Lopez, Claude; Roh, Hyeongyul; Switek, Maggie
  8. Estimating Inequality with Missing Incomes By Paolo Brunori; Pedro Salas-Rojo; Paolo Verme
  9. Are big data a radical innovation trigger or a problem-solving patch? The case of AI implementation by automotive incumbents By Quentin Plantec; Marie-Alix Deval; Sophie Hooge; Benoît Weil
  10. Role of Artificial Intelligence in Intra-Sectoral Wage Inequality in an Open Economy: A Finite Change Approach By Shreya Roy; Sugata Marjit; Bibek Ray Chaudhuri
  11. Machine ethics and African identities: Perspectives of artificial intelligence in Africa By Kohnert, Dirk
  12. A systematic literature review on the disruptions of artificial intelligence within the business world: in terms of the evolution of competences By Shengxing Yang
  13. AI Watch. National strategies on Artificial Intelligence: A European perspective. 2022 edition By JORGE RICART Raquel; VAN ROY Vincent; ROSSETTI Fiammetta; TANGI Luca
  14. Child Care Provider Survival Analysis By Phillip Sherlock; Herman T. Knopf; Robert Chapman; Maya Schreiber; Courtney K. Blackwell
  15. Does the European Central Bank speak differently when in parliament? By Fraccaroli, Nicolò; Giovannini, Alessandro; Jamet, Jean-Francois; Persson, Eric

  1. By: Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio
    Abstract: In this article we investigate the determinants of the European “Most Cited Publications”. We use data from the European Innovation Scoreboard-EIS of the European Commission for the period 2010-2019. Data are analyzed with Panel Data with Fixed Effects, Panel Data with Random Effects, WLS, and Pooled OLS. Results show that the level of “Most Cited Publications” is positively associated, among others, to “Innovation Index” and “Enterprise Birth” and negatively associated, among others, to “Government Procurement of Advanced Technology Products” and “Human Resources”. Furthermore, we perform a cluster analysis with the k-Means algorithm either with the Silhouette Coefficient and the Elbow Method. We find that the Elbow Method shows better results than the Silhouette Coefficient with a number of clusters equal to 3. In adjunct we perform a network analysis with the Manhattan distance, and we find the presence of 4 complex and 2 simplified network structures. Finally, we present a confrontation among 10 machine learning algorithms to predict the level of “Most Cited Publication” either with Original Data-OD either with Augmented Data-AD. Results show that the best machine learning algorithm to predict the level of “Most Cited Publication” with Original Data-OD is SGD, while Linear Regression is the best machine learning algorithm for the prediction of “Most Cited Publications” with Augmented Data-AD.
    Keywords: Innovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation.
    JEL: O3 O30 O31 O32 O33
    Date: 2022–08–20
  2. By: Tashreef Muhammad; Anika Bintee Aftab; Md. Mainul Ahsan; Maishameem Meherin Muhu; Muhammad Ibrahim; Shahidul Islam Khan; Mohammad Shafiul Alam
    Abstract: In modern capital market the price of a stock is often considered to be highly volatile and unpredictable because of various social, financial, political and other dynamic factors. With calculated and thoughtful investment, stock market can ensure a handsome profit with minimal capital investment, while incorrect prediction can easily bring catastrophic financial loss to the investors. This paper introduces the application of a recently introduced machine learning model - the Transformer model, to predict the future price of stocks of Dhaka Stock Exchange (DSE), the leading stock exchange in Bangladesh. The transformer model has been widely leveraged for natural language processing and computer vision tasks, but, to the best of our knowledge, has never been used for stock price prediction task at DSE. Recently the introduction of time2vec encoding to represent the time series features has made it possible to employ the transformer model for the stock price prediction. This paper concentrates on the application of transformer-based model to predict the price movement of eight specific stocks listed in DSE based on their historical daily and weekly data. Our experiments demonstrate promising results and acceptable root mean squared error on most of the stocks.
    Date: 2022–08
  3. By: Caio Almeida (Princeton University); Jianqing Fan (Princeton University); Gustavo Freire (Erasmus School of Economics); Francesca Tang (Princeton University)
    Abstract: We introduce a novel two-step approach to predict implied volatility surfaces. Given any fitted parametric option pricing model, we train a feedforward neural network on the model-implied pricing errors to correct for mispricing and boost performance. Using a large dataset of S&P 500 options, we test our nonparametric correction on several parametric models ranging from ad-hoc Black-Scholes to structural stochastic volatility models and demonstrate the boosted performance for each model. Out-of-sample prediction exercises in the cross-section and in the option panel show that machine-corrected models always outperform their respective original ones, often by a large extent. Our method is relatively indiscriminate, bringing pricing errors down to a similar magnitude regardless of the misspecification of the original parametric model. Even so, correcting models that are less misspecified usually leads to additional improvements in performance and also outperforms a neural network fitted directly to the implied volatility surface.
    Keywords: Deep Learning, Boosting, Implied Volatility, Stochastic Volatility, Model Correction
    JEL: C45 C58 G13
    Date: 2022–07
  4. By: Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio
    Abstract: In this article we analyze the determinants and the export trend of European countries of medium and high technology products. The data were analyzed using various econometric models, namely WLS, Pooled OLS, Dynamic Panel, Panel Data with Fixed Effects, Panel Data with Random Effects. The results show that exports of medium and high-tech products are positively associated, among other variables, with the value of “Average Annual GDP Growth”, “Total Entrepreneurial Activity” and “Sales Impacts”, and negatively associated with, among other variables, “Human Resources”, “Government and Procurement of Advanced Technology Products” and “Buyer Sophistication”. A cluster analysis was realized with the k-Means algorithm optimized with the Silhouette coefficient. The result showed the presence of only two clusters. Since this result was considered poorly representative of the industrial complexity of the European Union countries, a further analysis was carried out with the Elbow method. The result showed the presence of 6 clusters with the dominance of Germany and the economies connected to the German economy. In addition, a network analysis was carried out using the distance to Manhattan. Four complex network structures and two simplified network structures were detected. A comparison was then made between 10 machine learning algorithms for predicting the value of exports of medium and high-tech products. The result shows that the best performing algorithm is the SGD. An analysis with Augmented Data-AD was implemented with a comparison between 10 machine learning algorithms for prediction and the result shows that the Linear Regression algorithm is the best predictor. The prediction with the Augmented Data-AD allows to reduce the MAE by about 0.0022131 compared to the prediction with the Original Data-OD.
    Keywords: Innovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation
    JEL: O30 O31 O32 O33 O34
    Date: 2022–08–16
  5. By: Costantiello, Alberto; Laureti, Lucio; Leogrande, Angelo
    Abstract: The article affords the question of lifelong learning in Europe using data from the European Innovation Scoreboard-EIS in the period 2010-2019 for 36 countries. The econometric analysis is realized using WLS, Dynamic Panel, Pooled OLS, Panel Data with Fixed Effects and Random Effects. The results show that lifelong learning is, among other variables, positively associated to “Human Resources” and “Government procurement of advanced technology products” and is negatively associated, among others, to “Average annual GDP growth” and “Innovation Index”. A clusterization is realized using the k-Means algorithm with a confrontation between the Elbow Method and the Silhouette Coefficient. Subsequently, a Network Analysis was applied with the distance of Manhattan. The results show the presence of 4 complex and 2 simplified network structures. Finally, a comparison was made among eight machine learning algorithms for the prediction of the value of lifelong learning. The results show that the linear regression is the best predictor algorithm and that the level of lifelong learning is expected to growth on average by 1.12%.
    Keywords: Innovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation.
    JEL: O30 O31 O32 O33 O34
    Date: 2022–08–07
  6. By: Sullivan Hué (Aix-Marseille Université, AMSE)
    Abstract: Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes or uninterpretable models, which has raised concerns from practitioners and regulators. As an alternative, I propose to use partial linear models that are inherently interpretable. Specifically, this presentation introduces GAM-lasso (GAMLA) and GAM-autometrics (GAMA), denoted as GAM(L)A in short. GAM(L)A combines parametric and non-parametric functions to accurately capture linearities and nonlinearities prevailing between dependent and explanatory variables and a variable-selection procedure to control for overfitting issues. Estimation relies on a two-step procedure building upon the double residual method. I illustrate the predictive performance and interpretability of GAM(L)A on a regression and a classification problem. The results show that GAM(L)A outperforms parametric models augmented by quadratic, cubic, and interaction effects. Moreover, the results also suggest that the performance of GAM(L)A is not significantly different from that of random forest and gradient boosting.
    Date: 2022–08–01
  7. By: Lopez, Claude; Roh, Hyeongyul; Switek, Maggie
    Abstract: The Community Explorer provides novel insightintoon the different characteristics of the U.S. population that can be used in policy design and impact assessment. More broadly, it increases the understanding of socio-economic gaps and potential markets in the U.S.. More specifically, it synthesizes the information of 751 variables across 3142 counties from the Census Bureau’s American Community Survey using machine learning methods, into 17 communities. Each one of these communities has a distinctive profile that combines demographic, economic, and many other behavior determinants while not being geographically bounded.
    Keywords: US diversity, equity, machine learning, clusters, census, county level data, data viz, interactive map
    JEL: C38 R0 R1 Y1
    Date: 2022–08
  8. By: Paolo Brunori; Pedro Salas-Rojo; Paolo Verme
    Abstract: The measurement of income inequality is affected by missing observations, especially if they are concentrated on the tails of an income distribution. This paper conducts an experiment to test how the different correction methods proposed by the statistical, econometric and machine learning literature address measurement biases of inequality due to item non response. We take a baseline survey and artificially corrupt the data employing several alternative non-linear functions that simulate patterns of income non-response, and show how biased inequality statistics can be when item non-responses are ignored. The comparative assessment of correction methods indicates that most methods are able to partially correct for missing data biases. Sample reweighting based on probabilities on non-response produces inequality estimates quite close to true values in most simulated missing data patterns. Matching and Pareto corrections can also be effective to correct for selected missing data patterns. Other methods, such as Single and Multiple imputations and Machine Learning methods are less effective. A final discussion provides some elements that help explaining these findings.
    Keywords: Income Inequality; Item non-response; Income Distributions; Inequality Predictions; Imputations.
    JEL: D31 D63 E64 O15
    Date: 2022
  9. By: Quentin Plantec (TSM - Toulouse School of Management Research - UT1 - Université Toulouse 1 Capitole - Université Fédérale Toulouse Midi-Pyrénées - CNRS - Centre National de la Recherche Scientifique - TSM - Toulouse School of Management - UT1 - Université Toulouse 1 Capitole - Université Fédérale Toulouse Midi-Pyrénées); Marie-Alix Deval; Sophie Hooge; Benoît Weil
    Abstract: Big data, supported by AI technologies, is mainly viewed as a trigger for radical innovation. The automotive industry appears as a key example: the most critical innovative challenges (e.g., autonomous driving, connected cars) imply drawing more extensively on big data. But the degree of innovativeness of the industrial purpose of incumbents, who are already embedding such technologies in their end-products, is worth investigating. To answer this research question, we relied on a mixed-method approach and used knowledge search as a theoretical framework. First, we conducted a quantitative analysis on 46,145 patents from the top-19 automotive incumbents. By comparing AI and non-AI patents, we showed that incumbents mainly rely on knowledge exploitation for data-driven innovation leading to incremental innovations. But, surprisingly, such innovation path foster more technologically original inventions with AI, which is not the case for non-AI patents. Second, we conducted a qualitative study to better understand this phenomenon. We showed that big data and AI technologies are integrated in the industrialization phase of new vehicles development process, following creative problem-solving logics. We also retrieved technical and organizational challenges limiting data-driven innovation. Those findings are discussed regarding the knowledge search and the new product development literature in the context of automotive industry.
    Keywords: Big data,AI technologies,automotive industry,digital transformation
    Date: 2022–06
  10. By: Shreya Roy; Sugata Marjit; Bibek Ray Chaudhuri
    Abstract: Artificial Intelligence (AI) has the potential to significantly impact the income of individuals. Cross-country data shows that introduction of AI is inequality enhancing in developing and less developed countries. In this paper, we attempt to understand the reason for increase in wage inequality across labourers due to introduction of AI, in a finite change General Equilibrium (GE) set up which allows for emergence of a new activity. AI-induced technological shock is introduced in the non-traded sector of an open economy with heterogeneous skills. We show how the advent of AI (which was initially non-existent) in the non-traded sector separates the skills of the once homogenous workers, thus, creating an intra-sectoral wage gap. What proportion of the low-skilled workers can move to the higher wage paying sector depends on an adaptability factor that acts as an eligibility criterion in fragmenting the erstwhile homogenous labourers and also works towards rising intra-group wage gap.
    Keywords: artificial intelligence, finite change, sectoral wage gap
    JEL: O33 J31 D50
    Date: 2022
  11. By: Kohnert, Dirk
    Abstract: Artificial Intelligence (AI) has been embraced enthusiastically by Africans as a new resource for African development. AI could improve well-being by enabling innovation in business, education, health, ecology, urban planning, industry, etc. However, the high expectations could be little more than pious wishes. There are still too many open questions regarding the transfer required, and the selection of appropriate technology and its mastery. Given that the 'technology transfer' concept of modernization theories of the 1960s utterly failed because it had not been adapted to local needs, some scholars have called for an endogenous concept of African AI. However, this caused a lot of controversies. Africa became a battlefield of 'digital empires' of global powers due to its virtually non-existent digital infrastructure. Still, African solutions to African problems would be needed. Additionally, the dominant narratives and default settings of AI-related technologies have been denounced as male, gendered, white, heteronormative, powerful, and western. The previous focus on the formal sector is also questionable. Innovators from the informal sector and civil society, embedded in the local sociocultural environment but closely linked to transnational social spaces, often outperform government development efforts. UNESCO also warned that the effective use of AI in Africa requires the appropriate skills, legal framework and infrastructure. As in the past, calls by African politicians for a pooling of resources, a pan-African strategy, were probably in vain. AI may develop fastest in the already established African technology hubs of South Africa, Nigeria and Kenya. But promising AI-focused activities have also been identified in Ethiopia and Uganda. Gender equality, cultural and linguistic diversity, and changes in labour markets would also be required for AI to enhance rather than undermine socioeconomic inclusion. In addition, ethical questions related to a specific African identity have been raised. The extent to which African ideas of humanity and humanitarianism should be taken into account when developing an African AI remains an open question. In short, calling for the rapid deployment of AI in Africa could be a double-edged sword.
    Keywords: Artificial Intelligence; Innovation;, Machine learning; Big Data Analytics; moral values; AI ethics; African ethics; African philosophy; Africa; Sub-Saharan Africa; economic development; human development; informal sector; poverty; famine; international trade; global power; fragile state; South Africa; Nigeria; Kenya; Uganda; Ethiopia; Postcolonialism; African Studies;
    JEL: E24 E26 F15 F22 F35 F54 F63 I3 J4 J46 L26 M1 M13 N77 O32 O33 O35 P46 Q14 Z13
    Date: 2022–07–17
  12. By: Shengxing Yang (Université Paris-Saclay)
    Abstract: The advancement of artificial intelligence has brought both opportunities and challenges to the business world, and its potentially disruptive impact has attracted the research interest of management scholars. This exploratory research applied a systematic literature review approach to explore the nexus between AI and competences to help both firms and individuals better address the disruptions from AI. After reviewing relevant publications from the Business Source Complete database for the past decade (2011-2021), we selected 65 articl debates and issues on AI and perspectives linked with competences. Furthermore, we synthesize two frameworks (RBV framework for firm-level; Key and STEM competences for individual-level) and an overview to gain a holistic understanding of the nexus between AI and competences. We found relatively little empirical evidence in the literature, the implementation of AI was still in its preliminary stages, and the frameworks we aggregated industry and yield richer insights.
    Keywords: Artificial Intelligence,Competences,Firm,Individual,Systematic literature review
    Date: 2022–06–07
  13. By: JORGE RICART Raquel; VAN ROY Vincent (European Commission - JRC); ROSSETTI Fiammetta (European Commission - JRC); TANGI Luca (European Commission - JRC)
    Abstract: This report provides an in-depth comparative analysis of the national strategies structured along the categories and priorities agreed between EC and Member States in the Coordinated Plan on AI review 2021. The aim of this report is to assess how national strategies contribute to the achievement of the goals of the reviewed Coordination Plan.
    Keywords: Artificial Intelligence
    Date: 2022–05
  14. By: Phillip Sherlock; Herman T. Knopf; Robert Chapman; Maya Schreiber; Courtney K. Blackwell
    Abstract: The aggregate ability of child care providers to meet local demand for child care is linked to employment rates in many sectors of the economy. Amid growing concern regarding child care provider sustainability due to the COVID-19 pandemic, state and local governments have received large amounts of new funding to better support provider stability. In response to this new funding aimed at bolstering the child care market in Florida, this study was devised as an exploratory investigation into features of child care providers that lead to business longevity. In this study we used optimal survival trees, a machine learning technique designed to better understand which providers are expected to remain operational for longer periods of time, supporting stabilization of the child care market. This tree-based survival analysis detects and describes complex interactions between provider characteristics that lead to differences in expected business survival rates. Results show that small providers who are religiously affiliated, and all providers who are serving children in Florida's universal Prekindergarten program and/or children using child care subsidy, are likely to have the longest expected survival rates.
    Date: 2022–08
  15. By: Fraccaroli, Nicolò; Giovannini, Alessandro; Jamet, Jean-Francois; Persson, Eric
    Abstract: Parliamentary hearings are a fundamental tool to hold independent central banks accountable. However, it is not clear what type of information central banks provide when they communicate with parliaments compared to other existing information channels. In this article, we address this question by comparing the communication of the European Central Bank (ECB) in parliamentary hearings to its communication in the regular press conferences that follow monetary policy decisions. Using text analysis on the ECB President’s introductory statements in parliamentary hearings and press conferences from 1998 to 2021, we show that the ECB uses parliamentary hearings to discuss topics that are less covered in press conferences. We also find that the ECB’s policy stance in the hearings tends to reflect the stance in press conferences, and that the degree of language complexity is similar in the two fora. These findings support the view that the ECB mainly uses parliamentary hearings to further explain policy decisions first presented at press conferences but also to put them in a broader context. JEL Classification: E02, E52, E58
    Keywords: Central Bank accountability, Central Bank communication
    Date: 2022–08

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.