nep-big New Economics Papers
on Big Data
Issue of 2020‒01‒27
23 papers chosen by
Tom Coupé
University of Canterbury

  1. Big Data, artificial intelligence and the geography of entrepreneurship in the United States By Obschonka, Martin; Lee, Neil; Rodríguez-Pose, Andrés; Eichstaedt, johannes Christopher; Ebert, Tobias
  2. Comparing Deep Neural Network and Econometric Approaches to Predicting the Impact of Climate Change on Agricultural Yield By Timothy Neal; Michael Keane
  3. Neural Network Associative Forecasting of Demand for Goods By Osipov, Vasiliy; Zhukova, Nataly; Miloserdov, Dmitriy
  4. Proyección de la Inflación en Chile con Métodos de Machine Learning By Felipe Leal; Carlos Molina; Eduardo Zilberman
  5. ResLogit: A residual neural network logit model By Melvin Wong; Bilal Farooq
  6. On the Political Economy of the European Union By Julia M. Puaschunder; Martin Gelter
  7. "The Squawk Bot": Joint Learning of Time Series and Text Data Modalities for Automated Financial Information Filtering By Xuan-Hong Dang; Syed Yousaf Shah; Petros Zerfos
  9. DP-LSTM: Differential Privacy-inspired LSTM for Stock Prediction Using Financial News By Xinyi Li; Yinchuan Li; Hongyang Yang; Liuqing Yang; Xiao-Yang Liu
  10. Improving metadata infrastructure for complex surveys: Insights from the Fragile Families Challenge By Kindel, Alexander; Bansal, Vineet; Catena, Kristin; Hartshorne, Thomas; Jaeger, Kate; Koffman, Dawn; McLanahan, Sara; Phillips, Maya; Rouhani, Shiva; Vinh, Ryan
  11. The long-run information effect of central bank communication By Hansen, Stephen; McMahon, Michael; Tong, Matthew
  12. Estimating Saudi Arabia’s Regional GDP Using Satellite Nighttime Light Images. By Hector Lopez-Ruiz; Jorge Blazquez; Fakhri Hasanov
  14. Clustering to Reduce Spatial Data Set Size By Boeing, Geoff
  15. Searching for Interpretable Demographic Patterns By Muratova, Anna; Islam, Robiul; Mitrofanova, Ekaterina S.; Ignatov, Dmitry I.
  16. The Evolution of Inequality of Opportunity in Germany: A Machine Learning Approach By Paolo Brunori; Guido Neidhoefer
  17. Innovations in the Wind Energy Sector By Dali T. Laxton
  18. Evolving ab initio trading strategies in heterogeneous environments By David Rushing Dewhurst; Yi Li; Alexander Bogdan; Jasmine Geng
  19. The Impact of Typhoons on Economic Activity in the Philippines: Evidence from Nightlight Intensity By Strobl, Eric
  20. Forecasting own brand sales: Does incorporating competition help? By Li, W.; Fok, D.; Franses, Ph.H.B.F.
  21. Planarity and Street Network Representation in Urban Form Analysis By Boeing, Geoff
  22. Urban Spatial Order: Street Network Orientation, Configuration, and Entropy By Boeing, Geoff
  23. A Multi-Scale Analysis of 27,000 Urban Street Networks: Every US City, Town, Urbanized Area, and Zillow Neighborhood By Boeing, Geoff

  1. By: Obschonka, Martin; Lee, Neil; Rodríguez-Pose, Andrés; Eichstaedt, johannes Christopher; Ebert, Tobias
    Abstract: There is increasing interest in the potential of artificial intelligence and Big Data (e.g., generated via social media) to help understand economic outcomes and processes. But can artificial intelligence models, solely based on publicly available Big Data (e.g., language patterns left on social media), reliably identify geographical differences in entrepreneurial personality/culture that are associated with entrepreneurial activity? Using a machine learning model processing 1.5 billion tweets by 5.25 million users, we estimate the Big Five personality traits and an entrepreneurial personality profile for 1,772 U.S. counties. We find that these Twitter-based personality estimates show substantial relationships to county-level entrepreneurship activity, accounting for 20% (entrepreneurial personality profile) and 32% (all Big Five trait as separate predictors in one model) of the variance in local entrepreneurship and are robust to the introduction in the model of conventional economic factors that affect entrepreneurship. We conclude that artificial intelligence methods, analysing publically available social media data, are indeed able to detect entrepreneurial patterns, by measuring territorial differences in entrepreneurial personality/culture that are valid markers of actual entrepreneurial behaviour. More importantly, such social media datasets and artificial intelligence methods are able to deliver similar (or even better) results than studies based on millions of personality tests (self-report studies). Our findings have a wide range of implications for research and practice concerned with entrepreneurial regions and eco-systems, and regional economic outcomes interacting with local culture.
    Date: 2018–05–24
  2. By: Timothy Neal (UNSW School of Economics); Michael Keane (UNSW School of Economics)
    Abstract: Predicting the impact of climate change on crop yield is difficult, in part because the production function mapping weather to yield is high dimensional and nonlinear. We compare three approaches to predicting yields: (i) deep neural networks (DNNs), (ii) traditional panel-data models, and (iii) a new panel-data model that allows for unit and time fixed-effects in both intercepts and slopes in the agricultural production function - made feasible by a new estimator developed by Keane and Neal (2020) called MO-OLS. Using U.S. county-level corn yield data from 1950-2015, we show that both DNNs and MO-OLS models outperform traditional panel data models for predicting yield, both in-sample and in a Monte Carlo cross-validation exercise. However, the MO-OLS model substantially outperforms both DNNs and traditional panel-data models in forecasting yield in a 2006-15 holdout sample. We compare predictions of all these models for climate change impacts on yields from 2016 to 2100.
    Keywords: Climate Change, Crop Yield, Panel Data, Machine Learning, Neural Net
    Date: 2020–01
  3. By: Osipov, Vasiliy; Zhukova, Nataly; Miloserdov, Dmitriy
    Abstract: This article discusses the applicability of recurrent neural networks with controlled elements to the problem of forecasting market demand for goods on the four month horizon. Two variants of forecasting are considered. In the first variant, time series are used to train the neural network, including the real demand values, as well as pre-order values for 1, 2 and 3 months ahead. In the second variant, there is an iterative forecasting method. It predicts the de-mand for the next month at each step, and the training set is supplemented by the values predicted for the previous months. It is shown that the proposed methods can give a sufficiently high result. At the same time, the second ap-proach demonstrates greater potential.
    Keywords: Recurrent Neural Network; Machine Learning; Data Mining; Demand Forecasting
    JEL: C45 L10
    Date: 2019–09–23
  4. By: Felipe Leal; Carlos Molina; Eduardo Zilberman
    Abstract: In this paper, in line with Medeiros et al. (2019) for the US, we apply Machine Learning (ML) methods with Big Data to forecast the total and underlying CPI inflation in Chile. We show that the ML methods do not gain in the inflation projection for the Chilean case in a consistent way on simple and univariate linear competitors such as the AR, the mean and the median of the past inflation, which have proven to be highly competitive. In fact, these are the winning methods in many cases. A second contribution of this work is the construction of a large dataset with macroeconomic variables related to the Chilean economy similar to McCracken and Ng (2016), who built (and maintains) a similar data for the United States.
    Date: 2020–01
  5. By: Melvin Wong; Bilal Farooq
    Abstract: We present a Residual Logit (ResLogit) model for seamlessly integrating a data-driven Deep Neural Network (DNN) architecture in the random utility maximization paradigm. DNN models such as the Multi-layer Perceptron (MLP) have shown remarkable success in modelling complex data accurately, but recent studies have consistently demonstrated that their black-box properties are incompatible with discrete choice analysis for the purpose of interpreting decision making behaviour. Our proposed machine learning choice model is a departure from the conventional feed-forward MLP framework by using a dynamic residual neural network learning based approach. Our proposed method can be formulated as a Generalized Extreme Value (GEV) random utility maximization model for greater flexibility in capturing unobserved heterogeneity. It can generate choice model structures where the covariance between random utilities is estimated and incorporated into the random error terms, allowing for a richer set of higher-order substitution patterns than a standard logit might be able to achieve. We describe the process of our model estimation and examine the relative empirical performance and econometric implications on two mode choice experiments. We analyzed the behavioural and theoretical properties of our methodology. We showed how model interpretability is possible, while also capturing the underlying complex and unobserved behavioural heterogeneity effects in the residual covariance matrices.
    Date: 2019–12
  6. By: Julia M. Puaschunder (The New School, Department of Economic, USA); Martin Gelter (Fordham University School of Law and Center on European Union Law)
    Abstract: Political economy concerns historical, legal and heterodox economics analysis of complex systems. This article attempts to analyze the current state of the European Union from historical, legal and interdisciplinary economics perspectives. Historically, the ancient Athenian democracy, the Holy Roman Empire and the early formation of the United States serve as examples of early innovative legal constructs of their times that were sui generis and share key features with the current European Union. Regarding legal developments, this paper discusses the bicameral parliamentary structure, electoral processes and populist pressures. The future of the European Union economy is likely to see an AI (r)evolution shaping markets and rising big data revenues. This develop necessitates the creation of a fifth fundamental freedom of data transfer within the European Union, as well as taxation of growth generated by big data. Heterodox economic growth theories will increasingly have to account for this growth.
    Keywords: Ancient Athenian democracy, Artificial Intelligence, Bicameral parliament, Big data, Electoral system, European Union, Holy Roman empire, market disruption, political economy, Populist pressures, Taxation, United States
    Date: 2019–11
  7. By: Xuan-Hong Dang; Syed Yousaf Shah; Petros Zerfos
    Abstract: Multimodal analysis that uses numerical time series and textual corpora as input data sources is becoming a promising approach, especially in the financial industry. However, the main focus of such analysis has been on achieving high prediction accuracy while little effort has been spent on the important task of understanding the association between the two data modalities. Performance on the time series hence receives little explanation though human-understandable textual information is available. In this work, we address the problem of given a numerical time series, and a general corpus of textual stories collected in the same period of the time series, the task is to timely discover a succinct set of textual stories associated with that time series. Towards this goal, we propose a novel multi-modal neural model called MSIN that jointly learns both numerical time series and categorical text articles in order to unearth the association between them. Through multiple steps of data interrelation between the two data modalities, MSIN learns to focus on a small subset of text articles that best align with the performance in the time series. This succinct set is timely discovered and presented as recommended documents, acting as automated information filtering, for the given time series. We empirically evaluate the performance of our model on discovering relevant news articles for two stock time series from Apple and Google companies, along with the daily news articles collected from the Thomson Reuters over a period of seven consecutive years. The experimental results demonstrate that MSIN achieves up to 84.9% and 87.2% in recalling the ground truth articles respectively to the two examined time series, far more superior to state-of-the-art algorithms that rely on conventional attention mechanism in deep learning.
    Date: 2019–12
  8. By: Ntale, Moses Kizito Njagi; Mathenge, Fr. Paul; Gikonyo, Barnabas
    Abstract: Predictive analytics is used to analyze the vast amounts of information generated through internal and external sources such as live public transit data, train schedules, and bus feeds. The collection, storage, and mining of big data is on an increase as more automated platforms come online and this is an issue that is gaining attention in all business environments raising privacy concerns. However, the amount of data and its variety in data analytics may cause data management issues in areas of data quality, consistency and governance; resulting from different platforms and data stores in big data architecture causes data silos. Furthermore, integrating big data tools into a cohesive architecture that meets an organization's big data analytics needs is a challenging proposition for the analytics experts, which have to identify the right mix of technologies and then put the pieces together. This study therefore, investigated the influence of social media as a source of predictive analytics on customer satisfaction of Standard Gauge Railways (SGR) users. This research followed a cross sectional survey research design. The study targeted the customers and employees of SGR Nairobi terminus from which a sample size of 68 respondents was picked using from the Nairobi Terminus station through use of convenient sampling technique. This study used a questionnaire to collect primary data. Data obtained from the field was converted into useful information using qualitative and quantitative description qualitative was done through observation and analyzed through use of content analysis. On the other hand, quantitative data was analyzed through inferential techniques namely correlation and multiple regression. The findings indicated that usage of social media for information, mode of payment, SGR classes, and rates/fare had significant effect on customer satisfaction. It was recommended that the management of SGR should: place regular offers of discounts or freebies and give away on its sites; frequently update its social media sites with interesting information and product updates; and make the social sites more interactive to allow online members to invite others who are non-members of these social sites to sign up for the SGR services.
    Date: 2019–09–30
  9. By: Xinyi Li; Yinchuan Li; Hongyang Yang; Liuqing Yang; Xiao-Yang Liu
    Abstract: Stock price prediction is important for value investments in the stock market. In particular, short-term prediction that exploits financial news articles is promising in recent years. In this paper, we propose a novel deep neural network DP-LSTM for stock price prediction, which incorporates the news articles as hidden information and integrates difference news sources through the differential privacy mechanism. First, based on the autoregressive moving average model (ARMA), a sentiment-ARMA is formulated by taking into consideration the information of financial news articles in the model. Then, an LSTM-based deep neural network is designed, which consists of three components: LSTM, VADER model and differential privacy (DP) mechanism. The proposed DP-LSTM scheme can reduce prediction errors and increase the robustness. Extensive experiments on S&P 500 stocks show that (i) the proposed DP-LSTM achieves 0.32% improvement in mean MPA of prediction result, and (ii) for the prediction of the market index S&P 500, we achieve up to 65.79% improvement in MSE.
    Date: 2019–12
  10. By: Kindel, Alexander (Princeton University); Bansal, Vineet; Catena, Kristin; Hartshorne, Thomas; Jaeger, Kate; Koffman, Dawn; McLanahan, Sara; Phillips, Maya; Rouhani, Shiva; Vinh, Ryan
    Abstract: Researchers rely on metadata systems to prepare data for analysis. As the complexity of datasets increases and the breadth of data analysis practices grow, existing metadata systems can limit the efficiency and quality of data preparation. This article describes the redesign of a metadata system supporting the Fragile Families and Child Wellbeing Study based on the experiences of participants in the Fragile Families Challenge. We demonstrate how treating metadata as data—that is, releasing comprehensive information about variables in a format amenable to both automated and manual processing—can make the task of data preparation less arduous and less error-prone for all types of data analysis. We hope that our work will facilitate new applications of machine learning methods to longitudinal surveys and inspire research on data preparation in the social sciences. We have open-sourced the tools we created so that others can use and improve them.
    Date: 2018–09–21
  11. By: Hansen, Stephen; McMahon, Michael; Tong, Matthew
    Abstract: Why do long-run interest rates respond to central bank communication? Whereas existing explanations imply a common set of signals drives short and long-run yields, we show that news on economic uncertainty can have increasingly large effects along the yield curve. To evaluate this channel, we use the publication of the Bank of England’s Inflation Report, from which we measure a set of high-dimensional signals. The signals that drive long-run interest rates do not affect short-run rates and operate primarily through the term premium. This suggests communication plays an important role in shaping perceptions of long-run uncertainty. JEL Classification: E52, E58, C55
    Keywords: communication, machine learning, monetary policy
    Date: 2020–01
  12. By: Hector Lopez-Ruiz; Jorge Blazquez; Fakhri Hasanov (King Abdullah Petroleum Studies and Research Center)
    Abstract: The increasing availability of data from technologies such as mobile phones, satellites and connected devices means that there are many new possible sources of economic data. This study analyzes the potential use of nighttime light images from satellites to provide a regional distribution of Saudi Arabia’s gross domestic product (GDP).
    Keywords: Cointegration, Economic activity, GDP, Nighttime Satellite data, Production Function
    Date: 2019–12–23
  13. By: islah, khikmatul
    Abstract: Dalam rangka Good Governance, salah satu upaya yang dilakukan adalah dengan mengembangkan paradigma New Public Service. Implementasi dari paradigma ini dapat memberikan pelayanan tanpa adanya diskriminasi, karena seluruh kegiatan pelayanan yang dilakukan oleh pemerintah berorientasi pada pemberian pelayanan prima, serta mewujudkan asas pelayanan publik seperti yang tercantum dalam Undang-Undang nomor 25 Tahun 2009 tentang Pelayanan Publik yaitu asas kepentingan umum, kepastian hukum, kesamaan hak, keseimbangan hak dan kewajiban, keprofesionalan, partisipasif, persamaan perlakuan/ tidak diskriminatif, keterbukaan, akuntabilitas, fasilitas dan perlakuan khusus bagi kelompok rentan, ketepatan waktu, kecepatan, kemudahan dan keterjangkauan. Peningkatan pelayanan publik (public service) harus mendapatkan perhatian utama dari pemerintah, karena pelayanan publik merupakan hak-hak sosial dasar dari masyarakat (social rihgts ataupun fundamental rights). Landasan yuridis pelayanan publik atas hak-hak sosial dasar diatur dalam ketentuan Pasal 18 A ayat (2) dan Pasal 34 ayat (3) UUD 1945. Dengan demikian Undang-Undang Dasar mengatur secara tegas tentang pelayanan publik sebagai wujud hak sosial dasar (the rights to receive). Penolakan atau penyimpangan pelayanan publik adalah bertentangan dengan UUD 1945. Maka dari itu, berkaitan dengan hal tersebut, Pemerintah harus lebih berupaya dalam peningkatan kualitas pelayanan, diantaranya melalui cara inovasi pelayanan dengan memanfaatkan kemajuan teknologi. Salah satu teknologi yang berkembang saat ini adalah teknologi Big Data. Merupakan suatu peluang dan tantangan bagi Pemerintah untuk memanfaatkannya untuk peningkatan mutu pelayanan.
    Date: 2018–07–05
  14. By: Boeing, Geoff (Northeastern University)
    Abstract: Traditionally it had been a problem that researchers did not have access to enough spatial data to answer pressing research questions or build compelling visualizations. Today, however, the problem is often that we have too much data. Spatially redundant or approximately redundant points may refer to a single feature (plus noise) rather than many distinct spatial features. We use a machine learning approach with density-based clustering to compress such spatial data into a set of representative features.
    Date: 2018–03–22
  15. By: Muratova, Anna; Islam, Robiul; Mitrofanova, Ekaterina S.; Ignatov, Dmitry I.
    Abstract: Nowadays there is a large amount of demographic data which should be analyzed and interpreted. From accumulated demographic data, more useful information can be extracted by applying modern methods of data mining. Two kinds of experiments are considered in this work: 1) generation of additional secondary features from events and evaluation of its influence on accuracy; 2) exploration of features influence on classification result using SHAP (SHapley Additive exPlanations). An algorithm for creating secondary features is proposed and applied to the dataset. The classifications were made by two methods, SVM and neural networks, and the results were evaluated. The impact of events and features on the classification results was evaluated using SHAP; it was demonstrated how to tune model for improving accuracy based on the obtained values. Applying convolutional neural network for sequences of events allowed improve classification accuracy and surpass the previous best result on the studied demographic dataset.
    Keywords: data mining; demographics; neural networks; classification; SHAP; interpretation
    JEL: C02 C15 I00 J13
    Date: 2019–09–23
  16. By: Paolo Brunori (Università degli Studi di Firenze); Guido Neidhoefer (ZEW - Leibniz Centre for European Economic Research)
    Abstract: We show that measures of inequality of opportunity fully consistent with Roemer (1998)’s inequality of opportunity theory can be straightforwardly estimated adopting a machine learning approach. Following Roemer, inequality of opportunity is generally defined as inequality between individuals exerting the same degree of effort but characterized by different exogenous circumstances. Due to difficulties of measuring effort, most empirical contributions so far identified groups of individuals sharing same circumstances, and then measured inequality of opportunity as between-group inequality, without considering the e↵ort exerted. Our approach uses regression trees to identify groups of individuals characterized by identical circumstances, and a polynomial approximation to estimate the degree of effort exerted. To apply our method, we take advantage of information contained in 25 waves of the German Socio-Economic Panel. We show that in Germany inequality of opportunity declined immediately after the reunification, surged in the first decade of the century, and slightly declined again after 2010. The level of estimated unequal opportunity is today just above the level recorded in 1992.
    Keywords: Inequality, Opportunity, SOEP, Germany
    JEL: D63 D30 D31
    Date: 2020–01
  17. By: Dali T. Laxton
    Abstract: When technological innovations are implemented in the wind energy sector, we should observe reductions in the production cost of electricity. However, the accuracy of inferring the rate of innovation from production cost reductions is open to challenge when those costs change due to factors not attributable to technological innovation. This study applies an engineering model to generate time-series of wind energy production cost data as the measure of innovation. This approach enables us to exclude factors which are not attributable to technological innovation. In order to illustrate the importance of our measure of innovation, we conduct a learning curve analysis which measures the correlation between deployment of wind energy technology and cost reductions in electricity production. Our data delivers an improved fit of the learning curve in wind energy technology relative to alternative measures of innovation from the literature.
    Keywords: innovation; levelized engineering cost of energy; wind turbine vintages; learning curve;
    JEL: O31 O32 Q28 D83
    Date: 2019–12
  18. By: David Rushing Dewhurst; Yi Li; Alexander Bogdan; Jasmine Geng
    Abstract: Securities markets are quintessential complex adaptive systems in which heterogeneous agents compete in an attempt to maximize returns. Species of trading agents are also subject to evolutionary pressure as entire classes of strategies become obsolete and new classes emerge. Using an agent-based model of interacting heterogeneous agents as a flexible environment that can endogenously model many diverse market conditions, we subject deep neural networks to evolutionary pressure to create dominant trading agents. After analyzing the performance of these agents and noting the emergence of anomalous superdiffusion through the evolutionary process, we construct a method to turn high-fitness agents into trading algorithms. We backtest these trading algorithms on real high-frequency foreign exchange data, demonstrating that elite trading algorithms are consistently profitable in a variety of market conditions---even though these algorithms had never before been exposed to real financial data. These results provide evidence to suggest that developing \textit{ab initio} trading strategies by repeated simulation and evolution in a mechanistic market model may be a practical alternative to explicitly training models with past observed market data.
    Date: 2019–12
  19. By: Strobl, Eric (University of Bern)
    Abstract: We quantify the economic impact of typhoons in the Philippines. To this end we construct a panel data set of local economic activity derived from nightlight intensity satellite images and a cell level measure of typhoon damage constructed from storm track data, a wind field model, and a stylized damage function. Our econometric results reveal that there is a statistically and potentially economically significant, albeit short- lived, impact of typhoon destruction on local economic activity. Constructing risk profiles from a 60-year historical set of storms suggests that (near) future losses in economic activity for frequent (5-year return period) and rare (50-year return period) events are likely to range from between 1.0% and 2.5%.
    Keywords: economic impact; nightlights; Philippines; typhoons; wind field model
    JEL: O17 O44 Q54
    Date: 2019–07–30
  20. By: Li, W.; Fok, D.; Franses, Ph.H.B.F.
    Abstract: This study aims to investigate how much value is added to traditional sales forecast- ing models in marketing by using modern techniques like factor models, Lasso, elastic net, random forest and boosting methods. A benchmark model uses only the focal brand's own information, while the other models include competitive sales and market- ing activities in various ways. An Average Competitor Model (ACM) summarises all competitive information by averages. Factor-augmented models incorporate all or some competitive information by means of common factors. Lasso and elastic net models shrink the coecient estimates of specic competing brands towards zero by adding a shrinkage penalty to the sum of squared residuals. Random forest averages many tree models obtained from bootstrapped samples. Boosting trees grow many small trees sequentially and then average over all the tree models to deliver forecasts. We use these methods to forecast sales of packaged goods one week ahead and compare their pre- dictive performance. Our empirical results for 169 brands across 31 product categories show that the Lasso and elastic net are the safest methods to employ as they are better than the benchmark for most of the brands. The random forest method has better improvement for some of the brands.
    Keywords: Sales forecasting, high-dimensional data, principal components, factor model, Lasso, Elastic Net, random forest, boosting, data mining
    Date: 2019–10–10
  21. By: Boeing, Geoff (Northeastern University)
    Abstract: Models of street networks underlie research in urban travel behavior, accessibility, design patterns, and morphology. These models are commonly defined as planar, meaning they can be represented in two dimensions without any underpasses or overpasses. However, real-world urban street networks exist in three-dimensional space and frequently feature grade separation such as bridges and tunnels: planar simplifications can be useful but they also impact the results of real-world street network analysis. This study measures the nonplanarity of drivable and walkable street networks in the centers of 50 cities worldwide, then examines the variation of nonplanarity across a single city. It develops two new indicators - the Spatial Planarity Ratio and the Edge Length Ratio - to measure planarity and describe infrastructure and urbanization. While some street networks are approximately planar, we empirically quantify how planar models can inconsistently but drastically misrepresent intersection density, street lengths, routing, and connectivity.
    Date: 2018–08–05
  22. By: Boeing, Geoff (Northeastern University)
    Abstract: Street networks may be planned according to clear organizing principles or they may evolve organically through accretion, but their configurations and orientations help define a city’s spatial logic and order. Measures of entropy reveal a city’s streets’ order and disorder. Past studies have explored individual cases of orientation and entropy, but little is known about broader patterns and trends worldwide. This study examines street network orientation, configuration, and entropy in 100 cities around the world using OpenStreetMap data and OSMnx. It measures the entropy of street bearings in weighted and unweighted network models, along with each city’s typical street segment length, average circuity, average node degree, and the network’s proportions of four-way intersections and dead-ends. It also develops a new indicator of orientation-order that quantifies how a city’s street network follows the geometric ordering logic of a single grid. A cluster analysis is performed to explore similarities and differences among these study sites in multiple dimensions. Significant statistical relationships exist between city orientation-order and other indicators of spatial order, including street circuity and measures of connectedness. On average, US/Canadian study sites are far more grid-like than those elsewhere, exhibiting less entropy and circuity. These indicators, taken in concert, help reveal the extent and nuance of the grid. These methods demonstrate automatic, scalable, reproducible tools to empirically measure and visualize city spatial order, illustrating complex urban transportation system patterns and configurations around the world.
    Date: 2018–08–02
  23. By: Boeing, Geoff (Northeastern University)
    Abstract: OpenStreetMap offers a valuable source of worldwide geospatial data useful to urban researchers. This study uses the OSMnx software to automatically download and analyze 27,000 US street networks from OpenStreetMap at metropolitan, municipal, and neighborhood scales - namely, every US city and town, census urbanized area, and Zillow-defined neighborhood. It presents empirical findings on US urban form and street network characteristics, emphasizing measures relevant to graph theory, transportation, urban design, and morphology such as structure, connectedness, density, centrality, and resilience. In the past, street network data acquisition and processing have been challenging and ad hoc. This study illustrates the use of OSMnx and OpenStreetMap to consistently conduct street network analysis with extremely large sample sizes, with clearly defined network definitions and extents for reproducibility, and using nonplanar, directed graphs. These street networks and measures data have been shared in a public repository for other researchers to use.
    Date: 2018–08–02

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.