Title: | Datasets for Spatial Analysis |
---|---|
Description: | Diverse spatial datasets for demonstrating, benchmarking and teaching spatial data analysis. It includes R data of class sf (defined by the package 'sf'), Spatial ('sp'), and nb ('spdep'). Unlike other spatial data packages such as 'rnaturalearth' and 'maps', it also contains data stored in a range of file formats including GeoJSON and GeoPackage, but from version 2.3.4, no longer ESRI Shapefile - use GeoPackage instead. Some of the datasets are designed to illustrate specific analysis techniques. cycle_hire() and cycle_hire_osm(), for example, is designed to illustrate point pattern analysis techniques. |
Authors: | Roger Bivand [aut] , Jakub Nowosad [aut, cre] , Robin Lovelace [aut] , Angelos Mimis [ctb], Mark Monmonier [ctb] (author of the state.vbm dataset), Greg Snow [ctb] (author of the state.vbm dataset) |
Maintainer: | Jakub Nowosad <[email protected]> |
License: | CC0 |
Version: | 2.3.4 |
Built: | 2025-01-07 18:41:28 UTC |
Source: | https://github.com/nowosad/spData |
The afcon
data frame has 42 rows and 5 columns, for 42 African countries, exclusing then South West Africa and Spanish Equatorial Africa and Spanish Sahara. The dataset is used in Anselin (1995), and downloaded from before adaptation. The neighbour list object africa.rook.nb
is the SpaceStat ‘rook.GAL’, but is not the list used in Anselin (1995) - paper.nb
reconstructs the list used in the paper, with inserted links between Mauritania and Morocco, South Africa and Angola and Zambia, Tanzania and Zaire, and Botswana and Zambia. afxy
is the coordinate matrix for the centroids of the countries.
afcon
afcon
This data frame contains the following columns:
x: an easting in decimal degrees (taken as centroid of shapefile polygon)
y: an northing in decimal degrees (taken as centroid of shapefile polygon)
totcon: index of total conflict 1966-78
name: country name
id: country id number as in paper
All source data files prepared by Luc Anselin, Spatial Analysis Laboratory, Department of Agricultural and Consumer Economics, University of Illinois, Urbana-Champaign.
Anselin, L. and John O'Loughlin. 1992. Geography of international conflict and cooperation: spatial dependence and regional context in Africa. In The New Geopolitics, ed. M. Ward, pp. 39-75. Philadelphia, PA: Gordon and Breach. also: Anselin, L. 1995. Local indicators of spatial association, Geographical Analysis, 27, Table 1, p. 103.
data(afcon) if (requireNamespace("spdep", quietly = TRUE)) { library(spdep) plot(africa.rook.nb, afxy) plot(diffnb(paper.nb, africa.rook.nb), afxy, col="red", add=TRUE) text(afxy, labels=attr(africa.rook.nb, "region.id"), pos=4, offset=0.4) moran.test(afcon$totcon, nb2listw(africa.rook.nb)) moran.test(afcon$totcon, nb2listw(paper.nb)) geary.test(afcon$totcon, nb2listw(paper.nb)) }
data(afcon) if (requireNamespace("spdep", quietly = TRUE)) { library(spdep) plot(africa.rook.nb, afxy) plot(diffnb(paper.nb, africa.rook.nb), afxy, col="red", add=TRUE) text(afxy, labels=attr(africa.rook.nb, "region.id"), pos=4, offset=0.4) moran.test(afcon$totcon, nb2listw(africa.rook.nb)) moran.test(afcon$totcon, nb2listw(paper.nb)) geary.test(afcon$totcon, nb2listw(paper.nb)) }
The object loaded is a sf
object containing the state of
Alaska from the US Census Bureau
with a few variables from American Community Survey (ACS)
alaska
alaska
Formal class 'sf' [package "sf"]; the data contains a data.frame with 1 obs. of 7 variables:
GEOID: character vector of geographic identifiers
NAME: character vector of state names
REGION: character vector of region names
AREA: area in square kilometers of units class
total_pop_10: numerical vector of total population in 2010
total_pop_15: numerical vector of total population in 2015
geometry: sfc_MULTIPOLYGON
The object is in projected coordinates using Alaska Albers (EPSG:3467).
https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html
See the tigris package: https://cran.r-project.org/package=tigris
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(alaska) plot(alaska["total_pop_15"]) }
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(alaska) plot(alaska["total_pop_15"]) }
(Use example(auckland)
to load the data from shapefile and generate neighbour list on the fly). The auckland
data frame has 167 rows (census area units — CAU) and 4 columns. The dataset also includes the "nb" object auckland.nb
of neighbour relations based on contiguity, and the "polylist" object auckpolys
of polygon boundaries for the CAU. The auckland
data frame includes the following columns:
auckland
auckland
This data frame contains the following columns:
Easting: a numeric vector of x coordinates in an unknown spatial reference system
Northing: a numeric vector of y coordinates in an unknown spatial reference system
M77_85: a numeric vector of counts of infant (under 5 years of age) deaths in Auckland, 1977-1985
Und5_81: a numeric vector of population under 5 years of age at the 1981 Census
The contiguous neighbours object does not completely replicate results in the sources, and was reconstructed from auckpolys
; examination of figures in the sources suggests that there are differences in detail, although probably not in substance.
Marshall R M (1991) Mapping disease and mortality rates using Empirical Bayes Estimators, Applied Statistics, 40, 283–294; Bailey T, Gatrell A (1995) Interactive Spatial Data Analysis, Harlow: Longman — INFOMAP data set used with permission.
if (requireNamespace("sf", quietly = TRUE)) { auckland <- sf::st_read(system.file("shapes/auckland.gpkg", package="spData")[1]) plot(sf::st_geometry(auckland)) if (requireNamespace("spdep", quietly = TRUE)) { auckland.nb <- spdep::poly2nb(auckland) } }
if (requireNamespace("sf", quietly = TRUE)) { auckland <- sf::st_read(system.file("shapes/auckland.gpkg", package="spData")[1]) plot(sf::st_geometry(auckland)) if (requireNamespace("spdep", quietly = TRUE)) { auckland.nb <- spdep::poly2nb(auckland) } }
House sales price and characteristics for a spatial hedonic regression, Baltimore, MD 1978. X,Y on Maryland grid, projection type unknown.
baltimore
baltimore
A data frame with 211 observations on the following 17 variables.
STATION: a numeric vector
PRICE: a numeric vector
NROOM: a numeric vector
DWELL: a numeric vector
NBATH: a numeric vector
PATIO: a numeric vector
FIREPL: a numeric vector
AC: a numeric vector
BMENT: a numeric vector
NSTOR: a numeric vector
GAR: a numeric vector
AGE: a numeric vector
CITCOU: a numeric vector
LOTSZ: a numeric vector
SQFT: a numeric vector
X: a numeric vector
Y: a numeric vector
Prepared by Luc Anselin. Original data made available by Robin Dubin, Weatherhead School of Management, Case Western Research University, Cleveland, OH. http://sal.agecon.uiuc.edu/datasets/baltimore.zip
Dubin, Robin A. (1992). Spatial autocorrelation and neighborhood quality. Regional Science and Urban Economics 22(3), 433-452.
data(baltimore) str(baltimore) if (requireNamespace("sf", quietly = TRUE)) { library(sf) baltimore_sf <- baltimore %>% st_as_sf(., coords = c("X","Y")) plot(baltimore_sf["PRICE"]) }
data(baltimore) str(baltimore) if (requireNamespace("sf", quietly = TRUE)) { library(sf) baltimore_sf <- baltimore %>% st_as_sf(., coords = c("X","Y")) plot(baltimore_sf["PRICE"]) }
The boston.c
data frame has 506 rows and 20 columns. It contains the Harrison and Rubinfeld (1978) data corrected for a few minor errors and augmented with the latitude and longitude of the observations. Gilley and Pace also point out that MEDV is censored, in that median values at or over USD 50,000 are set to USD 50,000. The original data set without the corrections is also included in package mlbench
as BostonHousing
. In addition, a matrix of tract point coordinates projected to UTM zone 19 is included as boston.utm
, and a sphere of influence neighbours list as boston.soi
.
This data frame contains the following columns:
TOWN: a factor with levels given by town names
TOWNNO: a numeric vector corresponding to TOWN
TRACT: a numeric vector of tract ID numbers
LON: a numeric vector of tract point longitudes in decimal degrees
LAT: a numeric vector of tract point latitudes in decimal degrees
MEDV: a numeric vector of median values of owner-occupied housing in USD 1000
CMEDV: a numeric vector of corrected median values of owner-occupied housing in USD 1000
CRIM: a numeric vector of per capita crime
ZN: a numeric vector of proportions of residential land zoned for lots over 25000 sq. ft per town (constant for all Boston tracts)
INDUS: a numeric vector of proportions of non-retail business acres per town (constant for all Boston tracts)
CHAS: a factor with levels 1 if tract borders Charles River; 0 otherwise
NOX: a numeric vector of nitric oxides concentration (parts per 10 million) per town
RM: a numeric vector of average numbers of rooms per dwelling
AGE: a numeric vector of proportions of owner-occupied units built prior to 1940
DIS: a numeric vector of weighted distances to five Boston employment centres
RAD: a numeric vector of an index of accessibility to radial highways per town (constant for all Boston tracts)
TAX: a numeric vector full-value property-tax rate per USD 10,000 per town (constant for all Boston tracts)
PTRATIO: a numeric vector of pupil-teacher ratios per town (constant for all Boston tracts)
B: a numeric vector of 1000*(Bk - 0.63)^2
where Bk is the
proportion of blacks
LSTAT: a numeric vector of percentage values of lower status population
Details of the creation of the tract GPKG file: tract boundaries for 1990 (formerly at: http://www.census.gov/geo/cob/bdy/tr/tr90shp/tr25_d90_shp.zip, counties in the BOSTON SMSA http://www.census.gov/population/metro/files/lists/historical/63mfips.txt); tract conversion table 1980/1970 (formerly at : https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/7913?q=07913&permit[0]=AVAILABLE, http://www.icpsr.umich.edu/cgi-bin/bob/zipcart2?path=ICPSR&study=7913&bundle=all&ds=1&dups=yes). The shapefile contains corrections and extra variables (tract 3592 is corrected to 3593; the extra columns are:
units: number of single family houses
cu5k: count of units under USD 5,000
c5_7_5: counts USD 5,000 to 7,500
C*_*: interval counts
co50k: count of units over USD 50,000
median: recomputed median values
BB: recomputed black population proportion
censored: whether censored or not
NOXID: NOX model zone ID
POP: tract population
Previously available from http://lib.stat.cmu.edu/datasets/boston_corrected.txt
Harrison, David, and Daniel L. Rubinfeld, Hedonic Housing Prices and the Demand for Clean Air, Journal of Environmental Economics and Management, Volume 5, (1978), 81-102. Original data.
Gilley, O.W., and R. Kelley Pace, On the Harrison and Rubinfeld Data, Journal of Environmental Economics and Management, 31 (1996),403-405. Provided corrections and examined censoring.
Pace, R. Kelley, and O.W. Gilley, Using the Spatial Configuration of the Data to Improve Estimation, Journal of the Real Estate Finance and Economics, 14 (1997), 333-340.
Bivand, Roger. Revisiting the Boston data set - Changing the units of observation affects estimated willingness to pay for clean air. REGION, v. 4, n. 1, p. 109-127, 2017. https://openjournals.wu.ac.at/ojs/index.php/region/article/view/107.
if (requireNamespace("spdep", quietly = TRUE)) { data(boston) hr0 <- lm(log(MEDV) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) + AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT), data = boston.c) summary(hr0) logLik(hr0) gp0 <- lm(log(CMEDV) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) + AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT), data = boston.c) summary(gp0) logLik(gp0) spdep::lm.morantest(hr0, spdep::nb2listw(boston.soi)) } if (requireNamespace("sf", quietly = TRUE)) { boston.tr <- sf::st_read(system.file("shapes/boston_tracts.gpkg", package="spData")[1]) if (requireNamespace("spdep", quietly = TRUE)) { boston_nb <- spdep::poly2nb(boston.tr) } }
if (requireNamespace("spdep", quietly = TRUE)) { data(boston) hr0 <- lm(log(MEDV) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) + AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT), data = boston.c) summary(hr0) logLik(hr0) gp0 <- lm(log(CMEDV) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) + AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT), data = boston.c) summary(gp0) logLik(gp0) spdep::lm.morantest(hr0, spdep::nb2listw(boston.soi)) } if (requireNamespace("sf", quietly = TRUE)) { boston.tr <- sf::st_read(system.file("shapes/boston_tracts.gpkg", package="spData")[1]) if (requireNamespace("spdep", quietly = TRUE)) { boston_nb <- spdep::poly2nb(boston.tr) } }
A tiny dataset containing estimates of global coffee in thousands of 60 kg bags produced by country. Purpose: teaching **not** research.
coffee_data
coffee_data
A data frame (tibble) with 58 for the following 12 variables:
name_long: name of country or coffee variety
coffee_production_2016: production in 2016
coffee_production_2017: production in 2017
The examples section shows how this can be joined with spatial data to create a simple map.
The International Coffee Organization (ICO). See http://www.ico.org/ and http://www.ico.org/prices/m1-exports.pdf
head(coffee_data) ## Not run: if (requireNamespace("dplyr")) { library(dplyr) library(sf) # found by searching for "global coffee data" u = "http://www.ico.org/prices/m1-exports.pdf" download.file(u, "data.pdf", mode = "wb") if (requireNamespace("pdftables")) { # requires api key pdftables::convert_pdf(input_file = "data.pdf", output_file = "coffee-data-messy.csv") d = read_csv("coffee-data-messy.csv") file.remove("coffee-data-messy.csv") file.remove("data.pdf") coffee_data = slice(d, -c(1:9)) %>% select(name_long = 1, coffee_production_2016 = 2, coffee_production_2017 = 3) %>% filter(!is.na(coffee_production_2016)) %>% mutate_at(2:3, str_replace, " ", "") %>% mutate_at(2:3, as.integer) world_coffee = left_join(world, coffee_data) plot(world_coffee[c("coffee_production_2016", "coffee_production_2017")]) b = c(0, 500, 1000, 2000, 3000) library(tmap) tm_shape(world_coffee) + tm_fill("coffee_production_2017", title = "Thousand 60kg bags", breaks = b, textNA = "No data", colorNA = NULL) tmap_mode("view") # for an interactive version }} ## End(Not run)
head(coffee_data) ## Not run: if (requireNamespace("dplyr")) { library(dplyr) library(sf) # found by searching for "global coffee data" u = "http://www.ico.org/prices/m1-exports.pdf" download.file(u, "data.pdf", mode = "wb") if (requireNamespace("pdftables")) { # requires api key pdftables::convert_pdf(input_file = "data.pdf", output_file = "coffee-data-messy.csv") d = read_csv("coffee-data-messy.csv") file.remove("coffee-data-messy.csv") file.remove("data.pdf") coffee_data = slice(d, -c(1:9)) %>% select(name_long = 1, coffee_production_2016 = 2, coffee_production_2017 = 3) %>% filter(!is.na(coffee_production_2016)) %>% mutate_at(2:3, str_replace, " ", "") %>% mutate_at(2:3, as.integer) world_coffee = left_join(world, coffee_data) plot(world_coffee[c("coffee_production_2016", "coffee_production_2017")]) b = c(0, 500, 1000, 2000, 3000) library(tmap) tm_shape(world_coffee) + tm_fill("coffee_production_2017", title = "Thousand 60kg bags", breaks = b, textNA = "No data", colorNA = NULL) tmap_mode("view") # for an interactive version }} ## End(Not run)
The columbus
data frame has 49 rows and 22 columns. Unit of analysis: 49 neighbourhoods in Columbus, OH, 1980 data. In addition the data set includes a polylist
object polys
with the boundaries of the neighbourhoods, a matrix of polygon centroids coords
, and col.gal.nb
, the neighbours list from an original GAL-format file. The matrix bbs
is DEPRECATED, but retained for other packages using this data set.
columbus
columbus
This data frame contains the following columns:
AREA: computed by ArcView
PERIMETER: computed by ArcView
COLUMBUS_: internal polygon ID (ignore)
COLUMBUS_I: another internal polygon ID (ignore)
POLYID: yet another polygon ID
NEIG: neighborhood id value (1-49); conforms to id value used in Spatial Econometrics book.
HOVAL: housing value (in 1,000 USD)
INC: household income (in 1,000 USD)
CRIME: residential burglaries and vehicle thefts per thousand households in the neighborhood
OPEN: open space in neighborhood
PLUMB: percentage housing units without plumbing
DISCBD: distance to CBD
X: x coordinate (in arbitrary digitizing units, not polygon coordinates)
Y: y coordinate (in arbitrary digitizing units, not polygon coordinates)
NSA: north-south dummy (North=1)
NSB: north-south dummy (North=1)
EW: east-west dummy (East=1)
CP: core-periphery dummy (Core=1)
THOUS: constant=1,000
NEIGNO: NEIG+1,000, alternative neighborhood id value
The row names of columbus
and the region.id
attribute of polys
are set to columbus$NEIGNO
.
All source data files prepared by Luc Anselin, Spatial Analysis Laboratory, Department of Agricultural and Consumer Economics, University of Illinois, Urbana-Champaign, http://sal.agecon.uiuc.edu/datasets/columbus.zip.
Anselin, Luc. 1988. Spatial econometrics: methods and models. Dordrecht: Kluwer Academic, Table 12.1 p. 189.
if (requireNamespace("sf", quietly = TRUE)) { columbus <- sf::st_read(system.file("shapes/columbus.gpkg", package="spData")[1]) plot(sf::st_geometry(columbus)) } if (requireNamespace("spdep", quietly = TRUE)) { library(spdep) col.gal.nb <- read.gal(system.file("weights/columbus.gal", package="spData")[1]) }
if (requireNamespace("sf", quietly = TRUE)) { columbus <- sf::st_read(system.file("shapes/columbus.gpkg", package="spData")[1]) plot(sf::st_geometry(columbus)) } if (requireNamespace("spdep", quietly = TRUE)) { library(spdep) col.gal.nb <- read.gal(system.file("weights/columbus.gal", package="spData")[1]) }
Sample of old (incongruent) and new (congruent) administrative zones from UK statistical agencies
congruent
congruent
Simple feature geographic data in a projected CRS (OSGB) with random values assigned for teaching purposes.
https://en.wikipedia.org/wiki/ONS_coding_system
if(requireNamespace("sf", quietly = TRUE)) { library(sf) plot(aggregating_zones$geometry, lwd = 5) plot(congruent$geometry, add = TRUE, border = "green", lwd = 2) plot(incongruent$geometry, add = TRUE, border = "blue", col = NA) rbind(congruent, incongruent) } # Code used to download the data: ## Not run: #devtools::install_github("robinlovelace/ukboundaries") library(sf) library(tmap) library(dplyr) #library(ukboundaries) sel = grepl("003|004", msoa2011_lds$geo_label) aggregating_zones = st_transform(msoa2011_lds[sel, ], 27700) # find lsoas in the aggregating_zones lsoa_touching = st_transform(lsoa2011_lds, 27700)[aggregating_zones, ] lsoa_cents = st_centroid(lsoa_touching) lsoa_cents = lsoa_cents[aggregating_zones, ] sel = lsoa_touching$geo_code %in% lsoa_cents$geo_code # same for ed zones ed_touching = st_transform(ed1981, 27700)[aggregating_zones, ] ed_cents = st_centroid(ed_touching) ed_cents = ed_cents[aggregating_zones, ] incongruent_agg_ed = ed_touching[ed_cents, ] set.seed(2017) incongruent_agg_ed$value = rnorm(nrow(incongruent_agg_ed), mean = 5) congruent = aggregate(incongruent_agg_ed["value"], lsoa_touching[sel, ], mean) congruent$level = "Congruent" congruent = congruent[c("level", "value")] incongruent_cents = st_centroid(incongruent_agg_ed) aggregating_value = st_join(incongruent_cents, congruent)$value.y incongruent_agg = aggregate(incongruent_agg_ed["value"], list(aggregating_value), FUN = mean) incongruent_agg$level = "Incongruent" incongruent = incongruent_agg[c("level", "value")] summary(st_geometry_type(congruent)) summary(st_geometry_type(incongruent)) incongruent = st_cast(incongruent, "MULTIPOLYGON") summary(st_geometry_type(incongruent)) summary(st_geometry_type(aggregating_zones)) devtools::use_data(congruent, overwrite = TRUE) devtools::use_data(incongruent, overwrite = TRUE) devtools::use_data(aggregating_zones, overwrite = TRUE) ## End(Not run)
if(requireNamespace("sf", quietly = TRUE)) { library(sf) plot(aggregating_zones$geometry, lwd = 5) plot(congruent$geometry, add = TRUE, border = "green", lwd = 2) plot(incongruent$geometry, add = TRUE, border = "blue", col = NA) rbind(congruent, incongruent) } # Code used to download the data: ## Not run: #devtools::install_github("robinlovelace/ukboundaries") library(sf) library(tmap) library(dplyr) #library(ukboundaries) sel = grepl("003|004", msoa2011_lds$geo_label) aggregating_zones = st_transform(msoa2011_lds[sel, ], 27700) # find lsoas in the aggregating_zones lsoa_touching = st_transform(lsoa2011_lds, 27700)[aggregating_zones, ] lsoa_cents = st_centroid(lsoa_touching) lsoa_cents = lsoa_cents[aggregating_zones, ] sel = lsoa_touching$geo_code %in% lsoa_cents$geo_code # same for ed zones ed_touching = st_transform(ed1981, 27700)[aggregating_zones, ] ed_cents = st_centroid(ed_touching) ed_cents = ed_cents[aggregating_zones, ] incongruent_agg_ed = ed_touching[ed_cents, ] set.seed(2017) incongruent_agg_ed$value = rnorm(nrow(incongruent_agg_ed), mean = 5) congruent = aggregate(incongruent_agg_ed["value"], lsoa_touching[sel, ], mean) congruent$level = "Congruent" congruent = congruent[c("level", "value")] incongruent_cents = st_centroid(incongruent_agg_ed) aggregating_value = st_join(incongruent_cents, congruent)$value.y incongruent_agg = aggregate(incongruent_agg_ed["value"], list(aggregating_value), FUN = mean) incongruent_agg$level = "Incongruent" incongruent = incongruent_agg[c("level", "value")] summary(st_geometry_type(congruent)) summary(st_geometry_type(incongruent)) incongruent = st_cast(incongruent, "MULTIPOLYGON") summary(st_geometry_type(incongruent)) summary(st_geometry_type(aggregating_zones)) devtools::use_data(congruent, overwrite = TRUE) devtools::use_data(incongruent, overwrite = TRUE) devtools::use_data(aggregating_zones, overwrite = TRUE) ## End(Not run)
Points representing cycle hire points accross London.
cycle_hire
cycle_hire
FORMAT:
id: Id of the hire point
name: Name of the point
area: Area they are in
nbikes: The number of bikes currently parked there
nempty: The number of empty places
geometry: sfc_POINT
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(cycle_hire) # or cycle_hire <- st_read(system.file("shapes/cycle_hire.geojson", package="spData")) plot(cycle_hire) } ## Not run: # Download the data cycle_hire = readr::read_csv("http://cyclehireapp.com/cyclehirelive/cyclehire.csv", col_names = FALSE, skip = TRUE) cycle_hire = cycle_hire[c_names] c_names = c("id", "name", "area", "lat", "lon", "nbikes", "nempty") cycle_hire = st_sf(cycle_hire, st_multipoint(c_names[c("lon", "lat")])) ## End(Not run)
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(cycle_hire) # or cycle_hire <- st_read(system.file("shapes/cycle_hire.geojson", package="spData")) plot(cycle_hire) } ## Not run: # Download the data cycle_hire = readr::read_csv("http://cyclehireapp.com/cyclehirelive/cyclehire.csv", col_names = FALSE, skip = TRUE) cycle_hire = cycle_hire[c_names] c_names = c("id", "name", "area", "lat", "lon", "nbikes", "nempty") cycle_hire = st_sf(cycle_hire, st_multipoint(c_names[c("lon", "lat")])) ## End(Not run)
Dataset downloaded using the osmdata package representing cycle hire points accross London.
cycle_hire_osm
cycle_hire_osm
osm_id: The OSM ID
name: The name of the cycle point
capacity: How many bikes it can take
cyclestreets_id: The ID linked to cyclestreets' photomap
description: Additional description of points
geometry: sfc_POINT
See the osmdata package: https://cran.r-project.org/package=osmdata
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(cycle_hire_osm) # or cycle_hire_osm <- st_read(system.file("shapes/cycle_hire_osm.geojson", package="spData")) plot(cycle_hire_osm) } # Code used to download the data: ## Not run: library(osmdata) library(dplyr) library(sf) q = add_osm_feature(opq = opq("London"), key = "network", value = "tfl_cycle_hire") lnd_cycle_hire = osmdata_sf(q) cycle_hire_osm = lnd_cycle_hire$osm_points nrow(cycle_hire_osm) plot(cycle_hire_osm) cycle_hire_osm = dplyr::select(cycle_hire_osm, osm_id, name, capacity, cyclestreets_id, description) %>% mutate(capacity = as.numeric(capacity)) names(cycle_hire_osm) nrow(cycle_hire_osm) ## End(Not run)
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(cycle_hire_osm) # or cycle_hire_osm <- st_read(system.file("shapes/cycle_hire_osm.geojson", package="spData")) plot(cycle_hire_osm) } # Code used to download the data: ## Not run: library(osmdata) library(dplyr) library(sf) q = add_osm_feature(opq = opq("London"), key = "network", value = "tfl_cycle_hire") lnd_cycle_hire = osmdata_sf(q) cycle_hire_osm = lnd_cycle_hire$osm_points nrow(cycle_hire_osm) plot(cycle_hire_osm) cycle_hire_osm = dplyr::select(cycle_hire_osm, osm_id, name, capacity, cyclestreets_id, description) %>% mutate(capacity = as.numeric(capacity)) names(cycle_hire_osm) nrow(cycle_hire_osm) ## End(Not run)
The geographic boundaries of departments (sf) of the municipality of Athens. This is accompanied by various characteristics in these areas.
depmunic
depmunic
An sf object of 7 polygons with the following 7 variables.
num_dep: An unique identifier for each municipality department.
airbnb: The number of airbnb properties in 2017
museums: The number of museums
population: The population recorded in census at 2011.
pop_rest: The number of citizens that the origin is a non european country.
greensp: The area of green spaces (unit: square meters).
area: The area of the polygon (unit: square kilometers).
properties
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(depmunic) depmunic$foreigners <- 100*depmunic$pop_rest/depmunic$population plot(depmunic["foreigners"], key.pos=1) }
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(depmunic) depmunic$foreigners <- 100*depmunic$pop_rest/depmunic$population plot(depmunic["foreigners"], key.pos=1) }
The Eire data set has been converted to shapefile format and placed in the etc/shapes directory. The initial data objects are now stored as a SpatialPolygonsDataFrame object, from which the contiguity neighbour list is recreated. For purposes of record, the original data set is retained.
The eire.df
data frame has 26 rows and 9 columns. In addition, polygons of the 26 counties are provided as a multipart polylist in eire.polys.utm (coordinates in km, projection UTM zone 30). Their centroids are in eire.coords.utm. The original Cliff and Ord binary contiguities are in eire.nb.
This data frame contains the following columns:
A: Percentage of sample with blood group A
towns: Towns/unit area
pale: Beyond the Pale 0, within the Pale 1
size: number of blood type samples
ROADACC: arterial road network accessibility in 1961
OWNCONS: percentage in value terms of gross agricultural output of each county consumed by itself
POPCHG: 1961 population as percentage of 1926
RETSALE: value of retail sales British Pound000
INCOME: total personal income British Pound000
names: County names
Upton and Fingleton 1985, - Bailey and Gatrell 1995, ch. 1 for blood group data, Cliff and Ord (1973), p. 107 for remaining variables (also after O'Sullivan, 1968). Polygon borders and Irish data sourced from Michael Tiefelsdorf's SPSS Saddlepoint bundle, originally hosted at: http://geog-www.sbs.ohio-state.edu/faculty/tiefelsdorf/GeoStat.htm.
library(spdep) eire <- sf::st_read(system.file("shapes/eire.gpkg", package="spData")[1]) eire.nb <- poly2nb(eire) # Eire physical anthropology blood group data summary(eire$A) brks <- round(fivenum(eire$A), digits=2) cols <- rev(heat.colors(4)) plot(eire, col=cols[findInterval(eire$A, brks, all.inside=TRUE)]) title(main="Percentage with blood group A in Eire") legend(x=c(-50, 70), y=c(6120, 6050), c("under 27.91", "27.91 - 29.26", "29.26 - 31.02", "over 31.02"), fill=cols, bty="n") plot(st_geometry(eire)) plot(eire.nb, st_geometry(eire), add=TRUE) lA <- lag.listw(nb2listw(eire.nb), eire$A) summary(lA) moran.test(eire$A, nb2listw(eire.nb)) geary.test(eire$A, nb2listw(eire.nb)) cor(lA, eire$A) moran.plot(eire$A, nb2listw(eire.nb), labels=eire$names) A.lm <- lm(A ~ towns + pale, data=eire) summary(A.lm) res <- residuals(A.lm) brks <- c(min(res),-2,-1,0,1,2,max(res)) cols <- rev(cm.colors(6)) plot(eire, col=cols[findInterval(res, brks, all.inside=TRUE)]) title(main="Regression residuals") legend(x=c(-50, 70), y=c(6120, 6050), legend=c("under -2", "-2 - -1", "-1 - 0", "0 - 1", "1 - 2", "over 2"), fill=cols, bty="n") lm.morantest(A.lm, nb2listw(eire.nb)) lm.morantest.sad(A.lm, nb2listw(eire.nb)) lm.LMtests(A.lm, nb2listw(eire.nb), test="LMerr") # Eire agricultural data brks <- round(fivenum(eire$OWNCONS), digits=2) cols <- grey(4:1/5) plot(eire, col=cols[findInterval(eire$OWNCONS, brks, all.inside=TRUE)]) title(main="Percentage own consumption of agricultural produce") legend(x=c(-50, 70), y=c(6120, 6050), legend=c("under 9", "9 - 12.2", "12.2 - 19", "over 19"), fill=cols, bty="n") moran.plot(eire$OWNCONS, nb2listw(eire.nb)) moran.test(eire$OWNCONS, nb2listw(eire.nb)) e.lm <- lm(OWNCONS ~ ROADACC, data=eire) res <- residuals(e.lm) brks <- c(min(res),-2,-1,0,1,2,max(res)) cols <- rev(cm.colors(6)) plot(eire, col=cols[findInterval(res, brks, all.inside=TRUE)]) title(main="Regression residuals") legend(x=c(-50, 70), y=c(6120, 6050), legend=c("under -2", "-2 - -1", "-1 - 0", "0 - 1", "1 - 2", "over 2"), fill=cm.colors(6), bty="n") lm.morantest(e.lm, nb2listw(eire.nb)) lm.morantest.sad(e.lm, nb2listw(eire.nb)) lm.LMtests(e.lm, nb2listw(eire.nb), test="LMerr") print(localmoran.sad(e.lm, eire.nb, select=seq(along=eire.nb)))
library(spdep) eire <- sf::st_read(system.file("shapes/eire.gpkg", package="spData")[1]) eire.nb <- poly2nb(eire) # Eire physical anthropology blood group data summary(eire$A) brks <- round(fivenum(eire$A), digits=2) cols <- rev(heat.colors(4)) plot(eire, col=cols[findInterval(eire$A, brks, all.inside=TRUE)]) title(main="Percentage with blood group A in Eire") legend(x=c(-50, 70), y=c(6120, 6050), c("under 27.91", "27.91 - 29.26", "29.26 - 31.02", "over 31.02"), fill=cols, bty="n") plot(st_geometry(eire)) plot(eire.nb, st_geometry(eire), add=TRUE) lA <- lag.listw(nb2listw(eire.nb), eire$A) summary(lA) moran.test(eire$A, nb2listw(eire.nb)) geary.test(eire$A, nb2listw(eire.nb)) cor(lA, eire$A) moran.plot(eire$A, nb2listw(eire.nb), labels=eire$names) A.lm <- lm(A ~ towns + pale, data=eire) summary(A.lm) res <- residuals(A.lm) brks <- c(min(res),-2,-1,0,1,2,max(res)) cols <- rev(cm.colors(6)) plot(eire, col=cols[findInterval(res, brks, all.inside=TRUE)]) title(main="Regression residuals") legend(x=c(-50, 70), y=c(6120, 6050), legend=c("under -2", "-2 - -1", "-1 - 0", "0 - 1", "1 - 2", "over 2"), fill=cols, bty="n") lm.morantest(A.lm, nb2listw(eire.nb)) lm.morantest.sad(A.lm, nb2listw(eire.nb)) lm.LMtests(A.lm, nb2listw(eire.nb), test="LMerr") # Eire agricultural data brks <- round(fivenum(eire$OWNCONS), digits=2) cols <- grey(4:1/5) plot(eire, col=cols[findInterval(eire$OWNCONS, brks, all.inside=TRUE)]) title(main="Percentage own consumption of agricultural produce") legend(x=c(-50, 70), y=c(6120, 6050), legend=c("under 9", "9 - 12.2", "12.2 - 19", "over 19"), fill=cols, bty="n") moran.plot(eire$OWNCONS, nb2listw(eire.nb)) moran.test(eire$OWNCONS, nb2listw(eire.nb)) e.lm <- lm(OWNCONS ~ ROADACC, data=eire) res <- residuals(e.lm) brks <- c(min(res),-2,-1,0,1,2,max(res)) cols <- rev(cm.colors(6)) plot(eire, col=cols[findInterval(res, brks, all.inside=TRUE)]) title(main="Regression residuals") legend(x=c(-50, 70), y=c(6120, 6050), legend=c("under -2", "-2 - -1", "-1 - 0", "0 - 1", "1 - 2", "over 2"), fill=cm.colors(6), bty="n") lm.morantest(e.lm, nb2listw(eire.nb)) lm.morantest.sad(e.lm, nb2listw(eire.nb)) lm.LMtests(e.lm, nb2listw(eire.nb), test="LMerr") print(localmoran.sad(e.lm, eire.nb, select=seq(along=eire.nb)))
A data set for 1980 Presidential election results covering 3,107 US counties using geographical coordinates. In addition, three spatial neighbour objects, k4
not using Great Circle distances, dll
using Great Circle distances, and e80_queen
of Queen contiguities for equivalent County polygons taken from file co1980p020.tar.gz
on the USGS National Atlas site, and a spatial weights object imported from elect.ford
- a 4-nearest neighbour non-GC row-standardised object, but with coercion to symmetry.
elect80
elect80
A SpatialPointsDataFrame with 3107 observations on the following 7 variables.
FIPS: a factor of county FIPS codes
long: a numeric vector of longitude values
lat: a numeric vector of latitude values
pc_turnout: Votes cast as proportion of population over age 19 eligible to vote
pc_college: Population with college degrees as proportion of population over age 19 eligible to vote
pc_homeownership: Homeownership as proportion of population over age 19 eligible to vote
pc_income: Income per capita of population over age 19 eligible to vote
Pace, R. Kelley and Ronald Barry. 1997. "Quick Computation of Spatial Autoregressive Estimators", in Geographical Analysis; sourced from the data folder in the Spatial Econometrics Toolbox for Matlab, formerly available from http://www.spatial-econometrics.com/html/jplv7.zip, files elect.dat
and elect.ford
(with the final line dropped).
if (requireNamespace("sp", quietly = TRUE)) { library(sp) data(elect80) summary(elect80) plot(elect80) }
if (requireNamespace("sp", quietly = TRUE)) { library(sp) data(elect80) summary(elect80) plot(elect80) }
The raster data represents elevation in meters and uses WGS84 as a coordinate reference system.
system.file("raster/elev.tif", package = "spData")
system.file("raster/elev.tif", package = "spData")
The go_xyz
data frame has 256 rows and 3 columns. Vectors go_x
and go_y
are of length 16 and give the centres of the grid rows and columns, 30m apart. The data start from the bottom left, Getis and Ord start from the top left - so their 136th grid cell is our 120th.
This data frame contains the following columns:
x: grid eastings
y: grid northings
val: remote sensing values
Getis, A. and Ord, J. K. 1996 Local spatial statistics: an overview. In P. Longley and M. Batty (eds) Spatial analysis: modelling in a GIS environment (Cambridge: Geoinformation International), 266.
data(getisord) image(go_x, go_y, t(matrix(go_xyz$val, nrow = 16, ncol=16, byrow = TRUE)), asp = 1) text(go_xyz$x, go_xyz$y, go_xyz$val, cex = 0.7) polygon(c(195, 225, 225, 195), c(195, 195, 225, 225), lwd = 2) title(main = "Getis-Ord 1996 remote sensing data")
data(getisord) image(go_x, go_y, t(matrix(go_xyz$val, nrow = 16, ncol=16, byrow = TRUE)), asp = 1) text(go_xyz$x, go_xyz$y, go_xyz$val, cex = 0.7) polygon(c(195, 225, 225, 195), c(195, 195, 225, 225), lwd = 2) title(main = "Getis-Ord 1996 remote sensing data")
The ratified raster dataset represents grain sizes with the three classes clay, silt and sand, and WGS84 as a coordinate reference system.
system.file("raster/grain.tif", package = "spData")
system.file("raster/grain.tif", package = "spData")
The object loaded is a sf
object containing the state of
Hawaii from the US Census Bureau
with a few variables from American Community Survey (ACS)
hawaii
hawaii
Formal class 'sf' [package "sf"]; the data contains a data.frame with 1 obs. of 7 variables:
GEOID: character vector of geographic identifiers
NAME: character vector of state names
REGION: character vector of region names
AREA: area in square kilometers of units class
total_pop_10: numerical vector of total population in 2010
total_pop_15: numerical vector of total population in 2015
geometry: sfc_MULTIPOLYGON
The object is in projected coordinates using Hawaii Albers Equal Area Conic (ESRI:102007).
https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html
See the tigris package: https://cran.r-project.org/package=tigris
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(hawaii) plot(hawaii["total_pop_15"]) }
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(hawaii) plot(hawaii["total_pop_15"]) }
A 20m square is divided into 40 by 40 0.5m quadrats. Observations are in tens of grams of herb remains, 0 being from 0g to less than 10g, and so on. Analysis was mostly conducted using the interior 32 by 32 grid.
hopkins
hopkins
num [1:40, 1:40] 0 0 0 0 0 0 0 0 0 1 ...
Upton, G., Fingleton, B. 1985 Spatial data analysis by example: point pattern and quatitative data, Wiley, pp. 38–39.
Hopkins, B., 1965 Observations on savanna burning in the Olokemeji Forest Reserve, Nigeria. Journal of Applied Ecology, 2, 367–381.
data(hopkins) image(1:32, 1:32, hopkins[5:36,36:5], breaks=c(-0.5, 3.5, 20), col=c("white", "black"))
data(hopkins) image(1:32, 1:32, hopkins[5:36,36:5], breaks=c(-0.5, 3.5, 20), col=c("white", "black"))
Data on 25,357 single family homes sold in Lucas County, Ohio, 1993-1998 from the county auditor, together with an nb
neighbour object constructed as a sphere of influence graph from projected coordinates.
house
house
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots. The data slot is a data frame with 25357 observations on the following 24 variables.
price: a numeric vector
yrbuilt: a numeric vector
stories: a factor with levels one
bilevel
multilvl
one+half
two
two+half
three
TLA: a numeric vector
wall: a factor with levels stucdrvt
ccbtile
metlvnyl
brick
stone
wood
partbrk
beds: a numeric vector
baths: a numeric vector
halfbaths: a numeric vector
frontage: a numeric vector
depth: a numeric vector
garage: a factor with levels no garage
basement
attached
detached
carport
garagesqft: a numeric vector
rooms: a numeric vector
lotsize: a numeric vector
sdate: a numeric vector
avalue: a numeric vector
s1993: a numeric vector
s1994: a numeric vector
s1995: a numeric vector
s1996: a numeric vector
s1997: a numeric vector
s1998: a numeric vector
syear: a factor with levels 1993
1994
1995
1996
1997
1998
age: a numeric vector
Its projection is CRS(+init=epsg:2834)
, the Ohio North State Plane.
Dataset included in the Spatial Econometrics toolbox for Matlab, formerly available from http://www.spatial-econometrics.com/html/jplv7.zip.
if (requireNamespace("sp", quietly = TRUE)) { library(sp) data(house) str(house) plot(house) }
if (requireNamespace("sp", quietly = TRUE)) { library(sp) data(house) str(house) plot(house) }
Prevalence of respiratory symptoms in 71 school catchment areas in Huddersfield, Northern England
huddersfield
huddersfield
A data frame with 71 observations on the following 2 variables.
cases
: Prevalence of at least mild conditions
total
: Number of questionnaires returned
Martuzzi M, Elliott P (1996) Empirical Bayes estimation of small area prevalence of non-rare conditions, Statistics in Medicine 15, 1867–1873, pp. 1870–1871.
data(huddersfield) str(huddersfield)
data(huddersfield) str(huddersfield)
Classic data set for the choice of class intervals for choropleth maps.
jenks71
jenks71
A data frame with 102 observations on the following 2 variables.
jenks71
: a numeric vector: Per acre value of gross farm products in dollars by county for Illinois in #' 1959
area
: a numeric vector: county area in square miles
Jenks, G. F., Caspall, F. C., 1971. "Error on choroplethic maps: definition, measurement, reduction". Annals, Association of American Geographers, 61 (2), 217–244
data(jenks71) jenks71
data(jenks71) jenks71
Polygons representing large administrative zones in London
lnd
lnd
NAME: Borough name
GSS_CODE: Official code
HECTARES: How many hectares
NONLD_AREA: Area outside London
ONS_INNER: Office for national statistics code
SUB_2009: Empty column
SUB_2006: Empty column
geometry: sfc_MULTIPOLYGON
https://github.com/Robinlovelace/Creating-maps-in-R
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(lnd) summary(lnd) plot(st_geometry(lnd)) }
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(lnd) summary(lnd) plot(st_geometry(lnd)) }
(Use example(nc.sids)
to read the data set from shapefile, together with import of two different list of neighbours).
The nc.sids
data frame has 100 rows and 21 columns. It contains data given in Cressie (1991, pp. 386-9), Cressie and Read (1985) and Cressie and Chan (1989) on sudden infant deaths in North Carolina for 1974-78 and 1979-84. The data set also contains the neighbour list given by Cressie and Chan (1989) omitting self-neighbours (ncCC89.nb), and the neighbour list given by Cressie and Read (1985) for contiguities (ncCR85.nb). The data are ordered by county ID number, not alphabetically as in the source tables sidspolys
is a "polylist" object of polygon boundaries, and sidscents
is a matrix of their centroids.
nc.sids
nc.sids
This data frame contains the following columns:
SP_ID: SpatialPolygons ID
CNTY_ID: county ID
east: eastings, county seat, miles, local projection
north: northings, county seat, miles, local projection
L_id: Cressie and Read (1985) L index
M_id: Cressie and Read (1985) M index
names: County names
AREA: County polygon areas in degree units
PERIMETER: County polygon perimeters in degree units
CNTY_: Internal county ID
NAME: County names
FIPS: County ID
FIPSNO: County ID
CRESS_ID: Cressie papers ID
BIR74: births, 1974-78
SID74: SID deaths, 1974-78
NWBIR74: non-white births, 1974-78
BIR79: births, 1979-84
SID79: SID deaths, 1979-84
NWBIR79: non-white births, 1979-84
Cressie, N (1991), Statistics for spatial data. New York: Wiley, pp. 386–389; Cressie, N, Chan NH (1989) Spatial modelling of regional variables. Journal of the American Statistical Association, 84, 393–401; Cressie, N, Read, TRC (1985) Do sudden infant deaths come in clusters? Statistics and Decisions Supplement Issue 2, 333–349; http://sal.agecon.uiuc.edu/datasets/sids.zip.
if (requireNamespace("sf", quietly = TRUE)) { if (requireNamespace("spdep", quietly = TRUE)) { library(spdep) nc.sids <- sf::st_read(system.file("shapes/sids.gpkg", package="spData")[1]) row.names(nc.sids) <- as.character(nc.sids$FIPS) rn <- row.names(nc.sids) ncCC89_nb <- read.gal(system.file("weights/ncCC89.gal", package="spData")[1], region.id=rn) ncCR85_nb <- read.gal(system.file("weights/ncCR85.gal", package="spData")[1], region.id=rn) plot(sf::st_geometry(nc.sids), border="grey") plot(ncCR85_nb, sf::st_geometry(nc.sids), add=TRUE, col="blue") plot(sf::st_geometry(nc.sids), border="grey") plot(ncCC89_nb, sf::st_geometry(nc.sids), add=TRUE, col="blue") } }
if (requireNamespace("sf", quietly = TRUE)) { if (requireNamespace("spdep", quietly = TRUE)) { library(spdep) nc.sids <- sf::st_read(system.file("shapes/sids.gpkg", package="spData")[1]) row.names(nc.sids) <- as.character(nc.sids$FIPS) rn <- row.names(nc.sids) ncCC89_nb <- read.gal(system.file("weights/ncCC89.gal", package="spData")[1], region.id=rn) ncCR85_nb <- read.gal(system.file("weights/ncCR85.gal", package="spData")[1], region.id=rn) plot(sf::st_geometry(nc.sids), border="grey") plot(ncCR85_nb, sf::st_geometry(nc.sids), add=TRUE, col="blue") plot(sf::st_geometry(nc.sids), border="grey") plot(ncCC89_nb, sf::st_geometry(nc.sids), add=TRUE, col="blue") } }
New York leukemia data taken from the data sets supporting Waller and Gotway 2004 (the data should be loaded by running example(NY_data)
to demonstrate spatial data import techniques)
nydata
nydata
A data frame with 281 observations on the following 12 variables, and the binary coded spatial weights used in the source.
AREANAME: name of census tract
AREAKEY: unique FIPS code for each tract
X: x-coordinate of tract centroid (in km)
Y: y-coordinate of tract centroid (in km)
POP8: population size (1980 U.S. Census)
TRACTCAS: number of cases 1978-1982
PROPCAS: proportion of cases per tract
PCTOWNHOME: percentage of people in each tract owning their own home
PCTAGE65P: percentage of people in each tract aged 65 or more
Z: ransformed propoprtions
AVGIDIST: average distance between centroid and TCE sites
PEXPOSURE: "exposure potential": inverse distance between each census tract centroid and the nearest TCE site, IDIST, transformed via log(100*IDIST)
Cases: as TRACTCAS with more digits
Xm: X in metres
Ym: Y in metres
Xshift: feature offset
Yshift: feature offset
The examples section shows how the DBF files from the book website for Chapter 9 were converted into the nydata
data frame and the listw_NY
spatial weights list. The shapes
directory includes the original version of the UTM18 census tract boundaries imported from BNA format (http://sedac.ciesin.columbia.edu/ftpsite/pub/census/usa/tiger/ny/bna_st/t8_36.zip) before the OGR/GDAL BNA driver was available. The NY8_utm18
shapefile was constructed using a bna2mif converter and converted to shapefile format after adding data using writeOGR
. The new file NY8_bna_utm18.gpkg
has been constructed from the original BNA file, but read using the OGR BNA driver with GEOS support. The NY8 shapefile and GeoPackage NY8_utm18.gpkg include invalid polygons, but because the OGR BNA driver may have GEOS support (used here), the tract polygon objects in NY8_bna_utm18.gpkg are valid.
http://www.sph.emory.edu/~lwaller/ch9index.htm
Waller, L. and C. Gotway (2004) Applied Spatial Statistics for Public Health Data. New York: John Wiley and Sons.
## NY leukemia if (requireNamespace("sf", quietly = TRUE)) { library(foreign) nydata <- read.dbf(system.file("misc/nydata.dbf", package="spData")[1]) nydata <- sf::st_as_sf(nydata, coords=c("X", "Y"), remove=FALSE) plot(sf::st_geometry(nydata)) nyadjmat <- as.matrix(read.dbf(system.file("misc/nyadjwts.dbf", package="spData")[1])[-1]) ID <- as.character(names(read.dbf(system.file("misc/nyadjwts.dbf", package="spData")[1]))[-1]) identical(substring(ID, 2, 10), substring(as.character(nydata$AREAKEY), 2, 10)) if (requireNamespace("sf", quietly = TRUE)) { library(spdep) listw_NY <- mat2listw(nyadjmat, as.character(nydata$AREAKEY), style="B") } }
## NY leukemia if (requireNamespace("sf", quietly = TRUE)) { library(foreign) nydata <- read.dbf(system.file("misc/nydata.dbf", package="spData")[1]) nydata <- sf::st_as_sf(nydata, coords=c("X", "Y"), remove=FALSE) plot(sf::st_geometry(nydata)) nyadjmat <- as.matrix(read.dbf(system.file("misc/nyadjwts.dbf", package="spData")[1])[-1]) ID <- as.character(names(read.dbf(system.file("misc/nyadjwts.dbf", package="spData")[1]))[-1]) identical(substring(ID, 2, 10), substring(as.character(nydata$AREAKEY), 2, 10)) if (requireNamespace("sf", quietly = TRUE)) { library(spdep) listw_NY <- mat2listw(nyadjmat, as.character(nydata$AREAKEY), style="B") } }
Polygons representing the 16 regions of New Zealand (2018). See https://en.wikipedia.org/wiki/Regions_of_New_Zealand for a description of these regions and https://www.stats.govt.nz for information on the data source
nz
nz
FORMAT:
Name: Name
Island: Island
Land_area: Land area
Population: Population
Median_income: Median income (NZD)
Sex_ratio: Sex ratio (male/female)
geom: sfc_MULTIPOLYGON
https://en.wikipedia.org/wiki/Regions_of_New_Zealand
See the nzcensus package: https://github.com/ellisp/nzelect
if (requireNamespace("sf", quietly = TRUE)) { library(sf) summary(nz) plot(nz) } ## Not run: # Find "Regional Council 2018 Clipped (generalised)" # select the GeoPackage option in the "Vectors/tables" dropdown # at https://datafinder.stats.govt.nz/data/ (requires registration) # Save the result as: unzip("statsnzregional-council-2018-clipped-generalised-GPKG.zip") library(sf) library(tidyverse) nz_full = st_read("regional-council-2018-clipped-generalised.gpkg") print(object.size(nz_full), units = "Kb") # 14407.2 Kb nz = rmapshaper::ms_simplify(nz_full, keep = 0.001, sys = TRUE) print(object.size(nz), units = "Kb") # 39.9 Kb names(nz) nz$REGC2018_V1_00_NAME nz = filter(nz, REGC2018_V1_00_NAME != "Area Outside Region") %>% select(Name = REGC2018_V1_00_NAME, `Land_area` = LAND_AREA_SQ_KM) # regions basic info # devtools::install_github("hadley/rvest") library(rvest) doc = read_html("https://en.wikipedia.org/wiki/Regions_of_New_Zealand") %>% html_nodes("div table") tab = doc[[3]] %>% html_table() tab = tab %>% select(Name = Region, Population = `Population[20]`, Island) tab = tab %>% mutate(Population = str_replace_all(Population, ",", "")) %>% mutate(Population = as.numeric(Population)) %>% mutate(Name = str_remove_all(Name, " \\([1-9]\\)?.+")) nz$Name = as.character(nz$Name) nz$Name = str_remove(nz$Name, " Region") nz$Name %in% tab$Name # regions additional info library(nzcensus) nz_add_data = REGC2013 %>% select(Name = REGC2013_N, Median_income = MedianIncome2013, PropFemale2013, PropMale2013) %>% mutate(Sex_ratio = PropMale2013 / PropFemale2013) %>% mutate(Name = gsub(" Region", "", Name)) %>% select(Name, Median_income, Sex_ratio) # data join nz = left_join(nz, tab, by = "Name") %>% left_join(nz_add_data, by = "Name") %>% select(Name, Island, Land_area, Population, Median_income, Sex_ratio) ## End(Not run)
if (requireNamespace("sf", quietly = TRUE)) { library(sf) summary(nz) plot(nz) } ## Not run: # Find "Regional Council 2018 Clipped (generalised)" # select the GeoPackage option in the "Vectors/tables" dropdown # at https://datafinder.stats.govt.nz/data/ (requires registration) # Save the result as: unzip("statsnzregional-council-2018-clipped-generalised-GPKG.zip") library(sf) library(tidyverse) nz_full = st_read("regional-council-2018-clipped-generalised.gpkg") print(object.size(nz_full), units = "Kb") # 14407.2 Kb nz = rmapshaper::ms_simplify(nz_full, keep = 0.001, sys = TRUE) print(object.size(nz), units = "Kb") # 39.9 Kb names(nz) nz$REGC2018_V1_00_NAME nz = filter(nz, REGC2018_V1_00_NAME != "Area Outside Region") %>% select(Name = REGC2018_V1_00_NAME, `Land_area` = LAND_AREA_SQ_KM) # regions basic info # devtools::install_github("hadley/rvest") library(rvest) doc = read_html("https://en.wikipedia.org/wiki/Regions_of_New_Zealand") %>% html_nodes("div table") tab = doc[[3]] %>% html_table() tab = tab %>% select(Name = Region, Population = `Population[20]`, Island) tab = tab %>% mutate(Population = str_replace_all(Population, ",", "")) %>% mutate(Population = as.numeric(Population)) %>% mutate(Name = str_remove_all(Name, " \\([1-9]\\)?.+")) nz$Name = as.character(nz$Name) nz$Name = str_remove(nz$Name, " Region") nz$Name %in% tab$Name # regions additional info library(nzcensus) nz_add_data = REGC2013 %>% select(Name = REGC2013_N, Median_income = MedianIncome2013, PropFemale2013, PropMale2013) %>% mutate(Sex_ratio = PropMale2013 / PropFemale2013) %>% mutate(Name = gsub(" Region", "", Name)) %>% select(Name, Median_income, Sex_ratio) # data join nz = left_join(nz, tab, by = "Name") %>% left_join(nz_add_data, by = "Name") %>% select(Name, Island, Land_area, Population, Median_income, Sex_ratio) ## End(Not run)
Top 101 heighest points in New Zealand (2017). See https://data.linz.govt.nz/layer/50284-nz-height-points-topo-150k/ for details.
nz_height
nz_height
FORMAT:
t50_fid: ID
elevation: Height above sea level in m
geometry: sfc_POINT
if (requireNamespace("sf", quietly = TRUE)) { library(sf) summary(nz_height) plot(nz$geom) plot(nz_height$geom, add = TRUE) } ## Not run: library(dplyr) # After downloading data unzip("lds-nz-height-points-topo-150k-SHP.zip") nz_height = st_read("nz-height-points-topo-150k.shp") %>% top_n(n = 100, wt = elevation) library(tmap) tmap_mode("view") qtm(nz) + qtm(nz_height) f = list.files(pattern = "*nz-height*") file.remove(f) ## End(Not run)
if (requireNamespace("sf", quietly = TRUE)) { library(sf) summary(nz_height) plot(nz$geom) plot(nz_height$geom, add = TRUE) } ## Not run: library(dplyr) # After downloading data unzip("lds-nz-height-points-topo-150k-SHP.zip") nz_height = st_read("nz-height-points-topo-150k.shp") %>% top_n(n = 100, wt = elevation) library(tmap) tmap_mode("view") qtm(nz) + qtm(nz_height) f = list.files(pattern = "*nz-height*") file.remove(f) ## End(Not run)
A dataset of apartments in the municipality of Athens for 2017. Point location of the properties is given together with their main characteristics and the distance to the closest metro/train station.
properties
properties
An sf object of 1000 points with the following 6 variables.
id: An unique identifier for each property.
size : The size of the property (unit: square meters)
price : The asking price (unit: euros)
prpsqm : The asking price per squre meter (unit: euroes/square meter).
age : Age of property in 2017 (unit: years).
dist_metro: The distance to closest train/metro station (unit: meters).
depmunic
if (requireNamespace("sf", quietly = TRUE)) { if (requireNamespace("spdep", quietly = TRUE)) { library(sf) library(spdep) data(properties) summary(properties$prpsqm) pr.nb.800 <- dnearneigh(properties, 0, 800) pr.listw <- nb2listw(pr.nb.800) moran.test(properties$prpsqm, pr.listw) moran.plot(properties$prpsqm, pr.listw, xlab = "Price/m^2", ylab = "Lagged") } }
if (requireNamespace("sf", quietly = TRUE)) { if (requireNamespace("spdep", quietly = TRUE)) { library(sf) library(spdep) data(properties) summary(properties$prpsqm) pr.nb.800 <- dnearneigh(properties, 0, 800) pr.listw <- nb2listw(pr.nb.800) moran.test(properties$prpsqm, pr.listw) moran.plot(properties$prpsqm, pr.listw, xlab = "Price/m^2", ylab = "Lagged") } }
Lines representing the Seine, Marne and Yonne rivers.
seine
seine
FORMAT:
name: name
geometry: sfc_MULTILINESTRING
The object is in the RGF93 / Lambert-93 CRS.
https://www.naturalearthdata.com/
See the rnaturalearth package: https://cran.r-project.org/package=rnaturalearth
if (requireNamespace("sf", quietly = TRUE)) { library(sf) seine plot(seine) } ## Not run: library(sf) library(rnaturalearth) library(tidyverse) seine = ne_download(scale = 10, type = "rivers_lake_centerlines", category = "physical", returnclass = "sf") %>% filter(name %in% c("Yonne", "Seine", "Marne")) %>% select(name = name_en) %>% st_transform(2154) ## End(Not run)
if (requireNamespace("sf", quietly = TRUE)) { library(sf) seine plot(seine) } ## Not run: library(sf) library(rnaturalearth) library(tidyverse) seine = ne_download(scale = 10, type = "rivers_lake_centerlines", category = "physical", returnclass = "sf") %>% filter(name %in% c("Yonne", "Seine", "Marne")) %>% select(name = name_en) %>% st_transform(2154) ## End(Not run)
Data for Splash Dams in western Oregon
SplashDams
SplashDams
Formal class 'SpatialPointsDataFrame with 232 obs. of 6 variables:
streamName
locationCode
height
lastDate
owner
datesUsed
R. R. Miller (2010) Is the Past Present? Historical Splash-dam Mapping and Stream Disturbance Detection in the Oregon Coastal Province. MSc. thesis, Oregon State University; packaged by Jonathan Callahan
if (requireNamespace("sp", quietly = TRUE)) { library(sp) data(SplashDams) plot(SplashDams, axes=TRUE) }
if (requireNamespace("sp", quietly = TRUE)) { library(sp) data(SplashDams) plot(SplashDams, axes=TRUE) }
A SpatialPolygonsDataFrame object to plot a Visibility Based Map.
state.vbm
state.vbm
An object of class SpatialPolygonsDataFrame
with 50 rows and 2 columns.
A SpatialPolygonsDataFrame object to plot a map of the US states where the sizes of the states have been adjusted to be more equal. This map can be useful for plotting state data using colors patterns without the larger states dominating and the smallest states being lost. The original map is copyrighted by Mark Monmonier. Official publications based on this map should acknowledge him. Comercial publications of maps based on this probably need permission from him to use.
Greg Snow [email protected] (of this compilation)
The data was converted from the maps library for S-PLUS. S-PLUS uses the map with permission from the author. This version of the data has not received permission from the author (no attempt made, not that it was refused), most of my uses I feel fall under fair use and do not violate copyright, but you will need to decide for yourself and your applications.
http://www.markmonmonier.com/index.htm, http://euclid.psych.yorku.ca/SCS/Gallery/bright-ideas.html
if (requireNamespace("sp", quietly = TRUE)) { library(sp) data(state.vbm) plot(state.vbm) tmp <- state.x77[, 'HS Grad'] tmp2 <- cut(tmp, seq(min(tmp), max(tmp), length.out=11), include.lowest=TRUE) plot(state.vbm, col=cm.colors(10)[tmp2]) }
if (requireNamespace("sp", quietly = TRUE)) { library(sp) data(state.vbm) plot(state.vbm) tmp <- state.x77[, 'HS Grad'] tmp2 <- cut(tmp, seq(min(tmp), max(tmp), length.out=11), include.lowest=TRUE) plot(state.vbm, col=cm.colors(10)[tmp2]) }
Dataset in a 'long' form from the United Nations population division with projections up to 2050. Includes only the top 30 largest areas by population at 5 year intervals.
urban_agglomerations
urban_agglomerations
Selected variables:
year: Year of population estimate
country_code: Code of country
urban_agglomeration: Name of the urban agglomeration
population_millions: Estimated human population
geometry: sfc_POINT
if (requireNamespace("sf", quietly = TRUE)) { library(sf) plot(urban_agglomerations) } # Code used to download the data: ## Not run: f = "WUP2018-F11b-30_Largest_Cities_in_2018_by_time.xls" download.file( destfile = f, url = paste0("https://population.un.org/wup/Download/Files/", f) ) library(dplyr) library(sf) urban_agglomerations = readxl::read_excel(f, skip = 16) %>% st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326) names(urban_agglomerations) names(urban_agglomerations) <- gsub(" |\\n", "_", tolower(names(urban_agglomerations)) ) %>% gsub("\\(|\\)", "", .) names(urban_agglomerations) urban_agglomerations usethis::use_data(urban_agglomerations, overwrite = TRUE) file.remove("WUP2018-F11b-30_Largest_Cities_in_2018_by_time.xls") ## End(Not run)
if (requireNamespace("sf", quietly = TRUE)) { library(sf) plot(urban_agglomerations) } # Code used to download the data: ## Not run: f = "WUP2018-F11b-30_Largest_Cities_in_2018_by_time.xls" download.file( destfile = f, url = paste0("https://population.un.org/wup/Download/Files/", f) ) library(dplyr) library(sf) urban_agglomerations = readxl::read_excel(f, skip = 16) %>% st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326) names(urban_agglomerations) names(urban_agglomerations) <- gsub(" |\\n", "_", tolower(names(urban_agglomerations)) ) %>% gsub("\\(|\\)", "", .) names(urban_agglomerations) urban_agglomerations usethis::use_data(urban_agglomerations, overwrite = TRUE) file.remove("WUP2018-F11b-30_Largest_Cities_in_2018_by_time.xls") ## End(Not run)
The object loaded is a sf
object containing the contiguous United States data from the US Census Bureau
with a few variables from American Community Survey (ACS)
us_states
us_states
Formal class 'sf' [package "sf"]; the data contains a data.frame with 49 obs. of 7 variables:
GEOID: character vector of geographic identifiers
NAME: character vector of state names
REGION: character vector of region names
AREA: area in square kilometers of units class
total_pop_10: numerical vector of total population in 2010
total_pop_15: numerical vector of total population in 2015
geometry: sfc_MULTIPOLYGON
The object is in geographical coordinates using the NAD83 datum.
https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html
See the tigris package: https://cran.r-project.org/package=tigris
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(us_states) plot(us_states["REGION"]) }
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(us_states) plot(us_states["REGION"]) }
The object loaded is a data.frame
object containing the US states data from the American Community Survey (ACS)
us_states_df
us_states_df
Formal class 'data.frame'; the data contains a data.frame with 51 obs. of 5 variables:
state: character vector of state names
median_income_10: numerical vector of median income in 2010
median_income_15: numerical vector of median income in 2010
poverty_level_10: numerical vector of number of people with income below poverty level in 2010
poverty_level_15: numerical vector of number of people with income below poverty level in 2015
https://www.census.gov/programs-surveys/acs/
See the tidycensus package: https://cran.r-project.org/package=tidycensus
data(us_states_df) summary(us_states_df)
data(us_states_df) summary(us_states_df)
The used.cars
data frame has 48 rows and 2 columns. The data set includes a neighbours list for the 48 states excluding DC from poly2nb().
used.cars
used.cars
This data frame contains the following columns:
tax.charges: taxes and delivery charges for 1955-9 new cars
price.1960: 1960 used car prices by state
Hanna, F. A. 1966 Effects of regional differences in taxes and transport charges on automobile consumption, in Ostry, S., Rhymes, J. K. (eds) Papers on regional statistical studies, Toronto: Toronto University Press, pp. 199-223.
Hepple, L. W. 1976 A maximum likelihood model for econometric estimation with spatial series, in Masser, I (ed) Theory and practice in regional science, London: Pion, pp. 90-104.
if (requireNamespace("spdep", quietly = TRUE)) { library(spdep) data(used.cars) moran.test(used.cars$price.1960, nb2listw(usa48.nb)) moran.plot(used.cars$price.1960, nb2listw(usa48.nb), labels=rownames(used.cars)) uc.lm <- lm(price.1960 ~ tax.charges, data=used.cars) summary(uc.lm) lm.morantest(uc.lm, nb2listw(usa48.nb)) lm.morantest.sad(uc.lm, nb2listw(usa48.nb)) lm.LMtests(uc.lm, nb2listw(usa48.nb)) if (requireNamespace("spatialreg", quietly = TRUE)) { library(spatialreg) uc.err <- errorsarlm(price.1960 ~ tax.charges, data=used.cars, nb2listw(usa48.nb), tol.solve=1.0e-13, control=list(tol.opt=.Machine$double.eps^0.3)) summary(uc.err) uc.lag <- lagsarlm(price.1960 ~ tax.charges, data=used.cars, nb2listw(usa48.nb), tol.solve=1.0e-13, control=list(tol.opt=.Machine$double.eps^0.3)) summary(uc.lag) uc.lag1 <- lagsarlm(price.1960 ~ 1, data=used.cars, nb2listw(usa48.nb), tol.solve=1.0e-13, control=list(tol.opt=.Machine$double.eps^0.3)) summary(uc.lag1) uc.err1 <- errorsarlm(price.1960 ~ 1, data=used.cars, nb2listw(usa48.nb), tol.solve=1.0e-13, control=list(tol.opt=.Machine$double.eps^0.3), Durbin=FALSE) summary(uc.err1) } }
if (requireNamespace("spdep", quietly = TRUE)) { library(spdep) data(used.cars) moran.test(used.cars$price.1960, nb2listw(usa48.nb)) moran.plot(used.cars$price.1960, nb2listw(usa48.nb), labels=rownames(used.cars)) uc.lm <- lm(price.1960 ~ tax.charges, data=used.cars) summary(uc.lm) lm.morantest(uc.lm, nb2listw(usa48.nb)) lm.morantest.sad(uc.lm, nb2listw(usa48.nb)) lm.LMtests(uc.lm, nb2listw(usa48.nb)) if (requireNamespace("spatialreg", quietly = TRUE)) { library(spatialreg) uc.err <- errorsarlm(price.1960 ~ tax.charges, data=used.cars, nb2listw(usa48.nb), tol.solve=1.0e-13, control=list(tol.opt=.Machine$double.eps^0.3)) summary(uc.err) uc.lag <- lagsarlm(price.1960 ~ tax.charges, data=used.cars, nb2listw(usa48.nb), tol.solve=1.0e-13, control=list(tol.opt=.Machine$double.eps^0.3)) summary(uc.lag) uc.lag1 <- lagsarlm(price.1960 ~ 1, data=used.cars, nb2listw(usa48.nb), tol.solve=1.0e-13, control=list(tol.opt=.Machine$double.eps^0.3)) summary(uc.lag1) uc.err1 <- errorsarlm(price.1960 ~ 1, data=used.cars, nb2listw(usa48.nb), tol.solve=1.0e-13, control=list(tol.opt=.Machine$double.eps^0.3), Durbin=FALSE) summary(uc.err1) } }
Mercer and Hall wheat yield data, based on version in Cressie (1993), p. 455.
wheat
wheat
The format of the object generated by running data(wheat)
is a three column data frame made available by Hongfei Li. The example section shows how to convert this to the object used in demonstrating the aple
function, and is a formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots; the data slot is a data frame with 500 observations on the following 6 variables.
lat: local coordinates northings ordered north to south
yield: Mercer and Hall wheat yield data
r: rows south to north; levels in distance units of plot centres
c: columns west to east; levels in distance units of plot centres
lon: local coordinates eastings
lat1: local coordinates northings ordered south to north
The value of 4.03 was changed to 4.33 (wheat[71,]) 13 January 2014; thanks to Sandy Burden; cross-checked with http://www.itc.nl/personal/rossiter/teach/R/mhw.csv, which agrees.
Cressie, N. A. C. (1993) Statistics for Spatial Data. Wiley, New York, p. 455.
Mercer, W. B. and Hall, A. D. (1911) The experimental error of field trials. Journal of Agricultural Science 4, 107-132.
if (requireNamespace("sp", quietly = TRUE)) { library(sp) data(wheat) wheat$lat1 <- 69 - wheat$lat wheat$r <- factor(wheat$lat1) wheat$c <- factor(wheat$lon) wheat_sp <- wheat coordinates(wheat_sp) <- c("lon", "lat1") wheat_spg <- wheat_sp gridded(wheat_spg) <- TRUE wheat_spl <- as(wheat_spg, "SpatialPolygons") df <- as(wheat_spg, "data.frame") row.names(df) <- sapply(slot(wheat_spl, "polygons"), function(x) slot(x, "ID")) wheat <- SpatialPolygonsDataFrame(wheat_spl, data=df) } if (requireNamespace("sf", quietly = TRUE)) { library(sf) wheat <- st_read(system.file("shapes/wheat.gpkg", package="spData")) plot(wheat) }
if (requireNamespace("sp", quietly = TRUE)) { library(sp) data(wheat) wheat$lat1 <- 69 - wheat$lat wheat$r <- factor(wheat$lat1) wheat$c <- factor(wheat$lon) wheat_sp <- wheat coordinates(wheat_sp) <- c("lon", "lat1") wheat_spg <- wheat_sp gridded(wheat_spg) <- TRUE wheat_spl <- as(wheat_spg, "SpatialPolygons") df <- as(wheat_spg, "data.frame") row.names(df) <- sapply(slot(wheat_spl, "polygons"), function(x) slot(x, "ID")) wheat <- SpatialPolygonsDataFrame(wheat_spl, data=df) } if (requireNamespace("sf", quietly = TRUE)) { library(sf) wheat <- st_read(system.file("shapes/wheat.gpkg", package="spData")) plot(wheat) }
The object loaded is a sf
object containing a world map data from Natural Earth with a few variables from World Bank
world
world
Formal class 'sf' [package "sf"]; the data contains a data.frame with 177 obs. of 11 variables:
iso_a2: character vector of ISO 2 character country codes
name_long: character vector of country names
continent: character vector of continent names
region_un: character vector of region names
subregion: character vector of subregion names
type: character vector of type names
area_km2: integer vector of area values
pop: integer vector of population in 2014
lifeExp: integer vector of life expectancy at birth in 2014
gdpPercap: integer vector of per-capita GDP in 2014
geom: sfc_MULTIPOLYGON
The object is in geographical coordinates using the WGS84 datum.
https://www.naturalearthdata.com/
See the rnaturalearth package: https://cran.r-project.org/package=rnaturalearth
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(world) # or world <- st_read(system.file("shapes/world.gpkg", package="spData")) plot(world) }
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(world) # or world <- st_read(system.file("shapes/world.gpkg", package="spData")) plot(world) }
The object loaded is a data.frame
object containing data from World Bank
worldbank_df
worldbank_df
Formal class 'data.frame'; the data contains a data.frame with 177 obs. of 7 variables:
name: character vector of country names
iso_a2: character vector of ISO 2 character country codes
HDI: human development index (HDI)
urban_pop: urban population
unemployment: unemployment, total (% of total labor force)
pop_growth: population growth (annual %)
literacy: adult literacy rate, population 15+ years, both sexes (%)
See the wbstats package: https://cran.r-project.org/web/packages/wbstats
data(worldbank_df) # or worldbank_df <- read.csv(system.file("misc/worldbank_df.csv", package="spData")) summary(worldbank_df)
data(worldbank_df) # or worldbank_df <- read.csv(system.file("misc/worldbank_df.csv", package="spData")) summary(worldbank_df)