Package 'spData'

Title: Datasets for Spatial Analysis
Description: Diverse spatial datasets for demonstrating, benchmarking and teaching spatial data analysis. It includes R data of class sf (defined by the package 'sf'), Spatial ('sp'), and nb ('spdep'). Unlike other spatial data packages such as 'rnaturalearth' and 'maps', it also contains data stored in a range of file formats including GeoJSON and GeoPackage, but from version 2.3.4, no longer ESRI Shapefile - use GeoPackage instead. Some of the datasets are designed to illustrate specific analysis techniques. cycle_hire() and cycle_hire_osm(), for example, is designed to illustrate point pattern analysis techniques.
Authors: Roger Bivand [aut] , Jakub Nowosad [aut, cre] , Robin Lovelace [aut] , Angelos Mimis [ctb], Mark Monmonier [ctb] (author of the state.vbm dataset), Greg Snow [ctb] (author of the state.vbm dataset)
Maintainer: Jakub Nowosad <[email protected]>
License: CC0
Version: 2.3.4
Built: 2025-01-07 18:41:28 UTC
Source: https://github.com/nowosad/spData

Help Index


Spatial patterns of conflict in Africa 1966-78

Description

The afcon data frame has 42 rows and 5 columns, for 42 African countries, exclusing then South West Africa and Spanish Equatorial Africa and Spanish Sahara. The dataset is used in Anselin (1995), and downloaded from before adaptation. The neighbour list object africa.rook.nb is the SpaceStat ‘rook.GAL’, but is not the list used in Anselin (1995) - paper.nb reconstructs the list used in the paper, with inserted links between Mauritania and Morocco, South Africa and Angola and Zambia, Tanzania and Zaire, and Botswana and Zambia. afxy is the coordinate matrix for the centroids of the countries.

Usage

afcon

Format

This data frame contains the following columns:

  • x: an easting in decimal degrees (taken as centroid of shapefile polygon)

  • y: an northing in decimal degrees (taken as centroid of shapefile polygon)

  • totcon: index of total conflict 1966-78

  • name: country name

  • id: country id number as in paper

Note

All source data files prepared by Luc Anselin, Spatial Analysis Laboratory, Department of Agricultural and Consumer Economics, University of Illinois, Urbana-Champaign.

Source

Anselin, L. and John O'Loughlin. 1992. Geography of international conflict and cooperation: spatial dependence and regional context in Africa. In The New Geopolitics, ed. M. Ward, pp. 39-75. Philadelphia, PA: Gordon and Breach. also: Anselin, L. 1995. Local indicators of spatial association, Geographical Analysis, 27, Table 1, p. 103.

Examples

data(afcon)
if (requireNamespace("spdep", quietly = TRUE)) {
  library(spdep)
  plot(africa.rook.nb, afxy)
  plot(diffnb(paper.nb, africa.rook.nb), afxy, col="red", add=TRUE)
  text(afxy, labels=attr(africa.rook.nb, "region.id"), pos=4, offset=0.4)
  moran.test(afcon$totcon, nb2listw(africa.rook.nb))
  moran.test(afcon$totcon, nb2listw(paper.nb))
  geary.test(afcon$totcon, nb2listw(paper.nb))
}

Alaska multipolygon

Description

The object loaded is a sf object containing the state of Alaska from the US Census Bureau with a few variables from American Community Survey (ACS)

Usage

alaska

Format

Formal class 'sf' [package "sf"]; the data contains a data.frame with 1 obs. of 7 variables:

  • GEOID: character vector of geographic identifiers

  • NAME: character vector of state names

  • REGION: character vector of region names

  • AREA: area in square kilometers of units class

  • total_pop_10: numerical vector of total population in 2010

  • total_pop_15: numerical vector of total population in 2015

  • geometry: sfc_MULTIPOLYGON

The object is in projected coordinates using Alaska Albers (EPSG:3467).

Source

https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html

See Also

See the tigris package: https://cran.r-project.org/package=tigris

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  data(alaska)

  plot(alaska["total_pop_15"])
}

Marshall's infant mortality in Auckland dataset

Description

(Use example(auckland) to load the data from shapefile and generate neighbour list on the fly). The auckland data frame has 167 rows (census area units — CAU) and 4 columns. The dataset also includes the "nb" object auckland.nb of neighbour relations based on contiguity, and the "polylist" object auckpolys of polygon boundaries for the CAU. The auckland data frame includes the following columns:

Usage

auckland

Format

This data frame contains the following columns:

  • Easting: a numeric vector of x coordinates in an unknown spatial reference system

  • Northing: a numeric vector of y coordinates in an unknown spatial reference system

  • M77_85: a numeric vector of counts of infant (under 5 years of age) deaths in Auckland, 1977-1985

  • Und5_81: a numeric vector of population under 5 years of age at the 1981 Census

Details

The contiguous neighbours object does not completely replicate results in the sources, and was reconstructed from auckpolys; examination of figures in the sources suggests that there are differences in detail, although probably not in substance.

Source

Marshall R M (1991) Mapping disease and mortality rates using Empirical Bayes Estimators, Applied Statistics, 40, 283–294; Bailey T, Gatrell A (1995) Interactive Spatial Data Analysis, Harlow: Longman — INFOMAP data set used with permission.

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  auckland <- sf::st_read(system.file("shapes/auckland.gpkg", package="spData")[1])
  plot(sf::st_geometry(auckland))
  if (requireNamespace("spdep", quietly = TRUE)) {
    auckland.nb <- spdep::poly2nb(auckland)
  }
}

House sales prices, Baltimore, MD 1978

Description

House sales price and characteristics for a spatial hedonic regression, Baltimore, MD 1978. X,Y on Maryland grid, projection type unknown.

Usage

baltimore

Format

A data frame with 211 observations on the following 17 variables.

  • STATION: a numeric vector

  • PRICE: a numeric vector

  • NROOM: a numeric vector

  • DWELL: a numeric vector

  • NBATH: a numeric vector

  • PATIO: a numeric vector

  • FIREPL: a numeric vector

  • AC: a numeric vector

  • BMENT: a numeric vector

  • NSTOR: a numeric vector

  • GAR: a numeric vector

  • AGE: a numeric vector

  • CITCOU: a numeric vector

  • LOTSZ: a numeric vector

  • SQFT: a numeric vector

  • X: a numeric vector

  • Y: a numeric vector

Source

Prepared by Luc Anselin. Original data made available by Robin Dubin, Weatherhead School of Management, Case Western Research University, Cleveland, OH. http://sal.agecon.uiuc.edu/datasets/baltimore.zip

References

Dubin, Robin A. (1992). Spatial autocorrelation and neighborhood quality. Regional Science and Urban Economics 22(3), 433-452.

Examples

data(baltimore)
str(baltimore)

if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  baltimore_sf <- baltimore %>% st_as_sf(., coords = c("X","Y"))
  plot(baltimore_sf["PRICE"])
}

Corrected Boston Housing Data

Description

The boston.c data frame has 506 rows and 20 columns. It contains the Harrison and Rubinfeld (1978) data corrected for a few minor errors and augmented with the latitude and longitude of the observations. Gilley and Pace also point out that MEDV is censored, in that median values at or over USD 50,000 are set to USD 50,000. The original data set without the corrections is also included in package mlbench as BostonHousing. In addition, a matrix of tract point coordinates projected to UTM zone 19 is included as boston.utm, and a sphere of influence neighbours list as boston.soi.

Format

This data frame contains the following columns:

  • TOWN: a factor with levels given by town names

  • TOWNNO: a numeric vector corresponding to TOWN

  • TRACT: a numeric vector of tract ID numbers

  • LON: a numeric vector of tract point longitudes in decimal degrees

  • LAT: a numeric vector of tract point latitudes in decimal degrees

  • MEDV: a numeric vector of median values of owner-occupied housing in USD 1000

  • CMEDV: a numeric vector of corrected median values of owner-occupied housing in USD 1000

  • CRIM: a numeric vector of per capita crime

  • ZN: a numeric vector of proportions of residential land zoned for lots over 25000 sq. ft per town (constant for all Boston tracts)

  • INDUS: a numeric vector of proportions of non-retail business acres per town (constant for all Boston tracts)

  • CHAS: a factor with levels 1 if tract borders Charles River; 0 otherwise

  • NOX: a numeric vector of nitric oxides concentration (parts per 10 million) per town

  • RM: a numeric vector of average numbers of rooms per dwelling

  • AGE: a numeric vector of proportions of owner-occupied units built prior to 1940

  • DIS: a numeric vector of weighted distances to five Boston employment centres

  • RAD: a numeric vector of an index of accessibility to radial highways per town (constant for all Boston tracts)

  • TAX: a numeric vector full-value property-tax rate per USD 10,000 per town (constant for all Boston tracts)

  • PTRATIO: a numeric vector of pupil-teacher ratios per town (constant for all Boston tracts)

  • B: a numeric vector of 1000*(Bk - 0.63)^2 where Bk is the proportion of blacks

  • LSTAT: a numeric vector of percentage values of lower status population

Note

Details of the creation of the tract GPKG file: tract boundaries for 1990 (formerly at: http://www.census.gov/geo/cob/bdy/tr/tr90shp/tr25_d90_shp.zip, counties in the BOSTON SMSA http://www.census.gov/population/metro/files/lists/historical/63mfips.txt); tract conversion table 1980/1970 (formerly at : https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/7913?q=07913&permit[0]=AVAILABLE, http://www.icpsr.umich.edu/cgi-bin/bob/zipcart2?path=ICPSR&study=7913&bundle=all&ds=1&dups=yes). The shapefile contains corrections and extra variables (tract 3592 is corrected to 3593; the extra columns are:

  • units: number of single family houses

  • cu5k: count of units under USD 5,000

  • c5_7_5: counts USD 5,000 to 7,500

  • C*_*: interval counts

  • co50k: count of units over USD 50,000

  • median: recomputed median values

  • BB: recomputed black population proportion

  • censored: whether censored or not

  • NOXID: NOX model zone ID

  • POP: tract population

Source

Previously available from http://lib.stat.cmu.edu/datasets/boston_corrected.txt

References

Harrison, David, and Daniel L. Rubinfeld, Hedonic Housing Prices and the Demand for Clean Air, Journal of Environmental Economics and Management, Volume 5, (1978), 81-102. Original data.

Gilley, O.W., and R. Kelley Pace, On the Harrison and Rubinfeld Data, Journal of Environmental Economics and Management, 31 (1996),403-405. Provided corrections and examined censoring.

Pace, R. Kelley, and O.W. Gilley, Using the Spatial Configuration of the Data to Improve Estimation, Journal of the Real Estate Finance and Economics, 14 (1997), 333-340.

Bivand, Roger. Revisiting the Boston data set - Changing the units of observation affects estimated willingness to pay for clean air. REGION, v. 4, n. 1, p. 109-127, 2017. https://openjournals.wu.ac.at/ojs/index.php/region/article/view/107.

Examples

if (requireNamespace("spdep", quietly = TRUE)) {
  data(boston)
  hr0 <- lm(log(MEDV) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) +
                    AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT), data = boston.c)
  summary(hr0)
  logLik(hr0)
  gp0 <- lm(log(CMEDV) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) +
                    AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT), data = boston.c)
  summary(gp0)
  logLik(gp0)
  spdep::lm.morantest(hr0, spdep::nb2listw(boston.soi))
}
if (requireNamespace("sf", quietly = TRUE)) {
boston.tr <- sf::st_read(system.file("shapes/boston_tracts.gpkg",
                           package="spData")[1])
  if (requireNamespace("spdep", quietly = TRUE)) {
    boston_nb <- spdep::poly2nb(boston.tr)
  }
}

World coffee production data

Description

A tiny dataset containing estimates of global coffee in thousands of 60 kg bags produced by country. Purpose: teaching **not** research.

Usage

coffee_data

Format

A data frame (tibble) with 58 for the following 12 variables:

  • name_long: name of country or coffee variety

  • coffee_production_2016: production in 2016

  • coffee_production_2017: production in 2017

Details

The examples section shows how this can be joined with spatial data to create a simple map.

Source

The International Coffee Organization (ICO). See http://www.ico.org/ and http://www.ico.org/prices/m1-exports.pdf

Examples

head(coffee_data)
## Not run: 
if (requireNamespace("dplyr")) {
library(dplyr)
library(sf)
# found by searching for "global coffee data"
u = "http://www.ico.org/prices/m1-exports.pdf"
download.file(u, "data.pdf", mode = "wb")
if (requireNamespace("pdftables")) { # requires api key
pdftables::convert_pdf(input_file = "data.pdf", output_file = "coffee-data-messy.csv")
d = read_csv("coffee-data-messy.csv")
file.remove("coffee-data-messy.csv")
file.remove("data.pdf")
coffee_data = slice(d, -c(1:9)) %>% 
        select(name_long = 1, coffee_production_2016 = 2, coffee_production_2017 = 3) %>% 
        filter(!is.na(coffee_production_2016)) %>% 
        mutate_at(2:3, str_replace, " ", "") %>% 
        mutate_at(2:3, as.integer)
world_coffee = left_join(world, coffee_data)
plot(world_coffee[c("coffee_production_2016", "coffee_production_2017")])
b = c(0, 500, 1000, 2000, 3000)
library(tmap)
tm_shape(world_coffee) +
  tm_fill("coffee_production_2017", title = "Thousand 60kg bags", breaks = b,
          textNA = "No data", colorNA = NULL)
tmap_mode("view") # for an interactive version
}}

## End(Not run)

Columbus OH spatial analysis data set

Description

The columbus data frame has 49 rows and 22 columns. Unit of analysis: 49 neighbourhoods in Columbus, OH, 1980 data. In addition the data set includes a polylist object polys with the boundaries of the neighbourhoods, a matrix of polygon centroids coords, and col.gal.nb, the neighbours list from an original GAL-format file. The matrix bbs is DEPRECATED, but retained for other packages using this data set.

Usage

columbus

Format

This data frame contains the following columns:

  • AREA: computed by ArcView

  • PERIMETER: computed by ArcView

  • COLUMBUS_: internal polygon ID (ignore)

  • COLUMBUS_I: another internal polygon ID (ignore)

  • POLYID: yet another polygon ID

  • NEIG: neighborhood id value (1-49); conforms to id value used in Spatial Econometrics book.

  • HOVAL: housing value (in 1,000 USD)

  • INC: household income (in 1,000 USD)

  • CRIME: residential burglaries and vehicle thefts per thousand households in the neighborhood

  • OPEN: open space in neighborhood

  • PLUMB: percentage housing units without plumbing

  • DISCBD: distance to CBD

  • X: x coordinate (in arbitrary digitizing units, not polygon coordinates)

  • Y: y coordinate (in arbitrary digitizing units, not polygon coordinates)

  • NSA: north-south dummy (North=1)

  • NSB: north-south dummy (North=1)

  • EW: east-west dummy (East=1)

  • CP: core-periphery dummy (Core=1)

  • THOUS: constant=1,000

  • NEIGNO: NEIG+1,000, alternative neighborhood id value

Details

The row names of columbus and the region.id attribute of polys are set to columbus$NEIGNO.

Note

All source data files prepared by Luc Anselin, Spatial Analysis Laboratory, Department of Agricultural and Consumer Economics, University of Illinois, Urbana-Champaign, http://sal.agecon.uiuc.edu/datasets/columbus.zip.

Source

Anselin, Luc. 1988. Spatial econometrics: methods and models. Dordrecht: Kluwer Academic, Table 12.1 p. 189.

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  columbus <- sf::st_read(system.file("shapes/columbus.gpkg", package="spData")[1])
  plot(sf::st_geometry(columbus))
}

if (requireNamespace("spdep", quietly = TRUE)) {
  library(spdep)
  col.gal.nb <- read.gal(system.file("weights/columbus.gal", package="spData")[1])
}

Datasets to illustrate the concept of spatial congruence

Description

Sample of old (incongruent) and new (congruent) administrative zones from UK statistical agencies

Usage

congruent

Format

Simple feature geographic data in a projected CRS (OSGB) with random values assigned for teaching purposes.

Source

https://en.wikipedia.org/wiki/ONS_coding_system

Examples

if(requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  plot(aggregating_zones$geometry, lwd = 5)
  plot(congruent$geometry, add = TRUE, border = "green", lwd = 2)
  plot(incongruent$geometry, add = TRUE, border = "blue", col = NA)
  rbind(congruent, incongruent)
}
# Code used to download the data:
## Not run: 
#devtools::install_github("robinlovelace/ukboundaries")
library(sf)
library(tmap)
library(dplyr)
#library(ukboundaries)
sel = grepl("003|004", msoa2011_lds$geo_label)
aggregating_zones = st_transform(msoa2011_lds[sel, ], 27700)
# find lsoas in the aggregating_zones
lsoa_touching = st_transform(lsoa2011_lds, 27700)[aggregating_zones, ]
lsoa_cents = st_centroid(lsoa_touching)
lsoa_cents = lsoa_cents[aggregating_zones, ]
sel = lsoa_touching$geo_code %in% lsoa_cents$geo_code
# same for ed zones
ed_touching = st_transform(ed1981, 27700)[aggregating_zones, ]
ed_cents = st_centroid(ed_touching)
ed_cents = ed_cents[aggregating_zones, ]
incongruent_agg_ed = ed_touching[ed_cents, ]
set.seed(2017)
incongruent_agg_ed$value = rnorm(nrow(incongruent_agg_ed), mean = 5)
congruent = aggregate(incongruent_agg_ed["value"], lsoa_touching[sel, ], mean)
congruent$level = "Congruent"
congruent = congruent[c("level", "value")]
incongruent_cents = st_centroid(incongruent_agg_ed)
aggregating_value = st_join(incongruent_cents, congruent)$value.y
incongruent_agg = aggregate(incongruent_agg_ed["value"], list(aggregating_value), FUN = mean)
incongruent_agg$level = "Incongruent"
incongruent = incongruent_agg[c("level", "value")]
summary(st_geometry_type(congruent))
summary(st_geometry_type(incongruent))
incongruent = st_cast(incongruent, "MULTIPOLYGON")
summary(st_geometry_type(incongruent))
summary(st_geometry_type(aggregating_zones))
devtools::use_data(congruent, overwrite = TRUE)
devtools::use_data(incongruent, overwrite = TRUE)
devtools::use_data(aggregating_zones, overwrite = TRUE)

## End(Not run)

Cycle hire points in London

Description

Points representing cycle hire points accross London.

Usage

cycle_hire

Format

FORMAT:

  • id: Id of the hire point

  • name: Name of the point

  • area: Area they are in

  • nbikes: The number of bikes currently parked there

  • nempty: The number of empty places

  • geometry: sfc_POINT

Source

https://www.data.gov.uk/

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  data(cycle_hire)
  # or
  cycle_hire <- st_read(system.file("shapes/cycle_hire.geojson", package="spData"))
  
  plot(cycle_hire)
}

## Not run: 
# Download the data
cycle_hire = readr::read_csv("http://cyclehireapp.com/cyclehirelive/cyclehire.csv", 
  col_names = FALSE, skip = TRUE)
cycle_hire = cycle_hire[c_names]
c_names = c("id", "name", "area", "lat", "lon", "nbikes", "nempty")
cycle_hire = st_sf(cycle_hire, st_multipoint(c_names[c("lon", "lat")]))

## End(Not run)

Cycle hire points in London from OSM

Description

Dataset downloaded using the osmdata package representing cycle hire points accross London.

Usage

cycle_hire_osm

Format

  • osm_id: The OSM ID

  • name: The name of the cycle point

  • capacity: How many bikes it can take

  • cyclestreets_id: The ID linked to cyclestreets' photomap

  • description: Additional description of points

  • geometry: sfc_POINT

Source

https://www.openstreetmap.org

See Also

See the osmdata package: https://cran.r-project.org/package=osmdata

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  data(cycle_hire_osm)
  # or
  cycle_hire_osm <- st_read(system.file("shapes/cycle_hire_osm.geojson", package="spData"))
  
  plot(cycle_hire_osm)
}

# Code used to download the data:
## Not run: 
library(osmdata)
library(dplyr)
library(sf)
q = add_osm_feature(opq = opq("London"), key = "network", value = "tfl_cycle_hire")
lnd_cycle_hire = osmdata_sf(q)
cycle_hire_osm = lnd_cycle_hire$osm_points
nrow(cycle_hire_osm)
plot(cycle_hire_osm)
cycle_hire_osm = dplyr::select(cycle_hire_osm, osm_id, name, capacity, 
                               cyclestreets_id, description) %>%
  mutate(capacity = as.numeric(capacity))
names(cycle_hire_osm)
nrow(cycle_hire_osm)

## End(Not run)

Municipality departments of Athens (Sf)

Description

The geographic boundaries of departments (sf) of the municipality of Athens. This is accompanied by various characteristics in these areas.

Usage

depmunic

Format

An sf object of 7 polygons with the following 7 variables.

  • num_dep: An unique identifier for each municipality department.

  • airbnb: The number of airbnb properties in 2017

  • museums: The number of museums

  • population: The population recorded in census at 2011.

  • pop_rest: The number of citizens that the origin is a non european country.

  • greensp: The area of green spaces (unit: square meters).

  • area: The area of the polygon (unit: square kilometers).

See Also

properties

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  data(depmunic)
  
  depmunic$foreigners <- 100*depmunic$pop_rest/depmunic$population
  plot(depmunic["foreigners"], key.pos=1)
}

Eire data sets

Description

The Eire data set has been converted to shapefile format and placed in the etc/shapes directory. The initial data objects are now stored as a SpatialPolygonsDataFrame object, from which the contiguity neighbour list is recreated. For purposes of record, the original data set is retained. The eire.df data frame has 26 rows and 9 columns. In addition, polygons of the 26 counties are provided as a multipart polylist in eire.polys.utm (coordinates in km, projection UTM zone 30). Their centroids are in eire.coords.utm. The original Cliff and Ord binary contiguities are in eire.nb.

Format

This data frame contains the following columns:

  • A: Percentage of sample with blood group A

  • towns: Towns/unit area

  • pale: Beyond the Pale 0, within the Pale 1

  • size: number of blood type samples

  • ROADACC: arterial road network accessibility in 1961

  • OWNCONS: percentage in value terms of gross agricultural output of each county consumed by itself

  • POPCHG: 1961 population as percentage of 1926

  • RETSALE: value of retail sales British Pound000

  • INCOME: total personal income British Pound000

  • names: County names

Source

Upton and Fingleton 1985, - Bailey and Gatrell 1995, ch. 1 for blood group data, Cliff and Ord (1973), p. 107 for remaining variables (also after O'Sullivan, 1968). Polygon borders and Irish data sourced from Michael Tiefelsdorf's SPSS Saddlepoint bundle, originally hosted at: http://geog-www.sbs.ohio-state.edu/faculty/tiefelsdorf/GeoStat.htm.

Examples

library(spdep)
eire <- sf::st_read(system.file("shapes/eire.gpkg", package="spData")[1])
eire.nb <- poly2nb(eire)

# Eire physical anthropology blood group data
summary(eire$A)
brks <- round(fivenum(eire$A), digits=2)
cols <- rev(heat.colors(4))
plot(eire, col=cols[findInterval(eire$A, brks, all.inside=TRUE)])
title(main="Percentage with blood group A in Eire")
legend(x=c(-50, 70), y=c(6120, 6050), 
  c("under 27.91", "27.91 - 29.26", "29.26 - 31.02", "over 31.02"),
  fill=cols, bty="n")

plot(st_geometry(eire))
plot(eire.nb, st_geometry(eire), add=TRUE)

lA <- lag.listw(nb2listw(eire.nb), eire$A)
summary(lA)
moran.test(eire$A, nb2listw(eire.nb))
geary.test(eire$A, nb2listw(eire.nb))
cor(lA, eire$A)
moran.plot(eire$A, nb2listw(eire.nb), labels=eire$names)
A.lm <- lm(A ~ towns + pale, data=eire)
summary(A.lm)
res <- residuals(A.lm)
brks <- c(min(res),-2,-1,0,1,2,max(res))
cols <- rev(cm.colors(6))

plot(eire, col=cols[findInterval(res, brks, all.inside=TRUE)])
title(main="Regression residuals")
legend(x=c(-50, 70), y=c(6120, 6050),
  legend=c("under -2", "-2 - -1", "-1 - 0", "0 - 1", "1 - 2", "over 2"),
  fill=cols, bty="n")

lm.morantest(A.lm, nb2listw(eire.nb))
lm.morantest.sad(A.lm, nb2listw(eire.nb))
lm.LMtests(A.lm, nb2listw(eire.nb), test="LMerr")

# Eire agricultural data
brks <- round(fivenum(eire$OWNCONS), digits=2)
cols <- grey(4:1/5)
plot(eire, col=cols[findInterval(eire$OWNCONS, brks, all.inside=TRUE)])
title(main="Percentage own consumption of agricultural produce")
legend(x=c(-50, 70), y=c(6120, 6050),
  legend=c("under 9", "9 - 12.2", "12.2 - 19", "over 19"), fill=cols, bty="n")

moran.plot(eire$OWNCONS, nb2listw(eire.nb))
moran.test(eire$OWNCONS, nb2listw(eire.nb))
e.lm <- lm(OWNCONS ~ ROADACC, data=eire)
res <- residuals(e.lm)
brks <- c(min(res),-2,-1,0,1,2,max(res))
cols <- rev(cm.colors(6))
plot(eire, col=cols[findInterval(res, brks, all.inside=TRUE)])
title(main="Regression residuals")
legend(x=c(-50, 70), y=c(6120, 6050),
  legend=c("under -2", "-2 - -1", "-1 - 0", "0 - 1", "1 - 2", "over 2"),
  fill=cm.colors(6), bty="n")

lm.morantest(e.lm, nb2listw(eire.nb))
lm.morantest.sad(e.lm, nb2listw(eire.nb))
lm.LMtests(e.lm, nb2listw(eire.nb), test="LMerr")
print(localmoran.sad(e.lm, eire.nb, select=seq(along=eire.nb)))

1980 Presidential election results

Description

A data set for 1980 Presidential election results covering 3,107 US counties using geographical coordinates. In addition, three spatial neighbour objects, k4 not using Great Circle distances, dll using Great Circle distances, and e80_queen of Queen contiguities for equivalent County polygons taken from file co1980p020.tar.gz on the USGS National Atlas site, and a spatial weights object imported from elect.ford - a 4-nearest neighbour non-GC row-standardised object, but with coercion to symmetry.

Usage

elect80

Format

A SpatialPointsDataFrame with 3107 observations on the following 7 variables.

  • FIPS: a factor of county FIPS codes

  • long: a numeric vector of longitude values

  • lat: a numeric vector of latitude values

  • pc_turnout: Votes cast as proportion of population over age 19 eligible to vote

  • pc_college: Population with college degrees as proportion of population over age 19 eligible to vote

  • pc_homeownership: Homeownership as proportion of population over age 19 eligible to vote

  • pc_income: Income per capita of population over age 19 eligible to vote

Source

Pace, R. Kelley and Ronald Barry. 1997. "Quick Computation of Spatial Autoregressive Estimators", in Geographical Analysis; sourced from the data folder in the Spatial Econometrics Toolbox for Matlab, formerly available from http://www.spatial-econometrics.com/html/jplv7.zip, files elect.dat and elect.ford (with the final line dropped).

Examples

if (requireNamespace("sp", quietly = TRUE)) {
  library(sp)
  data(elect80)
  summary(elect80)
  plot(elect80)
}

Artificial elevation raster data set

Description

The raster data represents elevation in meters and uses WGS84 as a coordinate reference system.

Examples

system.file("raster/elev.tif", package = "spData")

Getis-Ord remote sensing example data

Description

The go_xyz data frame has 256 rows and 3 columns. Vectors go_x and go_y are of length 16 and give the centres of the grid rows and columns, 30m apart. The data start from the bottom left, Getis and Ord start from the top left - so their 136th grid cell is our 120th.

Format

This data frame contains the following columns:

  • x: grid eastings

  • y: grid northings

  • val: remote sensing values

Source

Getis, A. and Ord, J. K. 1996 Local spatial statistics: an overview. In P. Longley and M. Batty (eds) Spatial analysis: modelling in a GIS environment (Cambridge: Geoinformation International), 266.

Examples

data(getisord)
image(go_x, go_y, t(matrix(go_xyz$val, nrow = 16, ncol=16, byrow = TRUE)), asp = 1)
text(go_xyz$x, go_xyz$y, go_xyz$val, cex = 0.7)
polygon(c(195, 225, 225, 195), c(195, 195, 225, 225), lwd = 2)
title(main = "Getis-Ord 1996 remote sensing data")

Artificial raster dataset representing grain sizes

Description

The ratified raster dataset represents grain sizes with the three classes clay, silt and sand, and WGS84 as a coordinate reference system.

Examples

system.file("raster/grain.tif", package = "spData")

Hawaii multipolygon

Description

The object loaded is a sf object containing the state of Hawaii from the US Census Bureau with a few variables from American Community Survey (ACS)

Usage

hawaii

Format

Formal class 'sf' [package "sf"]; the data contains a data.frame with 1 obs. of 7 variables:

  • GEOID: character vector of geographic identifiers

  • NAME: character vector of state names

  • REGION: character vector of region names

  • AREA: area in square kilometers of units class

  • total_pop_10: numerical vector of total population in 2010

  • total_pop_15: numerical vector of total population in 2015

  • geometry: sfc_MULTIPOLYGON

The object is in projected coordinates using Hawaii Albers Equal Area Conic (ESRI:102007).

Source

https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html

See Also

See the tigris package: https://cran.r-project.org/package=tigris

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  data(hawaii)

  plot(hawaii["total_pop_15"])
}

Hopkins burnt savanna herb remains

Description

A 20m square is divided into 40 by 40 0.5m quadrats. Observations are in tens of grams of herb remains, 0 being from 0g to less than 10g, and so on. Analysis was mostly conducted using the interior 32 by 32 grid.

Usage

hopkins

Format

num [1:40, 1:40] 0 0 0 0 0 0 0 0 0 1 ...

Source

Upton, G., Fingleton, B. 1985 Spatial data analysis by example: point pattern and quatitative data, Wiley, pp. 38–39.

References

Hopkins, B., 1965 Observations on savanna burning in the Olokemeji Forest Reserve, Nigeria. Journal of Applied Ecology, 2, 367–381.

Examples

data(hopkins)
image(1:32, 1:32, hopkins[5:36,36:5], breaks=c(-0.5, 3.5, 20),
      col=c("white", "black"))

Lucas county OH housing

Description

Data on 25,357 single family homes sold in Lucas County, Ohio, 1993-1998 from the county auditor, together with an nb neighbour object constructed as a sphere of influence graph from projected coordinates.

Usage

house

Format

Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots. The data slot is a data frame with 25357 observations on the following 24 variables.

  • price: a numeric vector

  • yrbuilt: a numeric vector

  • stories: a factor with levels one bilevel multilvl one+half two two+half three

  • TLA: a numeric vector

  • wall: a factor with levels stucdrvt ccbtile metlvnyl brick stone wood partbrk

  • beds: a numeric vector

  • baths: a numeric vector

  • halfbaths: a numeric vector

  • frontage: a numeric vector

  • depth: a numeric vector

  • garage: a factor with levels no garage basement attached detached carport

  • garagesqft: a numeric vector

  • rooms: a numeric vector

  • lotsize: a numeric vector

  • sdate: a numeric vector

  • avalue: a numeric vector

  • s1993: a numeric vector

  • s1994: a numeric vector

  • s1995: a numeric vector

  • s1996: a numeric vector

  • s1997: a numeric vector

  • s1998: a numeric vector

  • syear: a factor with levels 1993 1994 1995 1996 1997 1998

  • age: a numeric vector

Its projection is CRS(+init=epsg:2834), the Ohio North State Plane.

Source

Dataset included in the Spatial Econometrics toolbox for Matlab, formerly available from http://www.spatial-econometrics.com/html/jplv7.zip.

Examples

if (requireNamespace("sp", quietly = TRUE)) {
  library(sp)
  data(house)
  str(house)
  plot(house)
}

Prevalence of respiratory symptoms

Description

Prevalence of respiratory symptoms in 71 school catchment areas in Huddersfield, Northern England

Usage

huddersfield

Format

A data frame with 71 observations on the following 2 variables.

  • cases: Prevalence of at least mild conditions

  • total: Number of questionnaires returned

Source

Martuzzi M, Elliott P (1996) Empirical Bayes estimation of small area prevalence of non-rare conditions, Statistics in Medicine 15, 1867–1873, pp. 1870–1871.

Examples

data(huddersfield)
str(huddersfield)

Illinois 1959 county gross farm product value per acre

Description

Classic data set for the choice of class intervals for choropleth maps.

Usage

jenks71

Format

A data frame with 102 observations on the following 2 variables.

  • jenks71: a numeric vector: Per acre value of gross farm products in dollars by county for Illinois in #' 1959

  • area: a numeric vector: county area in square miles

Source

Jenks, G. F., Caspall, F. C., 1971. "Error on choroplethic maps: definition, measurement, reduction". Annals, Association of American Geographers, 61 (2), 217–244

Examples

data(jenks71)
jenks71

The boroughs of London

Description

Polygons representing large administrative zones in London

Usage

lnd

Format

  • NAME: Borough name

  • GSS_CODE: Official code

  • HECTARES: How many hectares

  • NONLD_AREA: Area outside London

  • ONS_INNER: Office for national statistics code

  • SUB_2009: Empty column

  • SUB_2006: Empty column

  • geometry: sfc_MULTIPOLYGON

Source

https://github.com/Robinlovelace/Creating-maps-in-R

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  data(lnd)
  summary(lnd)
  plot(st_geometry(lnd))
}

North Carolina SIDS data

Description

(Use example(nc.sids) to read the data set from shapefile, together with import of two different list of neighbours). The nc.sids data frame has 100 rows and 21 columns. It contains data given in Cressie (1991, pp. 386-9), Cressie and Read (1985) and Cressie and Chan (1989) on sudden infant deaths in North Carolina for 1974-78 and 1979-84. The data set also contains the neighbour list given by Cressie and Chan (1989) omitting self-neighbours (ncCC89.nb), and the neighbour list given by Cressie and Read (1985) for contiguities (ncCR85.nb). The data are ordered by county ID number, not alphabetically as in the source tables sidspolys is a "polylist" object of polygon boundaries, and sidscents is a matrix of their centroids.

Usage

nc.sids

Format

This data frame contains the following columns:

  • SP_ID: SpatialPolygons ID

  • CNTY_ID: county ID

  • east: eastings, county seat, miles, local projection

  • north: northings, county seat, miles, local projection

  • L_id: Cressie and Read (1985) L index

  • M_id: Cressie and Read (1985) M index

  • names: County names

  • AREA: County polygon areas in degree units

  • PERIMETER: County polygon perimeters in degree units

  • CNTY_: Internal county ID

  • NAME: County names

  • FIPS: County ID

  • FIPSNO: County ID

  • CRESS_ID: Cressie papers ID

  • BIR74: births, 1974-78

  • SID74: SID deaths, 1974-78

  • NWBIR74: non-white births, 1974-78

  • BIR79: births, 1979-84

  • SID79: SID deaths, 1979-84

  • NWBIR79: non-white births, 1979-84

Source

Cressie, N (1991), Statistics for spatial data. New York: Wiley, pp. 386–389; Cressie, N, Chan NH (1989) Spatial modelling of regional variables. Journal of the American Statistical Association, 84, 393–401; Cressie, N, Read, TRC (1985) Do sudden infant deaths come in clusters? Statistics and Decisions Supplement Issue 2, 333–349; http://sal.agecon.uiuc.edu/datasets/sids.zip.

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  if (requireNamespace("spdep", quietly = TRUE)) {
    library(spdep)
    nc.sids <- sf::st_read(system.file("shapes/sids.gpkg", package="spData")[1])
    row.names(nc.sids) <- as.character(nc.sids$FIPS)
    rn <- row.names(nc.sids)
    ncCC89_nb <- read.gal(system.file("weights/ncCC89.gal", package="spData")[1],
                          region.id=rn)
    ncCR85_nb <- read.gal(system.file("weights/ncCR85.gal", package="spData")[1],
                          region.id=rn)
                          
    plot(sf::st_geometry(nc.sids), border="grey")
    plot(ncCR85_nb, sf::st_geometry(nc.sids), add=TRUE, col="blue")
    plot(sf::st_geometry(nc.sids), border="grey")
    plot(ncCC89_nb, sf::st_geometry(nc.sids), add=TRUE, col="blue")
  }
}

New York leukemia data

Description

New York leukemia data taken from the data sets supporting Waller and Gotway 2004 (the data should be loaded by running example(NY_data) to demonstrate spatial data import techniques)

Usage

nydata

Format

A data frame with 281 observations on the following 12 variables, and the binary coded spatial weights used in the source.

  • AREANAME: name of census tract

  • AREAKEY: unique FIPS code for each tract

  • X: x-coordinate of tract centroid (in km)

  • Y: y-coordinate of tract centroid (in km)

  • POP8: population size (1980 U.S. Census)

  • TRACTCAS: number of cases 1978-1982

  • PROPCAS: proportion of cases per tract

  • PCTOWNHOME: percentage of people in each tract owning their own home

  • PCTAGE65P: percentage of people in each tract aged 65 or more

  • Z: ransformed propoprtions

  • AVGIDIST: average distance between centroid and TCE sites

  • PEXPOSURE: "exposure potential": inverse distance between each census tract centroid and the nearest TCE site, IDIST, transformed via log(100*IDIST)

  • Cases: as TRACTCAS with more digits

  • Xm: X in metres

  • Ym: Y in metres

  • Xshift: feature offset

  • Yshift: feature offset

Details

The examples section shows how the DBF files from the book website for Chapter 9 were converted into the nydata data frame and the listw_NY spatial weights list. The shapes directory includes the original version of the UTM18 census tract boundaries imported from BNA format (http://sedac.ciesin.columbia.edu/ftpsite/pub/census/usa/tiger/ny/bna_st/t8_36.zip) before the OGR/GDAL BNA driver was available. The NY8_utm18 shapefile was constructed using a bna2mif converter and converted to shapefile format after adding data using writeOGR. The new file NY8_bna_utm18.gpkg has been constructed from the original BNA file, but read using the OGR BNA driver with GEOS support. The NY8 shapefile and GeoPackage NY8_utm18.gpkg include invalid polygons, but because the OGR BNA driver may have GEOS support (used here), the tract polygon objects in NY8_bna_utm18.gpkg are valid.

Source

http://www.sph.emory.edu/~lwaller/ch9index.htm

References

Waller, L. and C. Gotway (2004) Applied Spatial Statistics for Public Health Data. New York: John Wiley and Sons.

Examples

## NY leukemia

if (requireNamespace("sf", quietly = TRUE)) {
library(foreign)
nydata <- read.dbf(system.file("misc/nydata.dbf", package="spData")[1])
nydata <- sf::st_as_sf(nydata, coords=c("X", "Y"), remove=FALSE)
plot(sf::st_geometry(nydata))

nyadjmat <- as.matrix(read.dbf(system.file("misc/nyadjwts.dbf",
                                           package="spData")[1])[-1])
ID <- as.character(names(read.dbf(system.file("misc/nyadjwts.dbf",
                                              package="spData")[1]))[-1])
identical(substring(ID, 2, 10), substring(as.character(nydata$AREAKEY), 2, 10))

if (requireNamespace("sf", quietly = TRUE)) {
library(spdep)
listw_NY <- mat2listw(nyadjmat, as.character(nydata$AREAKEY), style="B")
}
}

Regions in New Zealand

Description

Polygons representing the 16 regions of New Zealand (2018). See https://en.wikipedia.org/wiki/Regions_of_New_Zealand for a description of these regions and https://www.stats.govt.nz for information on the data source

Usage

nz

Format

FORMAT:

  • Name: Name

  • Island: Island

  • Land_area: Land area

  • Population: Population

  • Median_income: Median income (NZD)

  • Sex_ratio: Sex ratio (male/female)

  • geom: sfc_MULTIPOLYGON

Source

https://www.stats.govt.nz

https://en.wikipedia.org/wiki/Regions_of_New_Zealand

See Also

See the nzcensus package: https://github.com/ellisp/nzelect

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  summary(nz)
  plot(nz)
}
## Not run: 
# Find "Regional Council 2018 Clipped (generalised)"
# select the GeoPackage option in the "Vectors/tables" dropdown
# at https://datafinder.stats.govt.nz/data/ (requires registration)
# Save the result as:
unzip("statsnzregional-council-2018-clipped-generalised-GPKG.zip")
library(sf)
library(tidyverse)
nz_full = st_read("regional-council-2018-clipped-generalised.gpkg")
print(object.size(nz_full), units = "Kb") # 14407.2 Kb
nz = rmapshaper::ms_simplify(nz_full, keep = 0.001, sys = TRUE)
print(object.size(nz), units = "Kb") # 39.9 Kb
names(nz)
nz$REGC2018_V1_00_NAME
nz = filter(nz, REGC2018_V1_00_NAME != "Area Outside Region") %>%
        select(Name = REGC2018_V1_00_NAME, `Land_area` = LAND_AREA_SQ_KM)
# regions basic info
# devtools::install_github("hadley/rvest")
library(rvest)
doc = read_html("https://en.wikipedia.org/wiki/Regions_of_New_Zealand") %>%
        html_nodes("div table")
tab = doc[[3]] %>% html_table()
tab = tab %>% select(Name = Region, Population = `Population[20]`, Island)
tab = tab %>% mutate(Population = str_replace_all(Population, ",", "")) %>%
        mutate(Population = as.numeric(Population)) %>%
        mutate(Name = str_remove_all(Name, " \\([1-9]\\)?.+"))
nz$Name = as.character(nz$Name)
nz$Name = str_remove(nz$Name, " Region")
nz$Name %in% tab$Name
# regions additional info
library(nzcensus)
nz_add_data = REGC2013 %>% 
        select(Name = REGC2013_N, Median_income = MedianIncome2013, 
               PropFemale2013, PropMale2013) %>% 
        mutate(Sex_ratio = PropMale2013 / PropFemale2013) %>% 
        mutate(Name = gsub(" Region", "", Name)) %>% 
        select(Name, Median_income, Sex_ratio)
# data join
nz = left_join(nz, tab, by = "Name") %>% 
        left_join(nz_add_data, by = "Name") %>% 
        select(Name, Island, Land_area, Population, Median_income, Sex_ratio)

## End(Not run)

High points in New Zealand

Description

Top 101 heighest points in New Zealand (2017). See https://data.linz.govt.nz/layer/50284-nz-height-points-topo-150k/ for details.

Usage

nz_height

Format

FORMAT:

  • t50_fid: ID

  • elevation: Height above sea level in m

  • geometry: sfc_POINT

Source

https://data.linz.govt.nz

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  summary(nz_height)
  plot(nz$geom)
  plot(nz_height$geom, add = TRUE)
}
## Not run: 
library(dplyr)
# After downloading data
unzip("lds-nz-height-points-topo-150k-SHP.zip")
nz_height = st_read("nz-height-points-topo-150k.shp") %>% 
  top_n(n = 100, wt = elevation)
library(tmap)
tmap_mode("view")
qtm(nz) +
  qtm(nz_height)
f = list.files(pattern = "*nz-height*")
file.remove(f)

## End(Not run)

Dataset of properties in the municipality of Athens (sf)

Description

A dataset of apartments in the municipality of Athens for 2017. Point location of the properties is given together with their main characteristics and the distance to the closest metro/train station.

Usage

properties

Format

An sf object of 1000 points with the following 6 variables.

  • id: An unique identifier for each property.

  • size : The size of the property (unit: square meters)

  • price : The asking price (unit: euros)

  • prpsqm : The asking price per squre meter (unit: euroes/square meter).

  • age : Age of property in 2017 (unit: years).

  • dist_metro: The distance to closest train/metro station (unit: meters).

See Also

depmunic

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  if (requireNamespace("spdep", quietly = TRUE)) {
    library(sf)
    library(spdep)
    
    data(properties)
    
    summary(properties$prpsqm)
    
    pr.nb.800 <- dnearneigh(properties, 0, 800)
    pr.listw <- nb2listw(pr.nb.800)
    
    moran.test(properties$prpsqm, pr.listw)
    moran.plot(properties$prpsqm, pr.listw, xlab = "Price/m^2", ylab = "Lagged")
  }
}

Small river network in France

Description

Lines representing the Seine, Marne and Yonne rivers.

Usage

seine

Format

FORMAT:

  • name: name

  • geometry: sfc_MULTILINESTRING

The object is in the RGF93 / Lambert-93 CRS.

Source

https://www.naturalearthdata.com/

See Also

See the rnaturalearth package: https://cran.r-project.org/package=rnaturalearth

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  seine
  plot(seine)
}
## Not run: 
library(sf)
library(rnaturalearth)
library(tidyverse)

seine = ne_download(scale = 10, type = "rivers_lake_centerlines", 
                    category = "physical", returnclass = "sf") %>% 
        filter(name %in% c("Yonne", "Seine", "Marne")) %>% 
        select(name = name_en) %>% 
        st_transform(2154)

## End(Not run)

Data for Splash Dams in western Oregon

Description

Data for Splash Dams in western Oregon

Usage

SplashDams

Format

Formal class 'SpatialPointsDataFrame with 232 obs. of 6 variables:

  • streamName

  • locationCode

  • height

  • lastDate

  • owner

  • datesUsed

Source

R. R. Miller (2010) Is the Past Present? Historical Splash-dam Mapping and Stream Disturbance Detection in the Oregon Coastal Province. MSc. thesis, Oregon State University; packaged by Jonathan Callahan

Examples

if (requireNamespace("sp", quietly = TRUE)) {
  library(sp)
  data(SplashDams)
  plot(SplashDams, axes=TRUE)
}

US State Visibility Based Map

Description

A SpatialPolygonsDataFrame object to plot a Visibility Based Map.

Usage

state.vbm

Format

An object of class SpatialPolygonsDataFrame with 50 rows and 2 columns.

Details

A SpatialPolygonsDataFrame object to plot a map of the US states where the sizes of the states have been adjusted to be more equal. This map can be useful for plotting state data using colors patterns without the larger states dominating and the smallest states being lost. The original map is copyrighted by Mark Monmonier. Official publications based on this map should acknowledge him. Comercial publications of maps based on this probably need permission from him to use.

Author(s)

Greg Snow [email protected] (of this compilation)

Source

The data was converted from the maps library for S-PLUS. S-PLUS uses the map with permission from the author. This version of the data has not received permission from the author (no attempt made, not that it was refused), most of my uses I feel fall under fair use and do not violate copyright, but you will need to decide for yourself and your applications.

References

http://www.markmonmonier.com/index.htm, http://euclid.psych.yorku.ca/SCS/Gallery/bright-ideas.html

Examples

if (requireNamespace("sp", quietly = TRUE)) {
  library(sp)
  data(state.vbm)
  plot(state.vbm)

  tmp <- state.x77[, 'HS Grad']
  tmp2 <- cut(tmp, seq(min(tmp), max(tmp), length.out=11),
            include.lowest=TRUE)
  plot(state.vbm, col=cm.colors(10)[tmp2])
}

Major urban areas worldwide

Description

Dataset in a 'long' form from the United Nations population division with projections up to 2050. Includes only the top 30 largest areas by population at 5 year intervals.

Usage

urban_agglomerations

Format

Selected variables:

  • year: Year of population estimate

  • country_code: Code of country

  • urban_agglomeration: Name of the urban agglomeration

  • population_millions: Estimated human population

  • geometry: sfc_POINT

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  plot(urban_agglomerations)
}
# Code used to download the data:
## Not run: 
f = "WUP2018-F11b-30_Largest_Cities_in_2018_by_time.xls"
download.file(
  destfile = f,
  url = paste0("https://population.un.org/wup/Download/Files/", f)
 )
library(dplyr)
library(sf)
urban_agglomerations = readxl::read_excel(f, skip = 16) %>%
    st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326)
names(urban_agglomerations)
names(urban_agglomerations) <- gsub(" |\\n", "_", tolower(names(urban_agglomerations)) ) %>% 
        gsub("\\(|\\)", "", .)
names(urban_agglomerations)
urban_agglomerations
usethis::use_data(urban_agglomerations, overwrite = TRUE)
file.remove("WUP2018-F11b-30_Largest_Cities_in_2018_by_time.xls")

## End(Not run)

US states polygons

Description

The object loaded is a sf object containing the contiguous United States data from the US Census Bureau with a few variables from American Community Survey (ACS)

Usage

us_states

Format

Formal class 'sf' [package "sf"]; the data contains a data.frame with 49 obs. of 7 variables:

  • GEOID: character vector of geographic identifiers

  • NAME: character vector of state names

  • REGION: character vector of region names

  • AREA: area in square kilometers of units class

  • total_pop_10: numerical vector of total population in 2010

  • total_pop_15: numerical vector of total population in 2015

  • geometry: sfc_MULTIPOLYGON

The object is in geographical coordinates using the NAD83 datum.

Source

https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html

See Also

See the tigris package: https://cran.r-project.org/package=tigris

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  data(us_states)

  plot(us_states["REGION"])
}

the American Community Survey (ACS) data

Description

The object loaded is a data.frame object containing the US states data from the American Community Survey (ACS)

Usage

us_states_df

Format

Formal class 'data.frame'; the data contains a data.frame with 51 obs. of 5 variables:

  • state: character vector of state names

  • median_income_10: numerical vector of median income in 2010

  • median_income_15: numerical vector of median income in 2010

  • poverty_level_10: numerical vector of number of people with income below poverty level in 2010

  • poverty_level_15: numerical vector of number of people with income below poverty level in 2015

Source

https://www.census.gov/programs-surveys/acs/

See Also

See the tidycensus package: https://cran.r-project.org/package=tidycensus

Examples

data(us_states_df)

summary(us_states_df)

US 1960 used car prices

Description

The used.cars data frame has 48 rows and 2 columns. The data set includes a neighbours list for the 48 states excluding DC from poly2nb().

Usage

used.cars

Format

This data frame contains the following columns:

  • tax.charges: taxes and delivery charges for 1955-9 new cars

  • price.1960: 1960 used car prices by state

Source

Hanna, F. A. 1966 Effects of regional differences in taxes and transport charges on automobile consumption, in Ostry, S., Rhymes, J. K. (eds) Papers on regional statistical studies, Toronto: Toronto University Press, pp. 199-223.

References

Hepple, L. W. 1976 A maximum likelihood model for econometric estimation with spatial series, in Masser, I (ed) Theory and practice in regional science, London: Pion, pp. 90-104.

Examples

if (requireNamespace("spdep", quietly = TRUE)) {
  library(spdep)
  data(used.cars)
  moran.test(used.cars$price.1960, nb2listw(usa48.nb))
  moran.plot(used.cars$price.1960, nb2listw(usa48.nb),
           labels=rownames(used.cars))
  uc.lm <- lm(price.1960 ~ tax.charges, data=used.cars)
  summary(uc.lm)

  lm.morantest(uc.lm, nb2listw(usa48.nb))
  lm.morantest.sad(uc.lm, nb2listw(usa48.nb))
  lm.LMtests(uc.lm, nb2listw(usa48.nb))

  if (requireNamespace("spatialreg", quietly = TRUE)) {
    library(spatialreg)
    uc.err <- errorsarlm(price.1960 ~ tax.charges, data=used.cars,
                       nb2listw(usa48.nb), tol.solve=1.0e-13, 
                       control=list(tol.opt=.Machine$double.eps^0.3))
    summary(uc.err)
    uc.lag <- lagsarlm(price.1960 ~ tax.charges, data=used.cars,
                     nb2listw(usa48.nb), tol.solve=1.0e-13, 
                     control=list(tol.opt=.Machine$double.eps^0.3))
    summary(uc.lag)
    uc.lag1 <- lagsarlm(price.1960 ~ 1, data=used.cars,
                      nb2listw(usa48.nb), tol.solve=1.0e-13, 
                      control=list(tol.opt=.Machine$double.eps^0.3))
    summary(uc.lag1)
    uc.err1 <- errorsarlm(price.1960 ~ 1, data=used.cars,
                        nb2listw(usa48.nb), tol.solve=1.0e-13, 
                        control=list(tol.opt=.Machine$double.eps^0.3),
                        Durbin=FALSE)
    summary(uc.err1)
  }

}

Mercer and Hall wheat yield data

Description

Mercer and Hall wheat yield data, based on version in Cressie (1993), p. 455.

Usage

wheat

Format

The format of the object generated by running data(wheat) is a three column data frame made available by Hongfei Li. The example section shows how to convert this to the object used in demonstrating the aple function, and is a formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots; the data slot is a data frame with 500 observations on the following 6 variables.

  • lat: local coordinates northings ordered north to south

  • yield: Mercer and Hall wheat yield data

  • r: rows south to north; levels in distance units of plot centres

  • c: columns west to east; levels in distance units of plot centres

  • lon: local coordinates eastings

  • lat1: local coordinates northings ordered south to north

Note

The value of 4.03 was changed to 4.33 (wheat[71,]) 13 January 2014; thanks to Sandy Burden; cross-checked with http://www.itc.nl/personal/rossiter/teach/R/mhw.csv, which agrees.

Source

Cressie, N. A. C. (1993) Statistics for Spatial Data. Wiley, New York, p. 455.

References

Mercer, W. B. and Hall, A. D. (1911) The experimental error of field trials. Journal of Agricultural Science 4, 107-132.

Examples

if (requireNamespace("sp", quietly = TRUE)) {
library(sp)
data(wheat)
wheat$lat1 <- 69 - wheat$lat
wheat$r <- factor(wheat$lat1)
wheat$c <- factor(wheat$lon)
wheat_sp <- wheat
coordinates(wheat_sp) <- c("lon", "lat1")
wheat_spg <- wheat_sp

gridded(wheat_spg) <- TRUE
wheat_spl <- as(wheat_spg, "SpatialPolygons")
df <- as(wheat_spg, "data.frame")
row.names(df) <- sapply(slot(wheat_spl, "polygons"),
                        function(x) slot(x, "ID"))
wheat <- SpatialPolygonsDataFrame(wheat_spl, data=df)
}


if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  wheat <- st_read(system.file("shapes/wheat.gpkg", package="spData"))
  plot(wheat)
}

World country polygons

Description

The object loaded is a sf object containing a world map data from Natural Earth with a few variables from World Bank

Usage

world

Format

Formal class 'sf' [package "sf"]; the data contains a data.frame with 177 obs. of 11 variables:

  • iso_a2: character vector of ISO 2 character country codes

  • name_long: character vector of country names

  • continent: character vector of continent names

  • region_un: character vector of region names

  • subregion: character vector of subregion names

  • type: character vector of type names

  • area_km2: integer vector of area values

  • pop: integer vector of population in 2014

  • lifeExp: integer vector of life expectancy at birth in 2014

  • gdpPercap: integer vector of per-capita GDP in 2014

  • geom: sfc_MULTIPOLYGON

The object is in geographical coordinates using the WGS84 datum.

Source

https://www.naturalearthdata.com/

https://data.worldbank.org/

See Also

See the rnaturalearth package: https://cran.r-project.org/package=rnaturalearth

Examples

if (requireNamespace("sf", quietly = TRUE)) {
  library(sf)
  data(world)
  # or
  world <- st_read(system.file("shapes/world.gpkg", package="spData"))

  plot(world)
}

World Bank data

Description

The object loaded is a data.frame object containing data from World Bank

Usage

worldbank_df

Format

Formal class 'data.frame'; the data contains a data.frame with 177 obs. of 7 variables:

  • name: character vector of country names

  • iso_a2: character vector of ISO 2 character country codes

  • HDI: human development index (HDI)

  • urban_pop: urban population

  • unemployment: unemployment, total (% of total labor force)

  • pop_growth: population growth (annual %)

  • literacy: adult literacy rate, population 15+ years, both sexes (%)

Source

https://data.worldbank.org/

See Also

See the wbstats package: https://cran.r-project.org/web/packages/wbstats

Examples

data(worldbank_df)
# or
worldbank_df <- read.csv(system.file("misc/worldbank_df.csv", package="spData"))

summary(worldbank_df)