--- title: "Introduction to the motif package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{intro-to-motif} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, dev = "png" ) ``` The **motif** package implements ideas of the pattern-based spatial analysis in R. Its role is to describe spatial patterns of categorical raster data for any defined regular and irregular areas. Patterns are represented quantitatively using built-in signatures based on co-occurrence matrices but also allows for any user-defined functions. It also enables spatial analysis such as search, change detection, and clustering to be performed on spatial patterns. The **motif** package works on raster data represented by `stars` objects. It has several example datasets, including `"raster/landcover2015.tif"`. This file contains a land cover data for New Guinea, with seven possible categories: (1) agriculture, (2) forest, (3) grassland, (5) settlement, (6) shrubland, (7) sparse vegetation, and (9) water. ```{r} library(motif) library(stars) library(sf) landcover = read_stars(system.file("raster/landcover2015.tif", package = "motif")) ``` ```{r, echo=FALSE} landcover = droplevels(landcover) plot(landcover, key.pos = 4, key.width = lcm(5), main = NULL) ``` ## Signatures ### Whole area We can see that most of the island is covered by forest, with some agriculture and smaller areas of the other classes. It is also reasonably easy to describe these proportions (so-called composition) numerically - we just need to count cells of each category for the whole data. We can use the `lsp_signature()` function for this. It requires a `stars` object as the first input and the type of a signature to calculate - `"composition"` in this case. There are also several additional arguments, including `threshold` - a share (between 0 and 1) of NA cells to allow signature calculation and `normalization` - decision if the output vector should be normalized. ```{r} landcover_comp = lsp_signature(landcover, type = "composition", threshold = 1, normalization = "none") landcover_comp ``` The output of `lsp_signature()`, `landcover_comp`, has a new class `lsp`. It is a tibble with three columns: - `id` - an id of each window (area) - `na_prop` - share (0-1) of `NA` cells for each window - `signature` - a list-column containing with calculated signatures We can take a look at the last column: ```{r} landcover_comp$signature ``` It is a list of signatures. In this case, we have just one signature describing the whole area in the form of a numeric vector. It contains how many cells belong to each land cover category. For example, there are 8122776 cells of forest, but only 2677 cells of shrubland. ### Regular local landscapes Another approach would be to divide this large area into many regular rectangles (we refer to them as local landscapes) and to calculate a signature in each of them. The previously used signature, `"composition"` has one important flaw though. It only describes how many cells of each category we have. However, it does not distinguish an area with the left half of forest and right half of agriculture from an area with forest mixed with agriculture (think of a green-yellow checkerboard). Gladly, several more types of signatures do exist. It includes a co-occurrence matrix (`type = "coma"`). `"coma"` goes to each cell, looks at its value, looks at the values of its neighbors and counts how many neighbors of each class our central cell has. ```{r, echo=FALSE} stars200 = lsp_add_sf(landcover, window = 200) landcover = droplevels(landcover) plot(landcover, key.pos = NULL, reset = FALSE, main = NULL) plot(st_geometry(stars200), add = TRUE, reset = FALSE) # st_bbox(stars200) # st_bbox(landcover) ``` This time, we set the `window` argument to `200`, which means that each local landscape will consist of 200 by 200 cells. In this example, each cell has a resolution of 300 by 300 meters, therefore a local landscape will be 60000 by 60000 meters (60 by 60 kilometers). ```{r} landcover_coma = lsp_signature(landcover, type = "coma", window = 200) landcover_coma ``` Now, we have one row per local landscape, where each local landscape is described by an id (`id`), the proportion of cells with `NA`s (`na_prop`), and a signature (`signature`). For example, the first signature looks like this: ```{r} landcover_coma$signature[[1]] ``` It is a matrix where each row and column represent subsequent land cover classes. For example, 141250 times forest cell is next to anther forest cell, and 226 times water cell is next to forest cell. You can learn more about this signature at https://jakubnowosad.com/comat/articles/coma.html. Additional signatures are described in the [Spatial patterns' signatures](https://jakubnowosad.com/motif/articles/v2_signatures.html) vignette. ### Irregular local landscapes The **motif** package also allows using irregular regions based on the user-provided polygons. It has an example spatial vector dataset, `ecoregions.gpkg`, which contains terrestrial ecoregions for New Guinea from https://ecoregions2017.appspot.com/. ```{r} ecoregions = read_sf(system.file("vector/ecoregions.gpkg", package = "motif")) ``` This dataset has 22 rows, where each row relates to one ecoregion. Each ecoregion is also related to a unique value in the `id` column. ```{r, echo=FALSE} # https://medialab.github.io/iwanthue/ my_pal = c("#d34359", "#62b93c", "#b75fcf", "#53c069", "#d44295", "#acb939", "#626edd", "#dc9e36", "#8156a8", "#4b8734", "#d98dc7", "#58c096", "#cc542a", "#48bbd2", "#bf814d", "#6686c8", "#968c30", "#a34d78", "#36815b", "#c26963", "#a2b36b", "#6b6829") plot(ecoregions["id"], main = NULL, col = my_pal, key.pos = 4, key.width = lcm(5)) ``` Now, we need to provide this dataset and its identity column to the `window` argument of `lsp_signature()`. ```{r} landcover_coma_e = lsp_signature(landcover, type = "coma", window = ecoregions["id"]) landcover_coma_e ``` The output, `landcover_coma_e`, is also of the `lsp` class and contains three columns. The first column, `id`, has the same values as provided in the `window` argument above. The third column, `signature`, has a spatial signature for each polygon in the `ecoregions` dataset. ```{r} landcover_coma_e$signature[[1]] ``` ## Large data support The **motif** package also supports large raster datasets, including rasters that do not fit into the RAM. It just requres reading the input data as a `stars.proxy` by adding `proxy = TRUE` to the `read_stars()` function: ```{r} landcover_proxy = read_stars(system.file("raster/landcover2015.tif", package = "motif"), proxy = TRUE) ``` The rest of the calculations are the same as above. ## Applications Importantly, spatial patterns' signatures can be used not only to describe landscapes, but also to search for other similar landscapes, compare landscapes, and cluster them. You can find examples of each of the above applications in the vignettes: 1. [Spatial patterns' search](https://jakubnowosad.com/motif/articles/v3_search.html) 2. [Spatial patterns' comparision](https://jakubnowosad.com/motif/articles/v4_compare.html) 3. [Spatial patterns' clustering](https://jakubnowosad.com/motif/articles/v5_cluster.html)