--- title: "Programming with a `move2` object" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Programming with a `move2` object} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(move2) library(assertthat) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Structure of the `move2` object - it is based on the `sf` objects and compatible with a lot of `dplyr`/`tidyverse` based functionality - information of non location data (other sensors as e.g. acceleration, magnetometer,etc) are associated to an empty locations. - *track attributes* and *event attributes* are distinguished. *event attributes* are attributes associated to each recorded event (location or non location), these will at least have a time and track id associated to them. *track attributes* are attributes associated to each track (e.g. individual, species, sex, etc), these will at least contain the track id, and can be retrieved with the function `mt_track_data()` ## Explanation To be able to expand and use the object in `move2` it is important to understand how the objects is structured. Here we explain some of the choices and explain the requirements. A move object in `move2` uses the `S3` class system, this is less rigors then the `S4` system that was used in the original `move` package. The objects are based on the `sf` objects from the `sf` package. This change is inspired by several factors, first by basing on `sf` we are able to profit from the speed and improvements that went into that package, second it makes it directly compatible with a lot of `dplyr`/`tidyverse` based functionality. To ensure information specific to movement is retrained we use attributes. This is in a fairly similar style to `sf`. To facilitate working with the associated sensor data we store other records with an empty point. This means, for example, acceleration and activity measurements can be part of the same `tbl`/`data.frame`. The `sf` package and `sf` in general allow to store coordinates as three dimensional records. As the altitude of tracking devices is typically much less accurate, few functions actually support this functionality we do not use it at this time. In the `move` package we implemented separate objects for one single individuals (`Move`) and multiple individuals (`MoveStack`). Here we choose to not do this. This reduces complexity. If functions require single individuals to work it is easy enough to split these of. ### Event data Tracking data generally consists of a time series of observations from a range of "sensors". Each of these observation or events at least have a time and a sensor associated with them. Some have a location recorded by, for example, a gps sensor other have non locations data like acceleration or gyroscope measurements. All events are combined in one large dataset, this facilitates combined analysis between them (e.g. interpolation to the position of an acceleration measurement). However for some analysis specific sensors or data types will be needed therefore filtering functions are available that subset the data to, for example, all location data. ### Separating track attributes To facilitate working with the trajectories we distinguish between track attributes and event attributes. Track level data could be individual and species names, sex and age. This can furthermore greatly facilitate object sizes as that is not duplicated. Keeping track attributes separate also contributes to data integrity as ensures track level attributes are consistent within a track. ## Attributes In this section we go through the attributes that `move2` uses. ### `time_column` This attributes should contain a string with a length of `1`. This string indicates in which column the timestamp information of the locations in it. The string should thus be an existing column. The time column in most cases will contain timestamps in the `POSIXct` format. In some cases timestamps will not be referring to an exact time point. For example when simulating movement data or analysis from a video. In these cases times can also be stored as `integer` or `numeric` values. ### `track_id_column` This attribute should contain a string of length `1`. A column with this name should be contained both in the `track_data` attribute and in the main dataset. This column also functions as the link between the `track_data` and the main data, linking the individual attributes to the individual data. ### `track_data` This dataset contains the track level data. Properties of the individual follows (e.g. sex, age and name) can be stored here. Additionally other deployment level information can be contained. As the move2 package does not separate individuals, tags and deployments. All information from these 3 entities in movebank are combined here. ## Special columns ### `time_column` Using the `time_column` attribute this column can be identified, for quick retrieval there is the `mt_time` function. Values should be either timestamps (e.g. `POSIXct`, `Date`) or `numeric`. Numeric values are facilitated as it can be useful for simulation, videos and laboratory experiments were absolute time reference is not available or relevant. ### `track_id_column` This column is identified by the `track_id_column` attributes, values can either be a `character`, `factor` or `integer` like values. For retrieval there is the `mt_track_id` function. ## General considerations ### Quality checking In `move` relatively stringent quality checking was done on the object. This enforced certain attributes for a trajectory that are sensible but in practice are not always adhered to. Some of these properties are: - Every record had a valid location (except for `unUsedRecords` but those were rarely used) - Records were time ordered within individual - All individuals were ordered - Timestamps could not be duplicated. Even though these are some useful properties for subsequent work when reading not all data adheres to these standards. To solve this there were options to remove duplicated records but these simply took the first record. Here we take a more permissive approach where less stringent checking is done on the input side. This means functions working with `move2` need to ensure input data adheres to their expectations. To facilitate that several assertion functions are provided that can quickly check data. Taking this approach gives the users more flexibility in resolving inconsistencies within R. We provide several functions to make this work quick. For specific use cases more informed functions can be developed. If you are writing functions based on the `move2` package and your function assumes a specific data structure this can best be checked with `assert_that` in combination with one of the assertion functions. This construct results in informative error messages: ```{r, error=T} data <- mt_sim_brownian_motion(1:3)[c(1, 3, 2, 6, 4, 5), ] assert_that(mt_is_time_ordered(data)) ``` ### Function naming schemes To facilitate finding functions and assist in recognizably we use a prefix. For functions relating to movement trajectories we use `mt_`, similar to how the `sf` package uses `st_` for spatial type. This prefix has the advantage of being short compared to `move_`. Functions for accessing data from [movebank](https://www.movebank.org) use the prefix `movebank_`. Furthermore do all assertions functions start with either `mt_is_` or `mt_has_`. ### Return type segment wise properties When analyzing trajectories frequently metrics are calculated that are properties of the time period in between two observations. Prime examples are the distance and speed between locations. This means that for each track with a length of $n$ locations there are $n-1$ measurements. To facilitate storing and processing this data we pad each track with a `NA` value at the end. This ensured that return vectors from functions like `mt_distance`, `mt_speed` and `mt_azimuth` return vectors with the same length of as the number of rows in the `move2` object. If the return values from these kind of functions are assigned to the `move2` object the properties stored in the first row reflect the value for the interval between the first and second row. Some metrics are calculated as a function of the segment before and after a segment (e.g. turn angles). In these cases the return vectors still have the same length however they are padded by a `NA` value at the beginning and end of each track so that the metric is stored with the location it is representative for. ### Data size Data sets have been growing considerably over the past decade since `move` was written. The ambition with `move2` is to facilitate this trend. It should work smoothly with trajectories of more then a million records. We have successfully loaded up to 30 million events into R, however at some stage memory limitations of the host computer start being a concern. This can to some extent be alleviated by omitting unnecessary columns from the data set, either at download or when reading the data. An alternative approach would be to facilitate working with trajectories on disk or within a database (alike `dbplyr`). However since many functions and packages we rely on do not support this, we opt not to do this. Therefore, if reducing the data loaded does not solve the problem, it can be advisable to use a computer with more memory or when possible split up analysis per track. # Function overview Here we first a quick overview of the most important function. ## Extracting information from a `move2` object - `sf::st_coordinates()`: returns the coordinates from the the events in the track(s) - `sf::st_crs()`: returns the projection of the tracks(s) - `sf::st_bbox()`: returns the bounding box of the track(s) - `mt_time()`: returns the timestamps for each event in the track - `mt_track_data()`: returns the table containing the information associated to the tracks - `mt_track_id()`: returns a vector of the track id associated to each event - `unique(mt_track_id())`: returns the names of the tracks - `mt_n_tracks()`: returns the number of the tracks - `nrow()`: returns the total number of events - `table(mt_track_id())`: returns the number of events per track - `mt_time_column()`: returns the name of the column containing the timestamps used by the `move2` object - `mt_track_id_column()`: returns the name of the column containing the track ids used by the `move2` object ## Transforming other classes to a `move2` object - `mt_as_move2()`: creates a `move2` object from objects of class `sf`, `data.frame`, `telemetry`/`telemetry list` from *ctmm*, `track_xyt` from *amt* or `Move`/`MoveStack` from *move*. ## Transforming a `move2` object into other classes - `to_move()`: converts to a object of class `Move`/`MoveStack` - `x2 <- x; class(x2) <- class(x) %>% setdiff("move2")`: to remove `move2` class from the object, it will be recognized as an object of class `sf` - to transform into a flat table without loosing information: * move all track associated attributes to the event table: `x <- mt_as_event_attribute(x, names(mt_track_data(x)))` * put coordinates in 2 columns: `x <- dplyr::mutate(x, coords_x=sf::st_coordinates(x)[,1], coords_y=sf::st_coordinates(x)[,2])` * remove the sf geometry column from the table: `x <- sf::st_drop_geometry(x)` ## Useful functions - `mt_read()`: read in data downloaded from movebank, by just stating the path to the file - `mt_read(mt_example())`: example dataset - `dplyr::filter(x, !sf::st_is_empty(x))`: exclude all empty locations - `filter_track_data(x, .track_id = c("nameTrack1", "nameTrack3")`: subset to one or more tracks - `split(x, mt_track_id(x))`: split a `move2` object into a list of single objects per track. Alternatively see `dplyr::mutate()`, `dplyr::group_by()`, `group_by_track_data()` to apply calculations to tracks separately - `mt_stack()`: combine multiple `move2` objects into one - `mt_as_track_attribute()`/`mt_as_event_attribute()`: move columns between track and event attributes (and vice versa) - `mt_set_track_id()`: replace track ids with new values, set new column to define tracks or rename track id column - `mutate_track_data()`: add or modify attributes in the track data - `sf::st_transform()`: to reproject the `move2` into a different projection - `mt_aeqd_crs()`: create a AEQD coordinate reference system - `mt_track_lines()`: convert a trajectory into lines for plotting with e.g. `ggplot` - use the argument `max.plot = 1` to display a single plot of the track. The attribute that should be used to color the tracks can be specified, e.g. `plot(x["individual_local_identifier"], max.plot = 1)`. [Here](https://r-spatial.github.io/sf/articles/sf5.html) is more information on how to do simple plots. All functions of the `move2` package are described [here.](https://bartk.gitlab.io/move2/reference/index.html)