--- title: "Filtering trajectories" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Filtering trajectories} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(sf_max_print = 5L) if (Sys.info()["user"] != "bart") { if (Sys.getenv("MBPWD") != "") { options(keyring_backend = "env") move2::movebank_store_credentials("move2_user", Sys.getenv("MBPWD")) } else { knitr::opts_chunk$set(eval = FALSE) } } ``` ```{r setup, message=FALSE} library(move2) library(dplyr) library(units) library(sf) ``` Download example data and select columns to reduce printing. ```{r} galapagos_albatrosses <- movebank_download_study(2911040, attributes = c( "ground_speed", "heading", "height_above_ellipsoid", "eobs_temperature", "individual_local_identifier" ) ) %>% select_track_data(study_site, weight, animal_life_stage) ``` # Filtering locations ## Omit empty locations ```{r} galapagos_albatrosses %>% filter(!st_is_empty(.)) ``` ## Temporal filtering First location each 6 hour window ```{r} galapagos_albatrosses %>% filter(!st_is_empty(.)) %>% mt_filter_per_interval(unit = "6 hours") ``` Random location each day ```{r} galapagos_albatrosses %>% filter(!st_is_empty(.)) %>% mt_filter_per_interval(criterion = "random", unit = "days") ``` # Finding and filtering duplicated records When dealing with trajectories frequently duplicated records do occur. There are many reasons these can appear ranging from the way in which data is recorded to duplicated data transmissions and uploads. These data are often stored, but for analysis they need to be removed. A simple definition of a duplicate record would be an observation at exactly the same time of the same individual. However many tracking devices record additional information such as acceleration. These records frequently have the same time as location records meaning not all records with duplicated timestamps can directly be deleted. Duplicated records can be found in the following way: ```{r} galapagos_albatrosses %>% group_by(mt_time(), mt_track_id()) %>% filter(n() != 1) %>% arrange(mt_time()) ``` If you are only interested in finding duplicated records where there is a location this can as follows (in this case there are none): ```{r} galapagos_albatrosses %>% filter(!st_is_empty(.)) %>% group_by(mt_time(), mt_track_id()) %>% filter(n() != 1) %>% arrange(mt_time()) ``` The package also has some build in functions for filtering unique records. Several strategies for omitting duplicated records are build in. First it is possible to omit all records that are a subset of other records, i.e. records that got added later with more information are retained. This happens with some tracking devices if data gets directly downloaded from the tag. As no information is lost this is the default strategy. ```{r} simulated_data <- mt_sim_brownian_motion(1:2)[rep(1:4, 2), ] simulated_data$temperature <- c(1:3, NA, 1:2, 7:8) simulated_data simulated_data %>% mt_filter_unique() ``` This strategy how ever does not guarantee not duplicates are left, as two records might not be subsets from each other. An alternative is to take a random record from each set of duplicates, this is not advised for formal analysis but might help for a quick inspection of data. This is also a lot quicker then inspecting subsets. How ever care needs to be taken as the example below, for example, results in empty points being retained at the cost of informative locations. ```{r} galapagos_albatrosses %>% mt_filter_unique("sample") ``` # Filtering tracks ## Tracks with at least `n` locations ```{r download} galapagos_albatrosses %>% group_by(mt_track_id()) %>% filter(n() > 500) ``` ## Tracks having a minimal duration ```{r} galapagos_albatrosses %>% group_by(mt_track_id()) %>% filter(as_units(diff(range(mt_time()))) > set_units(1, "week")) ``` ## Tracks that visit foraging area at least once ```{r, fig.width=7, fig.height=4.2} foraging_area <- st_as_sfc(st_bbox(c( xmin = -82, xmax = -77, ymax = -0.5, ymin = -13 ), crs = 4326)) library(ggplot2, quietly = TRUE) ggplot() + geom_sf(data = rnaturalearth::ne_coastline(returnclass = "sf", 50)) + theme_linedraw() + geom_sf(data = foraging_area, fill = "red", alpha = 0.3, color = "red") + geom_sf( data = galapagos_albatrosses %>% filter(!st_is_empty(.)), aes(color = `individual_local_identifier`) ) + coord_sf( crs = sf::st_crs("+proj=aeqd +lon_0=-83 +lat_0=-6 +units=km"), xlim = c(-1000, 600), ylim = c(-800, 700) ) # Filter to tracks making it at least once to the foraging area galapagos_albatrosses %>% group_by(mt_track_id()) %>% filter(any(st_intersects(geometry, foraging_area, sparse = FALSE))) ``` ## Filter by track attribute To use track attributes for filtering there is the `filter_track_data` function. This function works in the same way as `filter` from `dplyr` except that is operates on the track data. As soon as individuals are omitted from the track data the associated event data is also omitted. ```{r} galapagos_albatrosses %>% filter_track_data(study_site == "Punta Suarez") ``` # Re organizing trajectories ## Split on time gaps ```{r} galapagos_albatrosses %>% filter(!st_is_empty(.)) %>% mutate( next_new_track = mt_time_lags(.) > set_units(4, "h") | is.na(mt_time_lags(.)), track_index = cumsum(lag(next_new_track, default = FALSE)) ) %>% mt_set_track_id("track_index") ``` ## Monthly tracks ```{r} library(lubridate, quietly = TRUE) galapagos_albatrosses %>% mt_set_track_id(paste(mt_track_id(.), sep = "_", month.name[month(mt_time(.))] )) ```