Filtering trajectories

library(move2)
library(dplyr)
library(units)
library(sf)

Download example data and select columns to reduce printing.

galapagos_albatrosses <- movebank_download_study(2911040,
  attributes = c(
    "ground_speed",
    "heading",
    "height_above_ellipsoid",
    "eobs_temperature",
    "individual_local_identifier"
  )
) %>%
  select_track_data(study_site, weight, animal_life_stage)

Filtering locations

Omit empty locations

galapagos_albatrosses %>%
  filter(!st_is_empty(.))

Temporal filtering

First location each 6 hour window

galapagos_albatrosses %>%
  filter(!st_is_empty(.)) %>%
  mt_filter_per_interval(unit = "6 hours")

Random location each day

galapagos_albatrosses %>%
  filter(!st_is_empty(.)) %>%
  mt_filter_per_interval(criterion = "random", unit = "days")

Finding and filtering duplicated records

When dealing with trajectories frequently duplicated records do occur. There are many reasons these can appear ranging from the way in which data is recorded to duplicated data transmissions and uploads. These data are often stored, but for analysis they need to be removed. A simple definition of a duplicate record would be an observation at exactly the same time of the same individual. However many tracking devices record additional information such as acceleration. These records frequently have the same time as location records meaning not all records with duplicated timestamps can directly be deleted.

Duplicated records can be found in the following way:

galapagos_albatrosses %>%
  group_by(mt_time(), mt_track_id()) %>%
  filter(n() != 1) %>%
  arrange(mt_time())

If you are only interested in finding duplicated records where there is a location this can as follows (in this case there are none):

galapagos_albatrosses %>%
  filter(!st_is_empty(.)) %>%
  group_by(mt_time(), mt_track_id()) %>%
  filter(n() != 1) %>%
  arrange(mt_time())

The package also has some build in functions for filtering unique records. Several strategies for omitting duplicated records are build in.

First it is possible to omit all records that are a subset of other records, i.e. records that got added later with more information are retained. This happens with some tracking devices if data gets directly downloaded from the tag. As no information is lost this is the default strategy.

simulated_data <- mt_sim_brownian_motion(1:2)[rep(1:4, 2), ]
simulated_data$temperature <- c(1:3, NA, 1:2, 7:8)
simulated_data
simulated_data %>% mt_filter_unique()

This strategy how ever does not guarantee not duplicates are left, as two records might not be subsets from each other.

An alternative is to take a random record from each set of duplicates, this is not advised for formal analysis but might help for a quick inspection of data. This is also a lot quicker then inspecting subsets. How ever care needs to be taken as the example below, for example, results in empty points being retained at the cost of informative locations.

galapagos_albatrosses %>% mt_filter_unique("sample")

Filtering tracks

Tracks with at least n locations

galapagos_albatrosses %>%
  group_by(mt_track_id()) %>%
  filter(n() > 500)

Tracks having a minimal duration

galapagos_albatrosses %>%
  group_by(mt_track_id()) %>%
  filter(as_units(diff(range(mt_time()))) > set_units(1, "week"))

Tracks that visit foraging area at least once

foraging_area <- st_as_sfc(st_bbox(c(
  xmin = -82, xmax = -77,
  ymax = -0.5, ymin = -13
), crs = 4326))
library(ggplot2, quietly = TRUE)
ggplot() +
  geom_sf(data = rnaturalearth::ne_coastline(returnclass = "sf", 50)) +
  theme_linedraw() +
  geom_sf(data = foraging_area, fill = "red", alpha = 0.3, color = "red") +
  geom_sf(
    data = galapagos_albatrosses %>% filter(!st_is_empty(.)),
    aes(color = `individual_local_identifier`)
  ) +
  coord_sf(
    crs = sf::st_crs("+proj=aeqd +lon_0=-83 +lat_0=-6 +units=km"),
    xlim = c(-1000, 600), ylim = c(-800, 700)
  )
# Filter to tracks making it at least once to the foraging area
galapagos_albatrosses %>%
  group_by(mt_track_id()) %>%
  filter(any(st_intersects(geometry, foraging_area, sparse = FALSE)))

Filter by track attribute

To use track attributes for filtering there is the filter_track_data function. This function works in the same way as filter from dplyr except that is operates on the track data. As soon as individuals are omitted from the track data the associated event data is also omitted.

galapagos_albatrosses %>%
  filter_track_data(study_site == "Punta Suarez")

Re organizing trajectories

Split on time gaps

galapagos_albatrosses %>%
  filter(!st_is_empty(.)) %>%
  mutate(
    next_new_track = mt_time_lags(.) > set_units(4, "h") |
      is.na(mt_time_lags(.)),
    track_index = cumsum(lag(next_new_track, default = FALSE))
  ) %>%
  mt_set_track_id("track_index")

Monthly tracks

library(lubridate, quietly = TRUE)
galapagos_albatrosses %>%
  mt_set_track_id(paste(mt_track_id(.),
    sep = "_", month.name[month(mt_time(.))]
  ))