Download example data and select columns to reduce printing.
galapagos_albatrosses <- movebank_download_study(2911040,
attributes = c(
"ground_speed",
"heading",
"height_above_ellipsoid",
"eobs_temperature",
"individual_local_identifier"
)
) %>%
select_track_data(study_site, weight, animal_life_stage)
When dealing with trajectories frequently duplicated records do occur. There are many reasons these can appear ranging from the way in which data is recorded to duplicated data transmissions and uploads. These data are often stored, but for analysis they need to be removed. A simple definition of a duplicate record would be an observation at exactly the same time of the same individual. However many tracking devices record additional information such as acceleration. These records frequently have the same time as location records meaning not all records with duplicated timestamps can directly be deleted.
Duplicated records can be found in the following way:
galapagos_albatrosses %>%
group_by(mt_time(), mt_track_id()) %>%
filter(n() != 1) %>%
arrange(mt_time())
If you are only interested in finding duplicated records where there is a location this can as follows (in this case there are none):
galapagos_albatrosses %>%
filter(!st_is_empty(.)) %>%
group_by(mt_time(), mt_track_id()) %>%
filter(n() != 1) %>%
arrange(mt_time())
The package also has some build in functions for filtering unique records. Several strategies for omitting duplicated records are build in.
First it is possible to omit all records that are a subset of other records, i.e. records that got added later with more information are retained. This happens with some tracking devices if data gets directly downloaded from the tag. As no information is lost this is the default strategy.
simulated_data <- mt_sim_brownian_motion(1:2)[rep(1:4, 2), ]
simulated_data$temperature <- c(1:3, NA, 1:2, 7:8)
simulated_data
simulated_data %>% mt_filter_unique()
This strategy how ever does not guarantee not duplicates are left, as two records might not be subsets from each other.
An alternative is to take a random record from each set of duplicates, this is not advised for formal analysis but might help for a quick inspection of data. This is also a lot quicker then inspecting subsets. How ever care needs to be taken as the example below, for example, results in empty points being retained at the cost of informative locations.
n
locationsforaging_area <- st_as_sfc(st_bbox(c(
xmin = -82, xmax = -77,
ymax = -0.5, ymin = -13
), crs = 4326))
library(ggplot2, quietly = TRUE)
ggplot() +
geom_sf(data = rnaturalearth::ne_coastline(returnclass = "sf", 50)) +
theme_linedraw() +
geom_sf(data = foraging_area, fill = "red", alpha = 0.3, color = "red") +
geom_sf(
data = galapagos_albatrosses %>% filter(!st_is_empty(.)),
aes(color = `individual_local_identifier`)
) +
coord_sf(
crs = sf::st_crs("+proj=aeqd +lon_0=-83 +lat_0=-6 +units=km"),
xlim = c(-1000, 600), ylim = c(-800, 700)
)
# Filter to tracks making it at least once to the foraging area
galapagos_albatrosses %>%
group_by(mt_track_id()) %>%
filter(any(st_intersects(geometry, foraging_area, sparse = FALSE)))
To use track attributes for filtering there is the
filter_track_data
function. This function works in the same
way as filter
from dplyr
except that is
operates on the track data. As soon as individuals are omitted from the
track data the associated event data is also omitted.