---
title: "Downloading data from movebank"
output:
rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Downloading data from movebank}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
knitr::knit_hooks$set(time_it = local({
now <- NULL
function(before, options) {
if (before) {
# record the current time before each chunk
now <<- Sys.time()
} else {
# calculate the time difference after a chunk
res <- units::as_units(difftime(Sys.time(), now))
# return a character string to show the time
paste(
'
This code took:',
format(res, digits = 3, nsmall = 1), "
"
)
}
}
}))
krba <- getOption("keyring_backend")
options("keyring_backend" = "env")
```
```{r setup}
library(move2)
```
# User credentials
The credentials of the user are stored using the `keyring` package. With the
following command a user can be added to the keyring. Run this line once, it
will store your credentials in keyring. After that every time you load `move2`
and execute a download function from movebank, these functions will retrieve
your credentials from keyring.
```{r, eval=F}
movebank_store_credentials("myUserName", "myPassword")
```
```{r, echo=F}
movebank_store_credentials("asf", "adsf", force = TRUE)
```
```{r}
movebank_remove_credentials()
```
The `keyring` package can use several mechanisms to store credentials, these are
called backends. Some of these backends are operating system dependent, others
are more general. Some of the operating systems dependent backends have the
advantage that they do not require providing credentials when opening a new R
session.
The `move2` package uses the default backend as is returned by
`keyring::default_backend()`, this function thus shows the backend `move2` is
using. If you want to change the default you can use the `keyring_backend`
option, for more details see the \code{\link[keyring]{backends}} documentation
in the keyring package.
**macOS** and **Windows** generally do not require entering an extra password
for keyring. The default in **Linux** is often the `file` backend which can be
confusing as it creates an encrypted file with credentials that need a password
to unlock. In this case a separate password for the keyring file has to be
entered for each new R session before the movebank password can be accessed. To
avoid having to enter each time a keyring password the Secret Service API can be
used by installing the `libsecret` library. (Debian/Ubuntu: `libsecret-1-dev`;
Recent RedHat, Fedora and CentOS systems: `libsecret-devel`)
### Handling multiple Movebank accounts - use `key_name`
If you have multiple user accounts on movebank, the easiest way is to give each
of them a key name with the argument `key_name`. For the most used account also
the default option can be used. The `movebank_store_credentials()` only has to
be executed once for each account. After that the credentials will be retrieved
from keyring.
```{r, eval=F}
## store credentials for the most used account.
movebank_store_credentials("myUserName", "myPassword")
## store credentials for another movebank account
movebank_store_credentials("myUserName_2", "myPassword_2", key_name = "myOtherAccount")
```
```{r credentials, eval=TRUE, echo=FALSE}
movebank_store_credentials("myUserName", "myPassword", force = TRUE)
movebank_store_credentials("myUserName_2", "myPassword_2",
key_name = "myOtherAccount", force = TRUE
)
```
When you want to download from Movebank using your default movebank account,
nothing has to be specified before the download functions. If you want to
download from Movebank with another account, than you should execute the line
below, specifying the key name of the account to use, before the download
functions are executed.
```{r option_setting}
options("move2_movebank_key_name" = "myOtherAccount")
```
If in one script/Rsession you are using several accounts, to use the credentials
of the default account execute the line below:
```{r options_setting_2}
options("move2_movebank_key_name" = "movebank")
```
To check which accounts are stored in keyring:
```{r, eval=FALSE}
keyring::key_list()
# service username
# 1 movebank myUserName
# 2 myOtherAccount myUserName_2
```
The `service` column corresponds to the names provided in `key_name`. The
account entered without a key name (the default) will be called `movebank`. Note
that the key names have to be unique, if there are several usernames with the
same key name (service), it will cause an error.
### Removing user credentials from keyring
To deleted credentials from keyring:
```{r remove_credentials}
## for the default account
movebank_remove_credentials()
## for an account with a key name
movebank_remove_credentials(key_name = "myOtherAccount")
```
Next we can check if the keys are successfully removed:
```{r, eval=FALSE}
keyring::key_list()
```
Here you can check if the `movebank` service is successfully removed.
# Downloading data
```{r, echo=FALSE}
options("keyring_backend" = krba)
if (Sys.info()["user"] != "bart") {
if (Sys.getenv("MBPWD") != "") {
options(keyring_backend = "env")
move2::movebank_store_credentials("move2_user", Sys.getenv("MBPWD"))
} else {
knitr::opts_chunk$set(eval = FALSE)
}
}
```
```{r, message=FALSE}
library(dplyr)
```
## Study information
Using the function `movebank_download_study_info` it is possible to download
information for all studies, for all studies that have certain property or
for a single study. Any column of the table can be used to download only the
information of the studies that comply with the selected property.
This table contains all the information that can be seen on the "Study page"
on the Movebank webpage, plus additional information about download rights
and ownership.
NOTE: due to incorrect timestamps in some Movebank studies, the function `movebank_download_study_info()` sometimes returns a *Warning* message as the one in the example below. You can ignore this (see issue [#17](https://gitlab.com/bartk/move2/-/issues/17)).
- For all studies
```{r study_info, time_it=TRUE}
movebank_download_study_info()
```
- All studies where you have access to download the data
```{r study_info_2, eval=F }
movebank_download_study_info(i_have_download_access = TRUE)
```
- All studies where you are owner of the data
```{r study_info_3, eval=F }
movebank_download_study_info(i_am_owner = TRUE)
```
- All studies with a creative commons zero license are returned. These are a good candidate for exploration and testing
```{r study_info_4, eval=F }
movebank_download_study_info(license_type = "CC_0")
```
- For a specific study
```{r study_info_5, eval=F }
movebank_download_study_info(id = 2911040)
```
## Individual, tag and deployment information
The function `movebank_download_deployment` downloads a table with the
associated information to individuals, tags and deployments. This table
reassembles the "Reference Data" table that can be downloaded from the
Movebank webpage.
```{r galapagos_deployment, time_it=T}
movebank_download_deployment("Galapagos Albatrosses")
```
## Location & non-location data (Event data)
With the function `movebank_download_study` the complete study from Movebank
can be downloaded. There are many options to download a subset of the
complete study. The `study_id` can either be specified either as an `integer`
or `character` with respectively the id or name of the study.
To get the study ID of a Movebank study use `movebank_get_study_id`
```{r get_studyid, time_it=T}
movebank_get_study_id(study_id = "Galapagos Albatrosses")
```
- Download an entire study (all data of all sensors)
```{r download_allsensors, time_it=T}
movebank_download_study_info(study_id = 2911040)$sensor_type_ids
movebank_download_study(
study_id = 2911040,
sensor_type_id = c("gps", "acceleration")
)
```
- Download gps data of one individual
```{r download_oneindv, time_it=T}
movebank_download_study(
study_id = "Galapagos Albatrosses",
sensor_type_id = "gps",
individual_local_identifier = "unbanded-160"
)
```
- Download gps data for multiple individuals
```{r download_multiindv, time_it=T}
movebank_download_study(
study_id = 2911040,
sensor_type_id = "gps",
individual_local_identifier = c("1094-1094", "1103-1103")
)
```
```{r download_multiindv_2, eval=F}
## it is also possible to use the numerical identifiers
movebank_download_study(
study_id = 2911040,
sensor_type_id = "gps",
individual_id = c(2911086, 2911065)
)
```
- Download acceleration data of one or several individuals
```{r download_acc, time_it=T}
movebank_download_study(2911040,
sensor_type_id = "acceleration",
individual_local_identifier = "1094-1094"
)
```
Note that the `sensor_type_id` can either be specified either as an `integer`
or `character` with respectively the 'id' or 'external_id' of the sensor. Here is how
you get the correspondence table of sensor name and id:
```{r retrieve_sensors}
movebank_retrieve(entity_type = "tag_type")
```
- Download data of a specific time window and sensor. The `timestamp_*`
arguments can either be formatted as a `POSIXct` timestamp, `Date` or a
character string (e.g. `"20080604133046000"`(yyyyMMddHHmmssSSS)). The
`timestamp_*` arguments can also be used separately.
```{r download_time_win, time_it=T}
movebank_download_study(2911040,
sensor_type_id = "gps",
timestamp_start = as.POSIXct("2008-08-01 00:00:00"),
timestamp_end = as.POSIXct("2008-08-02 00:00:00")
)
```
- Reduce columns downloaded to a minimal set (only for location data). By
default all attributes are downloaded, but to speed up download, the argument
`attributes = NULL` can be used as it reduces the columns to download to
the bare minimum. All individual attributes are downloaded as this does not
take much time. Note that this option should only be used when downloading
location data (by specifying the sensor), as only timestamps, location
and track id is downloaded.
```{r quick_download, time_it=TRUE}
movebank_download_study(1259686571, sensor_type_id = 653, attributes = NULL)
```
- Download only specific attributes. If only specific attributes want to be
download you can state them in the argument attributes. The available
attributes vary between studies and sensors. You can retrieve the list of
available attributes for a specific sensor in given study. Note that only
one sensor at a time can be stated.
```{r study_attrs, time_it=T}
## get all attributes available for a specific study and sensor
movebank_retrieve(
entity_type = "study_attribute",
study_id = 2911040,
sensor_type_id = "gps"
)$short_name
movebank_download_study(
study_id = 2911040,
sensor_type_id = "gps",
attributes = c("height_above_ellipsoid", "eobs_temperature")
)
```
# Advanced usage
For specific request it might be useful to directly retrieve information from
the Movebank API. The `movebank_retrieve` function provides this functionality.
The first argument is the entity type you would like to retrieve information for
(e.g. `tag` or `event`). A study id is always required and other arguments make
it possible to select. For more details how to use the api see the
[documentation](https://github.com/movebank/movebank-api-doc/blob/master/movebank-api.md).
## Downloading undeployed data
One common reason to use this options is to retrieve undeployed locations. In
some cases a set of locations is collected before the tag attached to the animal
for quality control or error measurements. The example below shows how all
records for a specific tag can be retrieved. Filtering for locations where the
`deployment_id` is `NA`, returns those locations that were collected while the
tag was not deployed. The `timestamp_start` and `timestamp_end` might be good
argument to filter down the data even more in the call to `movebank_retrieve`.
By omitting the argument `tag_local_identifier` the entire study can downloaded.
With the argument `sensor_type_id` the sensors can be specified.
```{r advance, time_it=TRUE}
movebank_retrieve("event",
study_id = 1259686571,
tag_local_identifier = "193967",
attributes = "all"
) %>%
filter(is.na(deployment_id))
```