--- title: "Downloading data from movebank" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Downloading data from movebank} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) knitr::knit_hooks$set(time_it = local({ now <- NULL function(before, options) { if (before) { # record the current time before each chunk now <<- Sys.time() } else { # calculate the time difference after a chunk res <- units::as_units(difftime(Sys.time(), now)) # return a character string to show the time paste( '
This code took:', format(res, digits = 3, nsmall = 1), "
" ) } } })) krba <- getOption("keyring_backend") options("keyring_backend" = "env") ``` ```{r setup} library(move2) ``` # User credentials The credentials of the user are stored using the `keyring` package. With the following command a user can be added to the keyring. Run this line once, it will store your credentials in keyring. After that every time you load `move2` and execute a download function from movebank, these functions will retrieve your credentials from keyring. ```{r, eval=F} movebank_store_credentials("myUserName", "myPassword") ``` ```{r, echo=F} movebank_store_credentials("asf", "adsf", force = TRUE) ``` ```{r} movebank_remove_credentials() ``` The `keyring` package can use several mechanisms to store credentials, these are called backends. Some of these backends are operating system dependent, others are more general. Some of the operating systems dependent backends have the advantage that they do not require providing credentials when opening a new R session. The `move2` package uses the default backend as is returned by `keyring::default_backend()`, this function thus shows the backend `move2` is using. If you want to change the default you can use the `keyring_backend` option, for more details see the \code{\link[keyring]{backends}} documentation in the keyring package. **macOS** and **Windows** generally do not require entering an extra password for keyring. The default in **Linux** is often the `file` backend which can be confusing as it creates an encrypted file with credentials that need a password to unlock. In this case a separate password for the keyring file has to be entered for each new R session before the movebank password can be accessed. To avoid having to enter each time a keyring password the Secret Service API can be used by installing the `libsecret` library. (Debian/Ubuntu: `libsecret-1-dev`; Recent RedHat, Fedora and CentOS systems: `libsecret-devel`) ### Handling multiple Movebank accounts - use `key_name` If you have multiple user accounts on movebank, the easiest way is to give each of them a key name with the argument `key_name`. For the most used account also the default option can be used. The `movebank_store_credentials()` only has to be executed once for each account. After that the credentials will be retrieved from keyring. ```{r, eval=F} ## store credentials for the most used account. movebank_store_credentials("myUserName", "myPassword") ## store credentials for another movebank account movebank_store_credentials("myUserName_2", "myPassword_2", key_name = "myOtherAccount") ``` ```{r credentials, eval=TRUE, echo=FALSE} movebank_store_credentials("myUserName", "myPassword", force = TRUE) movebank_store_credentials("myUserName_2", "myPassword_2", key_name = "myOtherAccount", force = TRUE ) ``` When you want to download from Movebank using your default movebank account, nothing has to be specified before the download functions. If you want to download from Movebank with another account, than you should execute the line below, specifying the key name of the account to use, before the download functions are executed. ```{r option_setting} options("move2_movebank_key_name" = "myOtherAccount") ``` If in one script/Rsession you are using several accounts, to use the credentials of the default account execute the line below: ```{r options_setting_2} options("move2_movebank_key_name" = "movebank") ``` To check which accounts are stored in keyring: ```{r, eval=FALSE} keyring::key_list() # service username # 1 movebank myUserName # 2 myOtherAccount myUserName_2 ``` The `service` column corresponds to the names provided in `key_name`. The account entered without a key name (the default) will be called `movebank`. Note that the key names have to be unique, if there are several usernames with the same key name (service), it will cause an error. ### Removing user credentials from keyring To deleted credentials from keyring: ```{r remove_credentials} ## for the default account movebank_remove_credentials() ## for an account with a key name movebank_remove_credentials(key_name = "myOtherAccount") ``` Next we can check if the keys are successfully removed: ```{r, eval=FALSE} keyring::key_list() ``` Here you can check if the `movebank` service is successfully removed. # Downloading data ```{r, echo=FALSE} options("keyring_backend" = krba) if (Sys.info()["user"] != "bart") { if (Sys.getenv("MBPWD") != "") { options(keyring_backend = "env") move2::movebank_store_credentials("move2_user", Sys.getenv("MBPWD")) } else { knitr::opts_chunk$set(eval = FALSE) } } ``` ```{r, message=FALSE} library(dplyr) ``` ## Study information Using the function `movebank_download_study_info` it is possible to download information for all studies, for all studies that have certain property or for a single study. Any column of the table can be used to download only the information of the studies that comply with the selected property. This table contains all the information that can be seen on the "Study page" on the Movebank webpage, plus additional information about download rights and ownership. NOTE: due to incorrect timestamps in some Movebank studies, the function `movebank_download_study_info()` sometimes returns a *Warning* message as the one in the example below. You can ignore this (see issue [#17](https://gitlab.com/bartk/move2/-/issues/17)). - For all studies ```{r study_info, time_it=TRUE} movebank_download_study_info() ``` - All studies where you have access to download the data ```{r study_info_2, eval=F } movebank_download_study_info(i_have_download_access = TRUE) ``` - All studies where you are owner of the data ```{r study_info_3, eval=F } movebank_download_study_info(i_am_owner = TRUE) ``` - All studies with a creative commons zero license are returned. These are a good candidate for exploration and testing ```{r study_info_4, eval=F } movebank_download_study_info(license_type = "CC_0") ``` - For a specific study ```{r study_info_5, eval=F } movebank_download_study_info(id = 2911040) ``` ## Individual, tag and deployment information The function `movebank_download_deployment` downloads a table with the associated information to individuals, tags and deployments. This table reassembles the "Reference Data" table that can be downloaded from the Movebank webpage. ```{r galapagos_deployment, time_it=T} movebank_download_deployment("Galapagos Albatrosses") ``` ## Location & non-location data (Event data) With the function `movebank_download_study` the complete study from Movebank can be downloaded. There are many options to download a subset of the complete study. The `study_id` can either be specified either as an `integer` or `character` with respectively the id or name of the study. To get the study ID of a Movebank study use `movebank_get_study_id` ```{r get_studyid, time_it=T} movebank_get_study_id(study_id = "Galapagos Albatrosses") ``` - Download an entire study (all data of all sensors) ```{r download_allsensors, time_it=T} movebank_download_study_info(study_id = 2911040)$sensor_type_ids movebank_download_study( study_id = 2911040, sensor_type_id = c("gps", "acceleration") ) ``` - Download gps data of one individual ```{r download_oneindv, time_it=T} movebank_download_study( study_id = "Galapagos Albatrosses", sensor_type_id = "gps", individual_local_identifier = "unbanded-160" ) ``` - Download gps data for multiple individuals ```{r download_multiindv, time_it=T} movebank_download_study( study_id = 2911040, sensor_type_id = "gps", individual_local_identifier = c("1094-1094", "1103-1103") ) ``` ```{r download_multiindv_2, eval=F} ## it is also possible to use the numerical identifiers movebank_download_study( study_id = 2911040, sensor_type_id = "gps", individual_id = c(2911086, 2911065) ) ``` - Download acceleration data of one or several individuals ```{r download_acc, time_it=T} movebank_download_study(2911040, sensor_type_id = "acceleration", individual_local_identifier = "1094-1094" ) ``` Note that the `sensor_type_id` can either be specified either as an `integer` or `character` with respectively the 'id' or 'external_id' of the sensor. Here is how you get the correspondence table of sensor name and id: ```{r retrieve_sensors} movebank_retrieve(entity_type = "tag_type") ``` - Download data of a specific time window and sensor. The `timestamp_*` arguments can either be formatted as a `POSIXct` timestamp, `Date` or a character string (e.g. `"20080604133046000"`(yyyyMMddHHmmssSSS)). The `timestamp_*` arguments can also be used separately. ```{r download_time_win, time_it=T} movebank_download_study(2911040, sensor_type_id = "gps", timestamp_start = as.POSIXct("2008-08-01 00:00:00"), timestamp_end = as.POSIXct("2008-08-02 00:00:00") ) ``` - Reduce columns downloaded to a minimal set (only for location data). By default all attributes are downloaded, but to speed up download, the argument `attributes = NULL` can be used as it reduces the columns to download to the bare minimum. All individual attributes are downloaded as this does not take much time. Note that this option should only be used when downloading location data (by specifying the sensor), as only timestamps, location and track id is downloaded. ```{r quick_download, time_it=TRUE} movebank_download_study(1259686571, sensor_type_id = 653, attributes = NULL) ``` - Download only specific attributes. If only specific attributes want to be download you can state them in the argument attributes. The available attributes vary between studies and sensors. You can retrieve the list of available attributes for a specific sensor in given study. Note that only one sensor at a time can be stated. ```{r study_attrs, time_it=T} ## get all attributes available for a specific study and sensor movebank_retrieve( entity_type = "study_attribute", study_id = 2911040, sensor_type_id = "gps" )$short_name movebank_download_study( study_id = 2911040, sensor_type_id = "gps", attributes = c("height_above_ellipsoid", "eobs_temperature") ) ``` # Advanced usage For specific request it might be useful to directly retrieve information from the Movebank API. The `movebank_retrieve` function provides this functionality. The first argument is the entity type you would like to retrieve information for (e.g. `tag` or `event`). A study id is always required and other arguments make it possible to select. For more details how to use the api see the [documentation](https://github.com/movebank/movebank-api-doc/blob/master/movebank-api.md). ## Downloading undeployed data One common reason to use this options is to retrieve undeployed locations. In some cases a set of locations is collected before the tag attached to the animal for quality control or error measurements. The example below shows how all records for a specific tag can be retrieved. Filtering for locations where the `deployment_id` is `NA`, returns those locations that were collected while the tag was not deployed. The `timestamp_start` and `timestamp_end` might be good argument to filter down the data even more in the call to `movebank_retrieve`. By omitting the argument `tag_local_identifier` the entire study can downloaded. With the argument `sensor_type_id` the sensors can be specified. ```{r advance, time_it=TRUE} movebank_retrieve("event", study_id = 1259686571, tag_local_identifier = "193967", attributes = "all" ) %>% filter(is.na(deployment_id)) ```