vignettes/compress_data.Rmd
compress_data.Rmd
The heart of PITcleanr
’s utility is taking all the
various detections from a multitude of antennas and compressing them
into a more manageable chunk of data. It does this with the
compress()
function by mapping each detection onto a user
defined node, using a user-supplied (or PTAGIS default) configuration
file, and then combining detections on the same node into a single row
of data.
The complete tag history query output, either from PTAGIS or from other non-PTAGIS data
(e.g., cth_file
), will provide a record for every detection
of each tag code in the tag list. Again, this may include multiple
detections on the same antenna, or the same site within a short period
of time, leading to an unwieldy and perhaps messy dataset.
The compress()
function will work on a capture history
that’s been read into R using readCTH()
. Alternatively, the
user can provide only the file name and path to where that data is
stored (e.g. cth_file
), and compress()
will
call readCTH()
internally. In this example, we use the
compress()
function on the cth_file
object
containing the file path to our PTAGIS query results, and write the
output to an object comp_obs
containing the compressed
observations.
# view path to example file, of course you can also set cth_file to your own PTAGIS query results
cth_file
#> [1] "/home/runner/work/_temp/Library/PITcleanr/extdata/TUM_chnk_cth_2018.csv"
# run compress() function on it
comp_obs = compress(cth_file)
# look at first parts of resulting object
head(comp_obs, 10)
#> # A tibble: 10 × 9
#> tag_code node slot event_type_name n_dets min_det
#> <chr> <chr> <int> <chr> <int> <dttm>
#> 1 384.3B239AC47B NASONC 1 Mark 1 2016-03-04 17:00:00
#> 2 384.3B239AC47B BO3 2 Observation 18 2018-05-18 09:36:59
#> 3 384.3B239AC47B BO4 3 Observation 7 2018-05-18 13:03:11
#> 4 384.3B239AC47B TD1 4 Observation 4 2018-05-20 17:20:39
#> 5 384.3B239AC47B JO1 5 Observation 2 2018-05-21 17:49:21
#> 6 384.3B239AC47B MC2 6 Observation 12 2018-05-24 09:04:38
#> 7 384.3B239AC47B PRA 7 Observation 4 2018-05-28 13:58:51
#> 8 384.3B239AC47B RIA 8 Observation 5 2018-05-30 15:56:55
#> 9 384.3B239AC47B TUF 9 Observation 10 2018-06-14 20:35:10
#> 10 384.3B239AC47B TUM 10 Recapture 1 2018-06-16 15:26:31
#> # ℹ 3 more variables: max_det <dttm>, duration <drtn>, travel_time <drtn>
# in another format
head(comp_obs, 10) |>
kable() |>
kable_styling()
tag_code | node | slot | event_type_name | n_dets | min_det | max_det | duration | travel_time |
---|---|---|---|---|---|---|---|---|
384.3B239AC47B | NASONC | 1 | Mark | 1 | 2016-03-04 17:00:00 | 2016-03-04 17:00:00 | 0.0000000 mins | NA mins |
384.3B239AC47B | BO3 | 2 | Observation | 18 | 2018-05-18 09:36:59 | 2018-05-18 12:00:05 | 143.1000000 mins | 1158756.9833 mins |
384.3B239AC47B | BO4 | 3 | Observation | 7 | 2018-05-18 13:03:11 | 2018-05-18 13:09:29 | 6.3000000 mins | 63.1000 mins |
384.3B239AC47B | TD1 | 4 | Observation | 4 | 2018-05-20 17:20:39 | 2018-05-20 17:20:44 | 0.0833333 mins | 3131.1667 mins |
384.3B239AC47B | JO1 | 5 | Observation | 2 | 2018-05-21 17:49:21 | 2018-05-21 17:51:20 | 1.9833333 mins | 1468.6167 mins |
384.3B239AC47B | MC2 | 6 | Observation | 12 | 2018-05-24 09:04:38 | 2018-05-24 09:54:53 | 50.2500000 mins | 3793.3000 mins |
384.3B239AC47B | PRA | 7 | Observation | 4 | 2018-05-28 13:58:51 | 2018-05-28 14:01:50 | 2.9833333 mins | 6003.9667 mins |
384.3B239AC47B | RIA | 8 | Observation | 5 | 2018-05-30 15:56:55 | 2018-05-30 16:38:55 | 42.0000000 mins | 2995.0833 mins |
384.3B239AC47B | TUF | 9 | Observation | 10 | 2018-06-14 20:35:10 | 2018-06-16 12:22:30 | 2387.3333333 mins | 21836.2500 mins |
384.3B239AC47B | TUM | 10 | Recapture | 1 | 2018-06-16 15:26:31 | 2018-06-16 15:26:31 | 0.0000000 mins | 184.0167 mins |
The output consists of a tibble containing columns for:
A note on “nodes”: A node is the spatial
scale of interest for the user. If a configuration file is not supplied,
then by default the compress()
function considers the site
code as the node. However, a node could be defined as the individual PIT
antenna a detection was made on, or the array that antenna is a part of,
or groups of arrays, or sites, or groups of sites, or possibly even
larger (e.g, any detection in a particular tributary) depending on the
spatial scale desired. The user may decide to define some arrays at
particular sites to be their own nodes, while simultaneously lumping all
the sites in a particular watershed into a single node. To utilize this
kind of grouping, a configuration file or table must be supplied to the
configuration
argument in the compress()
function. For more information about configuration files, see this vignette
Each slot in the output is defined as all detections on a particular
node before the tag is detected on a different node. As an example, if a
tag moves from node A to B and back to A, there will be three slots in
the compressed data. The user can define a maximum number of minutes
between detections before a new slot should be defined by supplying a
value to the max_minutes
argument to
compress()
. The units of the duration
and
travel_time
columns can also be defined by the
units
argument. The default is minutes (mins
).
The user can translate the output to numeric values by running
as.numeric()
on those columns later.
The help menu for compress()
, or any function for that
matter, can be accessed using:
?PITcleanr::compress
Now, re-run the compress()
function, except supplying a
configuration
:
library(readr)
my_configuration <- system.file("extdata",
"TUM_configuration.csv",
package = "PITcleanr",
mustWork = TRUE) |>
read_csv(show_col_types = F)
# re-run compress(), providing configuration
comp_obs2 = compress(cth_file,
configuration = my_configuration)
# look at first part of comp_obs
head(comp_obs2, 10)
#> # A tibble: 10 × 9
#> tag_code node slot event_type_name n_dets min_det
#> <chr> <chr> <int> <chr> <int> <dttm>
#> 1 384.3B239AC47B NAL_U 1 Mark 1 2016-03-04 17:00:00
#> 2 384.3B239AC47B BO3 2 Observation 18 2018-05-18 09:36:59
#> 3 384.3B239AC47B BO4 3 Observation 7 2018-05-18 13:03:11
#> 4 384.3B239AC47B TD1 4 Observation 4 2018-05-20 17:20:39
#> 5 384.3B239AC47B JO1 5 Observation 2 2018-05-21 17:49:21
#> 6 384.3B239AC47B MC2 6 Observation 12 2018-05-24 09:04:38
#> 7 384.3B239AC47B PRA 7 Observation 4 2018-05-28 13:58:51
#> 8 384.3B239AC47B RIA_U 8 Observation 5 2018-05-30 15:56:55
#> 9 384.3B239AC47B TUM 9 Observation 10 2018-06-14 20:35:10
#> 10 384.3B239AC47B TUM 10 Recapture 1 2018-06-16 15:26:31
#> # ℹ 3 more variables: max_det <dttm>, duration <drtn>, travel_time <drtn>