Compressing PIT Tag Data

The heart of PITcleanr’s utility is taking all the various detections from a multitude of antennas and compressing them into a more manageable chunk of data. It does this with the compress() function by mapping each detection onto a user defined node, using a user-supplied (or PTAGIS default) configuration file, and then combining detections on the same node into a single row of data.

The complete tag history query output, either from PTAGIS or from other non-PTAGIS data (e.g., cth_file), will provide a record for every detection of each tag code in the tag list. Again, this may include multiple detections on the same antenna, or the same site within a short period of time, leading to an unwieldy and perhaps messy dataset.

The compress() function will work on a capture history that’s been read into R using readCTH(). Alternatively, the user can provide only the file name and path to where that data is stored (e.g. cth_file), and compress() will call readCTH() internally. In this example, we use the compress() function on the cth_file object containing the file path to our PTAGIS query results, and write the output to an object comp_obs containing the compressed observations.

# view path to example file, of course you can also set cth_file to your own PTAGIS query results
cth_file
#> [1] "/home/runner/work/_temp/Library/PITcleanr/extdata/TUM_chnk_cth_2018.csv"

# run compress() function on it
comp_obs = compress(cth_file)

# look at first parts of resulting object
head(comp_obs, 10)
#> # A tibble: 10 × 9
#>    tag_code       node    slot event_type_name n_dets min_det            
#>    <chr>          <chr>  <int> <chr>            <int> <dttm>             
#>  1 384.3B239AC47B NASONC     1 Mark                 1 2016-03-04 17:00:00
#>  2 384.3B239AC47B BO3        2 Observation         18 2018-05-18 09:36:59
#>  3 384.3B239AC47B BO4        3 Observation          7 2018-05-18 13:03:11
#>  4 384.3B239AC47B TD1        4 Observation          4 2018-05-20 17:20:39
#>  5 384.3B239AC47B JO1        5 Observation          2 2018-05-21 17:49:21
#>  6 384.3B239AC47B MC2        6 Observation         12 2018-05-24 09:04:38
#>  7 384.3B239AC47B PRA        7 Observation          4 2018-05-28 13:58:51
#>  8 384.3B239AC47B RIA        8 Observation          5 2018-05-30 15:56:55
#>  9 384.3B239AC47B TUF        9 Observation         10 2018-06-14 20:35:10
#> 10 384.3B239AC47B TUM       10 Recapture            1 2018-06-16 15:26:31
#> # ℹ 3 more variables: max_det <dttm>, duration <drtn>, travel_time <drtn>

# in another format
head(comp_obs, 10) |> 
  kable() |> 
  kable_styling()

tag_code	node	slot	event_type_name	n_dets	min_det	max_det	duration	travel_time
384.3B239AC47B	NASONC	1	Mark	1	2016-03-04 17:00:00	2016-03-04 17:00:00	0.0000000 mins	NA mins
384.3B239AC47B	BO3	2	Observation	18	2018-05-18 09:36:59	2018-05-18 12:00:05	143.1000000 mins	1158756.9833 mins
384.3B239AC47B	BO4	3	Observation	7	2018-05-18 13:03:11	2018-05-18 13:09:29	6.3000000 mins	63.1000 mins
384.3B239AC47B	TD1	4	Observation	4	2018-05-20 17:20:39	2018-05-20 17:20:44	0.0833333 mins	3131.1667 mins
384.3B239AC47B	JO1	5	Observation	2	2018-05-21 17:49:21	2018-05-21 17:51:20	1.9833333 mins	1468.6167 mins
384.3B239AC47B	MC2	6	Observation	12	2018-05-24 09:04:38	2018-05-24 09:54:53	50.2500000 mins	3793.3000 mins
384.3B239AC47B	PRA	7	Observation	4	2018-05-28 13:58:51	2018-05-28 14:01:50	2.9833333 mins	6003.9667 mins
384.3B239AC47B	RIA	8	Observation	5	2018-05-30 15:56:55	2018-05-30 16:38:55	42.0000000 mins	2995.0833 mins
384.3B239AC47B	TUF	9	Observation	10	2018-06-14 20:35:10	2018-06-16 12:22:30	2387.3333333 mins	21836.2500 mins
384.3B239AC47B	TUM	10	Recapture	1	2018-06-16 15:26:31	2018-06-16 15:26:31	0.0000000 mins	184.0167 mins

The output consists of a tibble containing columns for:

tag_code: The unique PIT tag ID.
node: By default, each site code from PTAGIS is considered a node. More on this below…
slot: A detection “slot” for each tag, numbered in chronological order. Also more on this below…
event_type_name: The type of “event”. Typically, mark, observation, recapture, or recovery.
n_dets: The number of detections that occurred within that slot.
min_det: The time of the first (min) detection in the slot.
max_det: The time of the last (max) detection in the slot.
duration: The duration of that slot (maximum - minimum detection time).
travel_time: The travel time between the previous slot and that one.

A note on “nodes”: A node is the spatial scale of interest for the user. If a configuration file is not supplied, then by default the compress() function considers the site code as the node. However, a node could be defined as the individual PIT antenna a detection was made on, or the array that antenna is a part of, or groups of arrays, or sites, or groups of sites, or possibly even larger (e.g, any detection in a particular tributary) depending on the spatial scale desired. The user may decide to define some arrays at particular sites to be their own nodes, while simultaneously lumping all the sites in a particular watershed into a single node. To utilize this kind of grouping, a configuration file or table must be supplied to the configuration argument in the compress() function. For more information about configuration files, see this vignette

Each slot in the output is defined as all detections on a particular node before the tag is detected on a different node. As an example, if a tag moves from node A to B and back to A, there will be three slots in the compressed data. The user can define a maximum number of minutes between detections before a new slot should be defined by supplying a value to the max_minutes argument to compress(). The units of the duration and travel_time columns can also be defined by the units argument. The default is minutes (mins). The user can translate the output to numeric values by running as.numeric() on those columns later.

The help menu for compress(), or any function for that matter, can be accessed using:

?PITcleanr::compress

Now, re-run the compress() function, except supplying a configuration:

library(readr)
my_configuration <- system.file("extdata", 
                                "TUM_configuration.csv", 
                                package = "PITcleanr",
                                mustWork = TRUE) |> 
  read_csv(show_col_types = F)

# re-run compress(), providing configuration
comp_obs2 = compress(cth_file,
                     configuration = my_configuration)

# look at first part of comp_obs
head(comp_obs2, 10)
#> # A tibble: 10 × 9
#>    tag_code       node   slot event_type_name n_dets min_det            
#>    <chr>          <chr> <int> <chr>            <int> <dttm>             
#>  1 384.3B239AC47B NAL_U     1 Mark                 1 2016-03-04 17:00:00
#>  2 384.3B239AC47B BO3       2 Observation         18 2018-05-18 09:36:59
#>  3 384.3B239AC47B BO4       3 Observation          7 2018-05-18 13:03:11
#>  4 384.3B239AC47B TD1       4 Observation          4 2018-05-20 17:20:39
#>  5 384.3B239AC47B JO1       5 Observation          2 2018-05-21 17:49:21
#>  6 384.3B239AC47B MC2       6 Observation         12 2018-05-24 09:04:38
#>  7 384.3B239AC47B PRA       7 Observation          4 2018-05-28 13:58:51
#>  8 384.3B239AC47B RIA_U     8 Observation          5 2018-05-30 15:56:55
#>  9 384.3B239AC47B TUM       9 Observation         10 2018-06-14 20:35:10
#> 10 384.3B239AC47B TUM      10 Recapture            1 2018-06-16 15:26:31
#> # ℹ 3 more variables: max_det <dttm>, duration <drtn>, travel_time <drtn>

Kevin See

18 June, 2025