DABOM Data Prep 'Cheatsheet' • PITcleanr

Introduction

The core of PITcleanr’s functionality relies on detections from PTAGIS, a configuration file that maps those detections onto user-defined “nodes”, and a parent-child table showing how those nodes are related to each other on a stream network (i.e. which nodes are connected). PITcleanr contains several functions to help create the configuration file and the parent-child table, but the user can also create those files by hand, or use files from a previous analysis (e.g. last year). This vignette shows only the essential steps needed to make use of PITcleanr in preparing data for the Dam Adult Branch Occupancy Model (DABOM).

Minimal R Script

If the user has file containing the complete tag histories for their tags of interest, plus the configuration file (config_file) and the parent-child file (parent_child_file) ready, the only script necessary to run in R (which will include saving the output as an Excel workbook) is:

# load the necessary packages
library(PITcleanr)
library(readr)
library(dplyr)

# read in configuration file, config_file = path to that file
my_config = read_csv(config_file)

# read in parent_child file, parent_child_file = path to that file
parent_child = read_csv(parent_child_file) %>%
  addParentChildNodes(configuration = my_config)

# read in PTAGIS complete tag histories (cth_file), site configuration, and parent-child, 
prepped_df = prepWrapper(cth_file = ptagis_cth,
                         file_type = "PTAGIS",
                         configuration = my_config,
                         parent_child = parent_child,
                         # filter out detections after a particular date
                         max_obs_date = "20180930",
                         # save results to file
                         save_file = T,
                         file_name = "PITcleanr_output.xlsx")

For additional details on site configuration, parent-child tables, and using the prepWrapper() function, continue reading.

Site Configuration

The user may want to start with a configuration file queried from PTAGIS, or generate their own. If they choose to generate their own, it must contain the following columns (at a minimum):

site_code: this is the site code associated with a detection site in PTAGIS
config_id: this is the configuration sequence from PTAGIS associated with a specific antenna arrangement for that site
antenna_id: this is the antenna code or ID from PTAGIS
node: this is the user-defined node that detections are mapped to i.e., how would the user like to group detections on a specific antenna, in a specific configuration, and at a specific site?

The user may want to include other columns such as site name, latitude/longitude, river kilometer (RKM), etc. but none of those are strictly necessary.

The user can start with a template by downloading all the metadata associated with each detection site in PTAGIS using the following:

config_template = queryPtagisMeta() %>%
  select(site_code, 
         # site_name,
         config_id = configuration_sequence,
         # antenna_group = antenna_group_name,
         antenna_id) %>%
  # set each node to NA to start
  mutate(node = NA_character_)

For a typical DABOM model, where nodes are generally defined as arrays, the user may wish to start with a configuration file that has nodes assigned to each antenna, based on the array the antenna is part of, and merely edit some rows.

config_template = buildConfig() %>%
  select(site_code, 
         site_name,
         config_id, 
         antenna_group,
         antenna_id, 
         node)

The user may save this template to a .csv file, and then edit it by hand, deleting sites they are not interested in, and filling in the node information for all other rows they are interested in.

# tell R where to save the csv file
config_file = "configuration.csv"

# save the template as a csv file
write_csv(config_template,
          file = config_file)

Once the user has finished editing it, they may read it back into R:

# read in configuration file
my_configuration = read_csv(config_file,
                            show_col_types = F)

“Parent-Child” Relationships

The parent-child table contains information about how sites or nodes are connected on the stream network. Each site/node child has a single parent, defined as the site/node most directly downstream of the child. Each parent may have multiple children, as the stream network may branch upstream of the parent. For additional information, see that section in this vignette.

If the user wishes to create a parent-child table by hand, the file must contain at a minimum the columns parent and child. This file can then be read into R.

# read in parent_child file
parent_child = read_csv(parent_child_file,
                        show_col_types = F)

The path to an example parent-child file included with PITcleanr can be found by running the following. The user could copy/paste that file to use as a template. Again, only the parent and child columns are necessary, although other columns may be useful.

# path to example parent-child table
system.file("extdata", 
            "TUM_parent_child.csv", 
            package = "PITcleanr",
            mustWork = TRUE)

The user may choose to only include site codes in the parent and child columns, and then use their configuration file to build out the parent child table to include nodes. To add nodes, use the addParentChildNodes function:

parent_child_nodes = addParentChildNodes(parent_child = parent_child,
                                         configuration = my_configuration)

Preparing for DABOM

Finally, the user must download complete tag histories from PTAGIS for the tags of interest, ensuring that the following attributes are selected to be included in the output:

Tag
Event Site Code
Event Date Time
Antenna
Antenna Group Configuration

This next group of attributes are not required, but are highly recommended:

Mark Species
Mark Rear Type
Event Type
Event Site Type
Event Release Site Code
Event Release Date Time

For further information about querying data from PTAGIS (or other sources), please read the vignette about reading in data.

Now the user can assign those detections to a “node”, compress that data, assign directionality and determine which observations should be retained for DABOM, all using the prepWrapper() function. The min_obs_date argument will filter out observations prior to that date, while the max_obs_date argument will ensure that all detections after that date are not retained.

# read in PTAGIS observations and compress them,
# then prep data for choice of which detections to keep
prepped_df = prepWrapper(cth_file = ptagis_file,
                         configuration = my_configuration,
                         parent_child = parent_child_nodes,
                         min_obs_date = "20180301",
                         max_obs_date = "20180930")

The user can even save that output directly to an Excel workbook or .csv file by setting save_file = TRUE and defining a file path and name in the file_name argument. Within this output (either in R, or in the saved Excel file), all the rows where the field user_keep_obs is NA or blank must be filled in with either TRUE or FALSE. The field auto_keep_obs provides a suggestion, but the ultimate choice is up to the user.

prepped_df = prepWrapper(cth_file = ptagis_file,
                         file_type = "PTAGIS",
                         configuration = my_config,
                         parent_child = parent_child_nodes,
                         min_obs_date = "20180301",
                         max_obs_date = "20180930",
                         save_file = T,
                         file_name = "PITcleanr_output.xlsx")

The prepWrapper() function also contains an option to add a column showing a character string of all the nodes a tag was detected at, in chronological order. Some users may find this display useful in determining which detections to retain. This option can be invoked by setting the argument add_tag_detects = TRUE within the prepWrapper() function.

DABOM Data Prep ‘Cheatsheet’

Kevin See

18 June, 2025

Introduction

Minimal R Script

Site Configuration

“Parent-Child” Relationships

Preparing for DABOM