PITcleanr Workshop

Introduction

PIT-tag data is often difficult to summarize into meaningful information due to the large amount of tag observations. For instance, PTAGIS serves as a data repository containing a single record for every observation of a tag code; including the initial detection, or “mark”, when the tag is implanted, and all subsequent detections on each antenna the tag passes, or at recapture (e.g., at weirs), and recovery (e.g., carcass surveys) sites. The number of detections that exist for each individual tag code can easily be in excess of 1,000s of observations. Therefore, querying PTAGIS for all detections of a large number of tags often leads to a wealth of data that is unwieldy to process. PITcleanr aims to “compress” or minimize the data to a more manageable size, without losing any of the information contained in the full dataset, and to provide useful tools to help organize the data into fish pathways and stream networks.

This document serves as a tutorial to the most common functions contained in PITcleanr and provides some simple data summaries that may help you get started using the package, and performing your own PIT-tag data analyses. The workshop contains four sections:

querying data,
compressing tag observations,
developing fish pathways, and finally,
an example case study.

Additional information on PITcleanr can be found on its GitHub page or website. Materials from this workshop can be found here.

Installation

PITcleanr can be installed from GitHub using the R remotes package. Additional information on package installation can be found on the PITcleanr website README.

# install PITcleanr, if necessary
install.packages("remotes")
remotes::install_github("KevinSee/PITcleanr",
                        build_vignettes = TRUE)

Load Packages

Once PITcleanr is successfully installed it can be loaded into the R session using the library() function. In this workshop, we will also use functions from the following packages; tidyverse, sf, and kableExtra. Use the install.packages() function for packages that are not already installed.

# E.g., for packages not installed:
# install.packages(c("tidyverse", "sf"))

# load necessary packages
library(PITcleanr)
library(tidyverse)
library(sf)
library(kableExtra)

Query Site Data

Interrogation Sites

PITcleanr can be used to query and download metadata for PTAGIS interrogation sites using the function queryInterrogationMeta(). The site_code argument is used to specify sites; alternatively, set site_code = NULL to retrieve metadata for all sites.

# interrogation site metadata
int_meta_KRS = queryInterrogationMeta(site_code = "KRS") # South Fork Salmon River, Krassel
int_meta = queryInterrogationMeta(site_code = NULL)

As a simple example, after downloading the interrogation metadata you can quickly summarize the number of active instream remote detections systems in PTAGIS by organization using tidyverse functions.

# count active instream remote sites by organization
int_meta %>%
  filter(siteType == "Instream Remote Detection System",
         active) %>%
  count(operationsOrganizationCode) %>%
  ggplot(aes(x = operationsOrganizationCode, y = n)) +
    geom_col() + 
    coord_flip()

Or, we can use the int_meta object to map our interrogation sites using the sf package. State polygons are available from the maps package.

# plot location of interrogation sites
# install.packages('maps')

# get state boundaries
pnw <-
  st_as_sf(maps::map("state", 
                     region = c('ID', 'WA', 'OR'), 
                     plot = FALSE, 
                     fill = TRUE))

# create spatial feature object of IPTDS
int_sf <- 
  int_meta %>% 
  filter(siteType == "Instream Remote Detection System",
         !is.na(longitude),
         !is.na(latitude)) %>%
  st_as_sf(coords = c('longitude', 'latitude'), 
           crs = 4326) # WGS84

ggplot() + 
  geom_sf(data = pnw) + 
  geom_sf(data = int_sf, aes(color = active))

PITcleanr also includes the queryInterrogationConfig() function to query and download configuration metadata for PTAGIS interrogation sites. The configuration metadata contains information about individual antennas, transceivers, and their arrangement across time. Similar to above, site_code can be set to NULL to get configuration metadata for all PTAGIS interrogation sites or you can designate the site of interest.

# configuration of an interrogation site
int_config <- queryInterrogationConfig(site_code = 'ZEN') # Secesh River, Zena Creek Ranch

int_config %>%
  kable() %>%
  kable_styling(full_width = TRUE,
                bootstrap_options = 'striped')

siteCode	siteName	siteType	configurationSequence	startDate	endDate	isCurrent	antennaGroupName	antennaGroupSortValue	transceiverID	transceiverType	antennaID
ZEN	Secesh River at Zena Cr. Ranch	Instream	110	2012-04-12	NA	TRUE	UPPER IN-STREAM ARRAY	A	A0	FS1001M	A1
ZEN	Secesh River at Zena Cr. Ranch	Instream	110	2012-04-12	NA	TRUE	UPPER IN-STREAM ARRAY	A	A0	FS1001M	A2
ZEN	Secesh River at Zena Cr. Ranch	Instream	110	2012-04-12	NA	TRUE	UPPER IN-STREAM ARRAY	A	A0	FS1001M	A3
ZEN	Secesh River at Zena Cr. Ranch	Instream	110	2012-04-12	NA	TRUE	UPPER IN-STREAM ARRAY	A	A0	FS1001M	A4
ZEN	Secesh River at Zena Cr. Ranch	Instream	110	2012-04-12	NA	TRUE	LOWER IN-STREAM ARRAY	B	B0	Unknown	B1
ZEN	Secesh River at Zena Cr. Ranch	Instream	110	2012-04-12	NA	TRUE	LOWER IN-STREAM ARRAY	B	B0	Unknown	B2
ZEN	Secesh River at Zena Cr. Ranch	Instream	110	2012-04-12	NA	TRUE	LOWER IN-STREAM ARRAY	B	B0	Unknown	B3
ZEN	Secesh River at Zena Cr. Ranch	Instream	110	2012-04-12	NA	TRUE	LOWER IN-STREAM ARRAY	B	B0	Unknown	B4
ZEN	Secesh River at Zena Cr. Ranch	Instream	110	2012-04-12	NA	TRUE	LOWER IN-STREAM ARRAY	B	B0	Unknown	B5
ZEN	Secesh River at Zena Cr. Ranch	Instream	110	2012-04-12	NA	TRUE	LOWER IN-STREAM ARRAY	B	B0	Unknown	B6

Mark/Recapture/Recovery Sites

A similar function exists to query and download metadata for PTAGIS mark-recapture-recovery (MRR) sites; however, configuration data doesn’t exist for MRR sites so there is no need for a queryMRRConfig() function.

# all MRR sites
mrr_meta <- queryMRRMeta(site = NULL)

head(mrr_meta, 3) %>%
  kable() %>%
  kable_styling(full_width = TRUE,
                bootstrap_options = "striped")

siteCode	name	rkm	type	huC8Code	latitude	longitude	coordinateType	legacyType	maxKm	markSiteOnly	pointLocation	shapeType	streamLocationId	spatialLineId	spatialPolyId	spatialFacId
15MILC	Fifteen Mile Creek, near The Dalles, Oregon	309	River	17070105	45.48169	-121.0756	Midpoint	R	86	FALSE	FALSE	line	1211264456185	119	NA	NA
1890SC	1890s Side Channel Methow River	843.067	River	17020008	48.37819	-120.1331	Midpoint	R	3	FALSE	FALSE	line	1201196483692	586	NA	NA
18MILC	Eighteenmile Creek, Lemhi River Basin	522.303.416.092	River	17060204	44.52558	-113.2369	Midpoint	R	43	FALSE	FALSE	line	1133545446822	145	NA	NA

All PTAGIS Sites

PITcleanr includes additional wrapper functions that can query interrogation and MRR sites together. The queryPtagisMeta() function downloads all INT and MRR sites at once. The buildConfig() function goes a step further and downloads all INT and MRR site metadata, plus INT site configuration information, and then applies some formatting to combine all metadata into a single data frame. Additionally, buildConfig() creates a node field which can be used to assign and group detections to a user-specified spatial scale.

# wrapper to download all site meta
# ptagis_meta <- queryPtagisMeta()

# wrapper to download site metadata and INT configuration data at once, and apply some formatting
config <- buildConfig(node_assign = "array",
                      array_suffix = "UD")

# number of unique sites in PTAGIS
n_distinct(config$site_code)
#> [1] 1953

# number of unique nodes in PTAGIS
n_distinct(config$node)
#> [1] 2224

# the number of detection locations/antennas in PTAGIS
nrow(config)
#> [1] 7736

head(config, 9) %>%
  select(-site_description) %>% # remove site_description column for formatting
  kable() %>%
  kable_styling(full_width = TRUE,
                bootstrap_options = "striped")

site_code	config_id	antenna_id	node	start_date	end_date	site_type	site_name	antenna_group	site_type_name	rkm	rkm_total	latitude	longitude
0HR	100	01	0HR	2022-07-26	NA	INT	Henrys Instream Array	Main Array	Instream Remote Detection System	522.303.416.047	1288	44.89690	-113.6247
0HR	100	02	0HR	2022-07-26	NA	INT	Henrys Instream Array	Main Array	Instream Remote Detection System	522.303.416.047	1288	44.89690	-113.6247
0HR	100	03	0HR	2022-07-26	NA	INT	Henrys Instream Array	Main Array	Instream Remote Detection System	522.303.416.047	1288	44.89690	-113.6247
158	100	D1	158_U	2011-11-29	NA	INT	Fifteenmile Ck at Eightmile Ck	Upstream Eightmile Ck	Instream Remote Detection System	309.004	313	45.60643	-121.0863
158	100	D2	158_U	2011-11-29	NA	INT	Fifteenmile Ck at Eightmile Ck	Middle Eightmile Ck	Instream Remote Detection System	309.004	313	45.60643	-121.0863
158	100	D3	158_D	2011-11-29	NA	INT	Fifteenmile Ck at Eightmile Ck	Downstream Eightmile Ck	Instream Remote Detection System	309.004	313	45.60643	-121.0863
158	100	D4	158_U	2011-11-29	NA	INT	Fifteenmile Ck at Eightmile Ck	Upstream Fifteenmile Ck	Instream Remote Detection System	309.004	313	45.60643	-121.0863
158	100	D5	158_U	2011-11-29	NA	INT	Fifteenmile Ck at Eightmile Ck	Middle Fifteenmile Ck	Instream Remote Detection System	309.004	313	45.60643	-121.0863
158	100	D6	158_D	2011-11-29	NA	INT	Fifteenmile Ck at Eightmile Ck	Downstream Fifteenmile Ck	Instream Remote Detection System	309.004	313	45.60643	-121.0863

The buildConfig() column includes the arguments node_assign and array_suffix. The node_assign argument includes the options c("array", "site", "antenna") which allows the user to assign PIT tag arrays, entire sites, or individual antennas, respectively, as nodes, which can then be used as a grouping variable for subsequent data prep and analysis. In the case node_assign = "array", the array_suffix can be used to assign suffixes to each PIT-tag array. By default, observations are assigned to individual arrays, and upstream and downstream arrays are assigned the suffixes “_U” and “_D”, respectively. In the case of 3-span arrays, observations at the middle array are assigned to the upstream array. Play with the different settings and review the output to learn the differences.

tmp_config <- buildConfig(node_assign = "site") # c("array", "site", "antenna)

# The option to use "UMD" is still in development.
tmp_config <- buildConfig(node_assign = "array",
                          array_suffix = "UMD") # c("UD", "UMD", "AOBO")

Test Tags

To help assess a site’s operational condition PITcleanr includes the function queryTimerTagSite(). The function queries the timer test tag history from PTAGIS for a single interrogation site during a calendar year. The test tag information is useful to quickly examine the site for operation conditions and to help understand potential assumption violations with future analyses.

# check for timer tag to evaluate site operation
test_tag <-
  queryTimerTagSite(site_code = "ZEN", 
                    year = 2023, 
                    api_key = my_api_key) # requires an API key

test_tag %>%
  ggplot(aes(x = time_stamp, y = 1)) +
  geom_point() +
  theme(axis.text.x = element_text(angle = -45, vjust = 0.5),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank()) +
  facet_grid(antenna_id ~ transceiver_id)

Compressing Tag Observations

PTAGIS Complete Tag Histories

In a typical PIT tag analysis workflow a user starts with a list of tags containing all the codes of interest, and then searches a data repository for all the observations of the listed tags.

A tag list is typically a .txt file with one row per tag code. For convenience, we’ve included one such file with PITcleanr, which contains tag IDs for Chinook salmon adults implanted with tags at Tumwater Dam in 2018. The following can be used to find the path to this example file. Alternatively, the user can provide the file path to their own tag list.

# generate file path to example tag list in PITcleanr
tag_list = system.file("extdata", 
                       "TUM_chnk_tags_2018.txt", 
                       package = "PITcleanr",
                       mustWork = TRUE)

read_delim(tag_list, delim='\t') %>%
  head(5) %>%
  kable(col.names = NULL) %>%
  kable_styling(full_width = TRUE,
                bootstrap_options = "striped")


3DD.00777C5E34
3DD.00777C7728
3DD.00777C8493
3DD.00777C9185
3DD.00777C91C3

# or simple example to your own file on desktop
#tag_list = "C:/Users/username/Desktop/my_tag_list.txt"

Read PTAGIS CTH

Once the user has created their own tag list, or located the above example, load the list into PTAGIS to query the complete tag history for each of the listed tags. Detailed instructions for querying complete tag histories in PTAGIS for use in PITcleanr can be found here. The following fields or attributes are required in the PTAGIS complete tag history query:

Tag
Event Site Code
Event Date Time
Antenna
Antenna Group Configuration

Other available PTAGIS fields of interest may also be included. We highly recommended the following fields to help with future summaries:

Mark Species
Mark Rear Type
Event Type
Event Site Type
Event Release Site Code
Event Release Date Time

For convenience, we’ve included an example complete tag history in PITcleanr from the above tag list, or you can provide the file path to your own complete tag history.

# file path to the example CTH in PITcleanr
ptagis_file <- system.file("extdata", 
                          "TUM_chnk_cth_2018.csv",
                          package = "PITcleanr",
                          mustWork = TRUE)

The PITcleanr function readCTH() is used to read in your complete tag history. Once the data is read by readCTH() it then re-formats the column names to be consistent with subsequent PITcleanr functions. In the following example we are only interested in tag observations after the start of the adult run year; so we use the filter() function to remove the excess observations.

raw_ptagis = readCTH(ptagis_file,
                     file_type = "PTAGIS") %>%
  # filter for only detections after start of run year
  filter(event_date_time_value >= lubridate::ymd(20180301))

# number of detections
nrow(raw_ptagis)
#> [1] 13294

# number of unique tags
dplyr::n_distinct(raw_ptagis$tag_code)
#> [1] 1406

We can compare the downloaded complete tag history file with the output of the readCTH function by looking at the first couple rows of data.

head(raw_ptagis, 4) %>%
  kable() %>%
  kable_styling(full_width = TRUE,
              bootstrap_options = "striped")

tag_code	event_site_code_value	event_date_time_value	antenna_id	antenna_group_configuration_value	mark_species_name	mark_rear_type_name	event_type_name	event_site_type_description	event_release_site_code_code	event_release_date_time_value	cth_count
384.3B239AC47B	NAL	2018-07-01 00:06:16	63	100	Chinook	Wild Fish or Natural Production	Observation	Instream Remote Detection System	NA	NA	1
384.3B239AC47B	UWE	2018-06-23 22:03:13	B2	110	Chinook	Wild Fish or Natural Production	Observation	Instream Remote Detection System	NA	NA	1
384.3B239AC47B	BO3	2018-05-18 09:36:59	18	110	Chinook	Wild Fish or Natural Production	Observation	Adult Fishway	NA	NA	1
384.3B239AC47B	BO3	2018-05-18 09:37:11	16	110	Chinook	Wild Fish or Natural Production	Observation	Adult Fishway	NA	NA	1


head(read_csv(ptagis_file), 4) %>%
  kable() %>%
  kable_styling(full_width = TRUE,
                bootstrap_options = "striped")

Tag Code	Mark Species Name	Mark Rear Type Name	Event Type Name	Event Site Type Description	Event Site Code Value	Event Date Time Value	Antenna ID	Antenna Group Configuration Value	Event Release Site Code Code	Event Release Date Time Value	CTH Count
384.3B239AC47B	Chinook	Wild Fish or Natural Production	Mark	River	NASONC	3/1/2016 10:34:00 AM	NA	0	NASONC	3/4/2016 5:00:00 PM	1
384.3B239AC47B	Chinook	Wild Fish or Natural Production	Observation	Instream Remote Detection System	NAL	7/1/2018 12:06:16 AM	63	100	NA	NA	1
384.3B239AC47B	Chinook	Wild Fish or Natural Production	Observation	Instream Remote Detection System	UWE	6/23/2018 10:03:13 PM	B2	110	NA	NA	1
384.3B239AC47B	Chinook	Wild Fish or Natural Production	Observation	Adult Fishway	BO3	5/18/2018 9:36:59 AM	18	110	NA	NA	1

Note that readCTH() includes the argument file_type which can be used to import data from Biomark’s BioLogic (“Biologic_csv”) database or data downloaded directly from a PIT-tag reader, in either a .log or .xlsx format (“raw”). These formats may be useful for smaller studies or studies outside the Columbia River Basin whose data may not be uploaded to PTAGIS.

Quality Control

The qcTagHistory() function can be used to perform basic quality control on PTAGIS detections. Note it can be run on either the file path to the complete tag histories or the data frame returned by readCTH(). qcTagHistory() provides information on disowned and orphan tags and information on the various release groups included in the detections. For more information, see the PTAGIS FAQ website.

# using the complete tag history file path
qc_detections = qcTagHistory(ptagis_file)

# or the complete tag history tibble from readCTH()
qc_detections = qcTagHistory(raw_ptagis)

# view release batch information
qc_detections$rel_time_batches %>%
  kable() %>%
  kable_styling(full_width = TRUE,
                bootstrap_options = "striped")

mark_species_name	year	event_site_type_description	event_type_name	event_site_code_value	n_tags	n_events	n_release	rel_greq_event	event_rel_ratio
Chinook	2018	Dam	Mark	TUM	1350	1350	1339	1350	1.008215
Chinook	2018	Dam	Recapture	TUM	136	143	142	143	1.007042
Chinook	2018	Intra-Dam Release Site	Mark	BONAFF	18	12	12	18	1.000000
Chinook	2018	Intra-Dam Release Site	Mark	WELLD2	2	2	2	2	1.000000
Chinook	2018	Trap or Weir	Recapture	CHIW	66	5	5	66	1.000000
Chinook	2018	Trap or Weir	Recapture	CHIWAT	11	8	8	12	1.000000

Single Tag Codes and Marking Files

Often you may be interested in observations for a single tag code. To help with this, PITcleanr contains the queryCapHist() function to query observations for a single tag from DART’s mirrored data repository of PTAGIS. This function can be helpful if you are interested in a small number of tag codes and want to programmatically develop summaries. The function requires a tag code input and will return observation data post marking.

# query PTAGIS complete tag history for a single tag code; some examples
tagID <- "3D6.1D594D4AFA" # IR3, GOJ, BO1 -> IR5
tagID <- "3DD.003D494091" # GRA, IR1 -> IML

tag_cth = queryCapHist(ptagis_tag_code = tagID)

# example to summarise CTH
tag_cth %>%
  group_by(event_type_name, 
           event_site_code_value) %>%
  summarise(n_dets = n(),
            min_det = min(event_date_time_value),
            max_det = max(event_date_time_value)) %>%
  mutate(duration = difftime(max_det, min_det, units = "hours")) %>%
  kable() %>%
  kable_styling(full_width = TRUE,
            bootstrap_options = 'striped')

event_type_name	event_site_code_value	n_dets	min_det	max_det	duration
Observation	GRA	11	2023-06-08 14:30:48	2023-06-08 14:35:48	0.0833333 hours
Observation	IML	3	2023-07-02 15:42:18	2023-07-03 08:16:29	16.5697222 hours
Observation	IR1	1	2023-06-13 15:52:29	2023-06-13 15:52:29	0.0000000 hours
Observation	IR2	1	2023-06-14 16:00:55	2023-06-14 16:00:55	0.0000000 hours
Observation	IR3	4	2023-06-24 11:24:12	2023-06-24 11:27:44	0.0588889 hours
Observation	IR4	8	2023-07-01 19:50:12	2023-07-01 20:49:50	0.9938889 hours
Recapture	IMNAHW	1	2023-07-03 10:00:00	2023-07-03 10:00:00	0.0000000 hours

DART no longer provides mark event information with a tag code’s observation data. If the marking information is necessary you can change the argument include_mark to TRUE to also query the PTAGIS marking record. Note: including the mark record requires a PTAGIS “API Key” which can be requested from the PTAGIS API website.

cth_node <- queryCapHist(ptagis_tag_code = tagID,
                         include_mark = TRUE,
                         api_key = my_api_key)

Additionally, the queryMRRDataFile() function can be used to query a raw MRR file from PTAGIS. NOTE: only the latest corrected version of the MRR file is returned.

# MRR tag file summaries
#mrr_file <- "NBD15065.TUM"
mrr_file <- "JSW-2022-175-001.xml"

mrr_data <- queryMRRDataFile(mrr_file)

# summary of injuries
mrr_data %>%
  group_by(species_run_rear_type, 
           text_comments) %>%
  filter(!is.na(text_comments)) %>% # remove NA comments
  count() %>%
  ggplot(aes(x = text_comments, 
             y = n, 
             fill = species_run_rear_type)) +
  geom_col() +
  coord_flip() +
  labs(title = paste0(unique(mrr_data$release_site), ' : ', mrr_file))

If more than one mark file is desired the user could iterate over queryMRRDataFile() using purrr::map_df() to query multiple files and combine them into a single data frame or list.

julian = str_pad(1:10, 3, pad = 0) # julian 001 - 010
yr = 2024                          # tagging year

# Imnaha smolt trap, first 10 days of 2024
mrr_files = paste0("IMN-", yr, "-", julian, "-NT1.xml")

# iterate over files
mrr_data <- map_df(mrr_files,  
             .f = queryMRRDataFile)

# view mrr data
head(mrr_data, 5)
#> # A tibble: 5 × 24
#>   capture_method event_date          event_site event_type life_stage
#>   <chr>          <dttm>              <chr>      <chr>      <chr>     
#> 1 SCREWT         2024-01-01 12:00:00 IMNTRP     Tally      Juvenile  
#> 2 SCREWT         2024-01-02 12:00:00 IMNTRP     Tally      Juvenile  
#> 3 SCREWT         2024-01-03 08:59:12 IMNTRP     Mark       Juvenile  
#> 4 SCREWT         2024-01-04 08:01:40 IMNTRP     Tally      Juvenile  
#> 5 SCREWT         2024-01-04 08:04:06 IMNTRP     Mark       Juvenile  
#> # ℹ 19 more variables: mark_method <chr>, migration_year <chr>,
#> #   organization <chr>, pit_tag <chr>, release_date <dttm>, release_site <chr>,
#> #   sequence_number <dbl>, species_run_rear_type <chr>, tagger <chr>,
#> #   n_fish <chr>, brood_year <chr>, length <dbl>, location_source <chr>,
#> #   location_latitude <chr>, location_longitude <chr>, mark_temperature <chr>,
#> #   release_temperature <chr>, weight <chr>, text_comments <chr>

# summarise new marks by day
mrr_data %>% 
  filter(event_type == 'Mark') %>%
  mutate(release_date = date(release_date)) %>%
  group_by(release_date, 
           species_run_rear_type) %>%
  count() %>%
  kable() %>%
  kable_styling(full_width = TRUE,
                bootstrap_options = "striped")

release_date	species_run_rear_type	n
2024-01-03	12W	1
2024-01-04	12W	2
2024-01-05	12W	2
2024-01-08	12W	5
2024-01-10	12W	2

The mark data from queryMRRDataFile() and the observations data from queryCapHist() could be combined to form a complete tag history. However, this process is not recommended for large number of tag codes.

Compress Detections

After downloading the complete tag history data from PTAGIS and reading the file into the R environment with readCTH() we are now ready to start summarizing the information. The first step is to compress() the tag histories into a more manageable format based on the “array” configuration from above using the function buildConfg() and the arguments node_assign == "array" and array_suffix == "UD".

The compress() function assigns detections on individual antennas to “nodes” (if no configuration file is provided, nodes are considered site codes), summarizes the first and last detection time on each node, as well as the number of detections that occurred during that “slot”. More information on the compress() function can be found here or using ?PITcleanr::compress.

# compress observations
comp_obs <-
  compress(cth_file = raw_ptagis,
           configuration = config)

# view compressed observations
head(comp_obs)
#> # A tibble: 6 × 9
#>   tag_code       node   slot event_type_name n_dets min_det            
#>   <chr>          <chr> <int> <chr>            <int> <dttm>             
#> 1 384.3B239AC47B BO3       1 Observation         18 2018-05-18 09:36:59
#> 2 384.3B239AC47B BO4       2 Observation          7 2018-05-18 13:03:11
#> 3 384.3B239AC47B TD1       3 Observation          4 2018-05-20 17:20:39
#> 4 384.3B239AC47B JO1       4 Observation          2 2018-05-21 17:49:21
#> 5 384.3B239AC47B MC2       5 Observation         12 2018-05-24 09:04:38
#> 6 384.3B239AC47B PRA       6 Observation          4 2018-05-28 13:58:51
#> # ℹ 3 more variables: max_det <dttm>, duration <drtn>, travel_time <drtn>

# compare number of detections
nrow(raw_ptagis)
#> [1] 13294
nrow(comp_obs)
#> [1] 6755

The result of compress() can summarized in a variety of ways, including the number of unique tags at each node.

n_distinct(comp_obs$tag_code)
#> [1] 1406

# number of unique tags per node
comp_obs %>%
  group_by(node) %>%
  summarise(n = n_distinct(tag_code)) %>%
  ggplot(aes(x = fct_reorder(node,n), y = n)) +
  geom_col() +
  coord_flip()

Compressing individual antenna detections to a more useful spatial scale and a more manageable size and format is a useful initial step. Further, summaries of compressed detections can answer some questions. However, more meaningful inference from PIT-tag data often requires understanding the relationship of your sites or nodes to one another, to better understand fish movements.

Fish Pathways

Parent-Child Tables

A next typical step in a PIT-tag analysis within a stream or river network is to order the nodes, or observation sites, along the network using their location. Ordering observations is completed using a “parent-child” table. To create a parent-child table, we first need to extract the sites of interest and place them on the stream using the lat/long data stored in PTAGIS. The PITcleanr function extractSites() can be used to extract detection sites from your complete tag history file. Setting as_sf == T will return your data frame as a simple features object using the sf package.

# extract sites from complete tag histories
sites_sf <-
  extractSites(cth_file = raw_ptagis,
               as_sf = T,
               min_date = '20180301',
               configuration = config)

After reviewing the data you may find that some sites should be removed or renamed for your particular analysis. This step can easily be accomplished with tidyverse commands. For our example, we are only interested in interrogation sites upstream of Tumwater Dam, and we want to combine both Tumwater detection sites; TUM and TUF.

sites_sf <-
  sites_sf %>%
  # all sites in Wenatchee have an rkm greater than or equal to 754
  filter(str_detect(rkm, '^754'),
         type != "MRR",            # ignore MRR sites
         site_code != 'LWE') %>%   # ignore a site downstream of Tumwater
  mutate(across(site_code,
                ~ recode(.,
                         "TUF" = "TUM"))) # combine TUM and TUF

The next step is to download the stream spatial layer from the USGS and the National Hydrography Dataset. The stream layer is downloaded using the queryFlowlines() function and is structured as a list that needs to be converted into an sf object.

# download subset of NHDPlus flowlines
nhd_list <-
  queryFlowlines(sites_sf = sites_sf,
                 root_site_code = "TUM",
                 min_strm_order = 2,
                 combine_up_down = TRUE)

# extract flowlines flowlines nhd_list
flowlines <- nhd_list$flowlines

The sites_sf and flowlines objects can now be used to create a simple map of your site locations which can be used to understand how nodes or sites relate to each other across the stream network.

# load ggplot
library(ggplot2)

tum_map <- ggplot() +
  geom_sf(data = flowlines,
          aes(color = as.factor(streamorde),
              size = streamorde)) +
  scale_color_viridis_d(direction = -1,
                        option = "D",
                        name = "Stream\nOrder",
                        end = 0.8) +
  scale_size_continuous(range = c(0.2, 1.2),
                        guide = 'none') +
  geom_sf(data = nhd_list$basin,
          fill = NA,
          lwd = 2) +
  theme_bw() +
  theme(axis.title = element_blank())

tum_map +
  geom_sf(data = sites_sf,
          size = 4,
          color = "black") +
  ggrepel::geom_label_repel(
    data = sites_sf,
    aes(label = site_code, 
        geometry = geometry),
    size = 2,
    stat = "sf_coordinates",
    min.segment.length = 0,
    max.overlaps = 50
  )

These same objects can also be used to construct the parent-child table using buildParentChild(). The parent-child table is necessary to understand whether fish movements are upstream or downstream, and is used frequently in further processing of your PIT-tag observations.

# build parent-child table
parent_child = buildParentChild(sites_sf,
                                flowlines)

After building the parent-child table from the complete tag history data and its extracted sites, we recommend plotting the nodes to double-check all the locations of interest are included. The relationship of nodes to each other can be easily plotted with the plotNodes() function.

# plot parent-child table
plotNodes(parent_child)

Sometimes errors occur in the parent-child table (e.g., due to errors in lat/lon data, points being “snapped” poorly to flowlines). In this case, the parent-child table can be edited using the editParentChild() function. Additional details regarding these functions can be found here.

# edit parent-child table
parent_child <-
  editParentChild(parent_child,
                  fix_list = list(c(NA, "LNF", "ICL"),
                                  c(NA, "ICL", "TUM"),
                                  c(NA, "ICM", "ICL")),
                  switch_parent_child = list(c("ICL", 'TUM'))) %>%
  filter(!is.na(parent))

# plot new parent-child table
plotNodes(parent_child)

Add Nodes (Arrays)

If the configuration file contains multiple nodes for some sites (e.g., a node for each array at a site), then the parent-child table can be expanded to accommodate these nodes using the addParentChildNodes() function.

# add nodes for arrays
parent_child_nodes <-
  addParentChildNodes(parent_child = parent_child,
                      configuration = config)

# plot parent-child w/ nodes
plotNodes(parent_child_nodes)

Movement Paths

PITcleanr provides two functions to determine the detection locations a tag would need to pass between a starting point (e.g., a tagging or release location) and an ending point (e.g., each subsequent upstream or downstream detection point), based on the parent-child table. We refer to these detection locations as a movement path. They are equivalent to following one of the paths in the figures above. The buildPaths() function creates the path, and buildNodeOrder() goes one additional step and assigns the node’s order in the pathway. Each of these functions includes the argument direction which can be used to provide paths and node orders based on upstream (e.g., adults; the default) or downstream (e.g., juvenile) movements. For downstream movement, each node may appear in the resulting tibble multiple times, as there may be multiple ways to reach that node, from different starting points.

# build paths and add node order
node_order <-
  buildNodeOrder(parent_child = parent_child_nodes,
                 direction = "u")

# view node paths and orders
node_order %>%
  arrange(node) %>%
  kable() %>%
  kable_styling(full_width = TRUE,
                bootstrap_options = "striped")

node	node_order	path
CHL_D	2	TUM CHL_D
CHL_U	3	TUM CHL_D CHL_U
CHU_D	4	TUM CHL_D CHL_U CHU_D
CHU_U	5	TUM CHL_D CHL_U CHU_D CHU_U
CHW_D	2	TUM CHW_D
CHW_U	3	TUM CHW_D CHW_U
ICL_D	2	TUM ICL_D
ICL_U	3	TUM ICL_D ICL_U
ICM_D	4	TUM ICL_D ICL_U ICM_D
ICM_U	5	TUM ICL_D ICL_U ICM_D ICM_U
LNF_D	4	TUM ICL_D ICL_U LNF_D
LNF_U	5	TUM ICL_D ICL_U LNF_D LNF_U
LWN_D	3	TUM UWE LWN_D
LWN_U	4	TUM UWE LWN_D LWN_U
NAL_D	3	TUM UWE NAL_D
NAL_U	4	TUM UWE NAL_D NAL_U
NAU_D	5	TUM UWE NAL_D NAL_U NAU_D
NAU_U	6	TUM UWE NAL_D NAL_U NAU_D NAU_U
TUM	1	TUM
UWE	2	TUM UWE
WTL_D	3	TUM UWE WTL_D
WTL_U	4	TUM UWE WTL_D WTL_U

Travel Direction

With the parent-child relationships established, PITcleanr can assign a movement direction between each node where a given tag was detected using the function addDirection().

# add direction based on parent-child table
comp_dir <- 
  addDirection(compress_obs = comp_obs, 
               parent_child = parent_child_nodes, 
               direction = "u")

head(comp_dir, 6) %>%
  kable() %>%
  kable_styling(full_width = TRUE,
                bootstrap_options = "striped")

tag_code	node	slot	event_type_name	n_dets	min_det	max_det	travel_time	node_order	path	direction
384.3B239AC47B	TUM	9	Recapture	1	2018-06-16 15:26:31	2018-06-16 15:26:31	184.0167 mins	1	TUM	start
384.3B239AC47B	UWE	10	Observation	1	2018-06-23 22:03:13	2018-06-23 22:03:13	10476.7000 mins	2	TUM UWE	forward
384.3B239AC47B	NAL_U	11	Observation	1	2018-07-01 00:06:16	2018-07-01 00:06:16	10203.0500 mins	4	TUM UWE NAL_D NAL_U	forward
3DD.003B9DD720	TUM	9	Recapture	1	2018-06-17 09:39:26	2018-06-17 09:39:26	216.6500 mins	1	TUM	start
3DD.003B9DD725	TUM	10	Recapture	1	2018-06-19 09:33:08	2018-06-19 09:33:08	722.4167 mins	1	TUM	start
3DD.003B9DD725	CHU_U	11	Observation	1	2018-06-28 21:34:58	2018-06-28 21:34:58	13681.8333 mins	5	TUM CHL_D CHL_U CHU_D CHU_U	forward

Filter Detections

For some types of analyses, such as Cormack-Jolly-Seber (CJS) models, one of the assumptions is that a tag/individual undergoes only one-way travel (i.e., travel is either all upstream or all downstream). To meet this assumption, individual detections sometimes need to be discarded. For example, an adult salmon undergoing an upstream spawning migration may move up into the Icicle Creek branch of our node network, and then later move back downstream and up another tributary to their final spawning location. In this case, any detections that occurred in Icicle Creek would need to be discarded.

PITcleanr contains a function, filterDetections(), to help determine which tags/individuals fail to meet the one-way travel assumption and need to be examined further. filterDetections() first runs addDirection() from above, and then adds two columns, “auto_keep_obs” and “user_keep_obs”. These are meant to suggest whether each row should be kept (i.e. marked TRUE) or deleted (i.e. marked FALSE). For tags that do meet the one-way travel assumption, both “auto_keep_obs” and “user_keep_obs” columns will be filled. Further, if a fish moves back and forth within the same path, PITcleanr will indicate that only the last detection at each node should be kept.

If a tag fails to meet that assumption (e.g., at least one direction is unknown), the “user_keep_obs” column will be NA for all observations of that tag. In this case, the “auto_keep_obs” column is PITcleanr’s best guess as to which detections should be kept. This guess is based on the assumption that the last and furthest forward-moving detection should be kept, and all detections along that movement path should be kept, which dropping others.

# add direction, and filter detections
comp_filter <-
  filterDetections(compress_obs = comp_obs, 
                   parent_child = parent_child_nodes,
                   # remove any detections after spawning season
                   max_obs_date = "20180930")

# view filtered observations
comp_filter %>%
  select(tag_code:min_det, 
         direction,
         ends_with("keep_obs")) %>%
  # view a strange movement path
  filter(is.na(user_keep_obs)) %>%
  filter(tag_code == tag_code[1]) %>%
  kable() %>%
  kable_styling(full_width = T,
                bootstrap_options = "striped")

tag_code	node	slot	event_type_name	n_dets	min_det	direction	auto_keep_obs	user_keep_obs
3DD.003BC80D81	TUM	8	Recapture	1	2018-06-21 16:14:26	start	TRUE	NA
3DD.003BC80D81	UWE	9	Observation	1	2018-06-29 15:29:19	forward	FALSE	NA
3DD.003BC80D81	NAL_D	10	Observation	1	2018-06-30 01:35:27	forward	FALSE	NA
3DD.003BC80D81	NAL_U	11	Observation	1	2018-06-30 01:35:48	forward	FALSE	NA
3DD.003BC80D81	LWN_U	12	Observation	1	2018-07-05 23:05:02	unknown	FALSE	NA
3DD.003BC80D81	NAL_D	13	Observation	1	2018-07-08 03:06:42	unknown	TRUE	NA
3DD.003BC80D81	UWE	14	Observation	1	2018-07-17 21:52:42	backward	TRUE	NA
3DD.003BC80D81	NAL_U	15	Observation	1	2018-07-20 21:50:13	forward	TRUE	NA
3DD.003BC80D81	NAU_D	16	Observation	1	2018-07-30 22:40:06	forward	TRUE	NA
3DD.003BC80D81	NAU_U	17	Observation	1	2018-07-30 22:41:59	forward	TRUE	NA

The user can save the output from filterDetections(), and fill in all the blanks in the “user_keep_obs” column. The “auto_keep_obs” column is merely a suggestion from PITcleanr, the user should use their own knowledge of the landscape, where the detection sites are, the run timing and general spatial patterns of that population to make final decisions. Once all the blank “user_keep_obs” cells have been filled in, that output can be read back into R and filtered for when user_keep_obs == TRUE, and the analysis can proceed from there.

Below, we move forward trusting PITcleanr’s algorithm.

# after double-checking PITcleanr calls, filter out the final dataset for further analysis
comp_final <-
  comp_filter %>%
  filter(auto_keep_obs = TRUE)

# number of unique tags per node
node_tags <-
  comp_final %>%
  group_by(node) %>%
  summarise(n_tags = n_distinct(tag_code))

# view tags per node
node_tags
#> # A tibble: 22 × 2
#>    node  n_tags
#>    <chr>  <int>
#>  1 CHL_D      1
#>  2 CHL_U    441
#>  3 CHU_D    363
#>  4 CHU_U    297
#>  5 CHW_D     29
#>  6 CHW_U     29
#>  7 ICL_D      6
#>  8 ICL_U      4
#>  9 ICM_D     12
#> 10 ICM_U     12
#> # ℹ 12 more rows

Detection Efficiencies

Oftentimes, we want to use the number of tags detected at a site as a surrogate to all fish passing the site; however, the detection probablity or efficiencies among INT sites is sometimes very different. To understand detection efficiencies across sites, and to quickly expand observed tags into estimated tags passing the site, PITcleanr offers a quick solution. The function estNodeEff uses the corrected observations and node pathways to estimate detecion efficiencies using your dataset.

# estimate detection efficiencies for nodes
node_eff <-
  estNodeEff(capHist_proc = comp_final,
             node_order = node_order)
#> If no node is supplied, calculated for all nodes

# view node efficiencies
node_eff %>%
  kable() %>%
  kable_styling(full_width = T,
                bootstrap_options = "striped")

node	tags_at_node	tags_above_node	tags_resighted	est_tags_at_node	est_tags_se	eff_est	eff_se
TUM	1406	1039	1039	1406	0.0	1.0000000	0.0000000
CHL_D	1	730	1	730	0.0	0.0013699	0.0000000
CHL_U	441	455	166	1206	58.5	0.3656716	0.0177378
CHU_D	363	297	206	523	13.2	0.6940727	0.0175177
CHU_U	297	363	206	523	13.2	0.5678776	0.0143327
CHW_D	29	29	29	29	0.0	1.0000000	0.0000000
CHW_U	29	29	29	29	0.0	1.0000000	0.0000000
ICL_D	6	23	4	33	6.5	0.1818182	0.0358127
ICL_U	4	25	4	25	0.0	0.1600000	0.0000000
ICM_D	12	12	11	13	0.3	0.9230769	0.0213018
ICM_U	12	12	11	13	0.3	0.9230769	0.0213018
LNF_D	7	7	7	7	0.0	1.0000000	0.0000000
LNF_U	7	7	7	7	0.0	1.0000000	0.0000000
UWE	361	256	175	528	16.0	0.6837121	0.0207185
LWN_D	17	14	10	24	2.3	0.7083333	0.0678819
LWN_U	14	17	10	24	2.3	0.5833333	0.0559028
NAL_D	126	195	103	238	6.8	0.5294118	0.0151261
NAL_U	104	196	82	248	9.5	0.4193548	0.0160640
NAU_D	132	130	129	133	0.2	0.9924812	0.0014925
NAU_U	130	132	129	133	0.2	0.9774436	0.0014698
WTL_D	20	27	19	28	0.7	0.7142857	0.0178571
WTL_U	27	20	19	28	0.7	0.9642857	0.0241071

Capture Histories

Finally, various mark-recapture models require a capture history of 1’s and 0’s as inputs. PITcleanr contains two functions that can help convert tag observations into capture history matrices. buildCapHist() uses the compressed observations (whether they’ve been through filterDections() or not) and converts them into capture history matrix that can be used in various R survival packages (e.g., marked). One key will be to ensure the nodes or sites are put in correct order for the user. Note: the drop_nodes argument can be set to FALSE to retain columns for individual nodes.

# build capture histories
cap_hist <-
  buildCapHist(filter_ch = comp_filter,
               parent_child = parent_child,
               configuration = config,
               drop_nodes = T)

# view capture histories
cap_hist
#> # A tibble: 1,406 × 2
#>    tag_code       cap_hist              
#>    <chr>          <chr>                 
#>  1 384.3B239AC47B 1000000000000100010000
#>  2 3DD.003B9DD720 1000000000000000000000
#>  3 3DD.003B9DD725 1000100000000000000000
#>  4 3DD.003B9DDAB0 1000000000000000000000
#>  5 3DD.003B9F123C 1000000000000100000000
#>  6 3DD.003BC7A654 1000000000000100101100
#>  7 3DD.003BC7F0E3 1000000000000100101100
#>  8 3DD.003BC80A3C 1000000000000100010000
#>  9 3DD.003BC80D81 1000000000000100111100
#> 10 3DD.003BC80F20 1000000000000000100000
#> # ℹ 1,396 more rows

The defineCapHistCols() function can be used to identify the site or node associated with the position of each 1 and 0 in the capture history matrix.

col_nodes <- 
  defineCapHistCols(parent_child = parent_child,
                    configuration = config,
                    use_rkm = T)

col_nodes
#>  [1] "TUM"   "ICL_D" "ICL_U" "LNF_D" "LNF_U" "ICM_D" "ICM_U" "CHW_D" "CHW_U"
#> [10] "CHL_D" "CHL_U" "CHU_D" "CHU_U" "UWE"   "NAL_D" "NAL_U" "NAU_D" "NAU_U"
#> [19] "WTL_D" "WTL_U" "LWN_D" "LWN_U"

# clear environment, if desired
rm(list = ls())

Example: Estimate Lemhi Survival

In this example, we’ll estimate the survival of Lemhi River, Idaho juvenile Chinook salmon to and through the Snake/Columbia River hydrosystem. We start by reading in the complete tag histories that we’ve queried from PTAGIS.

# read in PTAGIS detections
ptagis_file <- 
  system.file("extdata",
              "LEMTRP_chnk_cth_2021.csv",
              package = "PITcleanr",
              mustWork = TRUE)

ptagis_cth <- 
  readCTH(ptagis_file) %>%
  arrange(tag_code,
          event_date_time_value)

# qcTagHistory(ptagis_cth,
#              ignore_event_vs_release = T)

The configuration file we will use starts with the standard PTAGIS configuration information. For this example we will not concern ourselves with various arrays or antennas at individual sites, but instead we define nodes by their site codes. The first of two exceptions is any sites downstream of Bonneville Dam (river kilometer 234). Because our CJS model will only extend downstream to Bonneville, we will combine all detections below Bonneville with detections at Bonneville, using site B2J. The other exception is to combine the spillway arrays at Lower Granite Dam (site code “GRS”) with other juvenile bypass antennas (“GRJ”) there, because we are not interested in how fish pass Lower Granite dam, only if they do and are detected somewhere while doing so.

configuration <-
  buildConfig(node_assign = "site") %>% 
  mutate(across(node,
                ~ case_when(as.numeric(str_sub(rkm, 1, 3)) <= 234 ~ "B2J",
                            site_code == "GRS" ~ "GRJ",
                            .default = .))) %>%
  filter(!is.na(node))

Compress

With the capture histories and a configuration file that shows what node each detection is mapped onto, we can compress those capture histories into a more manageable and meaningful object. For more detail about compressing, see the Compressing data vignette.

# compress detections
comp_obs <-
  compress(ptagis_cth,
           configuration = configuration,
           units = "days") %>%
  # drop a couple of duplicate mark records
  filter(event_type_name != "Mark Duplicate")

View the compressed records using DT::datatable().

Show entries

Search:

	tag_code	node	slot	event_type_name	n_dets	min_det	max_det	duration	travel_time
			1 17		1 681	2020-09-24T11:56:00.000Z 2023-08-05T20:53:00.000Z	2020-09-24T11:56:00.000Z 2023-08-05T20:53:00.000Z	0.000 308.242	0.003 423.634
1	384.1B79730415	LEMTRP	1	Mark	1	2022-04-26T09:54:00Z	2022-04-26T09:54:00Z	0
2	384.1B79730415	LRW	2	Observation	2	2022-04-26T17:15:00Z	2022-04-26T17:15:00Z	0	0.306
3	384.1B79730415	LEMTRP	3	Recapture	1	2022-04-27T10:09:00Z	2022-04-27T10:09:00Z	0	0.704
4	384.1B79730415	HYC	4	Observation	2	2022-07-13T23:24:00Z	2022-07-13T23:25:00Z	0.001	77.552
5	384.1B7973117F	LEMTRP	1	Mark	1	2022-04-27T09:59:00Z	2022-04-27T09:59:00Z	0
6	384.1B7973117F	LRW	2	Observation	1	2022-04-27T20:15:00Z	2022-04-27T20:15:00Z	0	0.428
7	384.1B7973117F	0HR	3	Observation	1	2023-06-20T09:25:00Z	2023-06-20T09:25:00Z	0	418.549
8	3D9.1C2DEF3A60	HAYDNC	1	Mark	1	2020-09-24T11:56:00Z	2020-09-24T11:56:00Z	0
9	3D9.1C2DEF3A60	HYC	2	Observation	1	2021-08-20T22:11:00Z	2021-08-20T22:11:00Z	0	330.427
10	3D9.1C2DEF3A60	LEMTRP	3	Recapture	1	2021-08-21T06:43:00Z	2021-08-21T06:43:00Z	0	0.356

Showing 1 to 10 of 12,255 entries

Previous1 2 3 4 5…1,226Next

Build Parent-Child Table

Based on the complete tag histories from PTAGIS, and our slightly modified configuration file, we will determine which sites to include in our CJS model. The function extractSites() in the PITcleanr package will pull out all the nodes where any of our tags were detected and has the ability to turn that into a spatial object (i.e. an sf object) using the latitude and longitude information in the configuration file.

sites_sf <-
  extractSites(ptagis_cth,
               as_sf = T,
               configuration = configuration,
               max_date = "20220630") %>%
  arrange(desc(rkm))

sites_sf
#> Simple feature collection with 26 features and 7 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -2000140 ymin: 2523078 xmax: -1366605 ymax: 2823989
#> Projected CRS: NAD83 / Conus Albers
#> # A tibble: 26 × 8
#>    site_code site_name          site_type type  rkm   site_description node_site
#>    <chr>     <chr>              <chr>     <chr> <chr> <chr>            <chr>    
#>  1 BTL       Lower Big Timber,… Instream… INT   522.… In-stream detec… BTL      
#>  2 BTIMBC    Big Timber Creek,… NA        MRR   522.… River            BTIMBC   
#>  3 LLS       Lemhi Little Spri… Instream… INT   522.… In-stream detec… LLS      
#>  4 LLSPRC    Lemhi Little Spri… NA        MRR   522.… River            LLSPRC   
#>  5 LRW       Lemhi River Weir   Instream… INT   522.… This is an in-s… LRW      
#>  6 HYC       Hayden Creek In-s… Instream… INT   522.… This is an inst… HYC      
#>  7 HYDTRP    Hayden Creek Rota… NA        MRR   522.… TraporWeir       HYDTRP   
#>  8 LEMTRP    Upper Lemhi River… NA        MRR   522.… TraporWeir       LEMTRP   
#>  9 HAYDNC    Hayden Creek, Lem… NA        MRR   522.… River            HAYDNC   
#> 10 KEN       Kenney Creek In-s… Instream… INT   522.… In-stream detec… KEN      
#> # ℹ 16 more rows
#> # ℹ 1 more variable: geometry <POINT [m]>

There are a few screwtraps on the mainstem Salmon River or in a tributary to the Salmon that we are not interested in using. We can filter those by only keeping sites from Lower Granite and downstream, or sites within the Lemhi River basin. Some of these sites are on tributaries of the Lemhi River, and we are not interested in keeping those sites (since we don’t anticipate every fish necessarily moving past those sites).

sites_sf <-
  sites_sf %>% 
  left_join(configuration %>% 
              select(site_code, 
                     rkm_total) %>% 
              distinct()) %>% 
  filter(nchar(rkm) <= 7 |
           (str_detect(rkm, "522.303.416") &
              rkm_total <= rkm_total[site_code == "LEMTRP"] &
              nchar(rkm) == 15),
         !site_code %in% c("HAYDNC",
                           "S3A"))

From here, we could build a parent-child table by hand (see the Parent-Child Tables vignette). To build it using some of the functionality of PITcleanr, continue below.

Once the sites have been finalized, we query the NHDPlus v2 flowlines from USGS, using the queryFlowlines() function in PITcleanr.

nhd_list = queryFlowlines(sites_sf,
                          root_site_code = "LLRTP",
                          min_strm_order = 2)

# pull out the flowlines
flowlines <- nhd_list$flowlines

This figure plots the NHDplus flowlines and our sites.

ggplot() +
  geom_sf(data = flowlines,
          aes(color = as.factor(streamorde))) +
  scale_color_viridis_d(direction = -1,
                        option = "D",
                        name = "Stream\nOrder",
                        end = 0.8) +
  geom_sf(data = sites_sf,
          size = 3,
          color = "black") +
  ggrepel::geom_label_repel(
    data = sites_sf,
    aes(label = site_code,
        geometry = geometry),
    size = 1.5,
    stat = "sf_coordinates",
    min.segment.length = 0,
    max.overlaps = 100
  ) +
  theme_bw() +
  theme(axis.title = element_blank(),
        legend.position = "bottom")

PITcleanr can make use of some of the covariates associated with each reach in the NHDPlus_v2 layer to determine which sites are downstream or upstream of one another. This is how we construct the parent-child table, using the buildParentChild() function.

# construct parent-child table
parent_child <-
  sites_sf %>%
  buildParentChild(flowlines,
                   rm_na_parent = T,
                   add_rkm = F) %>% 
  editParentChild(fix_list = list(c("LMJ", "GRJ", "GOJ"))) %>%
  select(parent,
         child)
#> These child locations: B2J,
#>  had no parent location and were removed from the table.

By default, PITcleanr assumes the parent is downstream of the child. We can switch this direction with some clever coding and renaming.

# flip direction of parent/child relationships
parent_child <-
  parent_child %>%
  select(p = parent,
         c = child) %>%
  mutate(parent = c,
         child = p) %>%
  select(parent,
         child)

plotNodes(parent_child)

Filter Strange Capture Histories

Once the detections have been compressed, and we can describe the relationship between nodes / sites with a parent-child table, we can assign direction of a tag between two nodes. For straightforward CJS models, one of the assumptions is that fish (or whatever creature is being studied) strictly moves forward through the detection points. When a CJS model describes survival through time, that is an easy assumption to make (i.e. animals are always only moving forward in time), but when performing a space-for-time type CJS as we are discussing here, that assumption could be violated.

One way to deal with non-straightforward movements is to filter the detections to only include those that appear to move in a straightforward manner. PITcleanr can help the user identify which tags have non-straightforward movements, and suggest what detections to keep, and which ones to filter out. PITcleanr contains a function, filterDetections(), to help determine which tags/individuals fail to meet the one-way travel assumption and need to be examined further. filterDetections() first runs addDirection(), and then adds two columns, “auto_keep_obs” and “user_keep_obs”. These are meant to indicate whether each row should be kept (i.e. marked TRUE) or deleted (i.e. marked FALSE). For tags that do meet the one-way travel assumption, both “auto_keep_obs” and “user_keep_obs” columns will be filled. If a fish moves back and forth along the same path, PITcleanr will indicate that only the last detection at each node should be kept.

In this analysis, we are also only concerned with tag movements after the fish has passed the Upper Lemhi rotary screw trap (site code LEMTRP). As mentioned above, some fish may have been tagged further upstream, prior to reaching LEMTRP, but we are not interested in their journey prior to LEMTRP, so we would like to filter observations of each tag prior to their detection at LEMTRP.

PITcleanr can perform multiple steps under a single wrapper function, prepWrapper(). prepWrapper can:

start with a compressed detection file (compress_obs), OR
start with the raw complete tag history (cth_file) and then compress them.
filter out detections prior to a user-defined minimum date (min_obs_date) OR
filter out detections prior to a user-defined starting node (start_node)
add direction of movement to each detection (requires parent_child)
note which tags have detections that follow one-way movements, and which tags do not
- and suggest which detections to keep and which to filter out if needed,
- including detections that occur after a user-defined maximum date (max_obs_date)
add a column showing all the nodes where a tag was detected if that would be helpful to the user (add_tag_detects; uses the extractTagObs() function from PITcleanr)
save the output as an .xlsx or .csv file for the user to peruse with Excel or a similar program (save_file = TRUE and file_name can be set).

For our purposes, we will use the compressed detections and the parent-child table we built previously, filter for detections prior to LEMTRP and add all of the tag detections.

prepped_df <- prepWrapper(compress_obs = comp_obs,
                          parent_child = parent_child,
                          start_node = "LEMTRP",
                          add_tag_detects = T,
                          save_file = F)

In our example, we are content deleting the observations that PITcleanr has flagged with auto_keep_obs = FALSE.

prepped_df <- 
  prepped_df %>%
  mutate(
    across(user_keep_obs,
           ~ if_else(is.na(.),
                     auto_keep_obs,
                     .))) %>% 
  filter(user_keep_obs)

Form Capture Histories

Often, the inputs to a CJS model include a capture history matrix, with one row per tag, and columns of 0s and 1s describing if each tag was detected at each time period or node. We can easily construct such capture history matrices with the buildCapHist() function, using all the detections we determined to keep.

# translate PIT tag observations into capture histories, one per tag
cap_hist <- 
  buildCapHist(prepped_df,
               parent_child = parent_child,
               configuration = configuration)

# show an example
cap_hist
#> # A tibble: 3,622 × 2
#>    tag_code       cap_hist    
#>    <chr>          <chr>       
#>  1 384.1B79730415 100000000000
#>  2 384.1B7973117F 100000000000
#>  3 3D9.1C2DEF3A60 111010000000
#>  4 3DD.003D57F3DD 101001000000
#>  5 3DD.003D57F3DE 100000000000
#>  6 3DD.003D57F3DF 100000000000
#>  7 3DD.003D57F3E0 110000000000
#>  8 3DD.003D57F3E1 110000000000
#>  9 3DD.003D57F3E2 101011000000
#> 10 3DD.003D57F3E3 101000000000
#> # ℹ 3,612 more rows

To determine which position in the cap_hist string corresponds to which node, the user can run the defineCapHistCols() function (which is called internally within buildCapHist()).

# to find out the node associated with each column
col_nodes <- 
  defineCapHistCols(parent_child = parent_child,
                    configuration = configuration)
col_nodes
#>  [1] "LEMTRP" "EVU"    "EVL"    "LLRTP"  "LLR"    "GRJ"    "GOJ"    "LMJ"   
#>  [9] "ICH"    "MCJ"    "JDJ"    "B2J"

Users have the option to keep each detection node as a separate column when creating the capture histories, by setting drop_nodes = F.

cap_hist2 <- 
  buildCapHist(prepped_df,
               parent_child = parent_child,
               configuration = configuration,
               drop_nodes = F)
cap_hist2
#> # A tibble: 3,622 × 14
#>    tag_code      cap_hist LEMTRP   EVU   EVL LLRTP   LLR   GRJ   GOJ   LMJ   ICH
#>    <chr>         <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 384.1B797304… 1000000…      1     0     0     0     0     0     0     0     0
#>  2 384.1B797311… 1000000…      1     0     0     0     0     0     0     0     0
#>  3 3D9.1C2DEF3A… 1110100…      1     1     1     0     1     0     0     0     0
#>  4 3DD.003D57F3… 1010010…      1     0     1     0     0     1     0     0     0
#>  5 3DD.003D57F3… 1000000…      1     0     0     0     0     0     0     0     0
#>  6 3DD.003D57F3… 1000000…      1     0     0     0     0     0     0     0     0
#>  7 3DD.003D57F3… 1100000…      1     1     0     0     0     0     0     0     0
#>  8 3DD.003D57F3… 1100000…      1     1     0     0     0     0     0     0     0
#>  9 3DD.003D57F3… 1010110…      1     0     1     0     1     1     0     0     0
#> 10 3DD.003D57F3… 1010000…      1     0     1     0     0     0     0     0     0
#> # ℹ 3,612 more rows
#> # ℹ 3 more variables: MCJ <dbl>, JDJ <dbl>, B2J <dbl>

If the user wanted to find out how many tags were detected at each node, something like the following could be run:

cap_hist2 %>% 
  select(-c(tag_code,
            cap_hist)) %>% 
  colSums()
#> LEMTRP    EVU    EVL  LLRTP    LLR    GRJ    GOJ    LMJ    ICH    MCJ    JDJ 
#>   3622   1335   1948    257   1273    716     89     65      7     20     24 
#>    B2J 
#>     87

or directly from the prepared detection data:

prepped_df %>% 
  group_by(node) %>% 
  summarize(n_tags = n_distinct(tag_code),
            .groups = "drop") %>% 
  mutate(across(node,
                ~ factor(.,
                         levels = col_nodes))) %>% 
  arrange(node)
#> # A tibble: 12 × 2
#>    node   n_tags
#>    <fct>   <int>
#>  1 LEMTRP   3622
#>  2 EVU      1335
#>  3 EVL      1948
#>  4 LLRTP     257
#>  5 LLR      1273
#>  6 GRJ       716
#>  7 GOJ        89
#>  8 LMJ        65
#>  9 ICH         7
#> 10 MCJ        20
#> 11 JDJ        24
#> 12 B2J        87

Fit a Cormack-Jolly-Seber Model

Now that we’ve wrangled all our detections into capture histories, we’re finally ready to fit a Cormack-Jolly-Seber (CJS) model. There are many options for doing this, but in this example, we’ll use the R package marked. Note that fitting a model like this (or nearly any kind of model) is outside the scope of the PITcleanr package. PITcleanr will help the user wrangle their data into a format that can be utilized in further analyses.

Please note that this example (which is being developed into a vignette) does not cover any of the details or assumptions behind a CJS model. Before attempting any analysis, the user should be aware of what’s involved and how to interpret the results.

The marked package requires the capture histories to be processed and design data created. Setting the formula for Phi (survival parameters) and p (detection parameters) to be ~ time ensures that a separate survival and detection parameter will be estimated between and for each site. For further information about using the marked package, please see the package’s CRAN site, the package vignette, or the paper describing it (Laake, Johnson and Conn (2013)).

# load needed package
library(marked)

# process capture history into necessary format
cjs_proc <-
  cap_hist %>%
  select(tag_code,
         ch = cap_hist) %>% 
  as.data.frame() %>%
  process.data(model = "CJS")

# create design data
cjs_ddl <-
  make.design.data(cjs_proc)

# set model construction
Phi.time <- list(formula = ~ time)
p.time <- list(formula = ~ time)

# fit model
mod1 <- crm(data = cjs_proc,
            ddl = cjs_ddl,
            model.parameters = list(Phi = Phi.time,
                                    p = p.time),
            hessian = T)

Survival and detection parameter estimates can be extracted, and the user can label them by the reaches and sites they refer to.

# pull out parameter estimates
est_preds <- predict(mod1) %>%
  map(.f = as_tibble)

est_preds$Phi <-
  est_preds$Phi %>%
  left_join(parent_child %>%
              left_join(buildNodeOrder(parent_child),
                        by = join_by(child == node)) %>% 
              arrange(node_order) %>%
              mutate(occ = node_order - 1) %>% 
              select(occ, parent, child) %>%
              unite(col = "reach",
                    parent,
                    child,
                    sep = "_"),
            by = join_by(occ)) %>% 
  relocate(reach,
           .after = occ)

est_preds$p <-
  est_preds$p %>%
  mutate(site = rev(parent_child$child)) %>% 
  relocate(site,
           .after = occ)

Now, we can plot estimate survival (Phi) and examine the estimates in a table.

est_preds$Phi %>%
  ggplot(aes(x = fct_reorder(reach, est_preds$Phi$occ), y = estimate)) +
  geom_point() +
  geom_errorbar(aes(ymin = lcl, ymax = ucl)) +
  labs(x = "Location",
       y = "Survival")


# examine the estimates
# Survival
est_preds$Phi %>% 
  mutate(across(where(is.numeric),
         ~ round(., digits = 3))) %>%
  kable() %>%
  kable_styling(full_width = TRUE,
                bootstrap_options = "striped")

time	occ	reach	estimate	se	lcl	ucl
1	1	LEMTRP_EVU	0.824	0.010	0.803	0.843
2	2	EVU_EVL	0.900	0.013	0.872	0.922
3	3	EVL_LLRTP	1.000	0.047	0.000	1.000
4	4	LLRTP_LLR	0.901	0.048	0.759	0.963
5	5	LLR_GRJ	0.661	0.044	0.570	0.741
6	6	GRJ_GOJ	0.999	0.029	0.000	1.000
7	7	GOJ_LMJ	0.999	0.021	0.000	1.000
8	8	LMJ_ICH	0.648	0.324	0.102	0.968
9	9	ICH_MCJ	0.984	0.191	0.000	1.000
10	10	MCJ_JDJ	0.683	0.440	0.039	0.991
11	11	JDJ_B2J	0.598	0.945	0.001	1.000

We can also look at the estimates of detection probabilities.

est_preds$p %>%
  ggplot(aes(x = fct_reorder(site, est_preds$p$occ), y = estimate)) +
  geom_point() +
  geom_errorbar(aes(ymin = lcl, ymax = ucl)) +
  labs(x = "Location",
       y = "Detection")


# Detection
est_preds$p %>% 
  mutate(across(where(is.numeric),
         ~ round(., digits = 3))) %>%
  kable() %>%
  kable_styling(full_width = TRUE,
                bootstrap_options = "striped")

time	occ	site	estimate	se	lcl	ucl
2	2	GOJ	0.447	0.010	0.428	0.467
3	3	EVU	0.725	0.011	0.704	0.745
4	4	EVL	0.096	0.007	0.082	0.111
5	5	LLRTP	0.526	0.017	0.493	0.559
6	6	LLR	0.448	0.029	0.391	0.506
7	7	GRJ	0.056	0.007	0.044	0.071
8	8	LMJ	0.041	0.006	0.031	0.053
9	9	ICH	0.007	0.004	0.002	0.023
10	10	MCJ	0.020	0.010	0.007	0.055
11	11	JDJ	0.035	0.018	0.012	0.096
12	12	B2J	0.209	0.343	0.005	0.939

In this example, a user might be interested in the cumulative survival of fish from the Upper Lemhi screw trap to certain dams like Lower Granite, McNary, and John Day. These can be obtained by multiplying survival estimates up to the location of interest. Because the final survival and detection probabilities are confounded in this type of CJS model, we can’t estimate cummulative survival to Bonneville dam (B2J) unless we added detection sites either within the estuary or of adults returning to Bonneville.

# cumulative survival to Lower Granite, McNary and John Day dams
est_preds$Phi %>% 
  # group_by(life_stage) %>% 
  summarize(lower_granite = prod(estimate[occ <= 6]),
            mcnary = prod(estimate[occ <= 10]),
            john_day = prod(estimate[occ <= 11]))
#> # A tibble: 1 × 3
#>   lower_granite mcnary john_day
#>           <dbl>  <dbl>    <dbl>
#> 1         0.441  0.191    0.114

Warnings

Estimating fish survival is complex and requires multiple assumptions. Be careful following this script and reproducing results with your own data. We do not guarantee the steps listed above will yield perfect results, and they will require adjustments to fit your needs. The steps were only provided to get you started with PITcleanr. Please reach out to the authors if you have questions or need assistance using the package.

Ryan N. Kinzer

Mike Ackerman

Kevin See

Feb 1, 2024

Introduction

Installation

Load Packages

Query Site Data

Interrogation Sites

Mark/Recapture/Recovery Sites

All PTAGIS Sites

Test Tags

Compressing Tag Observations

PTAGIS Complete Tag Histories

Read PTAGIS CTH

Quality Control

Single Tag Codes and Marking Files

Compress Detections

Fish Pathways

Parent-Child Tables

Add Nodes (Arrays)

Movement Paths

Travel Direction

Filter Detections

Detection Efficiencies

Capture Histories

Example: Estimate Lemhi Survival

Compress

Build Parent-Child Table

Filter Strange Capture Histories

Form Capture Histories

Fit a Cormack-Jolly-Seber Model

Warnings

PITcleanr Workshop

Ryan N. Kinzer

Mike Ackerman

Kevin See

Feb 1, 2024

Introduction

Installation

Load Packages

Query Site Data

Interrogation Sites

Mark/Recapture/Recovery Sites

All PTAGIS Sites

Test Tags

Compressing Tag Observations

PTAGIS Complete Tag Histories

Tag List

Read PTAGIS CTH

Quality Control

Single Tag Codes and Marking Files

Compress Detections

Fish Pathways

Parent-Child Tables

Add Nodes (Arrays)

Movement Paths

Travel Direction

Filter Detections

Detection Efficiencies

Capture Histories

Example: Estimate Lemhi Survival

Compress

Build Parent-Child Table

Filter Strange Capture Histories

Form Capture Histories

Fit a Cormack-Jolly-Seber Model

Warnings