Introduction

Raw PIT tag detections contain a record of every detection of a tag code on an individual antenna, with the associate time-stamp. This data may be queried in primarily two different ways: 1) all of the tag codes detected at a site or antenna, or 2) all the detections of a particular tag code at a variety of sites or antennas. Regardless of what type of query is used, the user is generally dealing with a row of data for every single detection, identified by tag code, site, antenna and date/time (and detections may contain additional data as well).

This data may be stored in a variety of formats or databases. Often these can be broken out into data stored in PTAGIS (The Columbia Basin PIT Tag Information System), and data stored in non-PTAGIS databases (e.g. BioLogic) or local formats (e.g. .xlsx, .txt, .log). One of the first steps in analyzing this data is to read it into R and standardize the format. PITcleanr contains a workhorse function, readCTH() (for “read complete tag history”), that can accomplish this across many data formats.

PTAGIS Data

The Columbia Basin PIT Tag Information System (PTAGIS) is the centralized regional database for PIT-tag detections within the Columbia River Basin. It contains a record of each detection of every PIT tag, including the initial detection, or “mark”, when the tag is implanted in the fish, detections on PIT-tag antennas, recaptures (e.g. at weirs) and recoveries (e.g. carcass surveys). It contains a record of every individual detection, which means potentially multiple records of a tag being detected on the same antenna over and over e.g., in the case that it is not moving. Therefore, querying PTAGIS for all of these detections leads to a wealth of data, which can be unwieldy for the user. PITcleanr aims to compress that data to a more manageable size, without losing any of the information contained in that dataset.

Complete Capture History

PITcleanr starts with a complete capture history query from PTAGIS for a select group of tags of interest. The user will need to compile this list of tags themselves, ideally in a .txt file with one row per tag number, to make it easy to upload to a PTAGIS query.

For convenience, we’ve included one such file with PITcleanr, which is saved to the user’s computer when PITcleanr is installed. The file, “TUM_chnk_tags_2018.txt”, contains tag IDs for Chinook salmon adults implanted with PIT tags at Tumwater Dam in 2018. The following code can be used to find the path to this example file. The user can use this as a template for creating their own tag list as well.

system.file("extdata", 
            "TUM_chnk_tags_2018.txt", 
            package = "PITcleanr",
            mustWork = TRUE)

The example file of tag codes is very simple:

#> # A tibble: 1,406 × 1
#>    X1            
#>    <chr>         
#>  1 3DD.00777C5CEC
#>  2 3DD.00777C5E34
#>  3 3DD.00777C7728
#>  4 3DD.00777C8493
#>  5 3DD.00777C9185
#>  6 3DD.00777C91C3
#>  7 3DD.00777CE6B8
#>  8 3DD.00777CEB31
#>  9 3DD.00777CEF93
#> 10 3DD.00777CEFC1
#> # ℹ 1,396 more rows

Once the user has created their own tag list, or located this example one, they can go to the PTAGIS homepage to query the complete tag histories for those tags. The complete tag history query is available under Advanced Reporting, which requires a free account from PTAGIS. From the homepage, click on “Login/Register”, and either login to an existing account, or click “Create a new account” to create one. Once logged in, scroll down the dashboard page to the Advanced Reporting Links section. PTAGIS allows users to save reports/queries to be run again. For users who plan to utilize PITcleanr more than once, it saves a lot of time to build the initial query and then save it into the user’s PTAGIS account. It is then available through the “My Reports” link. To create a new query, click on “Query Builder”, or “Advanced Reporting Home Page” and then ““Create Query Builder2 Report”. From here, choose “Complete Tag History” from the list of possible reports.

There are several query indices on the left side of the query builder, but for the purposes of PITcleanr only a few are needed. First, under “1 Select Attributes” the following attributes are required to work with PITcleanr:

  • Tag
  • Event Site Code
  • Event Date Time
  • Antenna
  • Antenna Group Configuration

This next group of attributes are not required, but are highly recommended:

  • Mark Species
  • Mark Rear Type
  • Event Type
  • Event Site Type
  • Event Release Site Code
  • Event Release Date Time

Simply move these attributes over from the “Available” column to the “Selected:” column on the page by selecting them and clicking on the right arrow between the “Available” and “Selected” boxes. Other fields of interest to the user may be included as well (e.g. Event Length), and will be included as extra columns in the query output.

The only other required index is “2 Select Metrics”, but that can remain as the default, “CTH Count”, which provides one record for each event recorded per tag.

Set up a filter for specific tags by next navigating to the “27 Tag Code - List or Text File” on the left. After selecting “Tag” under “Attributes:”, click on “Import file…”. Simply upload the .txt file containing your PIT tag codes of interest, or alternatively, feel free to use the “TUM_chnk_tags_2018.txt” file provided with PITcleanr. After choosing the file, click on “Import” and the tag list will be loaded (delimited by semi-colons). Click “OK”.

Under “Report Message Name:” near the bottom, name the query something appropriate, such as “TUM_chnk_cth_2018”, and select “Run Report”. Once the query has successfully completed, the output can be exported as a .csv file (e.g. “TUM_chnk_cth_2018.csv”). Simply click on the “Export” icon near the top, which will open a new page, and select the default settings:

  • Export: Whole report
  • CSV file format
  • Export Report Title: unchecked
  • Export filter details: unchecked
  • Remove extra column: Yes

And click “Export”, again.

PITcleanr includes several example files to help users understand the appropriate format of certain files, and to provide demonstrations of various functions. The system.file function locates the file path to the subdirectory and file contained with a certain package. One such example is PTAGIS output for the Tumwater Chinook tags from 2018. Using similar code (system.file), the user can set the file path to this file, and store it as a new object ptagis_file.

ptagis_file = system.file("extdata", 
                          "TUM_chnk_cth_2018.csv",
                          package = "PITcleanr",
                          mustWork = TRUE)

Alternatively, if the user has run a query from PTAGIS as described above, they could set ptagis_file to the path and file name of the .csv they downloaded.

# As an example, set path to PTAGIS query output
ptagis_file = "C:/Users/USER_NAME_HERE/Downloads/TUM_chnk_cth_2018.csv"

Note that in our example file, there are 13501 detections (rows) for 1406 unique tags, matching the number of tags in our example tag list “TUM_chnk_tags_2018.txt”. For a handful of those tags, in our case 221, there is only a “Mark” detection i.e., that tag was never detected again after the fish was tagged and released. For the remaining tags, many of them were often detected at the same site and sometimes on the same antenna. Data like this, while full of information, can be difficult to analyze efficiently. To illustrate, here is an example of some of the raw data for a single tag:

Tag Code Event Site Code Value Event Date Time Value Antenna Id Antenna Group Configuration Value Mark Species Name Mark Rear Type Name Event Type Name Event Site Type Description Event Release Site Code Code Event Release Date Time Value Cth Count
3DD.0077767AC6 TUM 2018-06-22 06:40:12 NA 0 Chinook Hatchery Reared Mark Dam TUMFBY 2018-06-22 06:40:12 1
3DD.0077767AC6 NAU 2018-07-06 22:04:04 44 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-07-06 22:04:28 42 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-07-08 22:26:21 45 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAL 2018-07-13 21:52:09 66 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAL 2018-07-16 00:13:19 66 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-26 00:01:36 46 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-26 00:02:03 41 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-26 10:23:12 41 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-26 10:23:19 44 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-26 15:00:36 45 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-26 15:00:48 43 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-26 15:01:12 46 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-26 15:01:32 42 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-26 17:38:46 44 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-26 19:00:05 46 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-26 19:00:38 42 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-27 16:09:31 43 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-27 16:09:39 46 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1
3DD.0077767AC6 NAU 2018-08-27 16:51:01 44 100 Chinook Hatchery Reared Observation Instream Remote Detection System NA NA 1

PITcleanr provides a function to read in this kind of complete capture history, called readCTH(). This function ensures the column names are consistent for subsequent PITcleanr functions, and provides one function to read in PTAGIS and non-PTAGIS data and return similarly formatted output.

ptagis_cth <- readCTH(cth_file = ptagis_file,
                      file_type = "PTAGIS")

Mark Data File

PITcleanr also allows the user to query PTAGIS for an MRR data file. Many projects are set up to record all the tagging information for an entire season, or part of a season from a single site in one file, which is uploaded to PTAGIS. This file can be used to determine the list of tag codes a user may be interested in. The queryMRRDataFile will pull this information from PTAGIS, using either the XML information contained in P4 files, or the older file structure (text file with various possible file extensions). The only requirement is the file name. For example, to pull this data for tagging at Tumwater in 2018, use the following code:

tum_2018_mrr <- queryMRRDataFile("NBD-2018-079-001.xml")

Depending on how comprehensive that MRR data file is, a user might filter this data.frame for Spring Chinook by focusing on the species run rear type of “11”, and tags that were not collected for broodstock, or otherwise killed. An example of some of the data contained in MRR files like this is shown below.

tum_2018_mrr |> 
    # filter for Spring Chinook tags
  filter(str_detect(species_run_rear_type, 
                    "^11"),
         # filter out fish removed for broodstock collection
         str_detect(conditional_comments,
                    "BR",
                    negate = T),
         # filter out fish with other mortality
         str_detect(conditional_comments,
                    "[:space:]M[:space:]",
                    negate = T),
         str_detect(conditional_comments,
                    "[:space:]M$",
                    negate = T)) |> 
  slice(1:10)
capture_method conditional_comments event_date event_site event_type life_stage mark_method mark_temperature migration_year organization pit_tag release_date release_site release_temperature sequence_number species_run_rear_type tagger text_comments length weight location_rkm_ext second_pit_tag
LADDER AD CW MA MT RF 2018-06-04 12:25:20 TUM Mark Adult HAND 15.0 2018 WDFW 3DD.007791C27F 2018-06-04 12:25:20 TUMFBY 15.0 40 11H HUGHES M DNA 1 840 67.0 NA NA
LADDER MA MT RF 2018-06-04 12:48:45 TUM Mark Adult HAND 15.0 2018 WDFW 3DD.007791D334 2018-06-04 12:48:45 TUMFBY 15.0 41 11W HUGHES M DNA 2 810 54.0 NA NA
LADDER AD CW MA MT RF 2018-06-05 10:02:49 TUM Mark Adult HAND 15.0 2018 WDFW 3DD.00779071E6 2018-06-05 10:02:49 TUMFBY 15.0 42 11H HUGHES M DNA 3 NA NA NA NA
LADDER AD CW FE MT RF 2018-06-05 10:13:09 TUM Mark Adult HAND 15.0 2018 WDFW 3DD.0077923D3A 2018-06-05 10:13:09 TUMFBY 15.0 43 11H HUGHES M DNA 4 NA NA NA NA
LADDER AD CW FE MT RF 2018-06-06 10:36:29 TUM Mark Adult HAND 15.0 2018 WDFW 3DD.0077925482 2018-06-06 10:36:29 TUMFBY 15.0 46 11H HUGHES M DNA 5 NA NA NA NA
LADDER FE MT RF 2018-06-06 11:02:32 TUM Mark Adult HAND 15.0 2018 WDFW 3DD.007791714B 2018-06-06 11:02:32 TUMFBY 15.0 48 11W HUGHES M DNA 7 710 NA NA NA
LADDER AD CW FE MT PC RF 2018-06-06 11:09:53 TUM Mark Adult HAND 15.0 2018 WDFW 3DD.00779103AD 2018-06-06 11:09:53 TUMFBY 15.0 49 11H HUGHES M DNA 8 NA NA NA NA
LADDER AI CW FE MT RF 2018-06-06 11:15:46 TUM Mark Adult HAND 15.0 2018 WDFW 3DD.007791522F 2018-06-06 11:15:46 TUMFBY 15.0 50 11H HUGHES M DNA 9 NA NA NA NA
LADDER AD CW MA MT RF 2018-06-06 11:26:49 TUM Mark Adult HAND 15.0 2018 WDFW 3DD.007791FCA9 2018-06-06 11:26:49 TUMFBY 15.0 52 11H HUGHES M DNA 11 NA NA NA NA
LADDER AI CW FE MT RF 2018-06-06 11:31:34 TUM Mark Adult HAND 15.0 2018 WDFW 3DD.007790FA60 2018-06-06 11:31:34 TUMFBY 15.0 53 11H HUGHES M DNA 12 NA NA NA NA

Non-PTAGIS Data

Not all PIT tag data is in PTAGIS. This section will show the user how to read in PIT tag data from a variety of other sources, including BioLogic csv files as well as raw files (e.g. .xlsx, .log and .txt file extensions) downloaded directly from PIT tag readers. The readCTH() function is able to read in all of these formats, but the user must indicate what format each file is in.

There are many ways a user might store their various files, and many ways to script how to read them into R. In this vignette, we will suggest one way to do this, but alternatives certainly exist.

Storing Data Files

In this example, we have accumulated a number of different PIT tag detection files, in a variety of formats. We have saved them all in the same folder, and used some naming convention to indicate what format they are. Files from PTAGIS have “PTAGIS” in the file name somewhere, Biologic csv files have “BIOLOGIC” in the file name somewhere, and we assume the other files are raw files (with a mixture of .xlsx, .log and .txt file extensions).

For raw detection files, which usually come from a single reader, the file does not contain information about what site code those observations come from. The user can add that themselves within R, or the readCTH() function will assume the site code is the first part of the file name before the first underscore, _. Therefore, if the user adopts a naming convention that includes the site code at the beginning of every raw detection file, PITcleanr will assign the correct site code. Otherwise, the user can overwrite the site codes manually after running readCTH().

We set the name and path of the folder with all the detection data (detection_folder), and used the following script to compile a data.frame of various file names and file types. A user can set the detection_folder path to point to where they have stored all their detection files.

detection_folder = system.file("extdata", 
                                "non_ptagis",
                                package = "PITcleanr",
                                mustWork = TRUE) |> 
  paste0("/")

file_df <- tibble(file_nm = list.files(detection_folder)) |> 
  filter(str_detect(file_nm, "\\.")) |> 
  mutate(file_type = if_else(str_detect(file_nm, "BIOLOGIC"),
                             "Biologic_csv",
                             if_else(str_detect(file_nm, "PTAGIS"),
                                     "PTAGIS",
                                     "raw")))

file_df
#> # A tibble: 5 × 2
#>   file_nm                                file_type   
#>   <chr>                                  <chr>       
#> 1 0LL_tagobs_BIOLOGIC_07302022.csv       Biologic_csv
#> 2 NODENAME_01_00439.log                  raw         
#> 3 PTAGIS_lemhi_remainingsites_012023.csv PTAGIS      
#> 4 SUB2_10.20.22.xlsx                     raw         
#> 5 TESTSITE_test.txt                      raw

Reading in Detections

The following script uses the path of the detection_folder, the various file names inside that folder, the file type associated with each file, and the readCTH() function to read all the detections into R and consolidate them into a single data.frame, all_obs. It contains a quick check (try and "try-error") to ensure that if any particular file has trouble being read, the others are still included.

# read them all in
all_obs <-
  file_df |> 
  mutate(obs_df = map2(file_nm,
                       file_type,
                       .f = function(x, y) {
                         try(readCTH(cth_file = paste0(detection_folder, x), 
                                     file_type = y))
                       })) |> 
  mutate(cls_obs = map_chr(obs_df, .f = function(x) class(x)[1])) |> 
  filter(cls_obs != "try-error") |> 
  select(-cls_obs) |> 
  unnest(obs_df) |> 
  distinct()

Alternatively, the user can run readCTH() on each detection file, saving each R object separately, and then use bind_rows() to merge all the detections together. One reason to do this might be to manipulate or filter certain files before merging them.

The readCTH() also contains an argument to filter out test tags, so their detection is not included in the results. This option involves setting the test_tag_prefix argument to the initial alphanumeric characters in the test tags. By default, this is set to “3E7”. If the user wishes to keep all test tag detections, set test_tag_prefix = NA. If the user wishes to pull out only test tag detections, they can use the function readTestTag() instead of readCTH(), and set the test_tag_prefix to the appropriate code (e.g. “3E7”).