Reading PIT Tag Data Into R

Introduction

Raw PIT tag detections contain a record of every detection of a tag code on an individual antenna, with the associate time-stamp. This data may be queried in primarily two different ways: 1) all of the tag codes detected at a site or antenna, or 2) all the detections of a particular tag code at a variety of sites or antennas. Regardless of what type of query is used, the user is generally dealing with a row of data for every single detection, identified by tag code, site, antenna and date/time (and detections may contain additional data as well).

This data may be stored in a variety of formats or databases. Often these can be broken out into data stored in PTAGIS (The Columbia Basin PIT Tag Information System), and data stored in non-PTAGIS databases (e.g. BioLogic) or local formats (e.g. .xlsx, .txt, .log). One of the first steps in analyzing this data is to read it into R and standardize the format. PITcleanr contains a workhorse function, readCTH() (for “read complete tag history”), that can accomplish this across many data formats.

PTAGIS Data

The Columbia Basin PIT Tag Information System (PTAGIS) is the centralized regional database for PIT-tag detections within the Columbia River Basin. It contains a record of each detection of every PIT tag, including the initial detection, or “mark”, when the tag is implanted in the fish, detections on PIT-tag antennas, recaptures (e.g. at weirs) and recoveries (e.g. carcass surveys). It contains a record of every individual detection, which means potentially multiple records of a tag being detected on the same antenna over and over e.g., in the case that it is not moving. Therefore, querying PTAGIS for all of these detections leads to a wealth of data, which can be unwieldy for the user. PITcleanr aims to compress that data to a more manageable size, without losing any of the information contained in that dataset.

Complete Capture History

PITcleanr starts with a complete capture history query from PTAGIS for a select group of tags of interest. The user will need to compile this list of tags themselves, ideally in a .txt file with one row per tag number, to make it easy to upload to a PTAGIS query.

For convenience, we’ve included one such file with PITcleanr, which is saved to the user’s computer when PITcleanr is installed. The file, “TUM_chnk_tags_2018.txt”, contains tag IDs for Chinook salmon adults implanted with PIT tags at Tumwater Dam in 2018. The following code can be used to find the path to this example file. The user can use this as a template for creating their own tag list as well.

system.file("extdata", 
            "TUM_chnk_tags_2018.txt", 
            package = "PITcleanr",
            mustWork = TRUE)

The example file of tag codes is very simple:

#> # A tibble: 1,406 × 1
#>    X1            
#>    <chr>         
#>  1 3DD.00777C5CEC
#>  2 3DD.00777C5E34
#>  3 3DD.00777C7728
#>  4 3DD.00777C8493
#>  5 3DD.00777C9185
#>  6 3DD.00777C91C3
#>  7 3DD.00777CE6B8
#>  8 3DD.00777CEB31
#>  9 3DD.00777CEF93
#> 10 3DD.00777CEFC1
#> # ℹ 1,396 more rows

Once the user has created their own tag list, or located this example one, they can go to the PTAGIS homepage to query the complete tag histories for those tags. The complete tag history query is available under Advanced Reporting, which requires a free account from PTAGIS. From the homepage, click on “Login/Register”, and either login to an existing account, or click “Create a new account” to create one. Once logged in, scroll down the dashboard page to the Advanced Reporting Links section. PTAGIS allows users to save reports/queries to be run again. For users who plan to utilize PITcleanr more than once, it saves a lot of time to build the initial query and then save it into the user’s PTAGIS account. It is then available through the “My Reports” link. To create a new query, click on “Query Builder”, or “Advanced Reporting Home Page” and then ““Create Query Builder2 Report”. From here, choose “Complete Tag History” from the list of possible reports.

There are several query indices on the left side of the query builder, but for the purposes of PITcleanr only a few are needed. First, under “1 Select Attributes” the following attributes are required to work with PITcleanr:

Tag
Event Site Code
Event Date Time
Antenna
Antenna Group Configuration

This next group of attributes are not required, but are highly recommended:

Mark Species
Mark Rear Type
Event Type
Event Site Type
Event Release Site Code
Event Release Date Time

Simply move these attributes over from the “Available” column to the “Selected:” column on the page by selecting them and clicking on the right arrow between the “Available” and “Selected” boxes. Other fields of interest to the user may be included as well (e.g. Event Length), and will be included as extra columns in the query output.

The only other required index is “2 Select Metrics”, but that can remain as the default, “CTH Count”, which provides one record for each event recorded per tag.

Set up a filter for specific tags by next navigating to the “27 Tag Code - List or Text File” on the left. After selecting “Tag” under “Attributes:”, click on “Import file…”. Simply upload the .txt file containing your PIT tag codes of interest, or alternatively, feel free to use the “TUM_chnk_tags_2018.txt” file provided with PITcleanr. After choosing the file, click on “Import” and the tag list will be loaded (delimited by semi-colons). Click “OK”.

Under “Report Message Name:” near the bottom, name the query something appropriate, such as “TUM_chnk_cth_2018”, and select “Run Report”. Once the query has successfully completed, the output can be exported as a .csv file (e.g. “TUM_chnk_cth_2018.csv”). Simply click on the “Export” icon near the top, which will open a new page, and select the default settings:

Export: Whole report
CSV file format
Export Report Title: unchecked
Export filter details: unchecked
Remove extra column: Yes

And click “Export”, again.

PITcleanr includes several example files to help users understand the appropriate format of certain files, and to provide demonstrations of various functions. The system.file function locates the file path to the subdirectory and file contained with a certain package. One such example is PTAGIS output for the Tumwater Chinook tags from 2018. Using similar code (system.file), the user can set the file path to this file, and store it as a new object ptagis_file.

ptagis_file = system.file("extdata", 
                          "TUM_chnk_cth_2018.csv",
                          package = "PITcleanr",
                          mustWork = TRUE)

Alternatively, if the user has run a query from PTAGIS as described above, they could set ptagis_file to the path and file name of the .csv they downloaded.

# As an example, set path to PTAGIS query output
ptagis_file = "C:/Users/USER_NAME_HERE/Downloads/TUM_chnk_cth_2018.csv"

Note that in our example file, there are 13501 detections (rows) for 1406 unique tags, matching the number of tags in our example tag list “TUM_chnk_tags_2018.txt”. For a handful of those tags, in our case 221, there is only a “Mark” detection i.e., that tag was never detected again after the fish was tagged and released. For the remaining tags, many of them were often detected at the same site and sometimes on the same antenna. Data like this, while full of information, can be difficult to analyze efficiently. To illustrate, here is an example of some of the raw data for a single tag:

Tag Code	Event Site Code Value	Event Date Time Value	Antenna Id	Antenna Group Configuration Value	Mark Species Name	Mark Rear Type Name	Event Type Name	Event Site Type Description	Event Release Site Code Code	Event Release Date Time Value	Cth Count
3DD.0077767AC6	TUM	2018-06-22 06:40:12	NA	0	Chinook	Hatchery Reared	Mark	Dam	TUMFBY	2018-06-22 06:40:12	1
3DD.0077767AC6	NAU	2018-07-06 22:04:04	44	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-07-06 22:04:28	42	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-07-08 22:26:21	45	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAL	2018-07-13 21:52:09	66	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAL	2018-07-16 00:13:19	66	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-26 00:01:36	46	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-26 00:02:03	41	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-26 10:23:12	41	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-26 10:23:19	44	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-26 15:00:36	45	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-26 15:00:48	43	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-26 15:01:12	46	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-26 15:01:32	42	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-26 17:38:46	44	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-26 19:00:05	46	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-26 19:00:38	42	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-27 16:09:31	43	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-27 16:09:39	46	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1
3DD.0077767AC6	NAU	2018-08-27 16:51:01	44	100	Chinook	Hatchery Reared	Observation	Instream Remote Detection System	NA	NA	1

PITcleanr provides a function to read in this kind of complete capture history, called readCTH(). This function ensures the column names are consistent for subsequent PITcleanr functions, and provides one function to read in PTAGIS and non-PTAGIS data and return similarly formatted output.

ptagis_cth <- readCTH(cth_file = ptagis_file,
                      file_type = "PTAGIS")

Mark Data File

PITcleanr also allows the user to query PTAGIS for an MRR data file. Many projects are set up to record all the tagging information for an entire season, or part of a season from a single site in one file, which is uploaded to PTAGIS. This file can be used to determine the list of tag codes a user may be interested in. The queryMRRDataFile will pull this information from PTAGIS, using either the XML information contained in P4 files, or the older file structure (text file with various possible file extensions). The only requirement is the file name. For example, to pull this data for tagging at Tumwater in 2018, use the following code:

tum_2018_mrr <- queryMRRDataFile("NBD-2018-079-001.xml")

Depending on how comprehensive that MRR data file is, a user might filter this data.frame for Spring Chinook by focusing on the species run rear type of “11”, and tags that were not collected for broodstock, or otherwise killed. An example of some of the data contained in MRR files like this is shown below.

tum_2018_mrr |> 
    # filter for Spring Chinook tags
  filter(str_detect(species_run_rear_type, 
                    "^11"),
         # filter out fish removed for broodstock collection
         str_detect(conditional_comments,
                    "BR",
                    negate = T),
         # filter out fish with other mortality
         str_detect(conditional_comments,
                    "[:space:]M[:space:]",
                    negate = T),
         str_detect(conditional_comments,
                    "[:space:]M$",
                    negate = T)) |> 
  slice(1:10)

capture_method	conditional_comments	event_date	event_site	event_type	life_stage	mark_method	mark_temperature	migration_year	organization	pit_tag	release_date	release_site	release_temperature	sequence_number	species_run_rear_type	tagger	text_comments	length	weight	location_rkm_ext	second_pit_tag
LADDER	AD CW MA MT RF	2018-06-04 12:25:20	TUM	Mark	Adult	HAND	15.0	2018	WDFW	3DD.007791C27F	2018-06-04 12:25:20	TUMFBY	15.0	40	11H	HUGHES M	DNA 1	840	67.0	NA	NA
LADDER	MA MT RF	2018-06-04 12:48:45	TUM	Mark	Adult	HAND	15.0	2018	WDFW	3DD.007791D334	2018-06-04 12:48:45	TUMFBY	15.0	41	11W	HUGHES M	DNA 2	810	54.0	NA	NA
LADDER	AD CW MA MT RF	2018-06-05 10:02:49	TUM	Mark	Adult	HAND	15.0	2018	WDFW	3DD.00779071E6	2018-06-05 10:02:49	TUMFBY	15.0	42	11H	HUGHES M	DNA 3	NA	NA	NA	NA
LADDER	AD CW FE MT RF	2018-06-05 10:13:09	TUM	Mark	Adult	HAND	15.0	2018	WDFW	3DD.0077923D3A	2018-06-05 10:13:09	TUMFBY	15.0	43	11H	HUGHES M	DNA 4	NA	NA	NA	NA
LADDER	AD CW FE MT RF	2018-06-06 10:36:29	TUM	Mark	Adult	HAND	15.0	2018	WDFW	3DD.0077925482	2018-06-06 10:36:29	TUMFBY	15.0	46	11H	HUGHES M	DNA 5	NA	NA	NA	NA
LADDER	FE MT RF	2018-06-06 11:02:32	TUM	Mark	Adult	HAND	15.0	2018	WDFW	3DD.007791714B	2018-06-06 11:02:32	TUMFBY	15.0	48	11W	HUGHES M	DNA 7	710	NA	NA	NA
LADDER	AD CW FE MT PC RF	2018-06-06 11:09:53	TUM	Mark	Adult	HAND	15.0	2018	WDFW	3DD.00779103AD	2018-06-06 11:09:53	TUMFBY	15.0	49	11H	HUGHES M	DNA 8	NA	NA	NA	NA
LADDER	AI CW FE MT RF	2018-06-06 11:15:46	TUM	Mark	Adult	HAND	15.0	2018	WDFW	3DD.007791522F	2018-06-06 11:15:46	TUMFBY	15.0	50	11H	HUGHES M	DNA 9	NA	NA	NA	NA
LADDER	AD CW MA MT RF	2018-06-06 11:26:49	TUM	Mark	Adult	HAND	15.0	2018	WDFW	3DD.007791FCA9	2018-06-06 11:26:49	TUMFBY	15.0	52	11H	HUGHES M	DNA 11	NA	NA	NA	NA
LADDER	AI CW FE MT RF	2018-06-06 11:31:34	TUM	Mark	Adult	HAND	15.0	2018	WDFW	3DD.007790FA60	2018-06-06 11:31:34	TUMFBY	15.0	53	11H	HUGHES M	DNA 12	NA	NA	NA	NA

Non-PTAGIS Data

Not all PIT tag data is in PTAGIS. This section will show the user how to read in PIT tag data from a variety of other sources, including BioLogic csv files as well as raw files (e.g. .xlsx, .log and .txt file extensions) downloaded directly from PIT tag readers. The readCTH() function is able to read in all of these formats, but the user must indicate what format each file is in.

There are many ways a user might store their various files, and many ways to script how to read them into R. In this vignette, we will suggest one way to do this, but alternatives certainly exist.

Storing Data Files

In this example, we have accumulated a number of different PIT tag detection files, in a variety of formats. We have saved them all in the same folder, and used some naming convention to indicate what format they are. Files from PTAGIS have “PTAGIS” in the file name somewhere, Biologic csv files have “BIOLOGIC” in the file name somewhere, and we assume the other files are raw files (with a mixture of .xlsx, .log and .txt file extensions).

For raw detection files, which usually come from a single reader, the file does not contain information about what site code those observations come from. The user can add that themselves within R, or the readCTH() function will assume the site code is the first part of the file name before the first underscore, _. Therefore, if the user adopts a naming convention that includes the site code at the beginning of every raw detection file, PITcleanr will assign the correct site code. Otherwise, the user can overwrite the site codes manually after running readCTH().

We set the name and path of the folder with all the detection data (detection_folder), and used the following script to compile a data.frame of various file names and file types. A user can set the detection_folder path to point to where they have stored all their detection files.

detection_folder = system.file("extdata", 
                                "non_ptagis",
                                package = "PITcleanr",
                                mustWork = TRUE) |> 
  paste0("/")

file_df <- tibble(file_nm = list.files(detection_folder)) |> 
  filter(str_detect(file_nm, "\\.")) |> 
  mutate(file_type = if_else(str_detect(file_nm, "BIOLOGIC"),
                             "Biologic_csv",
                             if_else(str_detect(file_nm, "PTAGIS"),
                                     "PTAGIS",
                                     "raw")))

file_df
#> # A tibble: 5 × 2
#>   file_nm                                file_type   
#>   <chr>                                  <chr>       
#> 1 0LL_tagobs_BIOLOGIC_07302022.csv       Biologic_csv
#> 2 NODENAME_01_00439.log                  raw         
#> 3 PTAGIS_lemhi_remainingsites_012023.csv PTAGIS      
#> 4 SUB2_10.20.22.xlsx                     raw         
#> 5 TESTSITE_test.txt                      raw

Reading in Detections

The following script uses the path of the detection_folder, the various file names inside that folder, the file type associated with each file, and the readCTH() function to read all the detections into R and consolidate them into a single data.frame, all_obs. It contains a quick check (try and "try-error") to ensure that if any particular file has trouble being read, the others are still included.

# read them all in
all_obs <-
  file_df |> 
  mutate(obs_df = map2(file_nm,
                       file_type,
                       .f = function(x, y) {
                         try(readCTH(cth_file = paste0(detection_folder, x), 
                                     file_type = y))
                       })) |> 
  mutate(cls_obs = map_chr(obs_df, .f = function(x) class(x)[1])) |> 
  filter(cls_obs != "try-error") |> 
  select(-cls_obs) |> 
  unnest(obs_df) |> 
  distinct()

Alternatively, the user can run readCTH() on each detection file, saving each R object separately, and then use bind_rows() to merge all the detections together. One reason to do this might be to manipulate or filter certain files before merging them.

The readCTH() also contains an argument to filter out test tags, so their detection is not included in the results. This option involves setting the test_tag_prefix argument to the initial alphanumeric characters in the test tags. By default, this is set to “3E7”. If the user wishes to keep all test tag detections, set test_tag_prefix = NA. If the user wishes to pull out only test tag detections, they can use the function readTestTag() instead of readCTH(), and set the test_tag_prefix to the appropriate code (e.g. “3E7”).

Kevin See