vignettes/read_in_data.Rmd
read_in_data.Rmd
Raw PIT tag detections contain a record of every detection of a tag code on an individual antenna, with the associate time-stamp. This data may be queried in primarily two different ways: 1) all of the tag codes detected at a site or antenna, or 2) all the detections of a particular tag code at a variety of sites or antennas. Regardless of what type of query is used, the user is generally dealing with a row of data for every single detection, identified by tag code, site, antenna and date/time (and detections may contain additional data as well).
This data may be stored in a variety of formats or databases. Often
these can be broken out into data stored in PTAGIS (The Columbia Basin PIT Tag
Information System), and data stored in non-PTAGIS databases
(e.g. BioLogic) or local formats (e.g. .xlsx, .txt, .log). One of the
first steps in analyzing this data is to read it into R and standardize
the format. PITcleanr
contains a workhorse function,
readCTH()
(for “read complete tag history”), that can
accomplish this across many data formats.
The Columbia Basin PIT Tag Information System (PTAGIS) is the centralized regional
database for PIT-tag detections within the Columbia River Basin. It
contains a record of each detection of every PIT tag, including the
initial detection, or “mark”, when the tag is implanted in the fish,
detections on PIT-tag antennas, recaptures (e.g. at weirs) and
recoveries (e.g. carcass surveys). It contains a record of every
individual detection, which means potentially multiple records of a tag
being detected on the same antenna over and over e.g., in the case that
it is not moving. Therefore, querying PTAGIS for all of these detections
leads to a wealth of data, which can be unwieldy for the user.
PITcleanr
aims to compress that data to a more manageable
size, without losing any of the information contained in that
dataset.
PITcleanr
starts with a complete capture history query
from PTAGIS for a select group of tags
of interest. The user will need to compile this list of tags themselves,
ideally in a .txt file with one row per tag number, to make it easy to
upload to a PTAGIS query.
For convenience, we’ve included one such file with
PITcleanr
, which is saved to the user’s computer when
PITcleanr
is installed. The file, “TUM_chnk_tags_2018.txt”,
contains tag IDs for Chinook salmon adults implanted with PIT tags at
Tumwater Dam in 2018. The following code can be used to find the path to
this example file. The user can use this as a template for creating
their own tag list as well.
system.file("extdata",
"TUM_chnk_tags_2018.txt",
package = "PITcleanr",
mustWork = TRUE)
The example file of tag codes is very simple:
#> # A tibble: 1,406 × 1
#> X1
#> <chr>
#> 1 3DD.00777C5CEC
#> 2 3DD.00777C5E34
#> 3 3DD.00777C7728
#> 4 3DD.00777C8493
#> 5 3DD.00777C9185
#> 6 3DD.00777C91C3
#> 7 3DD.00777CE6B8
#> 8 3DD.00777CEB31
#> 9 3DD.00777CEF93
#> 10 3DD.00777CEFC1
#> # ℹ 1,396 more rows
Once the user has created their own tag list, or located this example
one, they can go to the PTAGIS
homepage to query the complete tag histories for those tags. The
complete tag history query is available under Advanced
Reporting, which requires a free account from PTAGIS. From the homepage, click on “Login/Register”,
and either login to an existing account, or click “Create a new account”
to create one. Once logged in, scroll down the dashboard page to the
Advanced Reporting Links section. PTAGIS allows users to save
reports/queries to be run again. For users who plan to utilize
PITcleanr
more than once, it saves a lot of time to build
the initial query and then save it into the user’s PTAGIS account. It is
then available through the “My Reports” link. To create a new query,
click on “Query Builder”, or “Advanced Reporting Home Page” and then
““Create Query Builder2 Report”. From here, choose “Complete Tag
History” from the list of possible reports.
There are several query indices on the left side of the query
builder, but for the purposes of PITcleanr
only a few are
needed. First, under “1 Select Attributes” the following attributes are
required to work with PITcleanr
:
This next group of attributes are not required, but are highly recommended:
Simply move these attributes over from the “Available” column to the “Selected:” column on the page by selecting them and clicking on the right arrow between the “Available” and “Selected” boxes. Other fields of interest to the user may be included as well (e.g. Event Length), and will be included as extra columns in the query output.
The only other required index is “2 Select Metrics”, but that can remain as the default, “CTH Count”, which provides one record for each event recorded per tag.
Set up a filter for specific tags by next navigating to the “27 Tag
Code - List or Text File” on the left. After selecting “Tag” under
“Attributes:”, click on “Import file…”. Simply upload the .txt file
containing your PIT tag codes of interest, or alternatively, feel free
to use the “TUM_chnk_tags_2018.txt” file provided with
PITcleanr
. After choosing the file, click on “Import” and
the tag list will be loaded (delimited by semi-colons). Click “OK”.
Under “Report Message Name:” near the bottom, name the query something appropriate, such as “TUM_chnk_cth_2018”, and select “Run Report”. Once the query has successfully completed, the output can be exported as a .csv file (e.g. “TUM_chnk_cth_2018.csv”). Simply click on the “Export” icon near the top, which will open a new page, and select the default settings:
And click “Export”, again.
PITcleanr
includes several example files to help users
understand the appropriate format of certain files, and to provide
demonstrations of various functions. The system.file
function locates the file path to the subdirectory and file contained
with a certain package. One such example is PTAGIS output for the
Tumwater Chinook tags from 2018. Using similar code
(system.file
), the user can set the file path to this file,
and store it as a new object ptagis_file
.
ptagis_file = system.file("extdata",
"TUM_chnk_cth_2018.csv",
package = "PITcleanr",
mustWork = TRUE)
Alternatively, if the user has run a query from PTAGIS as described
above, they could set ptagis_file
to the path and file name
of the .csv they downloaded.
# As an example, set path to PTAGIS query output
ptagis_file = "C:/Users/USER_NAME_HERE/Downloads/TUM_chnk_cth_2018.csv"
Note that in our example file, there are 13501 detections (rows) for 1406 unique tags, matching the number of tags in our example tag list “TUM_chnk_tags_2018.txt”. For a handful of those tags, in our case 221, there is only a “Mark” detection i.e., that tag was never detected again after the fish was tagged and released. For the remaining tags, many of them were often detected at the same site and sometimes on the same antenna. Data like this, while full of information, can be difficult to analyze efficiently. To illustrate, here is an example of some of the raw data for a single tag:
Tag Code | Event Site Code Value | Event Date Time Value | Antenna Id | Antenna Group Configuration Value | Mark Species Name | Mark Rear Type Name | Event Type Name | Event Site Type Description | Event Release Site Code Code | Event Release Date Time Value | Cth Count |
---|---|---|---|---|---|---|---|---|---|---|---|
3DD.0077767AC6 | TUM | 2018-06-22 06:40:12 | NA | 0 | Chinook | Hatchery Reared | Mark | Dam | TUMFBY | 2018-06-22 06:40:12 | 1 |
3DD.0077767AC6 | NAU | 2018-07-06 22:04:04 | 44 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-07-06 22:04:28 | 42 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-07-08 22:26:21 | 45 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAL | 2018-07-13 21:52:09 | 66 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAL | 2018-07-16 00:13:19 | 66 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-26 00:01:36 | 46 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-26 00:02:03 | 41 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-26 10:23:12 | 41 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-26 10:23:19 | 44 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-26 15:00:36 | 45 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-26 15:00:48 | 43 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-26 15:01:12 | 46 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-26 15:01:32 | 42 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-26 17:38:46 | 44 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-26 19:00:05 | 46 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-26 19:00:38 | 42 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-27 16:09:31 | 43 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-27 16:09:39 | 46 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
3DD.0077767AC6 | NAU | 2018-08-27 16:51:01 | 44 | 100 | Chinook | Hatchery Reared | Observation | Instream Remote Detection System | NA | NA | 1 |
PITcleanr
provides a function to read in this kind of
complete capture history, called readCTH()
. This function
ensures the column names are consistent for subsequent
PITcleanr
functions, and provides one function to read in
PTAGIS and non-PTAGIS data and return similarly formatted output.
ptagis_cth <- readCTH(cth_file = ptagis_file,
file_type = "PTAGIS")
PITcleanr
also allows the user to query PTAGIS for an
MRR data file. Many projects are set up to record all the tagging
information for an entire season, or part of a season from a single site
in one file, which is uploaded to PTAGIS. This file can be used to
determine the list of tag codes a user may be interested in. The
queryMRRDataFile
will pull this information from PTAGIS,
using either the XML information contained in P4 files, or the older
file structure (text file with various possible file extensions). The
only requirement is the file name. For example, to pull this data for
tagging at Tumwater in 2018, use the following code:
tum_2018_mrr <- queryMRRDataFile("NBD-2018-079-001.xml")
Depending on how comprehensive that MRR data file is, a user might filter this data.frame for Spring Chinook by focusing on the species run rear type of “11”, and tags that were not collected for broodstock, or otherwise killed. An example of some of the data contained in MRR files like this is shown below.
tum_2018_mrr |>
# filter for Spring Chinook tags
filter(str_detect(species_run_rear_type,
"^11"),
# filter out fish removed for broodstock collection
str_detect(conditional_comments,
"BR",
negate = T),
# filter out fish with other mortality
str_detect(conditional_comments,
"[:space:]M[:space:]",
negate = T),
str_detect(conditional_comments,
"[:space:]M$",
negate = T)) |>
slice(1:10)
capture_method | conditional_comments | event_date | event_site | event_type | life_stage | mark_method | mark_temperature | migration_year | organization | pit_tag | release_date | release_site | release_temperature | sequence_number | species_run_rear_type | tagger | text_comments | length | weight | location_rkm_ext | second_pit_tag |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LADDER | AD CW MA MT RF | 2018-06-04 12:25:20 | TUM | Mark | Adult | HAND | 15.0 | 2018 | WDFW | 3DD.007791C27F | 2018-06-04 12:25:20 | TUMFBY | 15.0 | 40 | 11H | HUGHES M | DNA 1 | 840 | 67.0 | NA | NA |
LADDER | MA MT RF | 2018-06-04 12:48:45 | TUM | Mark | Adult | HAND | 15.0 | 2018 | WDFW | 3DD.007791D334 | 2018-06-04 12:48:45 | TUMFBY | 15.0 | 41 | 11W | HUGHES M | DNA 2 | 810 | 54.0 | NA | NA |
LADDER | AD CW MA MT RF | 2018-06-05 10:02:49 | TUM | Mark | Adult | HAND | 15.0 | 2018 | WDFW | 3DD.00779071E6 | 2018-06-05 10:02:49 | TUMFBY | 15.0 | 42 | 11H | HUGHES M | DNA 3 | NA | NA | NA | NA |
LADDER | AD CW FE MT RF | 2018-06-05 10:13:09 | TUM | Mark | Adult | HAND | 15.0 | 2018 | WDFW | 3DD.0077923D3A | 2018-06-05 10:13:09 | TUMFBY | 15.0 | 43 | 11H | HUGHES M | DNA 4 | NA | NA | NA | NA |
LADDER | AD CW FE MT RF | 2018-06-06 10:36:29 | TUM | Mark | Adult | HAND | 15.0 | 2018 | WDFW | 3DD.0077925482 | 2018-06-06 10:36:29 | TUMFBY | 15.0 | 46 | 11H | HUGHES M | DNA 5 | NA | NA | NA | NA |
LADDER | FE MT RF | 2018-06-06 11:02:32 | TUM | Mark | Adult | HAND | 15.0 | 2018 | WDFW | 3DD.007791714B | 2018-06-06 11:02:32 | TUMFBY | 15.0 | 48 | 11W | HUGHES M | DNA 7 | 710 | NA | NA | NA |
LADDER | AD CW FE MT PC RF | 2018-06-06 11:09:53 | TUM | Mark | Adult | HAND | 15.0 | 2018 | WDFW | 3DD.00779103AD | 2018-06-06 11:09:53 | TUMFBY | 15.0 | 49 | 11H | HUGHES M | DNA 8 | NA | NA | NA | NA |
LADDER | AI CW FE MT RF | 2018-06-06 11:15:46 | TUM | Mark | Adult | HAND | 15.0 | 2018 | WDFW | 3DD.007791522F | 2018-06-06 11:15:46 | TUMFBY | 15.0 | 50 | 11H | HUGHES M | DNA 9 | NA | NA | NA | NA |
LADDER | AD CW MA MT RF | 2018-06-06 11:26:49 | TUM | Mark | Adult | HAND | 15.0 | 2018 | WDFW | 3DD.007791FCA9 | 2018-06-06 11:26:49 | TUMFBY | 15.0 | 52 | 11H | HUGHES M | DNA 11 | NA | NA | NA | NA |
LADDER | AI CW FE MT RF | 2018-06-06 11:31:34 | TUM | Mark | Adult | HAND | 15.0 | 2018 | WDFW | 3DD.007790FA60 | 2018-06-06 11:31:34 | TUMFBY | 15.0 | 53 | 11H | HUGHES M | DNA 12 | NA | NA | NA | NA |
Not all PIT tag data is in PTAGIS. This section will show the user
how to read in PIT tag data from a variety of other sources, including
BioLogic csv files as well as raw files (e.g. .xlsx, .log and .txt file
extensions) downloaded directly from PIT tag readers. The
readCTH()
function is able to read in all of these formats,
but the user must indicate what format each file is in.
There are many ways a user might store their various files, and many ways to script how to read them into R. In this vignette, we will suggest one way to do this, but alternatives certainly exist.
In this example, we have accumulated a number of different PIT tag detection files, in a variety of formats. We have saved them all in the same folder, and used some naming convention to indicate what format they are. Files from PTAGIS have “PTAGIS” in the file name somewhere, Biologic csv files have “BIOLOGIC” in the file name somewhere, and we assume the other files are raw files (with a mixture of .xlsx, .log and .txt file extensions).
For raw detection files, which usually come from a single reader, the
file does not contain information about what site code those
observations come from. The user can add that themselves within R, or
the readCTH()
function will assume the site code is the
first part of the file name before the first underscore, _
.
Therefore, if the user adopts a naming convention that includes the site
code at the beginning of every raw detection file,
PITcleanr
will assign the correct site code. Otherwise, the
user can overwrite the site codes manually after running
readCTH()
.
We set the name and path of the folder with all the detection data
(detection_folder
), and used the following script to
compile a data.frame of various file names and file types. A user can
set the detection_folder
path to point to where they have
stored all their detection files.
detection_folder = system.file("extdata",
"non_ptagis",
package = "PITcleanr",
mustWork = TRUE) |>
paste0("/")
file_df <- tibble(file_nm = list.files(detection_folder)) |>
filter(str_detect(file_nm, "\\.")) |>
mutate(file_type = if_else(str_detect(file_nm, "BIOLOGIC"),
"Biologic_csv",
if_else(str_detect(file_nm, "PTAGIS"),
"PTAGIS",
"raw")))
file_df
#> # A tibble: 5 × 2
#> file_nm file_type
#> <chr> <chr>
#> 1 0LL_tagobs_BIOLOGIC_07302022.csv Biologic_csv
#> 2 NODENAME_01_00439.log raw
#> 3 PTAGIS_lemhi_remainingsites_012023.csv PTAGIS
#> 4 SUB2_10.20.22.xlsx raw
#> 5 TESTSITE_test.txt raw
The following script uses the path of the
detection_folder
, the various file names inside that
folder, the file type associated with each file, and the
readCTH()
function to read all the detections into R and
consolidate them into a single data.frame, all_obs
. It
contains a quick check (try
and "try-error"
)
to ensure that if any particular file has trouble being read, the others
are still included.
# read them all in
all_obs <-
file_df |>
mutate(obs_df = map2(file_nm,
file_type,
.f = function(x, y) {
try(readCTH(cth_file = paste0(detection_folder, x),
file_type = y))
})) |>
mutate(cls_obs = map_chr(obs_df, .f = function(x) class(x)[1])) |>
filter(cls_obs != "try-error") |>
select(-cls_obs) |>
unnest(obs_df) |>
distinct()
Alternatively, the user can run readCTH()
on each
detection file, saving each R object separately, and then use
bind_rows()
to merge all the detections together. One
reason to do this might be to manipulate or filter certain files before
merging them.
The readCTH()
also contains an argument to filter out
test tags, so their detection is not included in the results. This
option involves setting the test_tag_prefix
argument to the
initial alphanumeric characters in the test tags. By default, this is
set to “3E7”. If the user wishes to keep all test tag detections, set
test_tag_prefix = NA
. If the user wishes to pull out
only test tag detections, they can use the function
readTestTag()
instead of readCTH()
, and set
the test_tag_prefix
to the appropriate code
(e.g. “3E7”).