When dealing with detections of individual tags, the user often is
interested in which locations are connected to which other locations
along the stream network. One way to capture this information is through
the construction of a parent-child table describing the “relationships”
among locations. In a parent-child table, each row consists of a parent
location, and a child location that is connected directly to that parent
location. By default, PITcleanr
assigns parent-child
relationships as moving in an upstream direction, so a single parent may
have multiple child locations upstream, if the stream network branches
upstream of it. However, each child should only have a single parent, as
we are assuming a lack of looped connections in our stream network. If
the user is interested in a downstream parent-child relationship, the
parent
and child
designations in the table can
be manually switched. As an example, assuming only upstream movement, a
weir may be considered a parent and each of its next upstream arrays
considered children. A location with no detection sites further upstream
has no children, but is presumably the child of a downstream location.
All of the parent-child relationships among locations in a watershed can
describe the potential movements by an individual tag (moving from
parent to child, to the next child, etc.).
For example, in the Wenatchee River example, the parent-child table looks like this.
parent_child
#> # A tibble: 14 × 2
#> parent child
#> <chr> <chr>
#> 1 TUM CHL
#> 2 CHL CHU
#> 3 TUM CHW
#> 4 TUM ICL
#> 5 ICL ICM
#> 6 ICM ICU
#> 7 ICL LNF
#> 8 TUM PES
#> 9 PES PEU
#> 10 TUM UWE
#> 11 UWE LWN
#> 12 UWE NAL
#> 13 NAL NAU
#> 14 UWE WTL
PITcleanr
can plot these relationships graphically,
showing the relationships between parent and child sites and which ones
are connected along a single “path”. This is done using the
plotNodes()
function.
A user can construct a parent-child table by hand, using a .csv file
with column names parent
and child
. Each line
in the figure above is represented by one row in the parent-child table
listing the parent site and child site. If the user is interested in
upstream movement, the parent will be the downstream site, and every
child will have a single parent (although a parent may have multiple
children sites). If the interest is in downstream movement, then the
parent will be the upstream site.
When dealing with large number of sites, and many possible
connections, it can be useful to take advantage of some of
PITcleanr
’s functions to construct a parent-child table.
These functions include:
extractSites()
: based on the complete tag history
(either file path and name, or the result of readCTH()
),
pulls out which sites had detections. If sites are not in PTAGIS, a
configuration file should be supplied with latitude and longitudes.queryFlowlines()
: using an sf
point object
of sites, queries the NHDPlusv2 stream layers that connect those
sites.buildParentChild()
: Based on the output from
extractSites()
and queryFlowlines()
, this
function constructs the parent-child table using information in the
NHDPlusv2 layer about which hydrosequences are downstream of one
another.PITcleanr
constructs the parent-child relationship by
joining a spatial (sf
) point object of sites with the
flowlines queried via queryFlowlines()
. The NHDPlus layer
that is returned contains a unique identifier, or hydrosequence, for
every segment, as well as the identifier of the hydrosequence
immediately upstream and downstream. Using this information,
PITcleanr
can identify the next downstream site from every
known location (using the findDwnstrmSite()
function), and
thus construct the parent child table through the
buildParentChild()
function. By default,
buildParentChild()
returns a tibble identifying every
parent-child pair, as well as the hydrosequence joined to the parent and
child location. If the argument add_rkm
is set to
TRUE
, PITcleanr
will query the PTAGIS metadata
again, and attach the river kilometer (or rkm) for each parent and child
location. If the sites are not in PTAGIS, the user can join any
attributes they wish using their own configuration file.
cth_file = system.file("extdata",
"TUM_chnk_cth_2018.csv",
package = "PITcleanr",
mustWork = TRUE)
cth_df <- readCTH(cth_file)
The queryPtagisMeta()
and buildConfig()
functions in PITcleanr
return information from
all INT and MRR sites in PTAGIS. However, the user may
only be interested in detections from site codes found within their
complete tag history output, e.g., your cth_file
. The
extractSites()
function does just that: extracts the site
codes found in the complete tag history. In addition, the detections can
be filtered by a minimum and/or maximum detection date, and the results
are returned as either a tibble, or as a simple (spatial) feature
sf
object. Setting the min_date
argument could
be useful if the user is not interested in detections at sites prior to
your study period e.g., detections that occur prior to fish arriving at
your tagging or release location.
In this example, we create a new object sites_sf
, return
it as an sf
object (by setting as_sf = T
) and
only return sites from those detections that occurred after May 1, 2018.
We also extract sites only from the Wenatchee subbasin and remove a
couple sites that we perhaps don’t care about. More information on
simple features (sf
objects) can be found here.
sites_sf = extractSites(cth_file,
as_sf = T,
min_date = "20180501")
# focus on sites within Wenatchee subbasin, and drop a few sites we don't care about
sites_sf <- sites_sf %>%
# all sites in the Wenatchee have a river kilometer that starts with 754
filter(str_detect(rkm, "^754."),
type != "MRR",
site_code != "LWE") |>
mutate(across(site_code,
~ recode(.,
"TUF" = "TUM")))
The user could create their own sf
object of detection
sites, either by hand in R, or by using GIS software to create a
shapefile or geopackage that can then be read into R using the
st_read()
function in the sf
package. If the
user chooses this path, the file must contain at least a column called
site_code
, whose values should be the same site codes found
in the configuration file. Other columns are optional. The
extractSites()
function returns an sf
object
with the following columns (gleaned from the configuration file):
site_code
site_name
site_type
type
rkm
site_description
extractSites()
will also accept a configuration file as
an argument, if the user wants to pass one of their own in. Such a file
should contain all the columns listed above.
The user may also be interested in getting the flowlines (i.e., the
stream or river network), for their sites of interest.
PITcleanr
provide the function
queryFlowlines()
to accomplish that.
queryFlowlines()
downloads an NHDPlus
v2 stream layer from USGS using the suggested
nhdplusTools
R package. It requires the spatial location of
sites as an sf
object (such as the output of
extractSites()
), and a site code identified as the “root”
site. The root site might correspond with your tagging or release
location and is provided to the root_site_code
argument.
The function starts from the root_site_code
and downloads
all flowlines upstream from there, with a minimum stream order set by
min_strm_order
.
If there are sites downstream of the root_site_code
site
in the users site list, downstream flowlines will also be downloaded. By
default, the upstream and downstream flowlines will be combined into a
single sf
object. However, if the user would like to keep
them separated, they can set the argument combine_up_down
to FALSE
, and the downstream flowlines will be returned as
a separate element.
The queryFlowlines()
function returns a list consisting
of:
flowlines
: the flowlines upstream of the
root_site_code
(and possibly the downstream ones as
well)basin
: the polygon containing the upstream
flowlinesThe default option (combine_up_down = TRUE
) like to
combine the flowlines upstream and downstream of the root site. If the
user sets combine_up_down = FALSE
, the function will return
a third element in the list called:
dwn_flowlines
: the flowlines downstream of the
root_site_code
.Depending on the spatial extent of your flowlines, the
queryFlowlines()
function may take awhile. More information
on the nhdplusTools
R package can be found here.
# query the flowlines
nhd_list = queryFlowlines(sites_sf = sites_sf,
root_site_code = "TUM",
min_strm_order = 2)
# join the upstream and downstream flowlines
flowlines = nhd_list$flowlines
To visualize the sites and stream, the user can make a plot, such as the one in the figure below.
library(ggplot2)
ggplot() +
geom_sf(data = flowlines,
aes(color = as.factor(streamorde),
size = streamorde)) +
scale_color_viridis_d(direction = -1,
option = "D",
name = "Stream\nOrder",
end = 0.8) +
scale_size_continuous(range = c(0.2, 1.2),
guide = 'none') +
geom_sf(data = nhd_list$basin,
fill = NA,
lwd = 2) +
geom_sf(data = sites_sf,
size = 4,
color = "black") +
ggrepel::geom_label_repel(
data = sites_sf,
aes(label = site_code,
geometry = geometry),
size = 2,
stat = "sf_coordinates",
min.segment.length = 0,
max.overlaps = 50
) +
theme_bw() +
theme(axis.title = element_blank())
Once the user has an sf
object with the spatial
locations of their sites, and a NHDPlusv2 layer of flowlines, they can
use the buildParentChild()
function to construct a
parent-child table.
parent_child = buildParentChild(sites_sf,
flowlines)
#> Error in dplyr::pull(., Hydroseq) :
#> Caused by error:
#> ! object 'Hydroseq' not found
#> Error in dplyr::pull(., Hydroseq) :
#> Caused by error:
#> ! object 'Hydroseq' not found
#> Error in dplyr::pull(., Hydroseq) :
#> Caused by error:
#> ! object 'Hydroseq' not found
#> Error in dplyr::pull(., Hydroseq) :
#> Caused by error:
#> ! object 'Hydroseq' not found
#> Error in dplyr::pull(., Hydroseq) :
#> Caused by error:
#> ! object 'Hydroseq' not found
#> Error in dplyr::pull(., Hydroseq) :
#> Caused by error:
#> ! object 'Hydroseq' not found
#> Error in dplyr::pull(., Hydroseq) :
#> Caused by error:
#> ! object 'Hydroseq' not found
#> Error in dplyr::pull(., Hydroseq) :
#> Caused by error:
#> ! object 'Hydroseq' not found
#> Error in dplyr::pull(., Hydroseq) :
#> Caused by error:
#> ! object 'Hydroseq' not found
#> Error in dplyr::pull(., Hydroseq) :
#> Caused by error:
#> ! object 'Hydroseq' not found
#> Error in dplyr::pull(., Hydroseq) :
#> Caused by error:
#> ! object 'Hydroseq' not found
#> Error in dplyr::pull(., Hydroseq) :
#> Caused by error:
#> ! object 'Hydroseq' not found
parent_child
#> # A tibble: 12 × 4
#> parent child parent_hydro child_hydro
#> <chr> <chr> <dbl> <dbl>
#> 1 NA TUM NA 50013200
#> 2 NA LNF NA 50016572
#> 3 NA ICL NA 50016572
#> 4 NA ICM NA 50017117
#> 5 NA UWE NA 50018330
#> 6 NA LWN NA 50021266
#> 7 NA CHL NA 50022268
#> 8 NA CHU NA 50023868
#> 9 NA WTL NA 50026363
#> 10 NA NAL NA 50028949
#> 11 NA NAU NA 50034395
#> 12 NA CHW NA 50045832
After initially building a parent-child table, there is usually some editing that needs to happen. This is necessary for a variety of reasons we’ve observed:
For these reasons (or any others), PITcleanr
provides a
function to edit the parent-child table, editParentChild()
.
It requires a list the length of rows to be fixed
(fix_list
). Each element of this list is a vector of length
3, where the first two elements contain the parent and child locations
to be edited, and the third element is the new (correct) parent
location. As each child contains a single parent in the table, this is
enough information to uniquely target individual rows of the
parent-child table.
The user can also switch parent-child pairs, making the parent the
child and vice versa, using the switch_parent_child
argument. This is primarily intended to fix relationships between a root
site and the initial downstream sites. If, by default, the parent child
table is built assuming upstream movement, but the user would like to
incorporate downstream movement from the root site to a location
downstream, this argument will be useful. However, it will not “fix”
associated parent-child relationships with the locations in the
switch_parent_child
list; those must be fixed through the
fix_list
argument.
Often, a good place to start will be to examine the current
parent-child relationships using the function plotNodes()
.
This can help visually identify pathways that need editing.
plotNodes(parent_child)
From the figure above, the original parent-child table has some
problems with 2 sites (ICL and PES) downstream of the root site, TUM. In
addition, the flowlines are not accurate near the LNF site, or the
spatial location of that site is incorrect. We would like to make TUM
the parent of both ICL and PES, and ICL should be the parent of LNF. All
of these corrections are implemented below using the
editParentChild()
function.
parent_child = editParentChild(parent_child,
fix_list = list(c(NA, "PES", "TUM"),
c(NA, "LNF", "ICL"),
c("PES", "ICL", "TUM")),
switch_parent_child = list(c("ICL", 'TUM')))
# view corrected parent_child table
parent_child
#> # A tibble: 12 × 4
#> parent child parent_hydro child_hydro
#> <chr> <chr> <dbl> <dbl>
#> 1 ICL LNF 50016572 50016572
#> 2 NA TUM NA 50013200
#> 3 NA ICL NA 50016572
#> 4 NA ICM NA 50017117
#> 5 NA UWE NA 50018330
#> 6 NA LWN NA 50021266
#> 7 NA CHL NA 50022268
#> 8 NA CHU NA 50023868
#> 9 NA WTL NA 50026363
#> 10 NA NAL NA 50028949
#> 11 NA NAU NA 50034395
#> 12 NA CHW NA 50045832
If the configuration file contains multiple nodes for some sites
(e.g., a node for each array at a site), then the parent-child table can
be expanded to accommodate these nodes using the
addParentChildNodes()
function. The function essentially
“expands” (adds rows) to the existing parent-child table to accommodate
those additional nodes. Note: the addParentChildNodes()
function assumes that the parent-child table is arranged so that
children are upstream of parents, and nodes designated as
_U
are upstream of those designated _D
.
Currently, the function can only handle up to two nodes at each
site.
Here, we use the addParentChildNodes()
function on our
existing parent_child
table, and provide our existing
configuration
tibble to the configuration
argument to expand the tibble. Our results are saved to a new object
parent_child_nodes
.
# read in configuration file
my_configuration <- system.file("extdata",
"TUM_configuration.csv",
package = "PITcleanr",
mustWork = TRUE) |>
read_csv(show_col_types = F)
# expand the parent-child table to include nodes
parent_child_nodes = addParentChildNodes(parent_child,
configuration = my_configuration)
# view expanded parent-child table
parent_child_nodes
#> # A tibble: 21 × 4
#> parent child parent_hydro child_hydro
#> <chr> <chr> <dbl> <dbl>
#> 1 ICL_D ICL_U 50016572 50016572
#> 2 ICL_U LNF 50016572 50016572
#> 3 ICM_D ICM_U 50017117 50017117
#> 4 LWN_D LWN_U 50021266 50021266
#> 5 CHL_D CHL_U 50022268 50022268
#> 6 CHU_D CHU_U 50023868 50023868
#> 7 WTL_D WTL_U 50026363 50026363
#> 8 NAL_D NAL_U 50028949 50028949
#> 9 NAU_D NAU_U 50034395 50034395
#> 10 CHW_D CHW_U 50045832 50045832
#> # ℹ 11 more rows