hierarchicalClones - Hierarchical clustering-based method for partitioning Ig sequences into clones.
hierarchicalClones function provides a computational pipline for assigning Ig
sequences into clonal groups sharing same V gene, J gene, and junction length, based on the
junction sequence similarity.
hierarchicalClones( db, threshold, method = c("nt", "aa"), linkage = c("single", "average", "complete"), normalize = c("len", "none"), junction = "junction", v_call = "v_call", j_call = "j_call", clone = "clone_id", first = FALSE, cdr3 = FALSE, mod3 = FALSE, max_n = 0, nproc = 1, verbose = FALSE, log = NULL, summarize_clones = TRUE )
- data.frame containing sequence data.
- a numeric scalar where the tree should be cut (the distance threshold for clonal grouping).
- one of the
"nt"for nucleotide based clustering or
"aa"for amino acid based clustering.
- available linkage are
- method of normalization. The default is
"len", which divides the distance by the length of the sequence group. If
"none"then no normalization if performed.
- character name of the column containing junction sequences. Also used to determine sequence length for grouping.
- character name of the column containing the V-segment allele calls.
- character name of the column containing the J-segment allele calls.
- the output column name containing the clonal cluster identifiers.
- specifies how to handle multiple V(D)J assignments for initial grouping.
TRUEonly the first call of the gene assignments is used. If
FALSEthe union of ambiguous gene assignments is used to group all sequences with any overlapping gene calls.
TRUEremoves 3 nucleotides from both ends of
"junction"prior to clustering (converts IMGT junction to CDR3 region). If
TRUEthis will also remove records with a junction length less than 7 nucleotides.
TRUEremoves records with a
junctionlength that is not divisible by 3 in nucleotide space.
- The maximum number of
Ncharacters to permit in the junction sequence before excluding the record from clonal assignment. Note, with
linkage="single"non-informative positions can create artifactual links between unrelated sequences. Use with caution. Default is set to be zero. Set it as
"NULL"for no action.
- number of cores to distribute the function over.
TRUEprints out a summary of each step cloning process. if
FALSE(default) process cloning silently.
- output path and filename to save the
verboselog. The input file directory is used if path is not specified. The default is
NULLfor no action.
TRUEperforms a series of analysis to assess the clonal landscape and returns a ScoperClones object. If
FALSEthen a modified input
summarize_clones=TRUE (default) a ScoperClones object is returned that includes the
clonal assignment summary information and a modified input
db in the
db slot that
contains clonal identifiers in the specified
data.frame is returned with clone identifiers in the
hierarchicalClones provides a computational platform to explore the B cell clonal
relationships in high-throughput Adaptive Immune Receptor Repertoire sequencing (AIRR-seq)
data sets. This function performs hierarchical clustering among sequences of B cell receptors
(BCRs, immunoglobulins, Ig) that share the same V gene, J gene, and junction length
based on the junction sequence similarity:
# Find clonal groups results <- hierarchicalClones(ExampleDb, threshold=0.15) # Retrieve modified input data with clonal clustering identifiers df <- as.data.frame(results) # Plot clonal summaries plot(results, binwidth=0.02)
See plotCloneSummary plotting summary results.