spectralClones - Spectral clustering-based method for partitioning Ig sequences into clones.
spectralClones function provides an unsupervised computational pipline for
assigning Ig sequences into clonal groups sharing same V gene, J gene, and junction
length, based on the junction sequence similarity and shared mutations in V and J segments.
spectralClones( db, method = c("novj", "vj"), germline = "germline_alignment", sequence = "sequence_alignment", junction = "junction", v_call = "v_call", j_call = "j_call", clone = "clone_id", targeting_model = NULL, len_limit = NULL, first = FALSE, cdr3 = FALSE, mod3 = FALSE, max_n = 0, threshold = NULL, base_sim = 0.95, iter_max = 1000, nstart = 1000, nproc = 1, verbose = FALSE, log = NULL, summarize_clones = TRUE )
- data.frame containing sequence data.
- one of the
"vj". See Details for description.
- character name of the column containing the germline or reference sequence.
- character name of the column containing input sequences.
- character name of the column containing junction sequences. Also used to determine sequence length for grouping.
- character name of the column containing the V-segment allele calls.
- character name of the column containing the J-segment allele calls.
- the output column name containing the clone ids.
- TargetingModel object. Only applicable if
"vj". See Details for description.
- IMGT_V object defining the regions and boundaries of the Ig
sequences. If NULL, mutations are counted for entire sequence. Only
- specifies how to handle multiple V(D)J assignments for initial grouping.
TRUEonly the first call of the gene assignments is used. If
FALSEthe union of ambiguous gene assignments is used to group all sequences with any overlapping gene calls.
TRUEremoves 3 nucleotides from both ends of
"junction"prior to clustering (converts IMGT junction to CDR3 region). If
TRUEthis will also remove records with a junction length less than 7 nucleotides.
TRUEremoves records with a
junctionlength that is not divisible by 3 in nucleotide space.
- the maximum number of N’s to permit in the junction sequence before excluding the
record from clonal assignment. Default is set to be zero. Set it as
"NULL"for no action.
- the supervising cut-off to enforce an upper-limit distance for clonal grouping. A numeric value between (0,1).
- required similarity cut-off for sequences in equal distances from each other.
- the maximum number of iterations allowed for kmean clustering step.
- the number of random sets chosen for kmean clustering initialization.
- number of cores to distribute the function over.
TRUEprints out a summary of each step cloning process. if
FALSE(default) process cloning silently.
- output path and filename to save the
verboselog. The input file directory is used if path is not specified. The default is
NULLfor no action.
TRUEperforms a series of analysis to assess the clonal landscape and returns a ScoperClones object. If
FALSEthen a modified input
summarize_clones=TRUE (default) a ScoperClones object is returned that includes the
clonal assignment summary information and a modified input
db in the
db slot that
contains clonal identifiers in the specified
data.frame is returned with clone identifiers in the
spectralClones provides a computational platform to explore the B cell clonal
relationships in high-throughput Adaptive Immune Receptor Repertoire sequencing (AIRR-seq)
data sets. Two methods are included to perform clustering among sequences of B cell receptors
(BCRs, immunoglobulins, Ig) that share the same V gene, J gene and junction length:
"novj": clonal relationships are inferred using an adaptive threshold that indicates the level of similarity among junction sequences in a local neighborhood.
"vj": clonal relationships are inferred not only based on the junction region homology, but also takes into account the mutation profiles in the V and J segments. Mutation counts are determined by comparing the input sequences (in the column specified by
sequence) to the effective germline sequence (IUPAC representation of sequences in the column specified by
germline). + Not mandatory, but the influence of SHM hot- and cold-spot biases in the clonal inference process will be noted if a SHM targeting model is provided through argument
targeting_model(see createTargetingModel for more technical details).
- Not mandatory, but the upper-limit cut-off for clonal grouping can be provided to
prevent sequences with disimilarity above the threshold group together. Using this argument
any sequence with distances above the
thresholdvalue from other sequences, will become a singleton.
# Subset example data db <- subset(ExampleDb, sample_id == "-1h") # Find clonal groups results <- spectralClones(db, method="novj", germline="germline_alignment_d_mask") # Retrieve modified input data with clonal clustering identifiers df <- as.data.frame(results) # Plot clonal summaries plot(results, binwidth=0.02)
See plotCloneSummary plotting summary results.