Estimate stability of factors as a function of the number of components
Source:R/factors.R
estimateStability.Rd
Estimates the stability of factors over a range of component numbers to
aid in the identification of the optimal factor number. Based on the
Most Stable Transcriptome Dimension (MSTD) approach (see details
).
Usage
estimateStability(
X,
min_components = 10,
max_components = 60,
by = 2,
n_runs = 30,
resample = FALSE,
mean_stability_threshold = NULL,
center_X = TRUE,
scale_X = FALSE,
assay_name = "normal",
BPPARAM = BiocParallel::SerialParam(),
verbose = TRUE,
...
)
Arguments
- X
Either a SummarizedExperiment object or a matrix containing data to be subject to ICA.
X
should have rows as features and columns as samples.- min_components
The minimum number of components to estimate the stability for.
- max_components
The maximum number of components to estimate the stability for.
- by
The number by which to increment the numbers of components tested.
- n_runs
The number of times to run ICA to estimate factors and quantify stability. Ignored if
use_stability
isFALSE
.- resample
If
TRUE
, a boostrap approach is used to estimate factors and quantify stability. Else, random initialisation of ICA is employed. Ignored ifuse_stability
isFALSE
.- mean_stability_threshold
A threshold for the mean stability of factors.
- center_X
If
TRUE
, X is centered (i.e., features / rows are transformed to have a mean of 0) prior to ICA. Generally recommended.- scale_X
If
TRUE
, X is scaled (i.e., features / rows are transformed to have a standard deviation of 1) before ICA.- assay_name
If
X
is a SummarizedExperiment, then this should be the name of the assay to be subject to ICA.- BPPARAM
A class containing parameters for parallel evaluation. Uses SerialParam by default, running only a single ICA computation at a time. Ignored if
use_stability
isFALSE
.- verbose
If
TRUE
, shows a progress bar that updates for each number of components tested. Note that the time taken may not be linear, because the time taken to run ICA generally increases with the number of components.- ...
Additional arguments to be passed to runICA.
Value
Returns a list containing:
- stability
A data.frame indicating factor stabilities as a function of the number of components.
- selected_nc
a naive estimate for the optimal number of components based on the
mean_stability_threshold
.
Details
Runs the stability-based ICA algorithm (see runICA) for a range of component numbers. Estimates stability for each, allowing for selection of the optimal number of components to be used for ICA. The results of this function can be plotted by plotStability.
This algorithm is based on the Most Stable Transcriptome Dimension (MSTD) approach (https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-4112-9).
The function automatically selects a number of components based on
mean_stability_threshold
. However, this choice should be made after
visualisating the stabilities as a function of the number of components,
which may be done using plotStability. The
aformentioned MSTD paper provides additional context and advice for choosing
the number of components based on these results.
Examples
# Get a random matrix with rnorm, with 200 rows (features)
# and 100 columns (observations)
X <- ReducedExperiment:::.makeRandomData(200, 100, "feature", "obs")
# Estimate stability across 10 to 30 components
# Note: We could have provided a SummarizedExperiment object instead of a matrix
stab_res_1 <- estimateStability(
X,
min_components = 10,
max_components = 30,
n_runs = 5,
verbose = FALSE
)