Skip to contents

Estimates the stability of factors over a range of component numbers to aid in the identification of the optimal factor number. Based on the Most Stable Transcriptome Dimension (MSTD) approach (see details).

Usage

estimateStability(
  X,
  min_components = 10,
  max_components = 60,
  by = 2,
  n_runs = 30,
  resample = FALSE,
  mean_stability_threshold = NULL,
  center_X = TRUE,
  scale_X = FALSE,
  assay_name = "normal",
  BPPARAM = BiocParallel::SerialParam(),
  verbose = TRUE,
  ...
)

Arguments

X

Either a SummarizedExperiment object or a matrix containing data to be subject to ICA. X should have rows as features and columns as samples.

min_components

The minimum number of components to estimate the stability for.

max_components

The maximum number of components to estimate the stability for.

by

The number by which to increment the numbers of components tested.

n_runs

The number of times to run ICA to estimate factors and quantify stability. Ignored if use_stability is FALSE.

resample

If TRUE, a boostrap approach is used to estimate factors and quantify stability. Else, random initialisation of ICA is employed. Ignored if use_stability is FALSE.

mean_stability_threshold

A threshold for the mean stability of factors.

center_X

If TRUE, X is centered (i.e., features / rows are transformed to have a mean of 0) prior to ICA. Generally recommended.

scale_X

If TRUE, X is scaled (i.e., features / rows are transformed to have a standard deviation of 1) before ICA.

assay_name

If X is a SummarizedExperiment, then this should be the name of the assay to be subject to ICA.

BPPARAM

A class containing parameters for parallel evaluation. Uses SerialParam by default, running only a single ICA computation at a time. Ignored if use_stability is FALSE.

verbose

If TRUE, shows a progress bar that updates for each number of components tested. Note that the time taken may not be linear, because the time taken to run ICA generally increases with the number of components.

...

Additional arguments to be passed to runICA.

Value

Returns a list containing:

stability

A data.frame indicating factor stabilities as a function of the number of components.

selected_nc

a naive estimate for the optimal number of components based on the mean_stability_threshold.

Details

Runs the stability-based ICA algorithm (see runICA) for a range of component numbers. Estimates stability for each, allowing for selection of the optimal number of components to be used for ICA. The results of this function can be plotted by plotStability.

This algorithm is based on the Most Stable Transcriptome Dimension (MSTD) approach (https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-4112-9).

The function automatically selects a number of components based on mean_stability_threshold. However, this choice should be made after visualisating the stabilities as a function of the number of components, which may be done using plotStability. The aformentioned MSTD paper provides additional context and advice for choosing the number of components based on these results.

Author

Jack Gisby

Examples

# Get a random matrix with rnorm, with 200 rows (features)
# and 100 columns (observations)
X <- ReducedExperiment:::.makeRandomData(200, 100, "feature", "obs")

# Estimate stability across 10 to 30 components
# Note: We could have provided a SummarizedExperiment object instead of a matrix
stab_res_1 <- estimateStability(
    X,
    min_components = 10,
    max_components = 30,
    n_runs = 5,
    verbose = FALSE
)