Run standard or stabilised Independent Component Analysis

Runs ICA through ica. If use_stability is FALSE, then X is passed directly to ica and a standard ICA analysis is performed. If use_stability is TRUE, then the stabilised ICA procedure is carried out (see details).

Usage

runICA(
  X,
  nc,
  use_stability = FALSE,
  resample = FALSE,
  method = "fast",
  stability_threshold = NULL,
  center_X = TRUE,
  scale_X = FALSE,
  reorient_skewed = TRUE,
  scale_components = TRUE,
  scale_reduced = TRUE,
  n_runs = 30,
  BPPARAM = BiocParallel::SerialParam(),
  ...
)

Arguments

X: Either a SummarizedExperiment object or a matrix containing data to be subject to ICA. X should have rows as features and columns as samples.
nc: The number of components to be identified. See estimateStability for a method to estimate the optimal number of components.
use_stability: Whether to use a stability-based approach to estimate factors. See details for further information.
resample: If TRUE, a boostrap approach is used to estimate factors and quantify stability. Else, random initialisation of ICA is employed. Ignored if use_stability is FALSE.
method: The ICA method to use. Passed to ica, the options are "fast", "imax" or "jade".
stability_threshold: A stability threshold for pruning factors. Factors with a stability below this threshold will be removed. If used, the threshold can lead to fewer factors being returned than that specified by nc.
center_X: If TRUE, X is centered (i.e., features / rows are transformed to have a mean of 0) prior to ICA. Generally recommended.
scale_X: If TRUE, X is scaled (i.e., features / rows are transformed to have a standard deviation of 1) before ICA.
reorient_skewed: If TRUE, factors are reorientated to ensure that the loadings of each factor (i.e., the source signal matrix) have positive skew. Helps ensure that the most influential features for each factor are positively associated with it.
scale_components: If TRUE, the loadings are standardised (to have a mean of 0 and standard deviation of 1).
scale_reduced: If TRUE, the reduced data (mixture matrix) are standardised (to have a mean of 0 and standard deviation of 1).
n_runs: The number of times to run ICA to estimate factors and quantify stability. Ignored if use_stability is FALSE.
BPPARAM: A class containing parameters for parallel evaluation. Uses SerialParam by default, running only a single ICA computation at a time. Ignored if use_stability is FALSE.
...: Additional arguments to be passed to ica.

Value

A list containing the following:

M: The mixture matrix (reduced data) with samples as rows and columns as factors.
S: The source signal matrix (loadings) with rows as features and columns as factors.
stab: If use_stability is TRUE, "stab" will be a component of the list. It is a vector indicating the relative stability, as described above.

Details

Function performs ICA for a data matrix. If use_stability is TRUE, then ICA is performed multiple times with either: i) random initialisation (default); or ii) bootstrap resampling of the data (if resample is TRUE).

Note that the seed must be set if reproducibility is needed. Specifically, one can use set.seed prior to running standard ICA (use_stability = FALSE) or set the RNGseed argument of BPPARAM when running stabilised ICA (use_stability = TRUE).

The stability-based ICA algorithm is similar to the the ICASSO approach (https://www.cs.helsinki.fi/u/ahyvarin/papers/Himberg03.pd) that is implemented in the stabilized-ica Python package (https://github.com/ncaptier/stabilized-ica/tree/master).

In short, the stability-based algorithm consists of:

Running ICA multiple times with either random initialisation or bootstrap resampling of the input data.
Clustering the resulting factors across all runs based on the signature matrix.
Calculating intra- (aics) and extra- (aecs) cluster stability, and defining the final cluster stability as aics - aecs.
Calculating the cluster centrotype as the factor with the highest intra-cluster stability.
Optionally removing factors below a specified stability threshold (stability_threshold).

Results from this function should be broadly similar to those generated by other implementations of stabilised ICA, although they will not be identical. Notable differences include:

ICA algorithm: Differences in the underlying implementation of ICA.
Stability threshold: The stability_threshold argument, if specified, removes unstable components. Such a threshold is not used by stabilized-ica.
Mixture matrix recovery: ICA is generally formulated as X = MS, where X is the input data, M is the mixture matrix (reduced data) and S is the source signal matrix (feature loadings). The stabilised ICA approach first calculates a source signal matrix before recovering the mixture matrix. To do this, other implementations, including that of the stabilized-ica package, multiply X by the pseudo-inverse of S. Such an operation is implemented in the ginv function of the MASS R package. In the development of ReducedExperiment, we noticed that taking the inverse of S often failed, particularly when there were correlated factors. For this reason, we instead formulate the mixture matrix as M = XS. After standardisation of M, both approaches return near-identical results, given that the matrix inverse was successfully calculated.

Author

Jack Gisby

Examples

# Get a random matrix with rnorm, with 100 rows (features)
# and 20 columns (observations)
X <- ReducedExperiment:::.makeRandomData(100, 20, "feature", "obs")

# Run standard ICA on the data with 5 components
set.seed(1)
ica_res <- runICA(X, nc = 5, use_stability = FALSE)

# Run stabilised ICA on the data with 5 components (low runs for example)
ica_res_stab <- runICA(X, nc = 5, use_stability = TRUE, n_runs = 5,
                        BPPARAM = BiocParallel::SerialParam(RNGseed = 1))