Runs ICA through ica. If use_stability
is FALSE, then X
is
passed directly to ica and a standard ICA analysis is performed.
If use_stability
is TRUE
, then the stabilised ICA procedure is carried
out (see details
).
Usage
runICA(
X,
nc,
use_stability = FALSE,
resample = FALSE,
method = "fast",
stability_threshold = NULL,
center_X = TRUE,
scale_X = FALSE,
reorient_skewed = TRUE,
scale_components = TRUE,
scale_reduced = TRUE,
n_runs = 30,
BPPARAM = BiocParallel::SerialParam(),
...
)
Arguments
- X
Either a SummarizedExperiment object or a matrix containing data to be subject to ICA.
X
should have rows as features and columns as samples.- nc
The number of components to be identified. See estimateStability for a method to estimate the optimal number of components.
- use_stability
Whether to use a stability-based approach to estimate factors. See
details
for further information.- resample
If
TRUE
, a boostrap approach is used to estimate factors and quantify stability. Else, random initialisation of ICA is employed. Ignored ifuse_stability
isFALSE
.- method
The ICA method to use. Passed to ica, the options are "fast", "imax" or "jade".
- stability_threshold
A stability threshold for pruning factors. Factors with a stability below this threshold will be removed. If used, the threshold can lead to fewer factors being returned than that specified by
nc
.- center_X
If
TRUE
, X is centered (i.e., features / rows are transformed to have a mean of 0) prior to ICA. Generally recommended.- scale_X
If
TRUE
, X is scaled (i.e., features / rows are transformed to have a standard deviation of 1) before ICA.- reorient_skewed
If
TRUE
, factors are reorientated to ensure that the loadings of each factor (i.e., the source signal matrix) have positive skew. Helps ensure that the most influential features for each factor are positively associated with it.- scale_components
If
TRUE
, the loadings are standardised (to have a mean of 0 and standard deviation of 1).- scale_reduced
If
TRUE
, the reduced data (mixture matrix) are standardised (to have a mean of 0 and standard deviation of 1).- n_runs
The number of times to run ICA to estimate factors and quantify stability. Ignored if
use_stability
isFALSE
.- BPPARAM
A class containing parameters for parallel evaluation. Uses SerialParam by default, running only a single ICA computation at a time. Ignored if
use_stability
isFALSE
.- ...
Additional arguments to be passed to ica.
Value
A list containing the following:
- M
The mixture matrix (reduced data) with samples as rows and columns as factors.
- S
The source signal matrix (loadings) with rows as features and columns as factors.
- stab
If
use_stability
is TRUE, "stab" will be a component of the list. It is a vector indicating the relative stability, as described above.
Details
Function performs ICA for a data matrix. If use_stability
is TRUE
, then
ICA is performed multiple times with either: i) random initialisation
(default); or ii) bootstrap resampling of the data (if resample
is TRUE
).
Note that the seed must be set if reproducibility is needed. Specifically,
one can use set.seed
prior to running standard ICA
(use_stability = FALSE
) or set the RNGseed
argument of BPPARAM
when
running stabilised ICA (use_stability = TRUE
).
The stability-based ICA algorithm is similar to the the ICASSO approach (https://www.cs.helsinki.fi/u/ahyvarin/papers/Himberg03.pd) that is implemented in the stabilized-ica Python package (https://github.com/ncaptier/stabilized-ica/tree/master).
In short, the stability-based algorithm consists of:
Running ICA multiple times with either random initialisation or bootstrap resampling of the input data.
Clustering the resulting factors across all runs based on the signature matrix.
Calculating intra- (aics) and extra- (aecs) cluster stability, and defining the final cluster stability as
aics - aecs
.Calculating the cluster centrotype as the factor with the highest intra-cluster stability.
Optionally removing factors below a specified stability threshold (
stability_threshold
).
Results from this function should be broadly similar to those generated by other implementations of stabilised ICA, although they will not be identical. Notable differences include:
- ICA algorithm
Differences in the underlying implementation of ICA.
- Stability threshold
The
stability_threshold
argument, if specified, removes unstable components. Such a threshold is not used by stabilized-ica.- Mixture matrix recovery
ICA is generally formulated as
X = MS
, whereX
is the input data,M
is the mixture matrix (reduced data) andS
is the source signal matrix (feature loadings). The stabilised ICA approach first calculates a source signal matrix before recovering the mixture matrix. To do this, other implementations, including that of the stabilized-ica package, multiplyX
by the pseudo-inverse ofS
. Such an operation is implemented in theginv
function of theMASS
R package. In the development of ReducedExperiment, we noticed that taking the inverse ofS
often failed, particularly when there were correlated factors. For this reason, we instead formulate the mixture matrix asM = XS
. After standardisation ofM
, both approaches return near-identical results, given that the matrix inverse was successfully calculated.
Examples
# Get a random matrix with rnorm, with 100 rows (features)
# and 20 columns (observations)
X <- ReducedExperiment:::.makeRandomData(100, 20, "feature", "obs")
# Run standard ICA on the data with 5 components
set.seed(1)
ica_res <- runICA(X, nc = 5, use_stability = FALSE)
# Run stabilised ICA on the data with 5 components (low runs for example)
ica_res_stab <- runICA(X, nc = 5, use_stability = TRUE, n_runs = 5,
BPPARAM = BiocParallel::SerialParam(RNGseed = 1))