Runs either standard linear or linear mixed models, with reduced components (e.g., factors or modules) as the outcomes and sample-level information (e.g., treatment, disease status) as predictors.
Usage
associateComponents(
re,
formula,
method = "lm",
scale_reduced = TRUE,
center_reduced = TRUE,
type = "II",
adj_method = "BH",
...
)
Arguments
- re
An object inheriting from ReducedExperiment.
- formula
The model formula to apply. Only the right hand side of the model need be specified (e.g., "~ x + y"). The left hand side (outcome) represents the components themselves. The variables in this formula should be present in the
colData
ofre
.- method
If "lm", then the lm function is used to run linear models (in tandem with Anova for running anovas on the model terms). If "lmer", then linear mixed models are run through lmer.
- scale_reduced
If
TRUE
, the reduced data are scaled (to have a standard deviation of 1) before modelling.- center_reduced
If
TRUE
, the reduced data are centered (to have a mean of 0) before modelling.- type
The type of anova to be applied to the terms of the linear model.
- adj_method
The method for adjusting for multiple testing. Passed to the p.adjust
method
parameter.- ...
Additional arguments passed to lmer, given that
method
is set to "lmer".
Value
Returns a list with the entry "models" including a list of the
model objects, "anovas" containing the output of anova-based testing,
and "summaries" containing the results of running summary
on the models.
Details
Multiple testing adjustment is performed separately for each term in the model across all factors. In other words, p-values are adjusted for the number of factors, but not the number of model terms. If you are testing a large number of terms, you could consider applying a custom adjustment method or using penalised regression.
Examples
# Create FactorisedExperiment with random data (100 features, 50 samples,
# 10 factors)
set.seed(1)
fe <- ReducedExperiment:::.createRandomisedFactorisedExperiment(100, 50, 10)
fe
#> class: FactorisedExperiment
#> dim: 100 50 10
#> metadata(0):
#> assays(1): normal
#> rownames(100): gene_1 gene_2 ... gene_99 gene_100
#> rowData names(0):
#> colnames(50): sample_1 sample_2 ... sample_49 sample_50
#> colData names(0):
#> 10 components
# Create a sample-level variable describing some sort of treatment
colData(fe)$treated <- c(rep("control", 25), rep("treatment", 25))
colData(fe)$treated <- factor(colData(fe)$treated, c("control", "treatment"))
# Increase the value of factor 1 for the treated samples, simulating some
# kind of treatment-related effect picked up by factor analysis
reduced(fe)[, 1][colData(fe)$treated == "treatment"] <-
reduced(fe)[, 1][colData(fe)$treated == "treatment"] +
rnorm(25, mean = 1.5, sd = 0.1)
# Create a sample-level variable describing a covariate we want to adjust for
# We will make the treated patients slightly older on average
colData(fe)$age <- 0
colData(fe)$age[colData(fe)$treated == "control"] <- rnorm(25, mean = 35, sd = 8)
colData(fe)$age[colData(fe)$treated == "treatment"] <- rnorm(25, mean = 40, sd = 8)
# Associate the factors with sample-level variable in the colData
lm_res <- associateComponents(
fe,
formula = "~ treated + age", # Our model formula
method = "lm", # Use a linear model
adj_method = "BH" # Adjust our p-values with Benjamini-Hochberg
)
# We see that treatment is significantly associated with factor 1 (adjusted
# p-value < 0.05) and is higher in the treated patients. Age is not
# significantly associated with factor 1, but there is a slight positive
# relationship
print(head(lm_res$summaries[
,
c("term", "component", "estimate", "stderr", "pvalue", "adj_pvalue")
]))
#> term component estimate stderr
#> factor_1_(Intercept) (Intercept) factor_1 -0.689268582 0.54583812
#> factor_1_treatedtreatment treatedtreatment factor_1 0.944893904 0.27301472
#> factor_1_age age factor_1 0.005964950 0.01571588
#> factor_2_(Intercept) (Intercept) factor_2 -0.168478517 0.62950406
#> factor_2_treatedtreatment treatedtreatment factor_2 0.070402916 0.31486235
#> factor_2_age age factor_2 0.003666567 0.01812481
#> pvalue adj_pvalue
#> factor_1_(Intercept) 0.21290073 0.5408151
#> factor_1_treatedtreatment 0.00115608 0.0115608
#> factor_1_age 0.70598966 0.9037811
#> factor_2_(Intercept) 0.79014986 0.9712076
#> factor_2_treatedtreatment 0.82403839 0.9223567
#> factor_2_age 0.84055886 0.9037811
# But what if these aren't 50 independent patients, but rather 25 patients
# sampled before and after treatment? We can account for this using a
# linear mixed model, which can account for repeated measures and paired
# designs
# First we add in this information
colData(fe)$patient_id <- c(paste0("patient_", 1:25), paste0("patient_", 1:25))
# Then we run the linear mixed model with a random intercept for patient
lmm_res <- associateComponents(
fe,
formula = "~ treated + age + (1 | patient_id)", # Add a random intercept
method = "lmer", # Use a linear mixed model
adj_method = "BH"
)
# We used a different method, but can obtain a similar summary output
print(head(lmm_res$summaries[
,
c("term", "component", "estimate", "stderr", "pvalue", "adj_pvalue")
]))
#> term component estimate stderr
#> factor_1_(Intercept) (Intercept) factor_1 -0.784284004 0.51247020
#> factor_1_treatedtreatment treatedtreatment factor_1 0.924712439 0.23378306
#> factor_1_age age factor_1 0.008856511 0.01464046
#> factor_2_(Intercept) (Intercept) factor_2 -0.168478517 0.62950406
#> factor_2_treatedtreatment treatedtreatment factor_2 0.070402916 0.31486235
#> factor_2_age age factor_2 0.003666567 0.01812481
#> pvalue adj_pvalue
#> factor_1_(Intercept) 0.1335307576 0.616495271
#> factor_1_treatedtreatment 0.0005283248 0.005283248
#> factor_1_age 0.5487369545 0.902588792
#> factor_2_(Intercept) 0.7901498553 0.970086916
#> factor_2_treatedtreatment 0.8240383854 0.896969277
#> factor_2_age 0.8405588589 0.902588792