Runs linear models for components and sample-level data

Runs either standard linear or linear mixed models, with reduced components (e.g., factors or modules) as the outcomes and sample-level information (e.g., treatment, disease status) as predictors.

Usage

associateComponents(
  re,
  formula,
  method = "lm",
  scale_reduced = TRUE,
  center_reduced = TRUE,
  type = "II",
  adj_method = "BH",
  ...
)

Arguments

re: An object inheriting from ReducedExperiment.
formula: The model formula to apply. Only the right hand side of the model need be specified (e.g., "~ x + y"). The left hand side (outcome) represents the components themselves. The variables in this formula should be present in the colData of re.
method: If "lm", then the lm function is used to run linear models (in tandem with Anova for running anovas on the model terms). If "lmer", then linear mixed models are run through lmer.
scale_reduced: If TRUE, the reduced data are scaled (to have a standard deviation of 1) before modelling.
center_reduced: If TRUE, the reduced data are centered (to have a mean of 0) before modelling.
type: The type of anova to be applied to the terms of the linear model.
adj_method: The method for adjusting for multiple testing. Passed to the p.adjust method parameter.
...: Additional arguments passed to lmer, given that method is set to "lmer".

Value

Returns a list with the entry "models" including a list of the model objects, "anovas" containing the output of anova-based testing, and "summaries" containing the results of running summary on the models.

Details

Multiple testing adjustment is performed separately for each term in the model across all factors. In other words, p-values are adjusted for the number of factors, but not the number of model terms. If you are testing a large number of terms, you could consider applying a custom adjustment method or using penalised regression.

Author

Jack Gisby

Examples

# Create FactorisedExperiment with random data (100 features, 50 samples,
# 10 factors)
set.seed(1)
fe <- ReducedExperiment:::.createRandomisedFactorisedExperiment(100, 50, 10)
fe
#> class: FactorisedExperiment 
#> dim: 100 50 10 
#> metadata(0):
#> assays(1): normal
#> rownames(100): gene_1 gene_2 ... gene_99 gene_100
#> rowData names(0):
#> colnames(50): sample_1 sample_2 ... sample_49 sample_50
#> colData names(0):
#> 10 components

# Create a sample-level variable describing some sort of treatment
colData(fe)$treated <- c(rep("control", 25), rep("treatment", 25))
colData(fe)$treated <- factor(colData(fe)$treated, c("control", "treatment"))

# Increase the value of factor 1 for the treated samples, simulating some
# kind of treatment-related effect picked up by factor analysis
reduced(fe)[, 1][colData(fe)$treated == "treatment"] <-
    reduced(fe)[, 1][colData(fe)$treated == "treatment"] +
    rnorm(25, mean = 1.5, sd = 0.1)

# Create a sample-level variable describing a covariate we want to adjust for
# We will make the treated patients slightly older on average
colData(fe)$age <- 0
colData(fe)$age[colData(fe)$treated == "control"] <- rnorm(25, mean = 35, sd = 8)
colData(fe)$age[colData(fe)$treated == "treatment"] <- rnorm(25, mean = 40, sd = 8)

# Associate the factors with sample-level variable in the colData
lm_res <- associateComponents(
    fe,
    formula = "~ treated + age", # Our model formula
    method = "lm", # Use a linear model
    adj_method = "BH" # Adjust our p-values with Benjamini-Hochberg
)

# We see that treatment is significantly associated with factor 1 (adjusted
# p-value < 0.05) and is higher in the treated patients. Age is not
# significantly associated with factor 1, but there is a slight positive
# relationship
print(head(lm_res$summaries[
    ,
    c("term", "component", "estimate", "stderr", "pvalue", "adj_pvalue")
]))
#>                                       term component     estimate     stderr
#> factor_1_(Intercept)           (Intercept)  factor_1 -0.689268582 0.54583812
#> factor_1_treatedtreatment treatedtreatment  factor_1  0.944893904 0.27301472
#> factor_1_age                           age  factor_1  0.005964950 0.01571588
#> factor_2_(Intercept)           (Intercept)  factor_2 -0.168478517 0.62950406
#> factor_2_treatedtreatment treatedtreatment  factor_2  0.070402916 0.31486235
#> factor_2_age                           age  factor_2  0.003666567 0.01812481
#>                               pvalue adj_pvalue
#> factor_1_(Intercept)      0.21290073  0.5408151
#> factor_1_treatedtreatment 0.00115608  0.0115608
#> factor_1_age              0.70598966  0.9037811
#> factor_2_(Intercept)      0.79014986  0.9712076
#> factor_2_treatedtreatment 0.82403839  0.9223567
#> factor_2_age              0.84055886  0.9037811

# But what if these aren't 50 independent patients, but rather 25 patients
# sampled before and after treatment? We can account for this using a
# linear mixed model, which can account for repeated measures and paired
# designs

# First we add in this information
colData(fe)$patient_id <- c(paste0("patient_", 1:25), paste0("patient_", 1:25))

# Then we run the linear mixed model with a random intercept for patient
lmm_res <- associateComponents(
    fe,
    formula = "~ treated + age + (1 | patient_id)", # Add a random intercept
    method = "lmer", # Use a linear mixed model
    adj_method = "BH"
)

# We used a different method, but can obtain a similar summary output
print(head(lmm_res$summaries[
    ,
    c("term", "component", "estimate", "stderr", "pvalue", "adj_pvalue")
]))
#>                                       term component     estimate     stderr
#> factor_1_(Intercept)           (Intercept)  factor_1 -0.784284004 0.51247020
#> factor_1_treatedtreatment treatedtreatment  factor_1  0.924712439 0.23378306
#> factor_1_age                           age  factor_1  0.008856511 0.01464046
#> factor_2_(Intercept)           (Intercept)  factor_2 -0.168478517 0.62950406
#> factor_2_treatedtreatment treatedtreatment  factor_2  0.070402916 0.31486235
#> factor_2_age                           age  factor_2  0.003666567 0.01812481
#>                                 pvalue  adj_pvalue
#> factor_1_(Intercept)      0.1335307576 0.616495271
#> factor_1_treatedtreatment 0.0005283248 0.005283248
#> factor_1_age              0.5487369545 0.902588792
#> factor_2_(Intercept)      0.7901498553 0.970086916
#> factor_2_treatedtreatment 0.8240383854 0.896969277
#> factor_2_age              0.8405588589 0.902588792