Additive Models and Model Matrices in R

Introduction: In this blog post, we’ll explore additive models and model matrices using the ashina dataset from the ISwR package in R. We’ll set up an additive model for the data and compare the results with those obtained from t-tests. Additionally, we’ll generate model matrices for different model specifications and discuss their implications.

Part 1: Additive Model for Ashina Data

First, let’s load the necessary package and set up the data:


library(ISwR)

ashina$subject <- factor(1:16)
attach(ashina)
act <- data.frame(vas=vas.active, subject, treat=1, period=grp)
plac <- data.frame(vas=vas.plac, subject, treat=0, period=grp)
ashina_long <- rbind(act, plac)

Now, we’ll fit the additive model using lm() function:

additive_model <- lm(vas ~ subject + period + treat, data = ashina_long)
summary(additive_model)

Output:

Call:
lm(formula = vas ~ subject + period + treat, data = ashina_long)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.1875 -0.6875 -0.1875  0.8125  2.3125 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5.3125     0.8394   6.328 2.18e-06 ***
subject2     -0.6250     1.1877  -0.526 0.603376    
subject3      2.1250     1.1877   1.789 0.086128 .  
...
treat        -2.2500     0.3953  -5.692 7.86e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.187 on 24 degrees of freedom
Multiple R-squared:  0.8017,	Adjusted R-squared:  0.6528 
F-statistic: 5.637 on 15 and 24 DF,  p-value: 0.0001044

The additive model includes subject, period, and treatment effects. The summary() function displays the coefficients and their significance. We can see that the treatment effect is highly significant (p-value = 7.86e-06).

Now, let’s perform a paired t-test for comparison:

t.test(vas.active, vas.plac, paired = TRUE)

Output:

  Paired t-test

data:  vas.active and vas.plac
t = 5.6925, df = 15, p-value = 3.931e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 1.433026 3.066974
sample estimates:
mean of the differences 
                   2.25

The paired t-test also indicates a significant difference between the active and placebo treatments (p-value = 3.931e-05), which is consistent with the additive model results.

Part 2: Model Matrices

Now, let’s generate model matrices for different model specifications and discuss their implications.

a <- gl(2, 2, 8)
b <- gl(2, 4, 8)
x <- 1:8
y <- c(1:4, 8:5)
z <- rnorm(8)

model_matrix_a_b <- model.matrix(~ a*b)
model_matrix_a_colon_b <- model.matrix(~ a:b)
model_matrix_a_plus_b <- model.matrix(~ a + b)
model_matrix_a <- model.matrix(~ a)
model_matrix_b <- model.matrix(~ b)

The model.matrix() function generates the design matrices for each model specification. Let’s fit the models using lm():

model_a_b <- lm(z ~ a*b)
model_a_colon_b <- lm(z ~ a:b)
model_a_plus_b <- lm(z ~ a + b)
model_a <- lm(z ~ a)
model_b <- lm(z ~ b)

Now, let’s check for singularities and discuss the implications:

  • a*b represents the full interaction model, including main effects and the interaction term.
  • a:b represents only the interaction term without the main effects.
  • a + b represents the additive model with main effects of a and b but no interaction.
  • a and b represent models with only the main effect of a or b, respectively.

When fitting the models, you may encounter singularities if there are linear dependencies among the predictors. The summary() function will indicate if there are any singularities in the model fit.

Output:

summary(model_a_b)

Call:
lm(formula = z ~ a * b)

Residuals:
        1         2         3         4         5         6         7         8 
-0.003357 -0.003357  0.003357  0.003357 -0.003357 -0.003357  0.003357  0.003357 

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.04469    0.02362   1.892    0.131
a2          -0.08937    0.03341  -2.676    0.055
b2                NA         NA      NA       NA
a2:b2        0.08937    0.04723   1.892    0.131

Residual standard error: 0.04723 on 4 degrees of freedom
Multiple R-squared:   0.75,	Adjusted R-squared:  0.5625 
F-statistic:     4 on 3 and 4 DF,  p-value: 0.1105

In the a*b model, we observe a singularity because the main effect of b is not defined. This indicates a linear dependency between the predictors.

By examining the model summaries, you can assess the significance of the main effects and interaction terms, and determine which model provides the best fit for the data.

Conclusion: In this blog post, we explored additive models and model matrices using the ashina dataset. We set up an additive model and compared the results with a paired t-test. Additionally, we generated model matrices for different model specifications and discussed their implications. Understanding these concepts is crucial for effective modeling and interpretation of results in R.

Additive models are an essential tool for analyzing complex relationships in data. They allow us to capture non-linear patterns and interactions, providing valuable insights that may not be apparent when using traditional linear models. By incorporating multiple predictors and examining their combined effects, additive models enable a more comprehensive understanding of the underlying data generating process.

Model matrices, on the other hand, play a vital role in the implementation of additive models. They serve as a fundamental building block, representing the design matrix that defines the model structure. Through the generation of model matrices for different model specifications, we gain a deeper understanding of how various factors contribute to the overall model fit and interpretation.

Furthermore, the comparison of additive models with a paired t-test showcases the versatility of additive modeling in addressing complex research questions. While the paired t-test is valuable for comparing the means of paired observations, additive models offer a broader framework for capturing intricate relationships among variables, making them particularly suitable for analyzing diverse datasets with complex structures.

In summary, mastering the concepts of additive models and model matrices in R equips researchers and analysts with the necessary tools to tackle advanced modeling challenges and derive meaningful insights from their data. By delving into these topics, practitioners can enhance their ability to model complex phenomena and effectively communicate the outcomes of their analyses.


Comments

Leave a comment