This is a question at the border between stats and r.

When I do a glm with many potential effects, and select a model using stepAIC, many independent variables are selected even if there are no relationship between dependent variable and the effects (all are random numbers).

Do someone has a solution to prevent this effect ? Is it related to Bonferoni correction ?

Is there is a ratio of independent vs number of observations that is safe for stepAIC ?

Thanks

Marc

Example of code. When 2 independent variables are included, no effect is selected, when 11 are included, 7 to 8 are selected.

x <- rnorm(15, 15, 2)
A <- rnorm(15, 20, 5)
B <- rnorm(15, 20, 5)
C <- rnorm(15, 20, 5)
D <- rnorm(15, 20, 5)
E <- rnorm(15, 20, 5)
F <- rnorm(15, 20, 5)
G <- rnorm(15, 20, 5)
H <- rnorm(15, 20, 5)
I <- rnorm(15, 20, 5)
J <- rnorm(15, 20, 5)
K <- rnorm(15, 20, 5)

df <- data.frame(x=x, A=A, B=B, C=C, D=D,
                 E=E, F=F, G=G, H=H, I=I, J=J,
                 K=K)

G1 <- glm(formula = x ~ A + B,
         data=df, family = gaussian(link = "identity"))

g1 <- stepAIC(G1)

summary(g1)

G2 <- glm(formula = x ~ A + B + C + D + E + F + G + H + I + J + K,
         data=df, family = gaussian(link = "identity"))

g2 <- stepAIC(G2)

summary(g2)

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to