I am analyzing some data that came from demographic health surveys.  The
data contain information for individuals within households, that are
located within clusters, that are located within survey years, that are
located within countries.  We are trying to find the best model from a
subset of predictors, and all models must contain the random variable of
household within cluster within year within country.

We are running models on a server with 64GB memory and 6 CPU cores

Data:  Please email me if you are willing to look at this for us and I will
send you the data via file transfer. It's a big file and I don't want to
post it online just now.

First, we tested a simple linear mixed model using the lmer package:

########## Install Packages ##########

library(lme4)
 library(glmulti)

########## Clear all memory/objects ##########
 rm(list=ls())
 ########## Read in Data ##########

mydata = read.csv("kr.and.GIS.cleaned.Residents.only.csv")

###################################################################
########## Try lmer model NO Interactions ##########
###################################################################
ptm <- proc.time()

lmer.model = lmer(stunt.dhs ~
       dis_ed_des+
tc_pa+
avg_clu_tc+
 dist_road+
pden_lscan+
URBAN_RURA+
time.water+
 wealth.index +
        (1|country.code.short/year/cluster/household),
        mydata,
REML = F)
print(lmer.model)

proc.time() - ptm


This works fine, runs in 12 minutes, and gives the following output:


Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: stunt.dhs ~ dis_ed_des + tc_pa + avg_clu_tc + dist_road +
pden_lscan +      URBAN_RURA + time.water + wealth.index + (1 |
country.code.short/year/cluster/household)
   Data: mydata
      AIC       BIC    logLik  deviance
 218267.5  218420.4 -109116.7  218233.5
Random effects:
 Groups                                        Name        Std.Dev.
 household:(cluster:(year:country.code.short)) (Intercept) 0.5584
 cluster:(year:country.code.short)             (Intercept) 0.3727
 year:country.code.short                       (Intercept) 0.0000
 country.code.short                            (Intercept) 0.3028
 Residual                                                  1.3819
Number of obs: 59405, groups:
household:(cluster:(year:country.code.short)), 40703;
cluster:(year:country.code.short), 7436; year:country.code.short, 27;
country.code.short, 21
Fixed Effects:
        (Intercept)           dis_ed_des                tc_pa
 avg_clu_tc
         -1.324e+00            1.348e-07           -7.806e-04
 -5.180e-04
          dist_road           pden_lscan          URBAN_RURAU
 time.water
         -1.435e-06            9.003e-06            6.637e-02
  6.097e-05
 wealth.indexpoorer  wealth.indexpoorest   wealth.indexricher
wealth.indexrichest
         -7.814e-02           -1.794e-01            1.469e-01
  4.390e-01  > > proc.time() - ptm   user  system elapsed
727.842  13.781 741.053


Next, we tried to do model selection using the genetic algorithm of
glmulti.  We eventually want to include more predictors than in this
current model AND we want to include interactions.  This could lead to over
one billion candidate models.  For this reason, we want to eventually use
the genetic algorithm and NOT all possible subsets.  However, as a first
step, we will use this smaller list of predictors and level = 1 so that no
interactions are considered.

First, we create the lmer wrapper and then just run the model to see how
many candidate models we would have with this situation

###################################################################
########## Create lmer wrapper to paste random variable to all models ######
###################################################################

lmer.glmulti <- function (formula, data, random = "", ...) {
lmer(paste(deparse(formula), random), data = data, REML=F, ...)
}


##################################################################
########## Find canditate number of models NO interactions ##########
###################################################################


candidate <- glmulti(stunt.dhs ~
 dis_ed_des*
        tc_pa*
         avg_clu_tc*
dist_road*
pden_lscan*
 URBAN_RURA*
time.water*
wealth.index,
data=mydata,
 level = 1,
fitfunc = lmer.glmulti,
        random = "+(1|country.code.short/year/cluster/household)",
        method = "d")


With this model, the same that we used for the lmer function, we get the
following error.  We also get this error if we try to run the same code but
with method = "g", and confsetsize=5)



Initialization... Error in
factor(household:(cluster:(year:country.code.short))) :    negative
length vectors are not allowed



If, however, we delete the household part of the random factor, the
function works:


candidate <- glmulti(stunt.dhs ~
 dis_ed_des*
        tc_pa*
         avg_clu_tc*
dist_road*
pden_lscan*
 URBAN_RURA*
time.water*
wealth.index,
data=mydata,
 level = 1,
fitfunc = lmer.glmulti,
        random = "+(1|country.code.short/year/cluster)",

        method = "d")


Gives:


Initialization...
TASK: Diagnostic of candidate set.
Sample size: 59405
2 factor(s).
6 covariate(s).
0 f exclusion(s).
0 c exclusion(s).
0 f:f exclusion(s).
0 c:c exclusion(s).
0 f:c exclusion(s).
Size constraints: min =  0 max = -1
Complexity constraints: min =  0 max = -1
Your candidate set contains 256 models.


And running the following code seems to be working (slowly):

###################################################################
########## Run models with NO Interactions using genetic algorithm #####
###################################################################

ptm <- proc.time()

 model <- glmulti(stunt.dhs ~
dis_ed_des*
        tc_pa*
        avg_clu_tc*
        dist_road*
pden_lscan*
 URBAN_RURA*
time.water*
wealth.index,
data=mydata,
 level = 1,
fitfunc = lmer.glmulti,
        random = "+(1|country.code.short/year/cluster)",
        crit = bic,
        imm = 0.5,
        confsetsize = 5)

CAN ANYONE EXPLAIN WHY WE GET THE NEGATIVE LENGTH VECTORS ERROR IF WE
INCLUDE HOUSEHOLD NESTED WITHIN THE OTHER RANDOM FACTORS FOR GLMULTI BUT
NOT FOR LMER?  ANY SUGGESTIONS FOR WHAT WE CAN DO TO DIAGNOSE OR FIX THE
PROBLEM?

Thanks!

-- 
Alicia Ellis
Postdoc
Gund Institute for Ecological Economics
University of Vermont
617 Main Street
Burlington, VT  05405
(802) 656-1046
http://www.wcs-heal.org
http://www.uvm.edu/~aellis5
<http://entomology.ucdavis.edu/faculty/scott/aellis/>



-- 
Alicia Ellis
Postdoc
Gund Institute for Ecological Economics
University of Vermont
617 Main Street
Burlington, VT  05405
(802) 656-1046
http://www.wcs-heal.org
http://www.uvm.edu/~aellis5/

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to