Hi R-help folks,
I have been doing some single SNP association work using snpMatrix. This works 
well, but produces a lot of false positives, because of population structure in 
my data. I would like to correct the p-values (which snpMatrix gives me) for 
population structure, possibly using principle component analysis (PCA). 

My data is complicated, so here's a simple example of what I'd like to do:
    
# 3x8 matrix of example snp data, 1 = allele A, -1 = allele B, 0 = hetero       
                            
snp.data = matrix(                               
        c(0,1,0,-1,-1,1,1,-1,0,1,1,0,0,1,0,-1,-1,NA,0,-1,0,0,1,0),  
        nrow=3,  
        dimnames = list(   
                 c("bob", "frita", "trudy"),     
                 c("snp1", "snp2", "snp3", "snp4", "snp5","snp6", "snp7", 
"snp8") 
                 )   
         )

# phenotype data - resistant or susceptible to zombie infection
phenotype.data = matrix( 
        c("bob", "frita", "trudy", 
        "resistant", "susceptible", "resistant"),
        nrow=3,   
        dimnames = list(                                           
                 c("bob", "frita", "trudy"),   
                 c("rowNames", "cc")   
                 )      
         )

library("snpMatrix") 
# add one in the following line so genotypes are 0,1 or 2, so 
single.snp.tests() 
method doesn't complain
snp.matrix <- as(snp.data + 1,'snp.matrix')   
single.snp.assoc <- single.snp.tests(cc, data=as.data.frame(phenotype.data), 
snp.data=snp.matrix )

Okay, so now I have p-values for the association between SNPs and resistance to 
zombie infection.


I do this for PCA:
snp.data = replace( snp.data, is.na(snp.data), 0) # workaround, b/c I can't get 
prcomp to ignore NAs no matter what I do
pca = prcomp(snp.data)

So, the question is, can I use the PCA data to correct my p-values (in 
single.snp.association) for population structure? In my real data, population 
structure is causing a lot of type I errors (false positive SNPs). 

I have read of some standalone software that does this sort of thing, for 
example EIGENSTRAT:
http://www.biostat.jhsph.edu/~iruczins/teaching/misc/gwas/papers/price2006.pdf
But I'd like to stick to R if possible. 

Any advice/comments welcome.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to