Dear R Geoer,

This is more a request than a question as I already succeeded at doing what I 
wanted to do. However, I'm open to suggestion to do it better, but I really 
think is would be a nice feature to add the the as.data.frame() function in the 
raster package.

I've been using the as.data.frame() function recently to extract data from 
rasterbrick to be incorporated into different models (knn, lm, kmeans, etc.).  
My raster is usually full of NA (the border and exclusion inside) which I have 
to remove before doing my models.  The problem is that my raster is so big that 
I often cannot just do a simple :

tab <- as.data.frame(r)
tab <- tab[!is.na(tab[,1]]

as it break at the first line for lack of memory.  Therefore, I think it would 
be useful to have a new argument to as.data.frame() like "exclude.na=T" so it 
already ignore the NA while creating the data.frame.  In some situations, it 
may allow us to work with raster a little bigger.

As I said, I did manage to do that (see code below).  However my code is 
probably not as efficient as if a pro-coder would have done it and I still 
think it would be a great addition the the as.data.frame() function:

###
library(raster)

# preparing the raster
r <- raster(nrow=1000, ncol=1000, xmn=0, xmx=1000, ymn=0, ymx=1000, crs=NA)
dat <- rep(NA, 1e+06)
dat[sample(1:1e+06, 2000)] <- runif(2000,0,1)
values(r) <- dat
plot(r)

# the traditional function
tab1 <- as.data.frame(r)
object.size(tab1)

# my modified as.data.frame function
# I have to tile the territory because if I don't, I have the same problem as 
the as.data.frame() function
as.df.no.na <- function(ras, nb.tuile){
tabfin <- as.data.frame(matrix(NA, 0,nlayers(ras), 
dimnames=list(NULL,names(ras))))
coupe <- c(0,round(nrow(ras)/nb.tuile)*(1:(nb.tuile-1)),nrow(ras))
for(i in 1:nb.tuile){
  mat <- getValues(ras, coupe[i]+1,coupe[i+1]-coupe[i])
  tab <- data.frame(mat,row.names=(coupe[i]*ncol(ras)+1):(coupe[i+1]*ncol(ras)))
  tabfin <- rbind(tabfin,tab[!is.na(tab[,1]),,drop=F])
}
tabfin
}

tab2 <- as.df.no.na(ras=r, nb.tuile=10)
object.size(tab2)
###

There!  Anybody thinks it like me that it would be a great addition or knows a 
better way to do it?  Could it be implemented in the raster package? Robert?  
Thanks!

Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol.
Division des orientations et projets spéciaux
Direction des inventaires forestiers
Ministère des Ressources naturelles

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Reply via email to