Hello, I have a lot of data and it has a lot of NaN values. I want to compress the data so I don't have memory issues later.
Using the Matrix package, sparseMatrix function, and some fiddling around, I have successfully reduced the 'size' of my data (as measured by object.size()). However, NaN values are found all over in my data and zeros are important, but zeros are found very infrequently in my data. So I turn NaN's into zeros and zeros into very small numbers. I don't like changing the zeros into small numbers, because that is not the truth. I know this is a judgement call on my part based on the impact non-zero zeros will have on my analysis. My question is: Do I have any other option? Is there a better solution for this issue? Here is a small example: # make sample data M <- Matrix(10 + 1:28, 4, 7) M2 <- cBind(-1, M) M2[, c(2,4:6)] <- 0 M2[1:2,2] <- M2[c(3,4),]<- M2[,c(3,4,5)]<- NaN M3 = M2 # my 'fiddling' to make sparseMatrix save space M3[M3==0] = 1e-08 # turn zeros into small values M3[is.nan(M3)] = 0 # turn NaN's into zeros # saving space sM <- as(M3, "sparseMatrix") #Note that this is just a sample of what I am doing. This reduces the object.size() if you have a lot more data. In this simple example it actually increases the object.size() because the data is so small. What I know about Matrix: http://cran.r-project.org/web/packages/Matrix/vignettes/Intro2Matrix.pdf Thanks, Ben [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.