Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-10 Thread Prof Brian Ripley
I don't understand why one would run a 64-bit version of R on a 2GB server, especially if one were worried about object size. You can run 32-bit versions of R on x86_64 Linux (see the R-admin manual for a comprehensive discussion), and most other 64-bit OSes default to 32-bit executables. Sin

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-10 Thread Michael Cassin
Thanks for all the comments, The artificial dataset is as representative of my 440MB file as I could design. I did my best to reduce the complexity of my problem to minimal reproducible code as suggested in the posting guidelines. Having searched the archives, I was happy to find that the topic

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Gabor Grothendieck
The examples were just artificially created data. We don't know what the real case is but if each entry is distinct then factors won't help; however, if they are not distinct then there is a huge potential savings. Also if they are really numeric, as in your example, then storing them as numeric

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Prof Brian Ripley
On Thu, 9 Aug 2007, Charles C. Berry wrote: > On Thu, 9 Aug 2007, Michael Cassin wrote: > >> I really appreciate the advice and this database solution will be useful to >> me for other problems, but in this case I need to address the specific >> problem of scan and read.* using so much memory. >>

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Charles C. Berry
I do not see how this helps Mike's case: > res <- (as.character(1:1e6)) > object.size(res) [1] 3624 > object.size(as.factor(res)) [1] 4224 Anyway, my point was that if two character vectors for which all.equal() yields TRUE can differ by almost an order of magnitude in object.size(),

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Gabor Grothendieck
Try it as a factor: > big2 <- rep(letters,length=1e6) > object.size(big2)/1e6 [1] 4.000856 > object.size(as.factor(big2))/1e6 [1] 4.001184 > big3 <- paste(big2,big2,sep='') > object.size(big3)/1e6 [1] 36.2 > object.size(as.factor(big3))/1e6 [1] 4.001184 On 8/9/07, Charles C. Berry <[EMAIL P

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Charles C. Berry
On Thu, 9 Aug 2007, Michael Cassin wrote: > I really appreciate the advice and this database solution will be useful to > me for other problems, but in this case I need to address the specific > problem of scan and read.* using so much memory. > > Is this expected behaviour? Can the memory usage

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Gabor Grothendieck
One other idea. Don't use byrow = TRUE. Matrices are stored in column order so that might be more efficient. You can always transpose it later. Haven't tested it to see if it helps. On 8/9/07, Michael Cassin <[EMAIL PROTECTED]> wrote: > > I really appreciate the advice and this database solutio

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Michael Cassin
I really appreciate the advice and this database solution will be useful to me for other problems, but in this case I need to address the specific problem of scan and read.* using so much memory. Is this expected behaviour? Can the memory usage be explained, and can it be made more efficient? Fo

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Gabor Grothendieck
Just one other thing. The command in my prior post reads the data into an in-memory database. If you find that is a problem then you can read it into a disk-based database by adding the dbname argument to the sqldf call naming the database. The database need not exist. It will be created by sqld

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Gabor Grothendieck
Another thing you could try would be reading it into a data base and then from there into R. The devel version of sqldf has this capability. That is it will use RSQLite to read the file directly into the database without going through R at all and then read it from there into R so its a complete

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Michael Cassin
Thanks for looking, but my file has quotes. It's also 400MB, and I don't mind waiting, but don't have 6x the memory to read it in. On 8/9/07, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > > If we add quote = FALSE to the write.csv statement its twice as fast > reading it in. > > On 8/9/07, Mich

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Gabor Grothendieck
If we add quote = FALSE to the write.csv statement its twice as fast reading it in. On 8/9/07, Michael Cassin <[EMAIL PROTECTED]> wrote: > Hi, > > I've been having similar experiences and haven't been able to > substantially improve the efficiency using the guidance in the I/O > Manual. > > Could

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-08-09 Thread Michael Cassin
Hi, I've been having similar experiences and haven't been able to substantially improve the efficiency using the guidance in the I/O Manual. Could anyone advise on how to improve the following scan()? It is not based on my real file, please assume that I do need to read in characters, and can't

Re: [R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-06-26 Thread Prof Brian Ripley
The R Data Import/Export Manual points out several ways in which you can use read.csv more efficiently. On Tue, 26 Jun 2007, ivo welch wrote: > dear R experts: > > I am of course no R experts, but use it regularly. I thought I would > share some experimentation with memory use. I run a linux m

[R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory

2007-06-26 Thread ivo welch
dear R experts: I am of course no R experts, but use it regularly. I thought I would share some experimentation with memory use. I run a linux machine with about 4GB of memory, and R 2.5.0. upon startup, gc() reports used (Mb) gc trigger (Mb) max used (Mb) Ncells 268755 14.4 4075