I don't understand why one would run a 64-bit version of R on a 2GB
server, especially if one were worried about object size. You can run
32-bit versions of R on x86_64 Linux (see the R-admin manual for a
comprehensive discussion), and most other 64-bit OSes default to 32-bit
executables.
Sin
Thanks for all the comments,
The artificial dataset is as representative of my 440MB file as I could design.
I did my best to reduce the complexity of my problem to minimal
reproducible code as suggested in the posting guidelines. Having
searched the archives, I was happy to find that the topic
The examples were just artificially created data. We don't know what the
real case is but if each entry is distinct then factors won't help; however,
if they are not distinct then there is a huge potential savings. Also
if they are
really numeric, as in your example, then storing them as numeric
On Thu, 9 Aug 2007, Charles C. Berry wrote:
> On Thu, 9 Aug 2007, Michael Cassin wrote:
>
>> I really appreciate the advice and this database solution will be useful to
>> me for other problems, but in this case I need to address the specific
>> problem of scan and read.* using so much memory.
>>
I do not see how this helps Mike's case:
> res <- (as.character(1:1e6))
> object.size(res)
[1] 3624
> object.size(as.factor(res))
[1] 4224
Anyway, my point was that if two character vectors for which all.equal()
yields TRUE can differ by almost an order of magnitude in object.size(),
Try it as a factor:
> big2 <- rep(letters,length=1e6)
> object.size(big2)/1e6
[1] 4.000856
> object.size(as.factor(big2))/1e6
[1] 4.001184
> big3 <- paste(big2,big2,sep='')
> object.size(big3)/1e6
[1] 36.2
> object.size(as.factor(big3))/1e6
[1] 4.001184
On 8/9/07, Charles C. Berry <[EMAIL P
On Thu, 9 Aug 2007, Michael Cassin wrote:
> I really appreciate the advice and this database solution will be useful to
> me for other problems, but in this case I need to address the specific
> problem of scan and read.* using so much memory.
>
> Is this expected behaviour? Can the memory usage
One other idea. Don't use byrow = TRUE. Matrices are stored in column
order so that might be more efficient. You can always transpose it later.
Haven't tested it to see if it helps.
On 8/9/07, Michael Cassin <[EMAIL PROTECTED]> wrote:
>
> I really appreciate the advice and this database solutio
I really appreciate the advice and this database solution will be useful to
me for other problems, but in this case I need to address the specific
problem of scan and read.* using so much memory.
Is this expected behaviour? Can the memory usage be explained, and can it be
made more efficient? Fo
Just one other thing.
The command in my prior post reads the data into an in-memory database.
If you find that is a problem then you can read it into a disk-based
database by adding the dbname argument to the sqldf call
naming the database. The database need not exist. It will
be created by sqld
Another thing you could try would be reading it into a data base and then
from there into R.
The devel version of sqldf has this capability. That is it will use RSQLite
to read the file directly into the database without going through R at all
and then read it from there into R so its a complete
Thanks for looking, but my file has quotes. It's also 400MB, and I don't
mind waiting, but don't have 6x the memory to read it in.
On 8/9/07, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
>
> If we add quote = FALSE to the write.csv statement its twice as fast
> reading it in.
>
> On 8/9/07, Mich
If we add quote = FALSE to the write.csv statement its twice as fast
reading it in.
On 8/9/07, Michael Cassin <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I've been having similar experiences and haven't been able to
> substantially improve the efficiency using the guidance in the I/O
> Manual.
>
> Could
Hi,
I've been having similar experiences and haven't been able to
substantially improve the efficiency using the guidance in the I/O
Manual.
Could anyone advise on how to improve the following scan()? It is not
based on my real file, please assume that I do need to read in
characters, and can't
The R Data Import/Export Manual points out several ways in which you can
use read.csv more efficiently.
On Tue, 26 Jun 2007, ivo welch wrote:
> dear R experts:
>
> I am of course no R experts, but use it regularly. I thought I would
> share some experimentation with memory use. I run a linux m
dear R experts:
I am of course no R experts, but use it regularly. I thought I would
share some experimentation with memory use. I run a linux machine
with about 4GB of memory, and R 2.5.0.
upon startup, gc() reports
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 268755 14.4 4075
16 matches
Mail list logo