Hi Matthew and Jim,
Thanks for all the suggestions as always. Matthew's post was very
informative in showing how things can be done much more efficiently
with data.table. I haven't had a chance to finish the reshaping
because my group was a in rush,
and someone else decided to do it in Perl.
Juliet,
I've been corrected off list. I did not read properly that you are on 64bit.
The calculation should be :
53860858 * 4 * 8 /1024^3 = 1.6GB
since pointers are 8 bytes on 64bit.
Also, data.table is an add-on package so I should have included :
install.packages(data.table)
What is the configuration you are running on (OS, memory, etc.)? What
does your object consist of? Is it numeric, factors, etc.? Provide a
'str' of it. If it is numeric, then the size of the object is
probably about 1.8GB. Doing the long to wide you will probably need
at least that much
Hi Jim,
Thanks for responding. Here is the info I should have included before.
I should be able to access 4 GB.
str(myData)
'data.frame': 53860857 obs. of 4 variables:
$ V1: chr 23 26 200047 200050 ...
$ V2: chr cv0001 cv0001 cv0001 cv0001 ...
$ V3: chr A A A B ...
$ V4: chr
You might want to do 'object.size' on myData to see how big it is and
then if you do try to run reshape again take a look and see if there
is any paging happening on your system which may be an indication that
you don't have enough memory. Also with 53M observations, it may take
a lot of time to
Hi Juliet,
Thanks for the info.
It is very slow because of the == in testData[testData$V2==one_ind,]
Why? Imagine someoone looks for 10 people in the phone directory. Would
they search the entire phone directory for the first person's phone number,
starting
on page 1, looking at every single
I have a data set that has 4 columns and 53860858 rows. I was able to
read this into R with:
cc - rep(character,4)
myData -
read.table(myData.csv,header=FALSE,skip=1,colClasses=cc,nrow=53860858,sep=,)
I need to reshape this data from long to wide. On a small data set the
following lines work.
7 matches
Mail list logo