Re: [R] long to wide on larger data set

2010-07-14 Thread Juliet Hannah
Hi Matthew and Jim, Thanks for all the suggestions as always. Matthew's post was very informative in showing how things can be done much more efficiently with data.table. I haven't had a chance to finish the reshaping because my group was a in rush, and someone else decided to do it in Perl.

Re: [R] long to wide on larger data set

2010-07-12 Thread Matthew Dowle
Juliet, I've been corrected off list. I did not read properly that you are on 64bit. The calculation should be : 53860858 * 4 * 8 /1024^3 = 1.6GB since pointers are 8 bytes on 64bit. Also, data.table is an add-on package so I should have included : install.packages(data.table)

Re: [R] long to wide on larger data set

2010-07-12 Thread jim holtman
What is the configuration you are running on (OS, memory, etc.)? What does your object consist of? Is it numeric, factors, etc.? Provide a 'str' of it. If it is numeric, then the size of the object is probably about 1.8GB. Doing the long to wide you will probably need at least that much

Re: [R] long to wide on larger data set

2010-07-12 Thread Juliet Hannah
Hi Jim, Thanks for responding. Here is the info I should have included before. I should be able to access 4 GB. str(myData) 'data.frame': 53860857 obs. of 4 variables: $ V1: chr 23 26 200047 200050 ... $ V2: chr cv0001 cv0001 cv0001 cv0001 ... $ V3: chr A A A B ... $ V4: chr

Re: [R] long to wide on larger data set

2010-07-12 Thread jim holtman
You might want to do 'object.size' on myData to see how big it is and then if you do try to run reshape again take a look and see if there is any paging happening on your system which may be an indication that you don't have enough memory. Also with 53M observations, it may take a lot of time to

Re: [R] long to wide on larger data set

2010-07-12 Thread Matthew Dowle
Hi Juliet, Thanks for the info. It is very slow because of the == in testData[testData$V2==one_ind,] Why? Imagine someoone looks for 10 people in the phone directory. Would they search the entire phone directory for the first person's phone number, starting on page 1, looking at every single

[R] long to wide on larger data set

2010-07-11 Thread Juliet Hannah
I have a data set that has 4 columns and 53860858 rows. I was able to read this into R with: cc - rep(character,4) myData - read.table(myData.csv,header=FALSE,skip=1,colClasses=cc,nrow=53860858,sep=,) I need to reshape this data from long to wide. On a small data set the following lines work.