Re: [R] How to more efficently read in a big matrix

2007-11-11 Thread Gabor Grothendieck
On Nov 11, 2007 2:28 PM, affy snp <[EMAIL PROTECTED]> wrote: > Hi Gabor, > > I replaced multiple spaces with a single one and tried > the code you suggested. I got: > > > library(sqldf) > Loading required package: RSQLite > Loading required package: DBI > Loading required package: gsubfn > Loading

Re: [R] How to more efficently read in a big matrix

2007-11-11 Thread affy snp
Hi Gabor, I replaced multiple spaces with a single one and tried the code you suggested. I got: > library(sqldf) Loading required package: RSQLite Loading required package: DBI Loading required package: gsubfn Loading required package: proto > source("http://sqldf.googlecode.com/svn/trunk/R/sqldf

Re: [R] How to more efficently read in a big matrix

2007-11-10 Thread affy snp
Thanks all for the help and suggestions. By specifying the colClass in read.table() and running it on a server with 8Gb memory, I could have the data read in 2 mins. I will just skip sqldf method for now and get back in a moment. Best, Allen On Nov 10, 2007 2:42 AM, Prof Brian Ripley <[EMA

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread Prof Brian Ripley
Did you read the Note on the help page for read.table, or the 'R Data Import/Export Manual'? There are several hints there, some of which will be crucial to doing this reasonably fast. How big is your computer? That is 116 million items (you haven't told us what type they are), so you will ne

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread Gabor Grothendieck
You left out the 2nd last line. Also did you replace multiple spaces in the input file with one space? On Nov 10, 2007 1:26 AM, affy snp <[EMAIL PROTECTED]> wrote: > Thanks Gabor. > > I made the column names look like as: > > probeset > WM806SignalA > WM806call > WM1716SignalA > WM1716call >

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread affy snp
Thanks Gabor. I made the column names look like as: probeset WM806SignalA WM806call WM1716SignalA WM1716call And I then tried what you mentioned and got: > library(sqldf) Loading required package: gsubfn Loading required package: proto > source("http://sqldf.googlecode.com/svn/trunk/R/sql

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread Gabor Grothendieck
On Nov 10, 2007 12:25 AM, affy snp <[EMAIL PROTECTED]> wrote: > Hi Gabor, > > Thanks a lot! > > The header of the big file looks like as follows: > > probe_set > WM_806_Signal_A > WM_806_call > WM_1716_Signal_A > WM_1716_call > > > I only need those columns with the header as like _Signal_A >

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread jim holtman
If you want to read only the alternate columns that contain numerics, then you can probably use: scan('yourfile', what=c(rep(list(NULL), list(0)), 243), flush=TRUE, fill=TRUE, skip=1) On Nov 10, 2007 12:25 AM, affy snp <[EMAIL PROTECTED]> wrote: > Hi Gabor, > > Thanks a lot! > > The header of the

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread affy snp
BTW, sth like: A<-read.table(file="243_47mel_withnormal_expression_log2.txt", +header=TRUE,row.names=1,colClasses=c('factor', rep('factor',486))) will do anything good? Allen On Nov 10, 2007 12:41 AM, affy snp <[EMAIL PROTECTED]> wrote: > Yes, I am showing the first 5 columns as an example. Tha

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread affy snp
Yes, I am showing the first 5 columns as an example. Thank you very much for your suggestion. Let me check it out. Allen On Nov 10, 2007 12:39 AM, jim holtman <[EMAIL PROTECTED]> wrote: > Your data is mixed: numeric and characters/factors. You can use > skip=1 to skip the header line, but it loo

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread jim holtman
It sounds like the data is not all numeric; you have a 'factor' in your read statement. It also sounds like either some of your lines are incomplete in the number of columns since are you trying to read in a "B" as a numeric. So if you have a character, then one way of doing it is: x <- scan('yo

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread affy snp
Hi Jim, I tired scan() first and got > x <- scan(file="243_47mel_withnormal_expression_log2.txt", what=0) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got 'probe_set' So I guess it requires the file be numeric. But I do have row names

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread affy snp
Hi Jim, Actually besides the first column which is character, half of the 486 columns are character as well. Thanks! Allen On Nov 10, 2007 12:29 AM, affy snp <[EMAIL PROTECTED]> wrote: > Hi Jim, > > I tired scan() first and got > > > x <- scan(file="243_47mel_withnormal_expression_log2.txt", wh

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread Gabor Grothendieck
On Nov 10, 2007 12:19 AM, affy snp <[EMAIL PROTECTED]> wrote: > Thanks Jim. > > I tried: > > A<-read.table(file="243_47mel_withnormal_expression_log2.txt", > +header=TRUE,row.names=1,colClasses=c('factor', rep('numeric',486))) > > by specifying colClass but it did not work. > > The error message I

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread jim holtman
Your data is mixed: numeric and characters/factors. You can use skip=1 to skip the header line, but it looks like the rest is mixed. In you example there are only 5 columns; are you just showing the first 5 columns? if there is the pattern that you show, then you would have a scan like: scan('yo

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread affy snp
Hi Gabor, Thanks a lot! The header of the big file looks like as follows: probe_set WM_806_Signal_A WM_806_call WM_1716_Signal_A WM_1716_call I only need those columns with the header as like _Signal_A Can you suggest how to use sqldf? Thanks! Allen On Nov 9, 2007 11:47 PM, Gabor Groth

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread affy snp
Thanks Jim. I tried: A<-read.table(file="243_47mel_withnormal_expression_log2.txt", +header=TRUE,row.names=1,colClasses=c('factor', rep('numeric',486))) by specifying colClass but it did not work. The error message I got is: > A<-read.table(file="243_47mel_withnormal_expression_log2.txt",heade

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread jim holtman
Here is an example of reading in file of 3M numbers (11MB of text file) on my laptop: > system.time(x <- scan('/tempyy', what=0)) Read 300 items user system elapsed 6.220.166.53 > str(x) num [1:300] 1 2 3 4 5 6 7 8 9 10 ... > gc() used (Mb) gc trigger (Mb) max use

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread jim holtman
If they are all numeric, then read it in with: x <- scan('yourfile', what=0) # assuming blank separators This will create a single vector of the values. Now this comes in in row order if that is what your data file has, so you could just add dimensions of dim(x) <- c(487, 238305) rows and col

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread affy snp
Hi Jim, Thanks a lot! I am currently running it on my laptop but without any success. I could upload it to a server which is with 8Gb memory and it might be better to go from there. Actually, I could have the whole file splitted in two parts, one with 2nd column to 95th column, the other one with

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread Gabor Grothendieck
1. You might be able to speed it up somewhat by specifying colClasses=. 2. Another possibility is that the devel version of the sqldf package provides an interface which simplifies reading a data file into sqlite and from there into R. This is particularly useful if you don't want to read it all

Re: [R] How to more efficently read in a big matrix

2007-11-09 Thread jim holtman
If they are all numeric, you can use 'scan' to read them in. With that amount of data, you will need almost 1GB to contain the single object. If you want to do any processing, you will probably need a machine with at least 3-4GB of physical memory, preferrably a 64-bit version of R. What type of

[R] How to more efficently read in a big matrix

2007-11-09 Thread affy snp
Dear list, I need to read in a big table with 487 columns and 238,305 rows (row names and column names are supplied). Is there a code to read in the table in a fast way? I tried the read.table() but it seems that it takes forever :( Thanks a lot! Best, Allen