On Nov 11, 2007 2:28 PM, affy snp <[EMAIL PROTECTED]> wrote:
> Hi Gabor,
>
> I replaced multiple spaces with a single one and tried
> the code you suggested. I got:
>
> > library(sqldf)
> Loading required package: RSQLite
> Loading required package: DBI
> Loading required package: gsubfn
> Loading
Hi Gabor,
I replaced multiple spaces with a single one and tried
the code you suggested. I got:
> library(sqldf)
Loading required package: RSQLite
Loading required package: DBI
Loading required package: gsubfn
Loading required package: proto
> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf
Thanks all for the help and suggestions. By specifying the colClass in
read.table()
and running it on a server with 8Gb memory, I could have the data read
in 2 mins.
I will just skip sqldf method for now and get back in a moment.
Best,
Allen
On Nov 10, 2007 2:42 AM, Prof Brian Ripley <[EMA
Did you read the Note on the help page for read.table, or the 'R Data
Import/Export Manual'? There are several hints there, some of which will
be crucial to doing this reasonably fast.
How big is your computer? That is 116 million items (you haven't told us
what type they are), so you will ne
You left out the 2nd last line. Also did you replace multiple spaces
in the input file with one space?
On Nov 10, 2007 1:26 AM, affy snp <[EMAIL PROTECTED]> wrote:
> Thanks Gabor.
>
> I made the column names look like as:
>
> probeset
> WM806SignalA
> WM806call
> WM1716SignalA
> WM1716call
>
Thanks Gabor.
I made the column names look like as:
probeset
WM806SignalA
WM806call
WM1716SignalA
WM1716call
And I then tried what you mentioned and got:
> library(sqldf)
Loading required package: gsubfn
Loading required package: proto
> source("http://sqldf.googlecode.com/svn/trunk/R/sql
On Nov 10, 2007 12:25 AM, affy snp <[EMAIL PROTECTED]> wrote:
> Hi Gabor,
>
> Thanks a lot!
>
> The header of the big file looks like as follows:
>
> probe_set
> WM_806_Signal_A
> WM_806_call
> WM_1716_Signal_A
> WM_1716_call
>
>
> I only need those columns with the header as like _Signal_A
>
If you want to read only the alternate columns that contain numerics,
then you can probably use:
scan('yourfile', what=c(rep(list(NULL), list(0)), 243), flush=TRUE,
fill=TRUE, skip=1)
On Nov 10, 2007 12:25 AM, affy snp <[EMAIL PROTECTED]> wrote:
> Hi Gabor,
>
> Thanks a lot!
>
> The header of the
BTW, sth like:
A<-read.table(file="243_47mel_withnormal_expression_log2.txt",
+header=TRUE,row.names=1,colClasses=c('factor', rep('factor',486)))
will do anything good?
Allen
On Nov 10, 2007 12:41 AM, affy snp <[EMAIL PROTECTED]> wrote:
> Yes, I am showing the first 5 columns as an example. Tha
Yes, I am showing the first 5 columns as an example. Thank you very much
for your suggestion. Let me check it out.
Allen
On Nov 10, 2007 12:39 AM, jim holtman <[EMAIL PROTECTED]> wrote:
> Your data is mixed: numeric and characters/factors. You can use
> skip=1 to skip the header line, but it loo
It sounds like the data is not all numeric; you have a 'factor' in
your read statement. It also sounds like either some of your lines
are incomplete in the number of columns since are you trying to read
in a "B" as a numeric. So if you have a character, then one way of
doing it is:
x <- scan('yo
Hi Jim,
I tired scan() first and got
> x <- scan(file="243_47mel_withnormal_expression_log2.txt", what=0)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'a real', got 'probe_set'
So I guess it requires the file be numeric. But I do have row names
Hi Jim,
Actually besides the first column which is character, half of the 486 columns
are character as well.
Thanks!
Allen
On Nov 10, 2007 12:29 AM, affy snp <[EMAIL PROTECTED]> wrote:
> Hi Jim,
>
> I tired scan() first and got
>
> > x <- scan(file="243_47mel_withnormal_expression_log2.txt", wh
On Nov 10, 2007 12:19 AM, affy snp <[EMAIL PROTECTED]> wrote:
> Thanks Jim.
>
> I tried:
>
> A<-read.table(file="243_47mel_withnormal_expression_log2.txt",
> +header=TRUE,row.names=1,colClasses=c('factor', rep('numeric',486)))
>
> by specifying colClass but it did not work.
>
> The error message I
Your data is mixed: numeric and characters/factors. You can use
skip=1 to skip the header line, but it looks like the rest is mixed.
In you example there are only 5 columns; are you just showing the
first 5 columns? if there is the pattern that you show, then you
would have a scan like:
scan('yo
Hi Gabor,
Thanks a lot!
The header of the big file looks like as follows:
probe_set
WM_806_Signal_A
WM_806_call
WM_1716_Signal_A
WM_1716_call
I only need those columns with the header as like _Signal_A
Can you suggest how to use sqldf?
Thanks!
Allen
On Nov 9, 2007 11:47 PM, Gabor Groth
Thanks Jim.
I tried:
A<-read.table(file="243_47mel_withnormal_expression_log2.txt",
+header=TRUE,row.names=1,colClasses=c('factor', rep('numeric',486)))
by specifying colClass but it did not work.
The error message I got is:
> A<-read.table(file="243_47mel_withnormal_expression_log2.txt",heade
Here is an example of reading in file of 3M numbers (11MB of text
file) on my laptop:
> system.time(x <- scan('/tempyy', what=0))
Read 300 items
user system elapsed
6.220.166.53
> str(x)
num [1:300] 1 2 3 4 5 6 7 8 9 10 ...
> gc()
used (Mb) gc trigger (Mb) max use
If they are all numeric, then read it in with:
x <- scan('yourfile', what=0) # assuming blank separators
This will create a single vector of the values. Now this comes in in
row order if that is what your data file has, so you could just add
dimensions of
dim(x) <- c(487, 238305)
rows and col
Hi Jim,
Thanks a lot! I am currently running it on my laptop but without any
success. I could upload it to a server which is with 8Gb memory
and it might be better to go from there.
Actually, I could have the whole file splitted in two parts,
one with 2nd column to 95th column, the other one with
1. You might be able to speed it up somewhat by specifying
colClasses=.
2. Another possibility is that the devel version of
the sqldf package provides an interface which simplifies reading a data file
into sqlite and from there into R. This is particularly useful if you
don't want to read it all
If they are all numeric, you can use 'scan' to read them in. With
that amount of data, you will need almost 1GB to contain the single
object. If you want to do any processing, you will probably need a
machine with at least 3-4GB of physical memory, preferrably a 64-bit
version of R. What type of
Dear list,
I need to read in a big table with 487 columns and 238,305 rows (row names
and column names are supplied). Is there a code to read in the table in
a fast way? I tried the read.table() but it seems that it takes forever :(
Thanks a lot!
Best,
Allen
23 matches
Mail list logo