Re: [R] ff package: reading selected columns from csv

2012-07-26 Thread Jan van der Laan
You probably have a character (which is converted to factor) or factor column with a large number of distinct values. All the levels of a factor are stored in memory in ff. Jan threshold schreef: *..plus I get the following message after reading the whole set (all 7 columns):* read.c

Re: [R] ff package: reading selected columns from csv

2012-07-26 Thread Jan van der Laan
Looking at the source code for read.table.ffdf what seems to happen is that when reading the first block of data by read.table (standard 1000 lines) the specified colClasses are used. In subsequent calls the types of the columns of the ffdf object are used as colClasses. In your case the

Re: [R] ff package: reading selected columns from csv

2012-07-26 Thread threshold
*..plus I get the following message after reading the whole set (all 7 columns):* > read.csv.ffdf(file=csvfile, header=FALSE, skip=100, first.rows=1000, > next.rows=1e7, VERBOSE=TRUE) read.table.ffdf 1..1000 (1000) csv-read=0.02sec ffdf-write=0.08sec read.table.ffdf 1001..10001000 (1000) cs

Re: [R] ff package: reading selected columns from csv

2012-07-26 Thread threshold
Dear Jan, thank you for your answer. I am basically following the code Ive been using with read.table, where x.class <- c('NULL', 'numeric','NULL','NULL','NULL', 'NULL', 'NULL') has been working fine. Reading all columns works with me but take much longer than allowed time constrains.. (460 such

Re: [R] ff package: reading selected columns from csv

2012-07-26 Thread Jan van der Laan
Having had a quick look at the source code for read.table.ffdf, I suspect that using 'NULL' in the colClasses argument is not allowed. Could you try to see if you can use read.table.ffdf with specifying the colClasses for all columns (thereby reading in all columns in the file)? If that w

[R] ff package: reading selected columns from csv

2012-07-25 Thread threshold
*Dear R users, Ive just started using the ff package. There is a csv file (~4Gb) with 7 columns and 6e+7 rows. I want to read only column from the file, skipping the first 100 rows. Below Ive provided different outcomes, which will clarify my problem * > sessionInfo() R version 2.14.2 (2012-02-29)