You probably have a character (which is converted to factor) or factor
column with a large number of distinct values. All the levels of a
factor are stored in memory in ff.
Jan
threshold schreef:
*..plus I get the following message after reading the whole set (all 7
columns):*
read.c
Looking at the source code for read.table.ffdf what seems to happen is
that when reading the first block of data by read.table (standard 1000
lines) the specified colClasses are used. In subsequent calls the
types of the columns of the ffdf object are used as colClasses. In
your case the
*..plus I get the following message after reading the whole set (all 7
columns):*
> read.csv.ffdf(file=csvfile, header=FALSE, skip=100, first.rows=1000,
> next.rows=1e7, VERBOSE=TRUE)
read.table.ffdf 1..1000 (1000) csv-read=0.02sec ffdf-write=0.08sec
read.table.ffdf 1001..10001000 (1000) cs
Dear Jan, thank you for your answer.
I am basically following the code Ive been using with read.table, where
x.class <- c('NULL', 'numeric','NULL','NULL','NULL', 'NULL', 'NULL')
has been working fine.
Reading all columns works with me but take much longer than allowed time
constrains.. (460 such
Having had a quick look at the source code for read.table.ffdf, I
suspect that using 'NULL' in the colClasses argument is not allowed.
Could you try to see if you can use read.table.ffdf with specifying
the colClasses for all columns (thereby reading in all columns in the
file)? If that w
*Dear R users, Ive just started using the ff package.
There is a csv file (~4Gb) with 7 columns and 6e+7 rows. I want to read only
column from the file, skipping the first 100 rows.
Below Ive provided different outcomes, which will clarify my problem
*
> sessionInfo()
R version 2.14.2 (2012-02-29)
6 matches
Mail list logo