-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 25/09/12 01:29, mcelis wrote:
> I am working with some large text files (up to 16 GBytes). I am interested
> in extracting the
> words and counting each time each word appears in the text. I have written a
> very simple R
> program by following
.txt,sep="\t")
cat("Word\tFREQ",words.txt,file="frequencies",sep="\n")
})
#Read 4 items
# user system elapsed
# 0.148 0.000 0.150
There is improvement in the speed. Output also looked similar. This code may
be still improved.
A.K.
Le lundi 24 septembre 2012 à 16:29 -0700, mcelis a écrit :
> I am working with some large text files (up to 16 GBytes). I am interested
> in extracting the words and counting each time each word appears in the
> text. I have written a very simple R program by following some suggestions
> and examp
elapsed
> # 0.036 0.008 0.043
> A.K.
Well, dear A.K., your definition of "word" is really different,
and in my view clearly much too simplistic, compared to what the
OP (= original-poster) asked from.
E.g., from the above paragraph, your method will get words such as
&qu
ower(txt1),"\\s"))),decreasing=TRUE)
words.txt<-paste(names(words.txt),words.txt,sep="\t")
cat("Word\tFREQ",words.txt,file="frequencies",sep="\n")
})
#Read 4 items
#user system elapsed
# 0.036 0.008 0.043
A.K.
- Original M
;)),decreasing=TRUE)
words.txt<-paste(names(words.txt),words.txt,sep="\t")
cat("Word\tFREQ",words.txt,file="frequencies",sep="\n")
})
# user system elapsed
# 0.016 0.000 0.014
A.K.
- Original Message -
From: mcelis
To: r-hel
I am working with some large text files (up to 16 GBytes). I am interested
in extracting the words and counting each time each word appears in the
text. I have written a very simple R program by following some suggestions
and examples I found online.
If my input file is 1 GByte, I see that R us
7 matches
Mail list logo