Le dimanche 18 août 2013 à 09:19 -0700, Ajinkya Kale a écrit :
> I did exactly what you mentioned... tried subset of these documents
> and found out there were some junk non-txt files which were causing
> this issue. Everything worked fine with dirsource once I deleted them
> from the dir.
> But I
I did exactly what you mentioned... tried subset of these documents and
found out there were some junk non-txt files which were causing this issue.
Everything worked fine with dirsource once I deleted them from the dir.
But I feel these functions should also tell what file they are failing
at I
Le samedi 17 août 2013 à 11:16 -0700, Ajinkya Kale a écrit :
> It contains all text files which were converted from doc, docx, ppt
> etc. using libreoffice.
> Some of them are non-english text documents.
>
>
> Sorry I cannot share the corpus.. but if someone can shed light on
> what might cause
I think I know why it works faster, cause VectorSource in above code only
takes the files names as a corpus and not the contents of the files :D duh!
Any suggestions to create a vector source out of contents of the txt files ?
On Sat, Aug 17, 2013 at 1:59 PM, Ajinkya Kale wrote:
> Funny, it wo
Funny, it works fine if I use VectorSource
ovid <- Corpus(VectorSource(list.files(sourceDir)[1:1253]), readerControl =
list(language = "lat"))
So I tried only executing > DirDource(sourceDir) and that fails with the
error i mentioned earlier. So its not a problem with Corpus() which I
thought initi
It contains all text files which were converted from doc, docx, ppt etc.
using libreoffice.
Some of them are non-english text documents.
Sorry I cannot share the corpus.. but if someone can shed light on what
might cause this error then I can try to eliminate those documents if some
specific docs
Le vendredi 16 août 2013 à 19:35 -0700, Ajinkya Kale a écrit :
> I am trying to use the text mining package ... I keep getting this error :
>
> rm(list=ls())
> library(tm)
> sourceDir <- "Z:\\projectk_viz\\docs_to_index"
> ovid <- Corpus(DirSource(sourceDir),readerControl = list(language = "lat"))
7 matches
Mail list logo