Le samedi 17 août 2013 à 11:16 -0700, Ajinkya Kale a écrit :
It contains all text files which were converted from doc, docx, ppt
etc. using libreoffice.
Some of them are non-english text documents.
Sorry I cannot share the corpus.. but if someone can shed light on
what might cause this
I did exactly what you mentioned... tried subset of these documents and
found out there were some junk non-txt files which were causing this issue.
Everything worked fine with dirsource once I deleted them from the dir.
But I feel these functions should also tell what file they are failing
at
Le dimanche 18 août 2013 à 09:19 -0700, Ajinkya Kale a écrit :
I did exactly what you mentioned... tried subset of these documents
and found out there were some junk non-txt files which were causing
this issue. Everything worked fine with dirsource once I deleted them
from the dir.
But I feel
Le vendredi 16 août 2013 à 19:35 -0700, Ajinkya Kale a écrit :
I am trying to use the text mining package ... I keep getting this error :
rm(list=ls())
library(tm)
sourceDir - Z:\\projectk_viz\\docs_to_index
ovid - Corpus(DirSource(sourceDir),readerControl = list(language = lat))
Error
It contains all text files which were converted from doc, docx, ppt etc.
using libreoffice.
Some of them are non-english text documents.
Sorry I cannot share the corpus.. but if someone can shed light on what
might cause this error then I can try to eliminate those documents if some
specific docs
Funny, it works fine if I use VectorSource
ovid - Corpus(VectorSource(list.files(sourceDir)[1:1253]), readerControl =
list(language = lat))
So I tried only executing DirDource(sourceDir) and that fails with the
error i mentioned earlier. So its not a problem with Corpus() which I
thought
I think I know why it works faster, cause VectorSource in above code only
takes the files names as a corpus and not the contents of the files :D duh!
Any suggestions to create a vector source out of contents of the txt files ?
On Sat, Aug 17, 2013 at 1:59 PM, Ajinkya Kale kaleajin...@gmail.com
I am trying to use the text mining package ... I keep getting this error :
rm(list=ls())
library(tm)
sourceDir - Z:\\projectk_viz\\docs_to_index
ovid - Corpus(DirSource(sourceDir),readerControl = list(language = lat))
Error in if (vectorized (length = 0)) stop(vectorized sources must
have
8 matches
Mail list logo