Re: [R] Error in Corpus() in tm package

2013-08-18 Thread Milan Bouchet-Valat
Le samedi 17 août 2013 à 11:16 -0700, Ajinkya Kale a écrit : It contains all text files which were converted from doc, docx, ppt etc. using libreoffice. Some of them are non-english text documents. Sorry I cannot share the corpus.. but if someone can shed light on what might cause this

Re: [R] Error in Corpus() in tm package

2013-08-18 Thread Ajinkya Kale
I did exactly what you mentioned... tried subset of these documents and found out there were some junk non-txt files which were causing this issue. Everything worked fine with dirsource once I deleted them from the dir. But I feel these functions should also tell what file they are failing at

Re: [R] Error in Corpus() in tm package

2013-08-18 Thread Milan Bouchet-Valat
Le dimanche 18 août 2013 à 09:19 -0700, Ajinkya Kale a écrit : I did exactly what you mentioned... tried subset of these documents and found out there were some junk non-txt files which were causing this issue. Everything worked fine with dirsource once I deleted them from the dir. But I feel

Re: [R] Error in Corpus() in tm package

2013-08-17 Thread Milan Bouchet-Valat
Le vendredi 16 août 2013 à 19:35 -0700, Ajinkya Kale a écrit : I am trying to use the text mining package ... I keep getting this error : rm(list=ls()) library(tm) sourceDir - Z:\\projectk_viz\\docs_to_index ovid - Corpus(DirSource(sourceDir),readerControl = list(language = lat)) Error

Re: [R] Error in Corpus() in tm package

2013-08-17 Thread Ajinkya Kale
It contains all text files which were converted from doc, docx, ppt etc. using libreoffice. Some of them are non-english text documents. Sorry I cannot share the corpus.. but if someone can shed light on what might cause this error then I can try to eliminate those documents if some specific docs

Re: [R] Error in Corpus() in tm package

2013-08-17 Thread Ajinkya Kale
Funny, it works fine if I use VectorSource ovid - Corpus(VectorSource(list.files(sourceDir)[1:1253]), readerControl = list(language = lat)) So I tried only executing DirDource(sourceDir) and that fails with the error i mentioned earlier. So its not a problem with Corpus() which I thought

Re: [R] Error in Corpus() in tm package

2013-08-17 Thread Ajinkya Kale
I think I know why it works faster, cause VectorSource in above code only takes the files names as a corpus and not the contents of the files :D duh! Any suggestions to create a vector source out of contents of the txt files ? On Sat, Aug 17, 2013 at 1:59 PM, Ajinkya Kale kaleajin...@gmail.com

[R] Error in Corpus() in tm package

2013-08-16 Thread Ajinkya Kale
I am trying to use the text mining package ... I keep getting this error : rm(list=ls()) library(tm) sourceDir - Z:\\projectk_viz\\docs_to_index ovid - Corpus(DirSource(sourceDir),readerControl = list(language = lat)) Error in if (vectorized (length = 0)) stop(vectorized sources must have