Re: [R] Error in Corpus() in tm package

2013-08-18 Thread Milan Bouchet-Valat
Le dimanche 18 août 2013 à 09:19 -0700, Ajinkya Kale a écrit : > I did exactly what you mentioned... tried subset of these documents > and found out there were some junk non-txt files which were causing > this issue. Everything worked fine with dirsource once I deleted them > from the dir. > But I

Re: [R] Error in Corpus() in tm package

2013-08-18 Thread Ajinkya Kale
I did exactly what you mentioned... tried subset of these documents and found out there were some junk non-txt files which were causing this issue. Everything worked fine with dirsource once I deleted them from the dir. But I feel these functions should also tell what file they are failing at I

Re: [R] Error in Corpus() in tm package

2013-08-18 Thread Milan Bouchet-Valat
Le samedi 17 août 2013 à 11:16 -0700, Ajinkya Kale a écrit : > It contains all text files which were converted from doc, docx, ppt > etc. using libreoffice. > Some of them are non-english text documents. > > > Sorry I cannot share the corpus.. but if someone can shed light on > what might cause

Re: [R] Error in Corpus() in tm package

2013-08-17 Thread Ajinkya Kale
I think I know why it works faster, cause VectorSource in above code only takes the files names as a corpus and not the contents of the files :D duh! Any suggestions to create a vector source out of contents of the txt files ? On Sat, Aug 17, 2013 at 1:59 PM, Ajinkya Kale wrote: > Funny, it wo

Re: [R] Error in Corpus() in tm package

2013-08-17 Thread Ajinkya Kale
Funny, it works fine if I use VectorSource ovid <- Corpus(VectorSource(list.files(sourceDir)[1:1253]), readerControl = list(language = "lat")) So I tried only executing > DirDource(sourceDir) and that fails with the error i mentioned earlier. So its not a problem with Corpus() which I thought initi

Re: [R] Error in Corpus() in tm package

2013-08-17 Thread Ajinkya Kale
It contains all text files which were converted from doc, docx, ppt etc. using libreoffice. Some of them are non-english text documents. Sorry I cannot share the corpus.. but if someone can shed light on what might cause this error then I can try to eliminate those documents if some specific docs

Re: [R] Error in Corpus() in tm package

2013-08-17 Thread Milan Bouchet-Valat
Le vendredi 16 août 2013 à 19:35 -0700, Ajinkya Kale a écrit : > I am trying to use the text mining package ... I keep getting this error : > > rm(list=ls()) > library(tm) > sourceDir <- "Z:\\projectk_viz\\docs_to_index" > ovid <- Corpus(DirSource(sourceDir),readerControl = list(language = "lat"))