from:"Maureen Kole"

Re: [tesseract-ocr] produce delimited output using hOCR or by preserving original document spacing

2014-10-13 Thread Maureen Kole

Sven, I apologize for my delayed response. I just saw your post. Thank you for your response. As I said in my post to Andrew, I am still working on this issue. I investigated the PSM mode prior to posting my question here on the forum and found this website to be useful for describing the PSM

[tesseract-ocr] Re: produce delimited output using hOCR or by preserving original document spacing

2014-10-13 Thread Maureen Kole

ion = TRUE, stopwords = TRUE)) > dtm<-removeSparseTerms(dtm,0.1) #or 0.2 > > > Also you can import text without a package like so: > > LoadMe<-readLines("out2.txt") > > #split document by spaces > wordList<-strsplit(LoadMe, "\\W+", perl=TRU