Tess3.01 hocr output not working with pdfbeads

2012-05-21 Thread Galt
I should begin by saying that I am grateful and happy to have a very nice searchable pdf of an old book thanks to Tess. I found this on the web: https://github.com/steelThread/mimeograph/commit/b29af3338e8f15b22392b4e313c8688d9950e13b pdfbeads currently doesn't work with hOCR output generated

Re: List of Config Paramenters

2012-05-21 Thread zdenko podobny
IMO the best way is to search for '_MEMBER' (or ' _VAR_H ' in "*.h" ???) as suggested (indirectly :-) ) in Visual Studio 2008 Developer Notes for Tesseract-OCR -> Handy free tools [1]. [1] http://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/tools.html#id2 -- Zdenko On Mon, May 21, 2012 at

Re: Using Cube Engine And Right to Left Language

2012-05-21 Thread Stane
I dont know if or how you can use cuve for training. But if you want to use cube programmatically have a look at api/baseapi.h For the wrong order problem I would suggest to unpack the ara.trainddata file with combine_tessdata -u tessdata/ara.traineddata /outputfolder/ara. there should be a ara.c

Re: List of Config Paramenters

2012-05-21 Thread Martin Roth
I found a partial list in tesseractclass.h . On Friday, 17 June 2011 11:54:05 UTC+2, Derek wrote: > > Check here: > http://code.google.com/p/

Re: Using Cube Engine And Right to Left Language

2012-05-21 Thread Not4 Any1
Stane, Thank you for replay. Can you explain how to use cube in training by command line or code, it will help alot. When i train my own tiff's file and use this trained file it give me the word in backword like ((word)) = ((drow)) with Arabic word. Thanks again. -- You received this message b

Re: List of Config Paramenters

2012-05-21 Thread Martin Roth
I found a partial list in tesseractclass.h . On Friday, 17 June 2011 11:54:05 UTC+2, Derek wrote: > > Check here: > http://code.google.com/p/

Word White and Blacklisting with RegEx

2012-05-21 Thread Martin Roth
I'm interested in implementing a word white (or black) list described by a regular expression. In my application I generally only need to detect single words with a predefined structure. Character whitelists definitely help, but I can't help but wonder if a word whitelist would be even better.

Re: Using Cube Engine And Right to Left Language

2012-05-21 Thread Stane
As far as i know there are no tool to train something for the cube engine. you just can use the ara.cube.* files in tessdata, to use the cube engine for the Arabic language. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group,

Re: what are the protos (related to mftraining); how to set the max number

2012-05-21 Thread Stane
Maybe you can change the MAX_NUM_INT_FEATURES in baseapi.h currently its set to 512. Or as you suggested you can train two to separated traineddata files and run tesseract 3.02 with it. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to

Re: Tess3.01 not recognizing my curly double quotes.

2012-05-21 Thread Nick White
Hi Galt, I've been suffering a very similar problem with some of the text I'm training, which has several diacritics above and below glyphs. It isn't infrequent to find quite a few lines of garbage which are some of the diacritics taking a line, which then causes the following and preceding lines