Re: Tesseract in Subtitle Edit

2012-05-23 Thread Taha Alasli
Hallur Guðjónsson, do you want the compaild Tesseract3.02.exe? If it is I'll send it to you. On 24 May 2012 00:19, Hallur Guðjónsson wrote: > Yes please post it here somewhere and I will try to compile it myself. > > Thank you > > Sincerely > > Hallur Orn > > > On Wednesday, May 23, 2012 8:17:

Re: Latin (Roman antiquity!) alphabet training

2012-05-23 Thread zdenko podobny
On Wed, May 23, 2012 at 11:10 PM, Falke wrote: > From what I see, there is no traineddata for the Roman latin > alphabet. Essentially, the current eng.traineddata's shortcoming is > its lack of the macron diacritic. > > Is it possible to add the macron glyphs to the already-existing > eng.traine

Re: Tesseract vs. Commercial OCR

2012-05-23 Thread nikolaykhl
I agree that Abbyy will do the job more accurate out of the box and is easier to get started with. You may also want to have a look at this article: http://www.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison On Wednesday, May 23, 2012 9:03:31 PM UTC+4, Scott Oom wrote: > > We are wo

Re: Tesseract in Subtitle Edit

2012-05-23 Thread Hallur Guðjónsson
Yes please post it here somewhere and I will try to compile it myself. Thank you Sincerely Hallur Orn On Wednesday, May 23, 2012 8:17:26 PM UTC, zdpo wrote: > > Officially 3.02 is not released, so there is not official (windows) binary > version (you should compile it by yourself)... > Anyway

Latin (Roman antiquity!) alphabet training

2012-05-23 Thread Falke
>From what I see, there is no traineddata for the Roman latin alphabet. Essentially, the current eng.traineddata's shortcoming is its lack of the macron diacritic. Is it possible to add the macron glyphs to the already-existing eng.traineddata? (the Ā, ā, Ē, ē, Ō, ō, Ū, ū) ---

Re: Tesseract in Subtitle Edit

2012-05-23 Thread TP
On Wed, May 23, 2012 at 7:19 AM, Hallur Guðjónsson wrote: > Yes I read it carefully but I understood wrong at first, is there some place > to get the 3.02 windows version of tesseract? do I have to compile it myself > (because I'm a dumbass and don't know how to do that) Now that I have written s

Re: Tesseract in Subtitle Edit

2012-05-23 Thread zdenko podobny
On Wed, May 23, 2012 at 10:20 PM, Sven Pedersen wrote: > Hei Hallur, > You can get the isl.traineddata file from subversion (SVN): > http://code.google.com/p/tesseract-ocr/source/browse/trunk/tessdata/?r=656 > > You can perhaps use that language file with the 3.01 version. no, he can not. this is

Re: Tesseract in Subtitle Edit

2012-05-23 Thread Sven Pedersen
Hei Hallur, You can get the isl.traineddata file from subversion (SVN): http://code.google.com/p/tesseract-ocr/source/browse/trunk/tessdata/?r=656 You can perhaps use that language file with the 3.01 version. You can get Microsoft's free compiler and follow the recipe on the Wiki, though it might

Re: Tesseract in Subtitle Edit

2012-05-23 Thread zdenko podobny
Officially 3.02 is not released, so there is not official (windows) binary version (you should compile it by yourself)... Anyway I can post somewhere current svn build if needed (no support and installer will be provided for this :-) ). -- Zdenko On Wed, May 23, 2012 at 4:19 PM, Hallur Guðjónsso

Re: Tesseract vs. Commercial OCR

2012-05-23 Thread Sven Pedersen
It is clear that, out of the box, Abbyy Fine Reader is more accurate. It may well be still more accurate with training, maybe due to post-processing. Many people who produce effective solutions on this list use pre- and post-processing scripts to deal with various common issues. With all that, Tess

unicharset matching upper and lower case letters

2012-05-23 Thread Nick White
Hi again, I recently added a wordlist to my training, and was disappointed to find that it didn't seem to substantially improve the results. I suspect this is in significant part due to the unicharset not recognising equivalent upper and lower case letters (and hence not matching dictionary words

Re: Tess3.01 not recognizing my curly double quotes.

2012-05-23 Thread Nick White
On Tue, May 22, 2012 at 05:21:23AM -0700, Galt wrote: > On May 21, 2:04 am, Nick White wrote: > > I've been suffering a very similar problem with some of the text I'm > > training, which has several diacritics above and below glyphs. It > > isn't infrequent to find quite a few lines of garbage whi

Tesseract vs. Commercial OCR

2012-05-23 Thread Scott Oom
We are working on automated testing tools for applications and games. We want to be able to verify various text in the UIs in different languages and have been experimenting with Tesseract OCR and having a lot of fun with it. In 2007, Ray Smith mentioned that "Tesseract is now behind the leading

Re: Tesseract in Subtitle Edit

2012-05-23 Thread Hallur Guðjónsson
Yes I read it carefully but I understood wrong at first, is there some place to get the 3.02 windows version of tesseract? do I have to compile it myself (because I'm a dumbass and don't know how to do that) Sincerely Hallur Örn On Wednesday, May 23, 2012 11:51:36 AM UTC, zdpo wrote: > > Did y

Re: Tesseract in Subtitle Edit

2012-05-23 Thread zdenko podobny
Did you read my reply carefully? See also FAQ [1] (IMO line number is not important in this case). [1] http://code.google.com/p/tesseract-ocr/wiki/FAQ#actual_tessdata_num_entries_<=_TESSDATA_NUM_ENTRIES:Error:Ass -- Zdenko On Wed, May 23, 2012 at 1:19 PM, Hallur Guðjónsson wrote: > Yeah I trie

Re: Tesseract in Subtitle Edit

2012-05-23 Thread Hallur Guðjónsson
Yeah I tried to run it through CMD to see what the error was, and it gives me this: actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert failed:in file ..\ccutil\tessdatamanager.cpp, line 48 The author of Subtitle Edit pointed to this website for acquiring new language packs, but I d

Re: Tess3.01 hocr output not working with pdfbeads

2012-05-23 Thread Galt
Thanks, Zdenko! I found most of those same links too. FYI here is Tess3.01 output: Dul fé na Gréine . . . . 3 In a nutshell, Tess 3.01 outputs this pattern for each word: Dul And judging by pdfbeads code, tess 3.00 did something like this for each word: Dul

Memory usage of Tesseract

2012-05-23 Thread Stane
Hi, I want to run Tesseract on a mobile device and therefore its important for me to use as less memory as possible. When i run Tesseract 3.01 with eng it uses about 8MB on initialisation, eng.traineddata has a size of about 3mb when i run it with japanese, it uses around 55MB with a jpn.trainedd

Re: Using Cube Engine And Right to Left Language

2012-05-23 Thread Taha Alasli
Thanx Stane, Your the best. On 22 May 2012 20:14, Stane wrote: > Well ofcause you need to give the right path as parameter, and the > outputpath must exist. > > I extracted it for you, since iam not sure with which tesseract > version you are working, here are both: > http://dl.dropbox.com/u/1028