Re: problem with LED-fonts recognition ;(

2012-12-04 Thread Speedy
6t5Ih1IM/discussion > > -- > Zdenko > > On Tue, Dec 4, 2012 at 2:42 PM, Speedy > > wrote: > >> Why is a black background a problem? One of the advertised features of >> tesseract is that it works equally well for black-on-white and >> white-on-black te

Re: problem with LED-fonts recognition ;(

2012-12-04 Thread Speedy
Why is a black background a problem? One of the advertised features of tesseract is that it works equally well for black-on-white and white-on-black text. Marcus On Tuesday, December 4, 2012 11:11:36 AM UTC+1, zdenop wrote: > > Search forum. I remember discussion about similar topic. > AFAIR: t

Training one letter affects all others?

2012-12-04 Thread Speedy
Hi there, I have trained a new font containing upper case letters and digits. In the evaluation I found that the most frequent error were 0->O confusions (not the other way around). A total of 38 zeros were recognized as O. Looking through the training images I found a few O that were actually

Re: Does tesseract 3.02 require new training?

2012-10-01 Thread Speedy
k/doc/shapeclustering.1.html > [3] http://code.google.com/p/tesseract-ocr/issues/detail?id=770 > [4] http://code.google.com/p/tesseract-ocr/issues/detail?id=754 > > -- > Zdenko > > On Mon, Oct 1, 2012 at 11:10 AM, Speedy > > wrote: > >> Hi, >> >> I&

Re: Does tesseract 3.02 require new training?

2012-10-01 Thread Speedy
Hi, I'll try another shot: When I move from tesseract 3.01 to tesseract 3.02 should I retrain my fonts with the 3.02 training tools or does this not matter? Best regards, Marcus On Thursday, September 20, 2012 4:31:50 PM UTC+2, Speedy wrote: > Hi there, > > we are c

Effect of font_properties

2012-10-01 Thread Speedy
Hello, I am trying to figure out exactly what effect the font_properties file has. I have already performed a number of trainings with great success. However, there are a few letter confusions that dominate the error rate and which I would like to reduce. Here is the setup: There really is

Re: Problems installing leptonica 1.69 with Tesseract 3.01 on Ubuntu 10.04 LTS

2012-09-22 Thread Speedy
I had not realized that tesseract-ocr 3.02 package has made it to the Ubuntu Precise repositories. That is great news! I have recently updated my speedy-ocr package to work in Precise. I will need to do some extra testing to assure the bash script, originally written for tesseract 2.04 in

Re: Improving the 'AddOns' wiki page

2012-09-22 Thread Speedy
R for the blind, called speedy-ocr. It is part of the Vinux DVD distributions of Vinux (vinuxproject.org). It is in our Vinux repositories for Ubuntu Lucid, Maverick, Natty, Oneiric, and currently testing for Precise. The interface runs in gnome using just zenity dialogs, since the blind can

Does tesseract 3.02 require new training?

2012-09-20 Thread Speedy
Hi there, we are currently using tesseract 3.01 as OCR engine and have trained a number of fonts with it. Things work quite well, but we would like to move to version 3.02 for two reasons: - It is possible to combine fonts - The character recognition is supposed to be significantly impr

Re: multiple languages?

2012-08-16 Thread Speedy
Is it possible to get the language that matched from the result? In other words, is it possible to use tesseract to recognize the font? Is this per character, per word or per page? How much slower is recognition when multiple languages are combined? On Thursday, August 9, 2012 9:35:00 AM UTC+2,

Re: Is there a way to combine languages?

2012-03-16 Thread Speedy
 traineddata files you choose - I > shall  test and feedback to you. > cheers. > > > > On Mon, Mar 12, 2012 at 5:25 PM, Speedy wrote: > > Can you provide any information on how this works? > > At what level can languages mingle? For example, could each wod be of >

Re: Is there a way to combine languages?

2012-03-12 Thread Speedy
Can you provide any information on how this works? At what level can languages mingle? For example, could each wod be of a different language? Or is it on a sentence level or on a paragraph level? Is there a way to influence this? For example, if I know that a document is of only a single language,

Re: Version 3.02 in alpha

2012-02-03 Thread Speedy
Getting packages into Ubuntu precise would be awesome! As someone involved in putting together Vinux, a distribution of Ubuntu for the blind and visually impaired, OCR is essential. We have several utilities people have built to simplify these tasks. Is tesseract version 3.02 backward compatibl

Re: Version 3.02 in alpha

2012-02-03 Thread Speedy
Another feature that sounds very promising are the bigrams. Is this a feature that works on a word level? Does this include a probability for the first word? I.e., is position 0 a valid context for a bigram? So for example, if I wanted to recognize license plates and I know that the first one or tw

Re: Version 3.02 in alpha

2012-02-03 Thread Speedy
I'd be very interested in this as well. How does it work? I mean, if I have a font in one language and another in the other language, dies it make sure that no characters from different languages are intermingled in the same word? How about in the same line? Is there a way to influence this? Does

Letters and digits and dictionaries

2012-02-02 Thread Speedy
Hello, we are trying to recognize sequences of letters and digits with only a weak syntax. Well, we do know that the sequences start with certain typical letter pairs but after that they can come in basically any order. Here are our questions: 1. What does tesseract do when there is no dictionar

Trigger happy?

2012-02-02 Thread Speedy
Hello, we are trying to use Tesseract to recognize text in real world images. We have a good text finder and a good binarization and feed Tesseract the already binarized image, but it still happens that the binarized image contains some dirt. It seems that Tesseract is quite "trigger happy" in such

Combining fonts

2012-02-02 Thread Speedy
Hello, I would like to be able to use tesseract with only a specific set of fonts and I would like to know which font actually matched. Basically, there is only ever one font in the image but it could in principle be one of many different fonts. However, we can typically limit it to only a subset.

Re: Experimental Languages section at CustomOCR.com - Kannada

2011-10-06 Thread Speedy
ns" and not are "improvements", but a separate software product which uses Tesseract in its original form, almost without any corrections. So I don't see many chances it will go open source or publicly available. Warm regards, Dmitri Silaev www.CustomOCR.com On Mon, Sep 26,

Re: Experimental Languages section at CustomOCR.com - Kannada

2011-09-26 Thread Speedy
As a general question, would any of the improvements in code be incorporated back into the freely available tesseract Open Source software from these paid customizations? Many in our blind community have unfulfilled needs, like book scanning, including math textbooks. *Don Marang* Vinux Softw