6t5Ih1IM/discussion
>
> --
> Zdenko
>
> On Tue, Dec 4, 2012 at 2:42 PM, Speedy
> > wrote:
>
>> Why is a black background a problem? One of the advertised features of
>> tesseract is that it works equally well for black-on-white and
>> white-on-black te
Why is a black background a problem? One of the advertised features of
tesseract is that it works equally well for black-on-white and
white-on-black text.
Marcus
On Tuesday, December 4, 2012 11:11:36 AM UTC+1, zdenop wrote:
>
> Search forum. I remember discussion about similar topic.
> AFAIR: t
Hi there,
I have trained a new font containing upper case letters and digits. In the
evaluation I found that the most frequent error were 0->O confusions (not
the other way around). A total of 38 zeros were recognized as O. Looking
through the training images I found a few O that were actually
k/doc/shapeclustering.1.html
> [3] http://code.google.com/p/tesseract-ocr/issues/detail?id=770
> [4] http://code.google.com/p/tesseract-ocr/issues/detail?id=754
>
> --
> Zdenko
>
> On Mon, Oct 1, 2012 at 11:10 AM, Speedy
> > wrote:
>
>> Hi,
>>
>> I&
Hi,
I'll try another shot: When I move from tesseract 3.01 to tesseract 3.02
should I retrain my fonts with the 3.02 training tools or does this not
matter?
Best regards,
Marcus
On Thursday, September 20, 2012 4:31:50 PM UTC+2, Speedy wrote:
> Hi there,
>
> we are c
Hello,
I am trying to figure out exactly what effect the font_properties file has.
I have already performed a number of trainings with great success. However,
there are a few letter confusions that dominate the error rate and which I
would like to reduce.
Here is the setup: There really is
I had not realized that tesseract-ocr 3.02 package has made it to the
Ubuntu Precise repositories. That is great news!
I have recently updated my speedy-ocr package to work in Precise. I
will need to do some extra testing to assure the bash script, originally
written for tesseract 2.04 in
R for the blind, called speedy-ocr. It is part of the Vinux DVD
distributions of Vinux (vinuxproject.org). It is in our Vinux
repositories for Ubuntu Lucid, Maverick, Natty, Oneiric, and currently
testing for Precise. The interface runs in gnome using just zenity
dialogs, since the blind can
Hi there,
we are currently using tesseract 3.01 as OCR engine and have trained a
number of fonts with it. Things work quite well, but we would like to move
to version 3.02 for two reasons:
- It is possible to combine fonts
- The character recognition is supposed to be significantly impr
Is it possible to get the language that matched from the result? In other
words, is it possible to use tesseract to recognize the font? Is this per
character, per word or per page? How much slower is recognition when
multiple languages are combined?
On Thursday, August 9, 2012 9:35:00 AM UTC+2,
traineddata files you choose - I
> shall test and feedback to you.
> cheers.
>
>
>
> On Mon, Mar 12, 2012 at 5:25 PM, Speedy wrote:
> > Can you provide any information on how this works?
> > At what level can languages mingle? For example, could each wod be of
>
Can you provide any information on how this works?
At what level can languages mingle? For example, could each wod be of
a different language? Or is it on a sentence level or on a paragraph
level? Is there a way to influence this? For example, if I know that a
document is of only a single language,
Getting packages into Ubuntu precise would be awesome! As someone
involved in putting together Vinux, a distribution of Ubuntu for the
blind and visually impaired, OCR is essential. We have several
utilities people have built to simplify these tasks.
Is tesseract version 3.02 backward compatibl
Another feature that sounds very promising are the bigrams. Is this a
feature that works on a word level? Does this include a probability
for the first word? I.e., is position 0 a valid context for a bigram?
So for example, if I wanted to recognize license plates and I know
that the first one or tw
I'd be very interested in this as well. How does it work?
I mean, if I have a font in one language and another in the other
language, dies it make sure that no characters from different
languages are intermingled in the same word? How about in the same
line? Is there a way to influence this? Does
Hello,
we are trying to recognize sequences of letters and digits with only a
weak syntax. Well, we do know that the sequences start with certain
typical letter pairs but after that they can come in basically any
order.
Here are our questions:
1. What does tesseract do when there is no dictionar
Hello,
we are trying to use Tesseract to recognize text in real world images.
We have a good text finder and a good binarization and feed Tesseract
the already binarized image, but it still happens that the binarized
image contains some dirt.
It seems that Tesseract is quite "trigger happy" in such
Hello,
I would like to be able to use tesseract with only a specific set of
fonts and I would like to know which font actually matched. Basically,
there is only ever one font in the image but it could in principle be
one of many different fonts. However, we can typically limit it to
only a subset.
ns" and not are "improvements", but a
separate software product which uses Tesseract in its original form,
almost without any corrections. So I don't see many chances it will go
open source or publicly available.
Warm regards,
Dmitri Silaev
www.CustomOCR.com
On Mon, Sep 26,
As a general question, would any of the improvements in code be
incorporated back into the freely available tesseract Open Source
software from these paid customizations? Many in our blind community
have unfulfilled needs, like book scanning, including math textbooks.
*Don Marang*
Vinux Softw
20 matches
Mail list logo