Hi Luke,
On Mon, Aug 06, 2018 at 02:12:38PM -0700, Luke Brandl wrote:
> I've been working to understand Tesseract and looking through the C and Python
> API code and documentation. It looks like some of the code and documentation
> are up to date, while the rest refers to 3.0.2 at least in the com
Hi Devon,
On Mon, Feb 22, 2016 at 10:43:33AM -0800, Devon Yoo wrote:
> I have test set that only has "uppercase English alphabets" and "numbers". But
> the provided eng.traineddata returns symbols and lower case alphabets
> sometimes. Is there a way to modify the existing traineddata file so that
So this email prompted me to try something a little crazy, but it
worked; I just built a statically linked tesseract binary :)
A long time ago I wrote some plain makefiles which didn't rely on
any automake / cmake stuff. The main devs weren't interested,
understandably, but it was useful and fu
Hi Łukasz,
> Is it possible to run tesseract without setting up
> LD_LIBRARY_PATH?
Why don't you want to just use LD_LIBRARY_PATH? I suspect, to be
honest, that it would be difficult to compile the leptonica library
into the tesseract executable. It would be fun and interesting (to
me) to tr
Hi Yizhen,
On Tue, Nov 24, 2015 at 07:08:24PM -0800, Yizhen Hai wrote:
> I am working on a volunteer project to digitize the Sutra and all related
> materials, most of them in Tibetan.
Sounds like a great project :)
> Therefore, I wonder how I can get help to use Tesseract for Tibetan. (I am new
On Sun, Nov 15, 2015 at 07:00:52PM +0100, Marco Atzeri wrote:
> On cygwin I already packaged the training utilities for 3.04.00.
> and some training data.
Ah cool, thanks Marco, sorry, I haven't kept up with everything here
and had missed your messages earlier. I'm glad you've packaged it
all up
On Sun, Nov 15, 2015 at 09:16:29PM +0530, Sriranga(83yrsold) wrote:
> Dear nick,
> kindly clarify whether "make" file will work on windows "vista" since binaries
> for windows are not available for download at present? If so how to do?
No, it won't work on Windows, and I have no plans to make it d
On Tue, Nov 10, 2015 at 08:59:19AM -0800, Ryan Baumann wrote:
> Thanks for this, Nick. I'm just getting around to looking into moving my Latin
> training into the tesstrain.sh system and this is very helpful.
Great, I was planning to do that myself with your Latin training -
let me know if you ne
tar.gz[x_8px]
> Dear Nick,
> Awaiting your valuable guidance.Kindly treat my request as SOS due to my
> overaged factor of 83+yrs old. I want to enjoy the program.
> With warmest reagards,
> sriranga(83yrs)
>
> On Tue, Nov 3, 2015 at 12:19 AM, Nick White wrote:
>
>
Hi Sriranga,
> I find there three files of '.sh - viz.
> 1) language-specific.sh. (My lang is "kan")
> 2)tesstrain.sh
> 3)tesstrain_utils.sh.
> Request for the valuable guidance how to use above .sh files ( step by step
I plan to write up some proper documentation on how to use these
scripts
Just a note, all the .git URLs listed below are git repositories,
and there isn't a web interface to them on my server, so just clone
them directly like this:
git clone http://ancientgreekocr.org/mignetools.git
Nick
On Thu, Oct 29, 2015 at 06:23:21PM +0000, Nick White wrote:
>
Hi all,
I recently finally got around to organising and releasing some
(well, a lot of) ground truth files for the language I have been
training for ages now, Ancient Greek. By "ground truth" I mean real
page scans with the corresponding (hand-typed) correct text, which
is essential to be able
Hi Alfred,
On Fri, Oct 23, 2015 at 01:11:55AM -0700, Alfred Puca wrote:
> I sent an attachment with the error using program from command line with psm
> option-4
Thanks for that. The first thing I notice is that you're using an
old version of tesseract (3.02). Can you update to the latest
vers
Hi Alfred,
On Wed, Oct 21, 2015 at 01:16:22AM -0700, Alfred Puca wrote:
> I'm having problems with psm option 4 (Assume a single column of text of
> variable sizes).
> It seems as a bug in the application.
> How is it possible to use this option?
What problems are you having? Can you give an ex
Hi Avinash,
On Wed, Oct 21, 2015 at 01:40:35PM -0700, Avinash Mishra wrote:
> I dont have VPS can anybody tell me how to install it on shared hosting
The instructions for installing without root should be what you need:
https://github.com/tesseract-ocr/tesseract/wiki/Compiling#install-elsewhere-
On Fri, Sep 11, 2015 at 12:13:02AM -0700, fsbo.cons...@gmail.com wrote:
> To anyone else who may run across this, it is because of the way C++ uses
> scope
> to optimize the code when it compiles. Things that are within the scope of the
> for loop will run faster than things that have larger scope
On Wed, Sep 16, 2015 at 10:16:40PM +0530, ShreeDevi Kumar wrote:
> If you are having trouble using it with Java, Quan maybe able to suggest a
> solution.
I agree, this sounds more like a Java issue to me. I don't know Java
at all, but if it's treating anything that sends output to stderr as
fail
On Fri, Aug 21, 2015 at 02:13:17PM +0100, Allistair wrote:
> This, I think, just illustrates there is no one-size-fits-all approach. All
> methods should be enumerated for installing Tesseract for Mac.
I disagree. Mac OS X is a homogenous enough system that we ought to
be able to do it right, onc
On Thu, Aug 20, 2015 at 03:46:32PM +0100, Allistair wrote:
> I had issues installed with Homebrew - it didn't install the dependencies very
> well like Leptonica etc. but could just have been an issue I was having.
> Conversely MacPorts worked out of the box.
Interesting. Do you remember what exac
Hi all,
I was looking at the Tesseract wiki, and it states that "The easiest
way to install Tesseract is with MacPorts." I don't think that is
true any more. MacPorts requires XCode to be manually installed
before it can be installed, which doesn't look like it's very simple
for a non-expert.
Hi Simon,
Yes, ideally it would be good if git compiled versions printed their
commit id when asked for 'tesseract -v'. I'm sure nobody would
object to a patch making that so ;-)
Nick
On Thu, Jul 23, 2015 at 06:33:41PM +0200, Simon Eigeldinger wrote:
> Hi all,
>
> at the moment when you compi
Hi all, long time since I last posted here.
This is just a little update about some training related tools I
wrote a while ago, the 'tesstrainingtools' collection. It has
largely been superceded by the training stuff that's included in
Tesseract now, but maybe someone will still find it useful.
gt;
> Any opinions on whether it's worth training for the phonetic alphabet or is it
> going to just be too difficult to recognize even with specific training?
>
> Tom
>
> On Wednesday, January 22, 2014 at 11:55:28 AM UTC-5, Nick White wrote:
>
> Hi Ep
On Fri, Aug 22, 2014 at 12:42:21PM -0700, Thomas Bruno wrote:
> Is this common when training from text2image output?
>
>
> APPLY_BOXES: boxfile line 5364/748 ((1488,893),(1532,6)): FAILURE! Couldn't
> find a matching blob
>
> FAIL!
Yes, there will be some of these. Check the proportion of faili
On Thu, Aug 21, 2014 at 11:29:09AM -0700, shree wrote:
> zdenko,
>
> the current problem also seems related to strtok_r
>
> please see
>
> http://stackoverflow.com/questions/12973750/
> fatal-error-strtok-r-h-no-such-file-or-directory-while-compiling-tesseract-oc
>
> http://sourceforge.net/p/mi
;
> same problem happen.
>
>
>
>
> On Thu, Aug 21, 2014 at 4:03 PM, Nick White wrote:
>
> Hi Dovhani,
>
> Does this happen with all images when using your training, or just
> one?
>
> Nick
>
> On Thu, Aug 21, 2014 at 03:03:47AM -07
Hi Dovhani,
Does this happen with all images when using your training, or just
one?
Nick
On Thu, Aug 21, 2014 at 03:03:47AM -0700, Dovhani Foneworx wrote:
> Hi guys, I have a problem, I have succesfully trained tesseract 3.03 in Ubunt
> 14.04 but when i run tesseract it is giving errors on an i
On Thu, Aug 21, 2014 at 01:41:23PM +0530, Shree Devi Kumar wrote:
> Hi Zdenko,
>
> ./ confusing for me :-)
:-) ./ is a common idiom for unix. '.' means 'current directory', so
./ means 'in the current directory'. You have to do it to run
programs in the current directory (or just do something
On Wed, Aug 20, 2014 at 07:39:50PM -0700, SHEN Fei wrote:
> hi Nick,
>
> I'm trying to use tesseract in my mobile phone so the tessdata size is
> critical.
> Since I only care about very few fonts, it would be convenient if I could add/
> remove a special font.
>
> Maybe removing some dictionary
Hi Chris,
On Wed, Aug 20, 2014 at 11:12:50AM -0700, Chris Smeal wrote:
> I've been doing some research on using Tesseract for both document scans and
> text in scenery, and I was wondering what image processors are best? Given I
> have a lot of images, I cannot process each batch by hand, so I wi
Hi Thomas,
On Mon, Aug 18, 2014 at 02:17:19PM -0700, Thomas Bruno wrote:
> Where can I find the box/tif combo for the eng.traineddata that Tessearct 3.02
> provides for download?
The tif/box files used to create the eng.traineddata for 3.02 are
not available, and are very unlikely to be made so
Hi Dovhani,
On Tue, Aug 19, 2014 at 04:06:26AM -0700, Dovhani Foneworx wrote:
> Hi I have a problem that when I run:
>
> set_unicharset_properties -U input_unicharset -O output_unicharset
> --script_dir
> =/home/foneworx/DM/Tesseracting/tesseract-3.03/training/langdata
>
>
>
> I get the follo
Hi Shen,
On Wed, Aug 20, 2014 at 01:10:30AM -0700, SHEN Fei wrote:
> Can I remove some fonts from an existing traineddata file?
> For example, if I only need 2 or 3 comon fonts of default eng.traineddata, is
> there a way to extract them out of the original file?
No, I'm afraid not, not at the m
On Wed, Aug 13, 2014 at 08:39:06AM -0700, Oliver Nicolini wrote:
> A little up, I can't find any doc for this topic. If anyone can help that
> would
> be fantastic.
Did you read Paul's reply? Tesseract only does binarisation. If you
don't want it to do that, binarise your image before passing it
Hi David,
You're right, that would be useful. Tesseract has a basic version of
that, called "patterns"; search the manpage for a bit of information
on them.
However at present they can't be assigned per region, only as
possible patterns for the whole OCR job. Also they aren't
restrictive, but
On Thu, Jul 24, 2014 at 05:53:56AM -0700, Victoria A. wrote:
> From my experience, seeing that Tesseract's English training data can
> recognize
> words that are NOT contained in the dictionary, I suppose Tesseract only uses
> the custom dictionary for "hints" instead of only knowing the words in
Dear Wikisourcerers,
It's good to hear from you. Wikisource is awesome, as far as I am
concerned.
> One
> of the most serious issues was raised by the Belarusian community which uses 2
> different scripts with no commercial OCR support. This means that the
> volunteers have to type each word man
Zdenko's right. To help you out more, you seem to have skipped over
this part of the instructions in the Compiling wiki page:
sudo apt-get install libicu-dev # (if you plan to make the training tools)
Nick
On Mon, Aug 11, 2014 at 07:10:47PM +0200, zdenko podobny wrote:
> It looks like you
On Tue, Aug 12, 2014 at 12:58:23PM +0530, Shree Devi Kumar wrote:
> On Tue, Aug 12, 2014 at 4:31 AM, testing1234
> wrote:
>
> Note.. Step 5 above the last command should be
>
> "sudo make install-langs"
>
> Nick, it maybe helpful to add/update instructions in wiki.
Cory just meant in
On Wed, Aug 06, 2014 at 08:50:27PM +0530, Shree Devi Kumar wrote:
> My current plan for documentation is as follows:
>
> - Rewrite and simplify TrainingTesseract3 on the wiki
> - Write manpages for each tool in training/
> - Document how each training file is used,
Hi Richard,
On Sun, Jul 20, 2014 at 01:51:32PM -0700, Richard Arnold wrote:
> Stroke Width Transform looks very interesting. However, I have some questions
> regarding its use in what I'm doing.
> I'm writing a Desktop application and OpenOCR appears to use a web service
> call??
Stroke Width Tr
Hi Fajar,
Looks like you should try binarising the image yourself prior to
handing it over to Tesseract.
Nick
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
Hi Albrecht,
Sorry for not replying sooner, I've been away.
> Nevertheless I read a post from Ray where he says that he receives
> millions of
> emails and the last thing he likes to do is writing long texts (email
> responses
> or documentations). I think this is a fatal situation, because if
Hi Rara,
On Thu, Jul 31, 2014 at 08:29:51AM -0700, Rara wrote:
> I'm searching of a detailed guide for developpement with Tesseract and a tuto
> explained how to use and test this platform with windows OS.
> Looking forward to your answer !
There is an example program using the C API here:
http
Hi Prashant,
On Wed, Aug 06, 2014 at 01:32:54AM -0700, Prashant Mahskey wrote:
>I am using tesseract for my android app with arabic language. I've
> copied all the files required from the language files download page. I've
> tried
> with gray scaling and cropping extra blank part from th
On Tue, Jul 22, 2014 at 11:48:21PM +0200, zdenko podobny wrote:
> If you want to have several version of tesseract (e.g. you want to compare OCR
> result) I would suggest you to compile them from source (e.g. in /usr/src) and
> not installed them. If you want to test particular version you can run
On Thu, Jul 17, 2014 at 12:14:43AM -0700, Jing JC wrote:
> The Ray's tutorial said the bounding box overlaps.
> so when I modify the box inside JTessbox,
> do I keep the overlapping boxes,
> or
> make the boxes non touching.
That's interesting, actually; I didn't realise Tesseract did
outlin
On Wed, Jul 16, 2014 at 11:17:00PM -0700, Jing JC wrote:
> I am going through Ray Smith's tutorial, and don't get it?
He means that as the co-ordinate system uses bottom left as the
origin, you will never get a minus number co-ordinate (as you could
if the origin was elsewhere).
--
You receiv
On Mon, Jul 14, 2014 at 11:36:46AM -0700, Paul wrote:
> Am Montag, 14. Juli 2014 10:07:59 UTC+2 schrieb sibi kanagaraj:
> But , I feel that Tamil Training is not sufficient and it
> could be
> streamlined . Hence I went to see if there are sufficient training
> documents for Tamil
Hi,
On Tue, Jul 15, 2014 at 10:04:24AM -0700, Jing JC wrote:
> yep yep.
>
> Thanks a lot Nick.
>
> I tried to cancel mu post last night.
> but seems I can not get access to it after posted but before approved.
>
> I tried to match the V2's example to V3's format.
>
> I figured it out late
Hi Mustak,
On Tue, Jul 15, 2014 at 03:14:35AM -0700, Mustak M wrote:
> I am new to tesseract. I am using tesseract 3.2. I am able to retrieve the
> text
> from an image. And able to get the co-ordinates for each word with "tesseract
> source.jpg output hocr" command. Is there any command to reti
On Mon, Jul 14, 2014 at 03:13:43PM -0700, Albrecht Hilker wrote:
> I know that this can be done with a few lines of code, but such a usefull
> class is missing in the OpenCV project. My first trials showed that this is
> not as trivial as it seems on the first look because a lot of conversio
Sorry for the noise. I've looked into this more, and discovered more
:)
On Tue, Jul 15, 2014 at 10:54:06AM -0400, Nick White wrote:
> On Mon, Jul 14, 2014 at 01:10:07PM -0700, Albrecht Hilker wrote:
> > When I download the traineddata files and extract the unicharset file from
Hi again,
On Mon, Jul 14, 2014 at 09:38:26AM -0700, Albrecht Hilker wrote:
> After some days I came back here and I'm very surprised about your lots of
> posts.
> Thanks for answering and taking the time.
As you may have noticed, there aren't too many people around here
who are comfortable look
Hi Albrecht,
On Mon, Jul 14, 2014 at 01:10:07PM -0700, Albrecht Hilker wrote:
> When I download the traineddata files and extract the unicharset file from
> them
> I notice that some are extremely different from the ones on SVN in the folder
> training/langdata.
>
> For example:
> Bengali, Hebr
Hi,
The part you aren't reading closely enough from the manual page is:
properties
An integer mask of character properties, one per bit. From least
to most significant bit, these are: isalpha, islower, isupper,
isdigit, ispunctuation.
So ; has ispunctuation set, but none of the others,
On Mon, Jul 14, 2014 at 07:38:19AM -0700, Christopher Smeenk wrote:
> I found the source for v3.03 here: http://packages.ubuntu.com/trusty/
> tesseract-ocr
The version called "3.03" in Ubuntu is an -rc - there is no official
3.03 release yet. As I understand it Ray & Jeff called it 3.03 so
that
On Sun, Jul 13, 2014 at 06:38:11PM +0430, universal reseller wrote:
> is google drive use tesseract 3.03 ?
It's -rc1, meaning release candidate 1. So it isn't an official
release, but rather a "testing preview" release, which should be to
what the final 3.03 will be.
> i checked one english pd
> I build the tesseract svn source code in win8, I used the
> VS2013/Cygwin/MinGW to build this, all failed.
Hi, you need to give us more clues as to why it failed. What error
messages did you get?
> what version of leptonica the newest svn use? 1.70 or 1.71?
Tesseract should work fine with e
On Fri, Jul 11, 2014 at 04:22:41PM -0700, Jing JC wrote:
> google's tesseract download page listed up 3.02 only.
>
> I need to compile tesseract on CentOs5.6
> where is the download link for tesseract 3.03
>
> or not available yet.
It isn't available yet. There is a -rc1 version that is availa
On Fri, Jul 11, 2014 at 03:06:29PM -0700, Alex Ryan wrote:
> I wrote some simple code to preprocess the image because I realized I will be
> doing basically the same image every time so its foolish to try and use
> Tesseracts binaziration technique which was designed for a different and more
> gen
0
@
p
a
r
m
F
u
s
B
»
f
d
c
h
C
t
L
?
T
M
y
R
l
~
<
®
N
b
k
[
«
1
,
.
”
g
H
$
(
+
D
w
V
£
4
9
Q
&
A
P
¢
]
3
2
©
8
/
>
X
é
j
;
7
€
O
¥
U
x
}
E
§
=
!
’
G
)
Z
q
{
“
—
Y
K
*
W
"
\
°
fi
‘
_
fl
/*
* Copyright 2014 Nick White
*
* Licensed under the Apache License, Version 2.0 (the
On Tue, Jul 08, 2014 at 10:36:50PM -0700, Alex Ryan wrote:
> In one of the links tho I saw something about -psm setting. When I run the OCR
> with -psm 6 all of a sudden it worked perfect!!! Im really not sure what that
> setting does, ive tried doing some searches, but im still unclear. Can you
I have more thoughts to the unicharset metrics discussion.
> So this example says that
> the character "1" has a min_bottom value of 59 and
> the character "9" has a min_bottom value of 18.
>
> Weird ? ? ?
> Both numbers are aligned to the baseline!
I am guessing now (I'll take a look at the cod
On Sat, Jul 05, 2014 at 03:34:05PM -0700, Albrecht Hilker wrote:
> Hello zdenop
>
> It is clear that you are not the right person to answer this question.
> If YOU would ever have looked into the source code you have seen that these
> values ARE in use (in version 3.03).
You're being pretty unfai
I'm just going to go through your numbered points here.
On Fri, Jul 04, 2014 at 10:02:43AM -0700, Albrecht Hilker wrote:
> 1.)
> The column "other_case" should contain the ID of the other-case letter.
> For the lowercase letters they point correctly to the uppercase letters.
> But the uppercase le
Hi,
I haven't tried it, but quickly grepping around the source code
suggests setting the config variable "crunch_include_numerals" to
true might do the job.
Please let us know if that works.
Nick
On Wed, Jul 09, 2014 at 11:15:10PM -0700, Damien D wrote:
> Hi everyone,
>
> tesseract seems to
Hi Alex,
One quick thought, if you're still using .uzn, it's only loaded with
certain psm levels (it is with -psm 6, but not -psm 3, the default).
And it's loaded from .uzn. So if you
have any .uzn files lying around, they will be being applied with
psm 6, but not if you don't explicitly stat
Hi Albrecht,
On Thu, Jul 03, 2014 at 09:40:51PM -0700, Albrecht Hilker wrote:
> Generally it is very sad that there is no detailed documentation about
> Tesseract.
I agree. I do work on the documentation, but there is an awful lot
missing. I appreciate you taking the time to ask questions here
On Tue, Jul 08, 2014 at 10:49:49PM -0700, shree wrote:
> My information IS dated - I haven't followed the recent changes. Please see
> this thread - almost a year old which talked of the upcoming changes for
> training
>
> https://groups.google.com/forum/#!searchin/tesseract-dev/fonts/tesse
On Wed, Jul 09, 2014 at 09:50:01AM -0700, Rani Yaroshinski wrote:
> In order to improve the accuracy of the OCR results ?
Yes, it is, if you know more details about the images you'll be
using, so can do better than Tesseract's guesses.
See https://code.google.com/p/tesseract-ocr/wiki/ImproveQual
On Wed, Jul 09, 2014 at 09:48:20AM -0700, Rani Yaroshinski wrote:
> From the point of view of the performance measures of the OCR ?
I don't think anybody has figures on this. You could do some tests
yourself, and let us know the results.
I would guess that file size would be a bigger slowdown th
On Wed, Jul 09, 2014 at 03:16:08AM -0700, Paul wrote:
> How about using ImageJ (can be automated with macros) to create a better
> binary
> result of the image.
Thanks for mentioning this; I hadn't heard of it and it sounds very
useful. I added a link to the ImproveQuality wiki page.
Nick
--
Hi Alex,
If you're up for some programming, you could recognise the squares
yourself, and pass each one separately to tesseract with the
PSM_SINGLE_CHAR segmentation type. That should help if Tesseract is
not segmenting each whole square separately.
If the board is always the same size, you co
On Fri, Jul 04, 2014 at 02:15:52AM -0700, Iskander Sharipov wrote:
> I need to create new tessdata language, which is very similar to russian in
> charset.
> Every time I try to do so by training tesseract on a box containing needed
> letters I get new traineddata,
> which actually can recognize ne
On Fri, Jul 04, 2014 at 02:08:46AM -0700, Meenal Goyal wrote:
> If you're sure that all the words you will encounter will be in the
> dictionary this should help somewhat:
> https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_
> increase_the_trust_in/strength_of_the_dictionary?
Hi Elena,
Just a guess, but maybe this line:
> api -> SetSourceResolution(600);
is the source of your troubles? Tesseract from the command line
would have just been guessing it, and perhaps its guess, coupled
with its ideas about different sizes of fonts, were better than
yours?
Nick
Hi Artur,
On Wed, Jul 02, 2014 at 10:18:55PM -0300, Artur Augusto wrote:
> As many people ask about how to use tesseract to read 7 segments display, I
> decided to publish an open source sample project.
>
> If someone wanna check it: https://github.com/arturahttps://github.com/
> arturaugusto/di
On Wed, Jul 02, 2014 at 10:26:16PM -0700, Meenal Goyal wrote:
> The post about "question about training tesseract" only suggests some
> pre-processing steps which include binarisation and I have already tried
> them.
> I wanted to know if anything can be done to improve output at later stage,
>
n such
> cases? Any feedback mechanism which can help improve?
>
> On Tuesday, July 1, 2014 8:52:35 PM UTC+5:30, Nick White wrote:
>
> Hi Meenal,
>
> On Tue, Jul 01, 2014 at 02:04:36AM -0700, Meenal Goyal wrote:
> > When I try to ocr an image, it also
Hi Meena,
On Tue, Jul 01, 2014 at 02:04:36AM -0700, Meenal Goyal wrote:
> When I try to ocr an image, it also produces some noise apart from the
> meaningful words. An example output for an image is:
>
> All women become
>
> like their’ mqthers. _ ' 1"’ '
>
> - —T at-{rs their tragedy. ” "R"-‘
Hi,
On Mon, Jun 30, 2014 at 09:25:23PM -0700, 韩煦深 wrote:
> I'm a Chinese student and I want to use the tesseract-ocr in our linux system.
> I have Ubuntu OS and I install tesseract in my ubuntu system.
> But I don't know how to use C++ API in linux system because all the examples
> are based on V
On Mon, Jun 30, 2014 at 10:42:41PM -0700, nirali kanani wrote:
> is there Tesseract - ocr v 3.03 exe available anywhere ?
Tesseract v3.03 hasn't been released yet (except as a pre-release
version in the latest ubuntu). The code is unlikely to change a lot
from what's currently in SVN, so you co
Hi Scott,
On Fri, Jun 27, 2014 at 09:39:21PM -0700, scott.ha...@gmail.com wrote:
> Hi all. Firstly let me say I am totally blown away by Tesseract, it vastly
> exceeded my expectations for an open source OCR project. I have an
> application
> (http://hackaday.io/project/1569-NSA-Away) that invo
Hi Meenal,
On Mon, Jun 30, 2014 at 01:40:10AM -0700, Meenal Goyal wrote:
> When i run tesseract on my image, it produces some words not present in the
> dictionary. Is there some way to directly get the list of these words and
> prevent tesseract from showing them in the output.
> Example of such
On Fri, Jun 27, 2014 at 04:57:30PM -0400, Nick White wrote:
> On Mon, Jun 23, 2014 at 10:11:28AM -0700, Paulo Basilio wrote:
> > Good day, I am trying to develop a mobile app that can read cursive
> > handwriting
> > (doctor's handwriting to be exact). My question
Hi Raghavan,
On Tue, Jun 24, 2014 at 06:58:56AM -0700, Raghavan P wrote:
> When i try to make use of tesseract classes like BLOCK_IT and BLOCK_LINE_IT, I
> am getting the error "it was not declared in this scope".
> May i know what header should i bring in or what am i missing here?
Are you using
Hi Paulo,
On Mon, Jun 23, 2014 at 10:11:28AM -0700, Paulo Basilio wrote:
> Good day, I am trying to develop a mobile app that can read cursive
> handwriting
> (doctor's handwriting to be exact). My question is, can tesseract-ocr read
> cursive handwriting? If not, can someone give me suggestion f
Hi Sheeyam, sorry for not replying to your emails sooner.
On Sun, Jun 22, 2014 at 04:43:27AM -0700, sheeyam shellvacumar wrote:
> Does Tesseract support sinhala. How do u guys train them ??? Actually i am
> confused help me
It looks like some people have trained Tesseract for Sinhala; see
http:
Hi Mori,
On Fri, Jun 27, 2014 at 01:51:01AM -0700, morteza neishaboori wrote:
> I want to use OCR to detect small words in images containing indoor signs and
> etc
> you can find some sample images in the link below to get the idea
> https://drive.google.com/folderview?id=0B3dLM0w0EeD-RFZVc1NjaGN
On Fri, Jun 27, 2014 at 01:48:52AM -0700, thinker wrote:
> reading image with multiple language (arabic and english) by using -l
> ara+eng option gives garbage output.
There are currently a couple of bugs with combining Arabic and
English together, so it isn't working. I'd recommend you add an
On Mon, Jun 23, 2014 at 08:32:52AM -0700, Traun Leyden wrote:
> One more thing that document should have is a mention of Stroke Width
> Transform
> to improve OCR recognition on images that have a lot of non-text content.
Oh cool, that looks great! I definitely will add that to the wiki
page so
Hi Jack,
I replied privately, but the gist is that VietOCR is a graphical
program that makes Tesseract easier to use on a Mac (as well as
Linux & Windows).
Nick
On Thu, Jun 26, 2014 at 08:55:19AM -0700, Jack Kershaw wrote:
> I am an ancient greek student currently studying A levels. I have bee
Hi Eddie,
I'd suggest contacting the author of the PHP wrapper, that isn't
something provided by the core Tesseract project, and it doesn't
look like any issue with Tesseract proper, just with the caller.
Nick
On Wed, Jun 25, 2014 at 12:36:59AM -0700, Eddie G wrote:
> I'm using the PHP wrapper
Hi Amar,
If you can wait for the release of Tesseract 3.03 (or compile the
latest version from SVN), that has PDF output built in.
Nick
On Mon, Jun 23, 2014 at 12:19:52AM -0700, Amar wrote:
> Hello dear friends, Is HOCR2PDF command line tool limited only to non-windows
> platforms? I could not
Hi Traun,
> Any tips on doing pre-processing on the images to improve the
> recognition?
The place to start would be here:
https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
Nick
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To
On Wed, Jun 18, 2014 at 07:30:03AM -0700, Paul wrote:
> That upper bound actually might be the root of your problem. If you've already
> compiled Tesseract on your own,
> try to use a greater number for kMaxUserDawgEdges. If you have not, you could
> either reduce the number of
> words in your dic
Hi Ketut,
On Tue, Jun 10, 2014 at 11:30:39PM -0700, ketut ariasa wrote:
> I have a very limited OCR application using tesseract, where I want to
> recognize only 8 letters and numbers begin with the letter 'D'.
> Is there a way to restrict tesseract to attempt to
> recognize only 8 digits letter
t http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/20140605164046.GB5444%40manta.lan.
For more options, visit https://groups.google.com/d/optout.
/* Copyright 2014 Nick White
*
* Licensed under the Apache Li
On Thu, Jun 05, 2014 at 01:51:24PM +0200, zdenko podobny wrote:
> On Thu, Jun 5, 2014 at 12:10 PM, 'thakobyan' via tesseract-ocr
> tesseract-ocr@googlegroups.com> wrote:
>>
>> Trying to OCR the portion of the image. For some reason if I
>> cut only one word (see Fail.png and Fail2.png att
1 - 100 of 498 matches
Mail list logo