Re: Tesseract Reading Issue

2010-07-21 Thread Jimmy O'Regan
On 21 July 2010 09:50, rthomas wrote: > > > On Jul 21, 10:23 am, rthomas wrote: >> > Nobody has mentioned any plans to write a .net wrapper for Tesseract >> > 3, and the developer of tessnet2 has mentioned that he would rather >> > pay for someone to reimplement Tesseract than touch it again, so

Re: Tesseract Reading Issue

2010-07-21 Thread Jimmy O'Regan
On 21 July 2010 09:23, rthomas wrote: >> >> Nobody has mentioned any plans to write a .net wrapper for Tesseract >> 3, and the developer of tessnet2 has mentioned that he would rather >> pay for someone to reimplement Tesseract than touch it again, so I >> wouldn't hold my breath, if I were you. >

Re: Tesseract Reading Issue

2010-07-21 Thread rthomas
On Jul 21, 10:23 am, rthomas wrote: > > Nobody has mentioned any plans to write a .net wrapper for Tesseract > > 3, and the developer of tessnet2 has mentioned that he would rather > > pay for someone to reimplement Tesseract than touch it again, so I > > wouldn't hold my breath, if I were you.

Re: Tesseract Reading Issue

2010-07-21 Thread rthomas
> > Nobody has mentioned any plans to write a .net wrapper for Tesseract > 3, and the developer of tessnet2 has mentioned that he would rather > pay for someone to reimplement Tesseract than touch it again, so I > wouldn't hold my breath, if I were you. > Yes, but the main reason is because I had

Re: Tesseract Reading Issue

2010-07-20 Thread patrickq
As I said, we just need Jimmy to find 4-5 hours of his free time to knock this one out :-)! On Jul 20, 11:01 am, Taxman wrote: > "This bad problem is just about fixing Tesseract to accept the reality > that not all text have the same height for all letters because not > everything is a book." > >

Re: Tesseract Reading Issue

2010-07-20 Thread Taxman
"This bad problem is just about fixing Tesseract to accept the reality that not all text have the same height for all letters because not everything is a book." Only some books have uniform text sizes. Textbooks have a large degree of variability in text size within the same page and probably caus

Re: Tesseract Reading Issue

2010-07-20 Thread Jimmy O'Regan
On 20 July 2010 02:52, Austin Henderson wrote: > As a developer I am cautious to estimate the amount of time a code change > will take. :D I like you a lot right now. > I am thrilled to have the code and look forward to enhancements > as they are ported to .net environments. Nobody has mentione

Re: Tesseract Reading Issue

2010-07-20 Thread Austin Henderson
As a developer I am cautious to estimate the amount of time a code change will take. I am thrilled to have the code and look forward to enhancements as they are ported to .net environments. For now I am cleaning up the image in pre processing steps to remove blobs that are inconsistent with others

Re: Tesseract Reading Issue

2010-07-19 Thread Jimmy O'Regan
On 19 July 2010 19:01, patrickq wrote: > Wrong ... option 2 won't really work unless you want to cut-out > individual words. This image where everything in on one line still > fails with the same insane forcing of the letters in "John" to be > interpreted as tall letters: > http://www.scanbizcards

Re: Tesseract Reading Issue

2010-07-19 Thread patrickq
- > From: Jimmy O'Regan > Sent: Monday, July 19, 2010 9:56 AM > To: tesseract-ocr@googlegroups.com > Subject: Re: Tesseract Reading Issue > > On 19 July 2010 15:34, Austin Henderson wrote: > > Thank you for your feedback. > > I am working with some automated im

Re: Tesseract Reading Issue

2010-07-19 Thread Austin Henderson
o be higher). I am not really sure I understand the significance of the values passed for this option though. Thanks Austin -Original Message- From: patrickq Sent: Monday, July 19, 2010 9:00 AM To: tesseract-ocr Subject: Re: Tesseract Reading Issue Setting the segmentation mode to P

Re: Tesseract Reading Issue

2010-07-19 Thread Jimmy O'Regan
ds to be lower), or if you get spaces between letters (needs to be higher). > I am not really sure I understand the significance of the values passed for > this option though. > > Thanks > Austin > > > -Original Message- From: patrickq > Sent: Monday, July 19,

Re: Tesseract Reading Issue

2010-07-19 Thread patrickq
as different objects? > Do you think tosp_table_xht_sp_ratio could have any impact on this if I > tweak it? > I am not really sure I understand the significance of the values passed for > this option though. > > Thanks > Austin > > -Original Message- >

Re: Tesseract Reading Issue

2010-07-19 Thread Austin Henderson
nks Austin -Original Message- From: patrickq Sent: Monday, July 19, 2010 9:00 AM To: tesseract-ocr Subject: Re: Tesseract Reading Issue Setting the segmentation mode to PSM_SINGLE_LINE doesn't help (I checked). Here is an even more striking example: "John Doe" and

Re: Tesseract Reading Issue

2010-07-19 Thread patrickq
Setting the segmentation mode to PSM_SINGLE_LINE doesn't help (I checked). Here is an even more striking example: "John Doe" and "j...@widgets.com": http://www.scanbizcards.com/johndoe.jpg Just because the email address uses a smaller font, Tesseract 3.0 stubbornly insists on interpreting all the

Re: Tesseract Reading Issue

2010-07-19 Thread Jimmy O'Regan
On 19 July 2010 13:30, Jimmy O'Regan wrote: > On 19 July 2010 13:20, patrickq wrote: >> This is a great example of a serious problem with Tesseract when >> analyzing any image with fonts of variable sizes such as a street >> sign, flyer, business card etc. What happens is that Tesseract's >> adap

Re: Tesseract Reading Issue

2010-07-19 Thread Jimmy O'Regan
On 19 July 2010 13:20, patrickq wrote: > This is a great example of a serious problem with Tesseract when > analyzing any image with fonts of variable sizes such as a street > sign, flyer, business card etc. What happens is that Tesseract's > adaptive classifier makes assumptions about letter heig

Re: Tesseract Reading Issue

2010-07-19 Thread patrickq
This is a great example of a serious problem with Tesseract when analyzing any image with fonts of variable sizes such as a street sign, flyer, business card etc. What happens is that Tesseract's adaptive classifier makes assumptions about letter heights and uses that knowledge when recognizing the

Tesseract Reading Issue

2010-07-19 Thread KAH
I have two files http://dl.dropbox.com/u/1531272/pg1-CROP.jpg and http://dl.dropbox.com/u/1531272/pg1-CROP-Lines.jpg Note on the "Lines" file there are dark lines on the left and right side of this image. I am trying to understand why the tessnet dll would render such different readings for t