[tesseract-ocr] Re: How to improve accuracy for OCR?

2017-06-01 Thread Jasnan Tp
Wednesday, 31 May 2017 21:23:56 UTC+5:30, Wilko Meijer wrote: > > I had the same problem. Eventually I created a traineddata file from the > OCR-B ttf, which works great for me. See attached. > > On Monday, 15 May 2017 12:31:52 UTC+2, Jasnan Tp wrote: >> >> hi, >> >>

[tesseract-ocr] Re: How to improve accuracy for OCR?

2017-05-15 Thread Jasnan Tp
hi, When I use mrz.traineddata, I get the following error tesseract test.png result.txt -l mrz Tesseract Open Source OCR Engine v3.03 with Leptonica index >= 0 && index < size_used_:Error:Assert failed:in file ../ccutil/genericvector.h, line 589 [1]15786 segmentation fault (core dumped) tes

Re: [tesseract-ocr] [Clarification question] Are there initiatives to makeTesseract's 3.03+ new "pdf" OCR option *multi-page* capable ?

2014-08-05 Thread TP
On Tue, Aug 5, 2014 at 12:40 AM, Tom wrote: > My current investigation showed that Leptonica cannot convert an input > multi-page PDF to TIFF multi-page. Writing a PDF is orders of magnitude easier than being able to read an arbitrary PDF. -- You received this message because you are subscrib

Re: ask for help to build tesseract on windows 7 using visual studio 2010

2013-09-17 Thread TP
On Mon, Sep 16, 2013 at 1:57 PM, Qiang Li wrote: > > I am new to Tesseract and try to build on windows 7 using visual studio > 2010 > > My reference: > > 1. http://code.google.com/p/tesseract-ocr/wiki/Compiling > 2. > http://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/setup.html#using-the-l

Re: Finding the first block of text?

2013-07-10 Thread TP
On Mon, Jul 8, 2013 at 3:54 PM, Kurt Marek wrote: > I'm new to tesseract, so please excuse my naiveté. I'm trying to scan some > newspaper headlines, but I don't need the text in the body of the articles. > Obviously, the headline is a much larger type and a different font. Running > tesseract in

Re: Cube documentation, training source files, and openness

2013-05-30 Thread TP
On Thu, May 30, 2013 at 8:32 AM, Dmitri Silaev wrote: > The community contribution now is constrained by enhancing > release packages and fixing trivial bugs. Without a proper > documentation or at least clues on how all this (not only Cube) works, > developers keep community contribution nominal.

Re: c++vs20088 using tess, release error, debug works well

2013-05-27 Thread TP
On Sun, May 26, 2013 at 8:16 AM, wrote: > Hi. i'v get a problem - my app don't works in release version, debug > version works good. > what can it be? > in release version it crashes at point > > char *abc = api.GetUTF8Text(); > > with message "System.AccessViolationException" > > it will work w

Re: Tesseract API example

2013-05-21 Thread TP
On Tue, May 21, 2013 at 2:59 AM, Arthur Ozga wrote: > I am looking to use Tesseract as a backend for a web-based OCR app using > visual c++ and the .Net framework, written using Visual Studio 2010/2012. > Unfortunately, I don't really understand how to integrate Tesseract in the > system using th

Re: VS2008 Express Edition - how to use this to see debug values?

2013-05-08 Thread TP
On Wed, May 8, 2013 at 6:17 AM, Shree Devi Kumar wrote: > Any idea when the 'significant changes in the works' will be ready for > release. That was mentioned in the tesseract-dev mailing list [1] over a year ago when we were discussing whether it was worthwhile to spend time linking the traini

Re: VS2008 Express Edition - how to use this to see debug values?

2013-05-08 Thread TP
On Tue, May 7, 2013 at 7:43 PM, Shree Devi Kumar wrote: > Anyway, That was the reason for wanting to follow the program in VS2008. > If you know of some instructions/tutorial to do that and can point me to > it, that will be great. Visual Studio is generally agreed to have the state of the art d

Re: VS2008 Express Edition - how to use this to see debug values?

2013-05-07 Thread TP
On Tue, May 7, 2013 at 6:11 AM, sdk wrote: > My question is, can that setup be used to trace the program flow or see > how the processing is being done. Yes, but why do you ask. Are you having problems? You might have to also compile leptonica, if you want to step into its functions. The mai

Re: Building tesseract 3.02.02 with leptonica 1.69

2013-04-29 Thread TP
On Mon, Apr 29, 2013 at 4:10 AM, Steven McArdle wrote: > What do you mean by "it doesn't support straight PDF" ? > > Leptonica only supports PDF for relatively simple *output*. See "I/O libraries Leptonica is dependent on" [1] and "Image I/O" [2]. If you don't believe that, see src\environ.h [3] f

Re: Include Tesseract in C++ code

2013-04-29 Thread TP
On Sun, Apr 28, 2013 at 2:16 PM, TedJ wrote: > But if anyone knows of another angle/translation/scale image correction > approach (or code), I'd love to hear about it. I.e. Image stabilization. > I would just use leptonica's pixRead() to read in an image, deskew with pixFindSkewAndDeskew [1] wh

Re: Help me!! Absolute beginner with Tesseract !!using windows+Python

2013-04-25 Thread TP
On Thu, Apr 25, 2013 at 2:23 AM, Xander Cage wrote: > I am trying to use tesseract with Python in a win7 environment. I've never done this (I have used leptonica from Python), but here's some suggestions. 1) Grown your own up to date solution: Basically, you use the Python ctypes module ---

Re: Include Tesseract in C++ code

2013-04-22 Thread TP
On Sun, Apr 21, 2013 at 1:15 PM, TedJ wrote: > *The following error has occurred during XML parsing:* > * > * > *File: > I:\Android\Tesseract\tesseract-3.02.02\tesseract-ocr-3.02-API-Example-vs2008\APIExample\baseapitester\baseapitester.vcproj > * > *Line: 27* > *Column: 4* > *Error Message:* > *

Re: Include Tesseract in C++ code

2013-04-21 Thread TP
On Sat, Apr 20, 2013 at 2:06 PM, TedJ wrote: > >Have you looked at my "Using the latest Tesseract-OCR sources" page [1] > that explains how to use TortoiseSVN to get the latest sources? > > I tried installing Tortroise too. Couldn't install it either. > Why couldn't you install TortoiseSVN? I f

Re: Include Tesseract in C++ code

2013-04-20 Thread TP
On Fri, Apr 19, 2013 at 1:44 PM, TedJ wrote: > Thanks for your reply. I was hoping that I'd be able to find a regular > browser page with the download ZIP file option. Could you post all the > required login info, please? Ugh, I never have good luck using svn > clients. I just tried to downlo

Re: error in Tesseract 3.02

2013-02-23 Thread TP
On Fri, Feb 22, 2013 at 11:14 PM, Nayra Ahmed wrote: > Hi, > I'm trying to run this code > > > #define __MSW32__ > #include "stdafx.h" > #include > #include "windows.h" > #include > #include > #include > #include > #include > using namespace std; > Pix *pix; > > int main() > { >tesserac

Re: Definition for BLOCK_RES_IT

2013-02-07 Thread TP
On Thu, Feb 7, 2013 at 3:01 AM, Sreenath Kambala wrote: > I'm trying to understand tesseract flow. > In line no : 52 , Fixspace.ccp ::: i found "BLOCK_RES_IT block_res_it; " > this statement. > But i'm not able figure where the definition for this class is defined. In general, BLAH_IT is generat

Re: Success story using tesseract

2013-02-05 Thread TP
On Fri, Feb 1, 2013 at 8:34 AM, Jakub Jaroš wrote: > in our project, we would like to decide about using Tesseract for it or not. > I would like to ask somebody who is successfully using Tesseract (or Ocropus > combination) in any project. I was recently playing around with Wolfram Research's Mat

Re: Tiff support for tesseract 3.02 on Ubuntu 12.04

2013-02-03 Thread TP
On Sun, Feb 3, 2013 at 1:08 PM, Michael Lissner wrote: > I have Ubuntu 12.04, which has tesseract 3.02 and leptonica version 1.69. > > I've installed these, and also installed libtiff4 using apt-get. libtiff4 is also known as "bigtiff". [1] lists "important backward incompatible changes in the pu

Re: Error in code tesseract in VS 2010

2013-01-31 Thread TP
On Wed, Jan 30, 2013 at 1:07 PM, Nada Feteha wrote: > Hi, > I try to make code work but in this line printf("Leptonica version: > %s\n",getLeptonicaVersion()); > I got this error Error1error LNK2019: unresolved external symbol > _getLeptonicaVersion referenced in function _main > C:\Build

Re: pixScale concern

2013-01-14 Thread TP
On Sun, Jan 13, 2013 at 11:01 PM, newbie wrote: > Oh, Seems Leptonica does automatically with resolution. Thanks TP again. OK. > I think pixSetResolution should work for me. Just concern that if with > higher DPI, then it could be slower even return the same result as 300 DPI. > Do

Re: Training Tesseract for single digit

2013-01-13 Thread TP
On Thu, Jan 10, 2013 at 3:09 PM, sunitha raghurajan wrote: > Yes, this is NH license plate. The first image is with out pre processing > and the second one is after processing through opencv. Looking at your tiff it is only 72dpi? Try to make a 300dpi tiff instead for better results. Also avoid t

Re: pixScale concern

2013-01-11 Thread TP
On Fri, Jan 11, 2013 at 4:19 PM, TP wrote: > On Thu, Jan 10, 2013 at 10:31 PM, newbie wrote: >> I am trying to scale the original image larger by pixScale function of >> Leptonica for the input of Tesseract OCR, however, I see that the resolution >> of dest image is i

Re: pixScale concern

2013-01-11 Thread TP
On Thu, Jan 10, 2013 at 10:31 PM, newbie wrote: > I am trying to scale the original image larger by pixScale function of > Leptonica for the input of Tesseract OCR, however, I see that the resolution > of dest image is increased as well. For example, I have pix with 300DPI, > but the if I scale t

Re: Create Pix from HBITMAP

2013-01-09 Thread TP
On Tue, Jan 8, 2013 at 7:58 PM, newbie wrote: > BTW, I just have one more question regarding the pixCreate. Is it default > set the bitmap resolution to 300 DPI even though I create the pix* from the > bmp with 96 DPI? > I saved the tiff file to disk and see it has 300 DPI. But if I manually set >

Re: Create Pix from HBITMAP

2013-01-09 Thread TP
That might work but sounds dangerous. I just took a more careful look at what pixCreate() does, and by default it allocates memory for you. The implication being that you are supposed to copy your image data into that location. In any case, you should definitely check your routine for memory leaks

Re: Create Pix from HBITMAP

2013-01-08 Thread TP
On Mon, Jan 7, 2013 at 8:47 PM, newbie wrote: > I have created the Pix using pixCreate,pixSetData, pixEndianByteSwap. > However, when I call pixDestroy, it crashed. Do you know what I am doing > wrong? Is there any restriction for pixCreate? Noted that Only the pix was > created by pixCreate crash

Re: Create Pix from HBITMAP

2013-01-02 Thread TP
On Wed, Jan 2, 2013 at 1:37 AM, TP wrote: > On Wed, Jan 2, 2013 at 12:15 AM, newbie wrote: >> Hello experts, >> >> I would like to create a Pix of Leptonica from my HBITMAP which was captured >> from the screen shot.Could you please advise me how to create it? > &

Re: Create Pix from HBITMAP

2013-01-02 Thread TP
On Wed, Jan 2, 2013 at 12:15 AM, newbie wrote: > Hello experts, > > I would like to create a Pix of Leptonica from my HBITMAP which was captured > from the screen shot.Could you please advise me how to create it? See pixGetWindowsHBITMAP() in leptonica/src/leptwin.c [1] [1] http://tpgit.github.c

Re: Leptonica error after pixRead this and only this .gif

2012-12-20 Thread TP
On Wed, Dec 19, 2012 at 6:33 AM, occorled wrote: > Do you think rebuilding the giflib from scratch with VS2010 would make any > difference? I ran into huge headaches when first testing leptonica with VS2010, which led me to discover all sort of things about linking with incompatible runtime libra

Re: Use tesseract to retrieve image skew & orientation

2012-12-20 Thread TP
On Thu, Dec 20, 2012 at 12:29 AM, José Luis Rey wrote: > Very thanks for the response, > > I want to automatic rotate the pages after scan on ChronoScan, I'm using > leptonica, but it only work with medium/high text density, it does not > detect orientation in document with few text (like cheks or

Re: Leptonica error after pixRead this and only this .gif

2012-12-18 Thread TP
On Tue, Dec 18, 2012 at 6:58 AM, occorled wrote: > What version of leptonica is your tesseract using? tesseract.exe -v tesseract 3.02 leptonica-1.68 (Feb 21 2012, 05:25:30) [MSC v.1500 DLL Release 32 bit] libgif 4.1.6 : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.5 BTW, built using

Re: Use tesseract to retrieve image skew & orientation

2012-12-18 Thread TP
On Tue, Dec 18, 2012 at 2:15 AM, José Luis Rey wrote: > It is posible to retrieve image skew and orientation using tesseract and or > leptonica? See [1][2] and [3][4], probably somewhat specific to western languages? P.S. How do I quickly determine such answers? View [5] and enter in "orient" o

Re: Vector image input?

2012-12-18 Thread TP
On Fri, Dec 14, 2012 at 11:28 AM, Zdenko Podobný wrote: > tesseract-ocr use leptonica for image IO. List of supported input type also > depends on leptonica configuration e.g. if you did not compile jpeg support > for leptonica, jpeg will be not supported in tesseract-ocr. So creating list > of su

Re: Leptonica error after pixRead this and only this .gif

2012-12-18 Thread TP
On Mon, Dec 17, 2012 at 3:29 PM, occorled wrote: > I know this is not the leptonica page, but I didn't see a forum link for it > at (http://code.google.com/p/leptonica/) and thought maybe someone who uses > it might know the answer to this question. > > I am using 3.02 tesseract, src built from s

Re: Bank Card Embossing Characters Recongnition

2012-11-20 Thread TP
On Mon, Nov 19, 2012 at 9:07 AM, Neo Song wrote: > Dear All, > I am now needing to OCR the embossing characters on the bank card. > These characters are in two kind of font. The first one is Farrington 7B, > which is used to present the account number, and another font is > unknown(maybe bank

Re: Language downloads on the website

2012-10-21 Thread TP
On Sun, Oct 21, 2012 at 1:07 PM, zdenko podobny wrote: > Personally I would prefer if tesseract/combine_tessdata can handle > (un)compression. tesseract already depends on leptonica which (at least for tiff and png support) also depends on zlib. So I suppose it should be relatively easy to do thi

Re: Thoughts on having the training process take font files directly

2012-10-16 Thread TP
On Mon, Oct 15, 2012 at 2:27 PM, Nick White wrote: >> As an added step, you could might consider: rendering to grayscale, >> slightly blurring (optional), adding a bit of noise, and then >> re-converting to b&w to simulate what physical scanners do? Maybe do >> this at 1200dpi and also downsample

Re: Thoughts on having the training process take font files directly

2012-10-15 Thread TP
On Mon, Oct 15, 2012 at 3:48 AM, Nick White wrote: > On Fri, Oct 12, 2012 at 10:28:15AM -0700, Tom Morris wrote: >> Sorry, let me clarify. I wasn't suggesting using scans, I was suggesting >> using >> images created by taking representative texts, representative fonts, and >> rendering page imag

Re: use tesseract api in visual c++ 2010

2012-10-09 Thread TP
On Tue, Oct 9, 2012 at 10:19 AM, zdenko podobny wrote: > This is page in svn repository (e.g. it waits for 3.02 release). 3.02 > version was not released yet, but you can get it from svn[1]. > > [1] https://code.google.com/p/tesseract-ocr/wiki/TesseractSvnInstallation > Windows users should see [

Re: error 0xc0150002 in Tesseract

2012-09-27 Thread TP
On Wed, Sep 26, 2012 at 9:57 AM, Nada Feteha wrote: > 'try.exe': Loaded 'C:\BuildFolder\lib\liblept168d.dll', Cannot find or open > the PDB file The binary releases of leptonica do not supply the PDB files needed to debug DLL versions. You will need to build the DLL Debug configuration of Leptoni

Re: use tesseract api in visual c++ 2010

2012-09-18 Thread TP
On Tue, Sep 18, 2012 at 4:58 PM, Kage.Sabaku.No.Gaara wrote: > the post is not how to get it to work with minor changesit is how to get > it to work right , fully internally debugable No. That post is how to do much more work than necessary using the OLD, OBSOLETE non-3.02 VS solution. I can

Re: use tesseract api in visual c++ 2010

2012-09-17 Thread TP
On Mon, Sep 17, 2012 at 7:08 PM, Kage.Sabaku.No.Gaara wrote: > I have a solution for your problem. I posted the answer on this website: > http://stackoverflow.com/questions/5079635/how-can-i-use-tesseract-ocror-any-other-free-ocr-in-small-c-project The information at the supplied link is obsolete

Re: Integrating tesseract into Qt (C++ project)

2012-08-24 Thread TP
On Fri, Aug 24, 2012 at 2:34 AM, zdenko podobny wrote: > First of all you need tesseract dependencies (leptonica + its dependencies). > You can compile it by yourself, but in this case you need to install > mingw+msys environment... than you need to compile tesseract library. > > In past I was su

Re: Problem building libtesseract3.02 with MS VC++ 2008

2012-08-24 Thread TP
On Thu, Aug 23, 2012 at 12:24 PM, Davor Pleskina wrote: > I tried to use 1.69. But it happens 1.68 has the same error (wrong > definition) in allheaders.h. Never mentioned leptprotos.h. Leptonica 1.69 is *NOT* officially supported on Windows. Dan told me it was a Linux-only quick and dirty relea

Re: Problem building libtesseract3.02 with MS VC++ 2008

2012-08-23 Thread TP
On Thu, Aug 23, 2012 at 11:41 AM, zdenko podobny wrote: > BTW: I did not need to change leptprotos.h - I have there > LEPT_DLL extern void setPixMemoryManager ( void *((*allocator)(size_t)), > void ((*deallocator)(void *)) ); The OP apparently wasn't following the recommendation for newbies -- or

Re: Problem building libtesseract3.02 with MS VC++ 2008

2012-08-23 Thread TP
On Wed, Aug 22, 2012 at 11:58 PM, Davor Pleskina wrote: > I still think the problem is template definition and not postion of files > (error is allheaders.h file - just one line; when commented out source > builds but probably leaves space for unexpected behavior...). You need to tell us which fi

Re: Help to link libs on VC++

2012-07-30 Thread TP
On Sat, Jul 28, 2012 at 8:54 PM, Angélica Mascaro wrote: > On project properties -> Linker -> Additional Dependencies i`ve put all the > libs that tesseract generated (ccmain.lib ccstruct.lib ccutil.lib > classify.lib cube.lib cutil.lib dict.lib image.lib libtesseract_tessopt.lib > libtesseract_tr

Re: Error in building Tesseract-OCR

2012-07-27 Thread TP
On Thu, Jul 26, 2012 at 8:04 PM, Nada Feteha wrote: > I do that but now I have this error (0x0150002) , I tried this solution > > http://stackoverflow.com/questions/5126105/c-unable-to-start-correctly-0xc0150002 > > and also this > http://answers.yahoo.com/question/index?qid=20081223052629AAH0xaM

Re: Perspective correction

2012-05-30 Thread TP
On Wed, May 30, 2012 at 10:14 AM, hiran.suvrat wrote: > If I narrow down my problem to this. Suppose I have a image of a text > at an angle ( Image is not taken from the top of the text) such that > it forms a trapezoid of the text. What would you suggest to use then? That would require a perspec

Re: Perspective correction

2012-05-29 Thread TP
On Tue, May 29, 2012 at 9:52 AM, hiran.suvrat wrote: > Is their a function to automatically detect and correct perspective > for text images? Depending on what you mean, Leptonica has the ability to dewarp text page images [1] (although it currently doesn't seem to work very well with pages that

Re: Tesseract in Subtitle Edit

2012-05-23 Thread TP
On Wed, May 23, 2012 at 7:19 AM, Hallur Guðjónsson wrote: > Yes I read it carefully but I understood wrong at first, is there some place > to get the 3.02 windows version of tesseract? do I have to compile it myself > (because I'm a dumbass and don't know how to do that) Now that I have written s

Re: use tesseract api in visual c++ 2010

2012-05-17 Thread TP
On Thu, May 17, 2012 at 9:02 AM, Saeed Torabzadeh wrote: > Hi All > I'm beginner to tesseract, I'm looking for use of tesseract in visual > c++(simple question), I goggled a lot but I couldn't see > any straight forward way to use "baseapi.h".I mean I don't know what should > i add in additional d

Re: The costs are same for every chars which get from TesseractExtractResult method

2012-04-26 Thread TP
On Wed, Apr 25, 2012 at 6:48 PM, Binhua Liu wrote: > But I found every costs – cost[0],cost[1],cost[2] …. They are same. Can > anyone tell me where I am wrong? Try following Dimitri's advice [1] and call: api.SetVariable("save_blob_choices", "T"); before calling SetImage(). [1] http://group

Re: Tess4J 1.0 Beta Release

2012-04-23 Thread TP
On Mon, Apr 23, 2012 at 7:07 PM, Quan Nguyen wrote: > all of the provided image processing functions are geared for Pix type, not > raw image. Why not just create a Pix from the raw image data? Leptonica has pixCreateHeader(), pixSetResolution(), pixSetWpl(), pixSetData(), etc [1] and various hel

Re: How to get the word quality information

2012-04-23 Thread TP
On Mon, Apr 23, 2012 at 5:20 AM, Binhua Liu wrote: > Hi all, > > I have try to use > api.SetVariable("tessedit_reject_bad_qual_wds","TRUE"); to set "reject > bad quality word", but still get many bad match words, my question is > > 1, how can I set the bottom line of word quality, then reject all

Re: Trouble recognizing digits - On tesseract-for-android

2012-04-17 Thread TP
On Tue, Apr 17, 2012 at 2:42 AM, Mayur Mudigonda wrote: > Given that you are continuing the vague emails, here's my best solution. > > I am not convinced your binarization is happening at the level that > tesseract requires. I would suggest looking at > > a) a good conversaion to gray scale > b) f

Re: Different output for almost identical images

2012-04-06 Thread TP
2012/4/6 Zdenko Podobný : > Dňa 06.04.2012 17:35, Rufus wrote / napísal(a): >> Thanks for the reply. >> >> I've tried another image(bad2.tiff), which is still a bit different from >> good.tiff, and is of the same order regarding the compression ratio. >> However, tesseract still doesn't output anyt

Re: Include Tesseract in C++ code

2012-03-30 Thread TP
On Thu, Mar 29, 2012 at 11:23 PM, zdenko podobny wrote: > > > On Thu, Mar 29, 2012 at 4:52 PM, Gustavo Souto wrote: > >> Hi everyone, I need you help... >> >> I want to create a program in C++ with Tesseract, but when I try to >> compile the source code some errors appear. I don't know well ho

Re: producing a box file upon actual recognition?

2012-03-28 Thread TP
On Tue, Mar 27, 2012 at 9:19 PM, Falke wrote: > Anyone? > > In case the length of my posts scared off a few readers, here's a more > condensed version: > > having box coordinates for every recognized character in the final > result, would allow one to extend the recognition process by either re- >

Re: Hi,

2012-03-26 Thread TP
On Sun, Mar 25, 2012 at 3:23 PM, João Real wrote: > Hi to all, > > I'm having the same problem. Which is the "correct path to the language > data"? > If I have my project in this directory the "C:\WindowsFormsApplication1\", > which path should I use? >-> > tessocr.Init(@"C:\WindowsFormsApplic

Re: BaseAPI: geting number of items(lines, words, symbols)

2012-03-26 Thread TP
On Mon, Mar 26, 2012 at 5:13 AM, Max Pole wrote: > Hi crews, > > I'm developing a java wrapper for the new tesseract. Recognition resullts > will be retrieved using BaseAPI and returned as Java arrays. In order to do > so, I need to know the number of particular items, for example words. I > wasn'

Re: Tesseract 3 and paragraph separation

2012-03-23 Thread TP
On Thu, Mar 22, 2012 at 12:59 PM, Demian Katz wrote: > I'm using Tesseract 3 as a simple command-line tool to generate OCR. > It's doing a fairly good job, but I have one unmet need -- I need to > be able to separate paragraphs with blank lines. Hmmm, I just tried this on a sample image (somethin

Re: Tesseract 3 and paragraph separation

2012-03-23 Thread TP
On Fri, Mar 23, 2012 at 1:19 AM, TP wrote: > If you aren't afraid of doing some programming, look at the code for > TessBaseAPI::GetHOCRText. It uses > res_it->IsAtBeginningOf(RIL_PARA) to figure out where each paragraph > begins. I took a look at TessBaseAPI::GetUTF8Text(

Re: Tesseract 3 and paragraph separation

2012-03-23 Thread TP
On Thu, Mar 22, 2012 at 12:59 PM, Demian Katz wrote: > Hello, > > I'm using Tesseract 3 as a simple command-line tool to generate OCR. > It's doing a fairly good job, but I have one unmet need -- I need to > be able to separate paragraphs with blank lines. It would be great if > Tesseract could d

Re: extract word-list failed

2012-03-10 Thread TP
Hmmm, my last post had URLs split in unfortunate places. Here's the list of screen capture URLs again but starting at the beginning of each line. 00 dawg2wordlist Debugging Pane Settings http://www.screencast.com/t/4eTQi8lZEa 01 Start Debugging dawg2wordlist http://www.screencast.com/t/wNw7ziQoQ5

Re: extract word-list failed

2012-03-10 Thread TP
Sriranga(78yrs), Here's some instructions and pictures of how to use Visual Studio 2008 to see where dawg2wordlist is crashing on Windows. Assuming that I have the following folder hierarchy: BuildFolder\ tesseract-3.02\ tessdata\ kan.traineddata testing\

Re: Replacing the tesseract 3.02 alpha vs2008 directory

2012-03-09 Thread TP
On Fri, Mar 9, 2012 at 2:38 AM, Mike wrote: > Thanks Tom, works perfectly for me on Win7, even after converting it > automatically to VS2010, no issues there in Release Mode Static. > Only thing, I was not able to get a proper debug build. I have to use > the static release libraries for leptonlib

Re: OCR Per Page Basis

2012-03-08 Thread TP
On Thu, Mar 8, 2012 at 12:06 PM, TP wrote: > you can use it's "ffind" command to display all (most? > some?) configuration parameters defined in the tesseract-ocr source > files Addendum: The following TCC/LE ffind command gives a more "complete" listing of p

Re: OCR Per Page Basis

2012-03-08 Thread TP
On Thu, Mar 8, 2012 at 11:11 AM, Dmitri Silaev wrote: > As for existence and effects of specific parameters, currently I don't > any other way to find it out but digging in Tesseract's code. If you are on Windows, I wrote this section on TCC/LE [1] that talks about how you can use it's "ffind" co

Re: extract word-list failed

2012-03-08 Thread TP
On Thu, Mar 8, 2012 at 1:55 AM, Sriranga(78yrs) wrote: > Tom, > followed your guidance. in this connection attached screenshots for your > perusal. CMD is still blinking for "error for loading unicharset from > kan.unicharset. It appears crtexe.c does not exist - i dont know whether in > operating

Re: extract word-list failed

2012-03-08 Thread TP
On Thu, Mar 8, 2012 at 12:31 AM, Sriranga(78yrs) wrote: > TP, > tried again and successfully displayed  as follow: > Extract of output of VS2008 is reproduced below: > 'dawg2wordlistd.exe': Loaded > 'M:\r700\BuildFolder\tesseract-ocr\vs2008\LIB_Debug\dawg2wo

Re: OCR Per Page Basis

2012-03-08 Thread TP
On Wed, Mar 7, 2012 at 9:33 AM, Dmitri Silaev wrote: > No, at this time it is not possible to do via command line. As a matter of fact with the SVN version of tesseract at least (and probably earlier versions), it is possible to tell tesseract to OCR a particular page in a multipage tiff file via

Re: Error during python-tesseract installation

2012-03-07 Thread TP
On Wed, Mar 7, 2012 at 11:27 AM, Ivan Mushketik wrote: > What am I doing wrong? I have no idea. First of all provide links so people don't have to google just to help you. For example the location of Python-tesseract [1] Provide details on your current system (presumably Ubuntu but what version?

Re: extract word-list failed

2012-03-07 Thread TP
On Wed, Mar 7, 2012 at 7:55 PM, Sriranga(78yrs) wrote: > David, > Thank you for the valuable guidance. I followed your steps still problem of > window's exe encounter -  vide screenshot is attached. WinXP(sp3)  tesseract > -r-700 > With warmest regards, > -sriranga(79yrs) > > > On Thu, Mar 8, 2012

Re: Replacing the tesseract 3.02 alpha vs2008 directory

2012-03-01 Thread TP
On Thu, Mar 1, 2012 at 12:38 AM, Wout Bittremieux wrote: > Op woensdag 29 februari 2012 20:56:58 UTC+1 schreef TP het volgende: >> >> Are you saying that even after installing VC++2008 Express Edition, >> you still didn't have access to the MSVCR90D.DLL? I would have tho

Re: Replacing the tesseract 3.02 alpha vs2008 directory

2012-02-29 Thread TP
On Wed, Feb 29, 2012 at 3:17 AM, Wout Bittremieux wrote: > I was able to successfully build Tesseract revision 684 using the procedure > you outlined in your documentation. > > If you recall correctly, last week I had some problems using Tesseract and > Leptonica in Visual Studio 2010. I built Tes

Re: Replacing the tesseract 3.02 alpha vs2008 directory

2012-02-28 Thread TP
On Tue, Feb 28, 2012 at 8:30 AM, Sriranga(78yrs) wrote: > I followed the instructions[1] > http://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/setup.html#using-the-latest-tesseractocr-sources > kindly read as r-678 as r-684 which is typo error. > sorry to intimate you that  again failed to ge

Re: Replacing the tesseract 3.02 alpha vs2008 directory

2012-02-27 Thread TP
On Mon, Feb 27, 2012 at 7:39 PM, 79yrsold wrote: > It may kindly be clarified whether exe(debug or release version) files > will also be uploaded along with  proposed some alpha version for > testing and feedback by the users(non-programmer/developer) since the > new procedure is bit little confus

Re: Using Tesseract in Visual Studio 2010

2012-02-23 Thread TP
On Wed, Feb 22, 2012 at 11:59 PM, Wout Bittremieux wrote: > I have been trying to include Tesseract in my Visual Studio 2010 > project in order to be able to use the API calls. Because my knowledge > about building and using libraries in Visual Studio is rather > superficial, this has presented me

Re: Example how to use the thresholder api to tweak the threshold

2012-02-22 Thread TP
On Wed, Feb 22, 2012 at 10:18 AM, avasilev wrote: > Can anyone give me an example how to set the threshold above which > pixels are considered "white" and below - "black" Why not just use Leptonica to threshold your image *before* you pass the PIX to api.SetImage()? You can do something simple:

A tip for those of you new to using Leptonica to do image processing

2012-02-20 Thread TP
It been around since last summer, but you may be unaware of my "Unofficial Leptonica v1.68 Documentation" website (http://tpgit.github.com/UnOfficialLeptDocs/index.html). See that page for its list of improvement over the Official Leptonica website. In a previous post to this group (http://groups.

Re: Tesseract vs Commercial Products

2012-02-18 Thread TP
On Sat, Feb 18, 2012 at 4:58 PM, Jason Funk wrote: > My specific examples are screen captures of powerpoint slides. For > example, what would need to be done to this image? > > http://jasonfunk.net/example2.jpeg Remember, its *always* a bad idea to save an image in jpeg format if it will later be

Re: Tesseract vs Commercial Products

2012-02-18 Thread TP
On Sat, Feb 18, 2012 at 9:32 PM, Dmitri Silaev wrote: > If you have many > such images you can use ImageMagick to automate the above image > processing operations and then feed resulting images to Tesseract, all > in a single script. Or, since tesseract-ocr already links with the Leptonica C Imag

Re: VS2010 and use of ResultIterator

2012-02-10 Thread TP
On Thu, Feb 9, 2012 at 1:32 PM, TyDam' wrote: > Thanks for your answer. > I already read and have a look on this project. > Without modification this project compil and run but when I modify it > by adding: > > tesseract::ResultIterator *ri; > ri=api.GetIterator(); > char * out2 = ri->GetUTF8Text(

Re: Tesseract Dll For Visual Basic Express 2008

2012-01-04 Thread TP
On Tue, Jan 3, 2012 at 8:02 AM, Lahiru Himash Madusanka wrote: > I have downloaded [1] and try to compile. I followed all the > directions. But it gives me errors. > I have added Build Log with this E-mail. Can you tell me what is the > wrong with that This error has already been discussed in the

Re: Could n't able to stepinto InitWithLanguage Method

2011-11-22 Thread TP
On Tue, Nov 22, 2011 at 2:09 AM, Dileep.M wrote: > I started using tesseract-ocr 2.0 .NET  Wrapper.I'm able to do OCR. > But when I'm tring to debug I couldn't able to  Step into  the > folowing Method. > int result = m_myTessBaseAPIInstance->InitWithLanguage(...) > > Same case with DoOCR step int

Re: TessBaseAPI inking error in VSE2008

2011-11-18 Thread TP
On Fri, Nov 18, 2011 at 10:34 AM, Jenny Folkesson wrote: > Thanks for your reply, TP. > > I've tried adding the definitions you suggested: > #define __MSW32__ > #define USE_STD_NAMESPACE I never tried adding it directly to a header file. I define them in the Project

Re: TessBaseAPI inking error in VSE2008

2011-11-17 Thread TP
On Thu, Nov 17, 2011 at 2:20 PM, Jenny F wrote: > I've added all the directories that include headers or .lib files to > the Project properties. When I try to compile I get the following 3 > errors: > > 1>main.obj : error LNK2019: unresolved external symbol > "__declspec(dllimport) public: virtual

Re: tips for improving Tesseract accuracy and speed...

2011-03-31 Thread TP
sis.html) which refers to http://tpgit.github.com/Leptonica/livre__adapt_8c_source.html and http://tpgit.github.com/Leptonica/livre__tophat_8c_source.html. -- TP -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to t

Re: tesseract.exe has stopped working on win2008 r2

2011-03-27 Thread TP
On Sun, Mar 27, 2011 at 2:27 AM, zdenko podobny wrote: > On Sun, Mar 27, 2011 at 12:45 AM, TP wrote: >> On Sat, Mar 26, 2011 at 7:42 AM, zdenko podobny wrote: >> >> Can somebody explain why a tif size (2480x3508 @ 8BPP) is not >> >> processed? >> >>

Re: tesseract.exe has stopped working on win2008 r2

2011-03-26 Thread TP
btiff with "LZW_SUPPORT= 1" in my nmake.opt file. You can see the actual problem by looking at http://tpgit.github.com/Leptonica/tiffio_8c_source.html#l00274, where Leptonica gets the TIFFTAG_SAMPLESPERPIXEL. It allows 1, 3, or 4 but not 2 as this image contains. -- TP --

Re: how to get the character in an image file which is in table format.

2011-03-12 Thread TP
cause you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>> To unsubscribe from this group, send email to >>> tesseract-ocr+unsubscr...@googlegroups.com. >>> Fo

Re: Image pre-processing for good OCR results

2011-02-23 Thread TP
pure" Image Processing routines. I also find Leptonica's source code fairly easy to read because one of the purposes of the library is to try to teach image processing concepts. In any case, if you're planning on using tesseract-ocr 3.x, then you already must have liblept,