Re: VietOCR v2.0/3.1 & VietOCR.NET v2.0 Releases

2011-02-22 Thread KHEM Sochenda
Thank you Don for the comments. On Tue, Feb 8, 2011 at 4:06 PM, SpeedyChair wrote: >  Another way to prepare a PDF document for tesseract is to use the 'convert' > command from the ImageMagick package to split an image only PDF file into a > series of GrayScale TIFF images, one for each page.  Th

RE: How to extract the images of each word from the whole image page?

2011-02-22 Thread Cong Nguyen
If you used tesseract 2.04, you should have a look at tessdll\tessdll.cpp. >From tesseract 3.x, orcshell.h/.cpp have been removed. So you need to do backward. I try to do the same on my project, hope it releases soon. Cong. From: tesseract-ocr@googlegroups.com [mailto:tesseract-ocr@

RE: Image pre-processing for good OCR results

2011-02-22 Thread Cong Nguyen
Dear Jon, Beginning for analyzing; I try also to detect lines, corners; but results are not good. I think due to images are low contrast. Please try to analyze with some data line profiles: ROI-left-profile: https://picasaweb.google.com/congnguyenba/TesseractBasedOCR#5576706091073985 3

RE: Image pre-processing for good OCR results

2011-02-22 Thread Cong Nguyen
Dear Andres, The recognition results which I showed, have achieved after I had used my simple tesseract engine 3.01 .net wrapper (link here: http://code.google.com/p/tesseractdotnet/). ROI detection is cropping ROI manually, after that I used my company software to filter. About filteri

Re: Tessnet2 exe exiting without prompting any message

2011-02-22 Thread Quan Nguyen
VietOCR.NET is a VS2008-based C# project that uses tessnet2. Its source can be found at: http://sf.net/projects/vietocr Good luck. On Feb 10, 11:54 am, Seena Anup wrote: > Hi, > > I downloaded Tessnet2 binary from this > pagehttp://www.pixel-technology.com/freeware/tessnet2/ > > I tried execut

Re: Tessnet 2

2011-02-22 Thread Quan Nguyen
VietOCR.NET is a VS2008-based C# project that uses tessnet2. Its source can be found at: http://sf.net/projects/vietocr Good luck. On Feb 15, 12:49 am, noobsaibot wrote: > Hi guys i am a C# developer and working tirelessly on some projects > which require tessnet integration.but unfortunate

Re: Image pre-processing for good OCR results

2011-02-22 Thread Jon Andersen
Vicky, I may be able to convert your local-minima code to OpenCV code; can you send me the result files as well as the filter? I wrote some Python code that uses OpenCV to crop the headstone images to show just the stone. Its not perfect, but it works OK. The Hough algorithm and the other corne

problem in the mftraining part of the tesseract training

2011-02-22 Thread Open sourced nick
I've successfully created a box file with tesseract now after running the unicharset_extractor having it creating the unicharset file that looks like: ... n 3 NULL -1 s 3 NULL 23 t 3 NULL 43 ... I've continued with this command mftraining -U unicharset -O testlang.u

Re: Image pre-processing for good OCR results

2011-02-22 Thread Andres
Hello, A few comments from my side, sorry for being disordered, but I have not much time right now. In OpenCV you can use thresholding with the Otsu algorithm, it’s not documented in the documentation of the threshold function, but the parameter is CV_THRESH_OTSU. Otsu thresholding involves the

Re: Image pre-processing for good OCR results

2011-02-22 Thread Tom Morris
On Feb 20, 9:02 pm, Jon Andersen wrote: > My project athttp://RecordAGrave.comis about recording headstones from > graves and posting the text and images on the Net so that people can > research their family history.  I would appreciate some advice on how to > pre-process these headstone images t

Re: Wrappers for tessearct3.01?

2011-02-22 Thread SpeedyChair
All three messages were received by the list. All Googlegroups mailing list prevent the sender from receiving his own post back. They figure they never fail and your own posts returning to you are not necessary. It confuses many people. Don Marang Vinux Software Development Coordinator (vin

Re: VietOCR v2.0/3.1 & VietOCR.NET v2.0 Releases

2011-02-22 Thread SpeedyChair
I do not have my own page built just for speedy-ocr at the moment. The Ubuntu 10.0.4 Lucid package is hosted on Launchpad in our Vinux Lucid PPA. To add the Vinux Lucid repository to your system, type: sudo add-apt-repository ppa:vinux/vinux-lucid Then install speedy-ocr with the following

Re: [Tesseract 3] English training text

2011-02-22 Thread zdenko podobny
Dmitry, unfortunately I have not enough of time for tests :-(. I still hope Ray will release more info before final 3.01. At the moment I focus on box editor. BR, Zdenko On Tue, Feb 22, 2011 at 9:27 AM, Dmitry Silaev wrote: > Interesting. I was wondering about Cube since its traces began to >

Re: problem in single word recognition

2011-02-22 Thread Dmitry Silaev
I might not understood you fully, but this is an obvious excerpt from "baseapi.h": "Each SetRectangle clears the recogntion results so multiple rectangles can be recognized with the same image" Indeed, SetRectangle() calls ClearResults() which "deletes the pageres and clears the block list ready

Re: [Tesseract 3] English training text

2011-02-22 Thread Dmitry Silaev
Interesting. I was wondering about Cube since its traces began to appear in the source code but had no enough time to investigate it thorougly Zdenko, would you please kindly share your other findings on Cube? Regards, Dmitry On Tue, Feb 22, 2011 at 11:13 AM, zdenko podobny wrote: > I doubt tha

Re: [Tesseract 3] English training text

2011-02-22 Thread zdenko podobny
I doubt that google will release their (full) training set :-( Have a look at svn to file eng.cube.size [1]. You can see there name of fonts that was training for English in 3.01. As far as I understood there is (unpublished/not released) possibility to train language data directly on font files.