[tesseract-ocr] training scripts in 5.0.0

2021-12-04 Thread Marco Atzeri
Hi, I am updating the cygwin package from 4.1.1 to 5.0.0 and I noticed that 3 scripts language-specific.sh tesstrain.sh tesstrain_utils.sh are not anymore in the source code. Have they been replaced by something else that I should pack instead ? Regards Marco -- You received this mes

Re: [tesseract-ocr] Simplest way to automate pdf to tif?

2020-01-14 Thread Marco Atzeri
Am 14.01.2020 um 19:13 schrieb teksts: Hi all, I have a very large number of PDFs to convert to .tif files to be processed by Tesseract. While I've been getting acquainted with Tesseract, I've just be converting them manually through Adobe Acrobat, but I'd like to automate the process. Any ad

[tesseract-ocr] Updated: tesseract-ocr-4.1.0-1

2019-07-11 Thread Marco Atzeri
read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Improved extensively by Google. It is released under the Apache License 2.0. HOMEPAGE https://github.com/tesseract-ocr/ Marco Atzeri If you have

Re: [tesseract-ocr] training tesseract 4.0.. issue with 'Make leptonica' giving error

2019-07-06 Thread Marco Atzeri
Am 17.04.2019 um 03:55 schrieb yoganand: im trying to train my tesseract 4.. i started it with installing cygwin and could do till setup and steps you have given for OCRD-train is giving issues while trying to compile leptonica and tesseract. tesseract and leptonica are already available for

Re: [tesseract-ocr] No recognition

2018-12-18 Thread Marco Atzeri
Am 18.12.2018 um 19:00 schrieb Bostjan Laba: I cannot OCR the red number down below from the attached image. Turning image to grey, 600dpi, b/w or anything else just won't produce the proper result. I get some garbage chars or not even that. How do I get to scan an image like that and OCR th

Re: [tesseract-ocr] tesseract version mismatch

2018-11-02 Thread Marco Atzeri
Am 02.11.2018 um 20:05 schrieb Nikhil Kumar: Hello, I have upgraded tesseract to the latest version, but I am getting this error. Can anyone help me with this? Thank you last version is 4.0.0. How did you upgraded and on what system you are ? --- Diese E-Mail wurde von Avast Antivirus-Softw

[tesseract-ocr] Updated: tesseract-ocr-4.0.0-1

2018-10-29 Thread Marco Atzeri
. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Improved extensively by Google. It is released under the Apache License 2.0. HOMEPAGE https://github.com/tesseract-ocr/ Marco Atzeri If you have questions or comments, please send them to the cygwin mailing list at: cygwin (at

[tesseract-ocr] patch for #426

2018-10-20 Thread Marco Atzeri
The attached patch, tested on 4.0.0 RC3 for Cygwin, should solve the https://github.com/tesseract-ocr/tesseract/issues/426 and maintain the current cygwin build. It is similar to implementation in other software of the same need to have "-no-undefined" passed to libtool I doubt that the config

Re: [tesseract-ocr] Recognizing large Tiff Images

2018-09-18 Thread Marco Atzeri
Am 18.09.2018 um 23:15 schrieb rasaiza...@gmail.com: Hi, base on your advise i am using now ImageMagick.net I have been used both resize and thumbnail function but i get very larger image 100X !!! )) i am confusing about how i must use ImageMagick to get an small size image suitable for loadin

Re: [tesseract-ocr] Recognizing large Tiff Images

2018-09-18 Thread Marco Atzeri
Am 18.09.2018 um 09:02 schrieb rasaiza...@gmail.com: Hi, I have a Tiff image that when using tesseract 4.0 api it reports  object reference error on  TessBaseApi ResultIterator When i open this tif image using paint.net and paint it reports out of memory Exeception... I tried to rescale

Re: [tesseract-ocr] tesseract-4.0.0-beta.3 - testing problem

2018-08-06 Thread Marco Atzeri
Am 28.07.2018 um 10:08 schrieb Shree Devi Kumar: Test related info has been moved to a new repo under tesseract-ocr https://github.com/tesseract-ocr/test You need to update that submodule (similar to googletest) for all files to be available. It's possible that the wiki has not been updated

[tesseract-ocr] tesseract-4.0.0-beta.3 - testing problem

2018-07-28 Thread Marco Atzeri
With cygwin 64bit 1) I see an excess of "ln" during testing make[2]: 'libgmock_main.la' is up to date. mkdir -p ../test/testing ln -s /cygdrive/d/cyg_pub/devel/tesseract/prove2/tesseract-ocr-4.0.0-0.3.x86_64/src/tesseract-4.0.0-beta.3/test/testing/phototest.tif ../test/testing/phototest.tif m

Re: [tesseract-ocr] Leptoncia vs libleptonica-dev

2018-06-13 Thread Marco Atzeri
On 6/13/2018 8:16 AM, Ning Zhao wrote: Hi all, The question in my mind now is whether leptonica and libleptonica-dev are the same thing as leptonica doesn't provide an executable. How can I check I have installed them/it successfully? leptonica is a library. As any library is usually provi

Re: [tesseract-ocr] Re: @shree / Fianlly I made the customzied (fine tuned) traineddata

2018-03-08 Thread Marco Atzeri
On 08/03/2018 09:58, 이경준 wrote: This is my finely tuned traineddata (3types) my os environment is Ubuntu 16.04.03 LTS Please avoid to put large attachment on any mailing list. If you need to share, upload it somewhere else and shere the link -- You received this message because you are subsc

Re: [tesseract-ocr] Image format

2018-03-05 Thread Marco Atzeri
On 05/03/2018 16:09, Dusayanta Prasad wrote: Is it necessary for Tesseract that the input should always be in .tif format ? No. Other formats will also work -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and

Re: [tesseract-ocr] Tesseract recognition accuracy is low

2018-02-07 Thread Marco Atzeri
On 07/02/2018 11:31, Niti Rohilla wrote: Hi All, I am using tesseract for OCR but I am not getting higher accuracy. For some characters results are completely wrong for example it recognizes G as B and 6 as 5 etc. Can somebody please tell me the next steps or how can I improve the accuracy

[tesseract-ocr] Updated: tesseract-ocr-3.05.01-1

2017-06-09 Thread Marco Atzeri
Library it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Improved extensively by Google. It is released under the Apache License 2.0. HOMEPAGE https://github.com/tesseract-ocr/ Marco Atzeri If

Re: [tesseract-ocr] Object Name Conflicts in Archive

2016-12-14 Thread Marco Atzeri
On 12/12/2016 22:40, Will Brackenbury wrote: I'm experiencing issues installing Tesseract on a 64-bit Windows 8 system using cygwin. The error I receive is "Object Name Conflicts in Archive" based on the libtesseract_api.a folders. I used the instructions to compile from here (https://vorba.ch/20

Re: [tesseract-ocr] tessdata on github

2016-08-18 Thread Marco Atzeri
On 19/08/2016 00:22, ShreeDevi Kumar wrote: I am wondering whether it would be possible to download only the needed traineddata files from tessdata repo (optional) into the designated tessdata-dir (which has the required tessdata files). I found the following options but haven't been able to try

Re: [tesseract-ocr] add Arabic language URGENT

2016-07-21 Thread Marco Atzeri
On 21/07/2016 00:09, Anis Ch wrote: *HELLO, I need to add Arabic language to my tesseract ocr software ( version 5,41 march 2015) * * * *I try using the official site but I have Failed* * * *Would you help me :) * the latest files for Arabic language for Tesseract are here: https://github.

Re: [tesseract-ocr] Das tutorial 2016

2016-07-13 Thread Marco Atzeri
On 13/07/2016 19:05, ShreeDevi Kumar wrote: https://github.com/tesseract-ocr/docs/tree/master/das_tutorial2016 I tried to download the files from above link but getting an error. Wondering whether problem is at my end or the files need to be re uploaded... work for me when selecting the raw

Re: [tesseract-ocr] Tesseract version 3.04.1 is giving error when trying to load .png image

2016-06-02 Thread Marco Atzeri
On 02/06/2016 12:58, Supriya Das wrote: Hello EveryBody, When we are trying to load a png image it it throwing error ""function not present", "pixReadMemPng"". How to solve this one please suggest. i have used leptonica version 1.73. Thanks & Regards, Supriya Das. May be some more hints

Re: [tesseract-ocr] Italian - Missing special-words

2016-05-28 Thread Marco Atzeri
On 10/01/2016 16:46, bácsi Kazi wrote: Hi, Finally I could build my portable 3.05dev install with CygWin (without training, because I got errors while building - ideas welcome). I'm now using the Italian language files from GitHub, but I keep on getting the error "failed to load .../special-wo

Re: [tesseract-ocr] do i get a performance boost when i compile tesseract as a 64 bit program?

2016-05-15 Thread Marco Atzeri
On 15/05/2016 12:33, Simon Eigeldinger wrote: Hi all, i am thinking of switching back to cygwin. msys2/mingw-w64 seems not to give me much luck at the moment. seems i have issues with leptonica. it can find leptonica but not its pdf functions it seems. at the moment i guess i have to decide if i

Re: [tesseract-ocr] Re: Update of cygwin package for training

2016-05-10 Thread Marco Atzeri
On 10/05/2016 14:39, Mikael Egibyan wrote: Hi Marco, Can you please link a tutorial how to generate/create all the specific language files? Thanks! Mikayel Hi Mikayel, It is not clear your request. Are you asking about training file ? On cygwin it works as on the other system https://github

Re: [tesseract-ocr] Proper Use of Text2Image?

2016-04-17 Thread Marco Atzeri
On 17/04/2016 14:45, John Timuty wrote: Hi there! ^_^ I didn't know how to compile so I had to download Cygwin because only there i got a compiled text2image.exe.. But now i have it and tried to use it. This is what i get when i execute command. John@John-PC /cygdrive/c/cygwin/training $ *text2i

Re: [tesseract-ocr] Will somebody send me text2image.cpp compiled?

2016-04-12 Thread Marco Atzeri
On 12/04/2016 08:02, John Timuty wrote: Hello Guys.. :-) I want to train tesseract for a new font, and for that i need text2image utility. If you know how to compile it, please help me by doing it and sending it to me. Thankyou! ^_^ As we have no clue of your system, I can only give you the add

Re: [tesseract-ocr] It does not work

2016-04-11 Thread Marco Atzeri
On 11/04/2016 08:23, Aleksey wrote: Hi, I compiled the latest sources on Windows 8 and Ubuntu 15.10, did everything according to recommendations, created png file with the texts printed with basic fonts, run tesseract on both systems, it returns rubbish exept for several words. I attach

Re: [tesseract-ocr] How to train tesseract in Windows?

2016-04-04 Thread Marco Atzeri
On 04/04/2016 18:17, ShreeDevi Kumar wrote: Install cygwin and download tesseract packages including training utils. >>On cygwin Marco Atzeri has packaged Tesseract as well as the training utilities for 3.04.00 along with some training data. Instruction for cygwin installation is here

Re: [tesseract-ocr] building training tools on cygwin

2016-04-03 Thread Marco Atzeri
On 03/04/2016 09:09, shree wrote: Marco, Thanks for the patches. I wasn't able to build with dev. Please provide the patches as a pull request for the project, when you build it next time. Thanks. hi Shree, strange. I just built from dev with the two patches, that applied fine. tesseract-tr

Re: [tesseract-ocr] building training tools on cygwin

2016-03-29 Thread Marco Atzeri
On 29/03/2016 19:49, ShreeDevi Kumar wrote: Hi, I have been able to build latest source of tesseract on cygwin. ra@Shree ~/tesseract-ocr/tesseract $ tesseract -v tesseract 3.05.00dev-296-g60176fc leptonica-1.73 libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.6.20 : libtiff

[tesseract-ocr] Updated: tesseract-ocr-3.04.01-1

2016-03-05 Thread Marco Atzeri
a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Improved extensively by Google. It is released under the Apache License 2.0. HOMEPAGE https://github.com/tesseract-ocr/ Marco Atzeri If you have

Re: [tesseract-ocr] German mutated vowel/umlaut ü/Ü

2016-02-11 Thread Marco Atzeri
On 12/02/2016 04:33, Stefan Greiner wrote: Tesseract 3.04 (953523b) Since using 3.04 with current German language file "deu.traineddata" the small ü are always recognised as big Ü. Are there any parameters to fix this? the other characters are recognised properly. example source screenshot add

Re: [tesseract-ocr] Update of cygwin package for training

2015-12-20 Thread Marco Atzeri
same procedure used for install. Setup will propose the installation of any updated packages Don't forget to select the font package lohit-kannada-fonts On 20/12/2015 13:42, Sriranga(83yrsold) wrote: Marco, awaiting response for upadated pacakage for download. w/b sriranga On Mon, Dec 14, 20

Re: [tesseract-ocr] Update of cygwin package for training

2015-12-14 Thread Marco Atzeri
On 14/12/2015 14:37, Sriranga(83yrsold) wrote: It would have be nice to build packages to 3.05.00dev(released on 22 July) also. It works for me in the ubuntu 15.10. from where the said packages have download for cygwin. as 3.05.00dev seems a development tag and not yet a stable release I am

Re: [tesseract-ocr] Update of cygwin package for training

2015-12-14 Thread Marco Atzeri
On 14/12/2015 14:37, Sriranga(83yrsold) wrote: It would have be nice to build packages to 3.05.00dev(released on 22 July) also. It works for me in the ubuntu 15.10. from where the said packages have download for cygwin. as 3.05.00dev seems a development tag and not yet a stable release I am

[tesseract-ocr] Update of cygwin package for training

2015-12-14 Thread Marco Atzeri
Hi, I updated both arch (x86 and x86_64) packages to 3.04.00-3. The tesseract-training-util packages now contains the scripts taken from development repository and should work correctly. Mini HOWTO using the Kan language files provided by Sriranga. as example: 1) package to be installed tess

Re: [tesseract-ocr] how to use tesstrain .sh etc in ubuntu 15.10

2015-12-01 Thread Marco Atzeri
On 01/12/2015 13:47, Sriranga(83yrsold) wrote: Marco, what is the latest position of your research? pl send me commandline used by you to test on my machine - since I could not understand "Workaround linking font_properties *_->_* /usr/share/tessdata/font_properties" OR how to compile from s

Re: [tesseract-ocr] how to use tesstrain .sh etc in ubuntu 15.10

2015-11-30 Thread Marco Atzeri
On 29/11/2015 12:18, Marco Atzeri wrote: On 27/11/2015 16:28, Sriranga(83yrsold) wrote: In coninuation of my previous post - I like to inform that also succeeded to generate the kan.traineddata file in tesseract-3.05.0Dev using tesstrain.sh. I am thankful to all concerned who helped me to solve

Re: [tesseract-ocr] how to use tesstrain .sh etc in ubuntu 15.10

2015-11-29 Thread Marco Atzeri
On 27/11/2015 16:28, Sriranga(83yrsold) wrote: In coninuation of my previous post - I like to inform that also succeeded to generate the kan.traineddata file in tesseract-3.05.0Dev using tesstrain.sh. I am thankful to all concerned who helped me to solve the problem. Good Luck. On Fri, Nov 27, 2

Re: [tesseract-ocr] Tesseract command works on OSX but not Windows

2015-11-18 Thread Marco Atzeri
On 18/11/2015 05:32, Jonathan Warrick wrote: I have written a script that utilizes Tesseract to extract simple text from a |.tif| file, which works perfectly as expected when using OSX, but does not seem to work at all when I try and run the command on a Windows machine (Windows 7; OS of machine

Re: [tesseract-ocr] how to use tesstrain .sh etc in ubuntu 15.10

2015-11-17 Thread Marco Atzeri
On 17/11/2015 11:59, Sriranga(83yrsold) wrote: I have already installed cygwin (as standalone) in Vista OS. Since i could not succeeded to run Tesstrain.sh in ubuntu 15.10 - but failed inspite of number of attempts made by me. I shall be ever thankful to you if you kindly guide me how to run tess

Re: [tesseract-ocr] how to use tesstrain .sh etc in ubuntu 15.10

2015-11-16 Thread Marco Atzeri
On 16/11/2015 11:43, Sriranga(83yrsold) wrote: It would be nice to indicate step by step procedure to be followed for the generating On cygwin the procedure for "lang.traineddata" should be the same than on Linux. Instruction for cygwin installation is here: https://cygwin.com/cygwin-ug-net/se

Re: [tesseract-ocr] how to use tesstrain .sh etc in ubuntu 15.10

2015-11-15 Thread Marco Atzeri
On 15/11/2015 18:45, Nick White wrote: On Sun, Nov 15, 2015 at 09:16:29PM +0530, Sriranga(83yrsold) wrote: Dear nick, kindly clarify whether "make" file will work on windows "vista" since binaries for windows are not available for download at present? If so how to do? No, it won't work on Wind

Re: [tesseract-ocr] building on cygwin with training data

2015-08-02 Thread Marco Atzeri
On 8/2/2015 10:31 AM, ShreeDevi Kumar wrote: + tesseract-dev google group Thank you, Marco. I will download the training tools packages and and give it a try. In future updates to the tesseract package, may I suggest packaging of more languages from 'tessdata' - https://github.com/tesseract-ocr

[tesseract-ocr] Updated: tesseract-ocr-3.04.00-2

2015-08-02 Thread Marco Atzeri
. Improved extensively by Google. It is released under the Apache License 2.0. HOMEPAGE https://github.com/tesseract-ocr/ Marco Atzeri If you have questions or comments, please send them to the cygwin mailing list at: cygwin (at) cygwin (dot) com . -- You received this message because you are

Re: [tesseract-ocr] building on cygwin with training data

2015-08-02 Thread Marco Atzeri
On 7/29/2015 11:40 AM, ShreeDevi Kumar wrote: ​Marco, Thanks for building the training tools for cygwin. Till now just the additional binaries have been shipped as part of the tesseract package. With Tesseract 3.04.00 t​here are additional scripts provided to help with training. Google has also

[tesseract-ocr] building on cygwin with training data

2015-07-28 Thread Marco Atzeri
Hi, I just completed the build of tesseract-ocr-3.04.00 including the training portion. Attached the patch I used together with configure LIBS="$(pkg-config --libs icu-i18n)" to correctly include the icu dependency. For what I see the additional steps make training make training-install

Re: [tesseract-ocr] tesseract on cygwin

2015-07-27 Thread Marco Atzeri
https://github.com/tesseract-ocr/tesseract/issues/61 Closed *building **tesseract**under **cygwin**: training tools don't build #61* - sent from my phone. excuse the brevity and typos. On 27 Jul 2015 11:50, "Marco Atzeri" mailto:marco.atz...@gmail.com>> wrote: On 7/27/20

Re: [tesseract-ocr] tesseract on cygwin

2015-07-26 Thread Marco Atzeri
On 7/27/2015 4:54 AM, ShreeDevi Kumar wrote: Thank you, Marco. 1. Is there a way to download just the tesseract package and dependencies (like Simon had setup) for testing purposes for those who do not have a cygwin install? possible: The package is available on mirrors: http://mirrors.kern

[tesseract-ocr] Updated: tesseract-ocr-3.04.00-1

2015-07-26 Thread Marco Atzeri
://github.com/tesseract-ocr/ Marco Atzeri If you have questions or comments, please send them to the cygwin mailing list at: cygwin (at) cygwin (dot) com . -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this grou

Re: [tesseract-ocr] tesseract on cygwin

2015-07-26 Thread marco . atzeri
Il giorno venerdì 24 luglio 2015 09:48:53 UTC+2, Simon Eigeldinger ha scritto: > > hi, > > sorry missed the point. > just reproduced it: > > $ tesseract testing\eurotext.tif testing\eurotext -l eng+deu pdf > > Tesseract Open Source OCR Engine v3.05.00dev with Leptonica > Page 1 > Error in f