[tesseract-ocr] Re: english-arabic dictionary - transliteration text

2024-03-29 Thread Tom Morris
Rather than using random web resources, I'd suggest using the official documentation. The most relevant section is probably this: https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#fine-tuning-for--a-few-characters I would suggest starting with script/Latin for your base mo

[tesseract-ocr] Re: english-arabic dictionary - transliteration text

2024-03-28 Thread aum hren
h 28, 2024 at 2:45:39 PM UTC aum hren wrote: > olo company > > i am trying to ocr an old (1963) morocco arabic - english dictionary > > i have tried jTessBoxEditor for ocr, somehow managed to follow the info on > net, > but at the very end tesseract failed to make final _traind

[tesseract-ocr] english-arabic dictionary - transliteration text

2024-03-28 Thread aum hren
olo company i am trying to ocr an old (1963) morocco arabic - english dictionary i have tried jTessBoxEditor for ocr, somehow managed to follow the info on net, but at the very end tesseract failed to make final _traindata_ files my problem is the book (dictionary) is basically in english

Re: [tesseract-ocr] Dictionary?

2023-11-19 Thread Des Bw
That is very interesting. I was expecting the dictionary to have some significant impact on the output. I am getting no impact at all. Yes, my images are pretty fine: regular scanned (300dpi) book, and i m on Tesseract 5. Sure, I will dig into this forum, and also with the experimentation

Re: [tesseract-ocr] Dictionary?

2023-11-19 Thread Zdenko Podobny
AFAIR there were tests with the legacy engine where the effect of improving results quality by dictionaries where measured as 10-15% for common text. However: adding a word to a dictionary has never ensured Tesseract's accurate recognition of that word. For non-word inputs (e.g. serial nu

[tesseract-ocr] Dictionary?

2023-11-19 Thread Des Bw
Does Tesseract actually use the dictionary (wordlist) included into the model (traineddata file)? - I am not getting any difference/impact by including a dictionary (word list) into the file. Has anybody experimented with a dictionary set up? -- You received this message because you are

[tesseract-ocr] Re: Problem with deactivating dictionary in tesseract using Python

2023-03-01 Thread Vuh doo
and passing it (and absolute path) to > pytesseract functions doesn't work!!! > > > El viernes, 11 de octubre de 2019, 10:32:12 (UTC+2), Sandra M. escribió: >> >> I'm trying to deactivate the tesseract dictionary, but I don't get it. >> I'm

[tesseract-ocr] Seems like the dictionary isn't used

2022-09-15 Thread צביקה הרמתי
Hi. 1. I've an image that's written in a "Science Fiction" style font, where 'E' is written similarly to '='. Therefore, the attached image is recognized as "AR= YOU SURE YOU WANT TO QuIT >" However, since Tesseract is using an English d

[tesseract-ocr] Custom dictionary

2021-10-12 Thread bambitous ttous
Hello, Could you tell me please how can I use a custom dictionary composed of few words ? Thanks in advance. Best regards, Bambitous -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiv

[tesseract-ocr] How do I know my custom dictionary is being used?

2021-06-06 Thread Ricardo Moura
I'm trying to use a custom dictionary as follows: text = pytesseract.image_to_string(img,config='--psm 12 bazaar') "bazaar" is a .txt with: load_system_dawg F load_freq_dawg F user_words_suffixuser-words eng.user-words is the new dictionary I've cre

[tesseract-ocr] Extend the standard dictionary for a language with own words

2020-03-19 Thread Dayton
to extend the standard dictionary with my own words to improve the accuracy the OCR. Thanks in advance for your help! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from i

[tesseract-ocr] Re: Problem with deactivating dictionary in tesseract using Python

2020-01-08 Thread alexander sanchez diaz
te the tesseract dictionary, but I don't get it. I'm > using tesseract 5.0.0 and use the Python code below. I read about the > parameters load_system_dawg and load_freq_dawg to change them in the > config, but I don't know how to do this exactly. Can someone give me more

[tesseract-ocr] Problem with deactivating dictionary in tesseract using Python

2019-10-11 Thread 'Sandra M.' via tesseract-ocr
I'm trying to deactivate the tesseract dictionary, but I don't get it. I'm using tesseract 5.0.0 and use the Python code below. I read about the parameters load_system_dawg and load_freq_dawg to change them in the config, but I don't know how to do this exactly. Can

[tesseract-ocr] Re: How to add dictionary to training?

2019-08-04 Thread ElGato ElMago
I guess you don't do training with dictionary. You only use it when you read image. 2019年8月3日土曜日 1時48分08秒 UTC+9 Mox Betex: > > I want to do fine tuning and I want to add my dictionary of words. > How to do that, what file to create? > Do I need to add dictionary for

[tesseract-ocr] How to add dictionary to training?

2019-08-02 Thread Mox Betex
I want to do fine tuning and I want to add my dictionary of words. How to do that, what file to create? Do I need to add dictionary for training or after? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this grou

[tesseract-ocr] How to turn off dictionary during OCR?

2019-07-31 Thread Mox Betex
Can I turn off use of dictionary during OCR and how? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To

[tesseract-ocr] Tesseract user dictionary

2019-07-15 Thread Kooshan Hashemifard
Dear Tesseract OCR team I am trying to use user dictionary for other language than English and followed the instruction of Tesseract manual. Although I tested it for *English *user word list my self and it worked properly, the same procedures doesn't work for FAS language given word list a

[tesseract-ocr] Tesseract Dictionary

2019-06-26 Thread raghad mosto
Hi .i am trying to correct Tesseract result by its dictionary .but I don't know how i can access and use the dictionary of the language I used -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and

[tesseract-ocr] Whats the size of the Dictionary for ara.traindata?

2018-12-24 Thread Nouran S. Ahmad
Hi, So Im using terreract-ocr 4 for Arabic and I read the published research papers, and I have a question about the dictionary used by the word recognizer I want to know what is the number of words in dictionary for Arabic language? I am aware that there are fast and best traindata files, I

[tesseract-ocr] Does tesseract dictionary support prefix/suffix words?

2018-07-29 Thread Ramast Magdy
For example let's say certain language would write "my book" as "mybook" without spaces. Then there is "hisbook, herbook, theirbook, ...". It doesn't make sense to add all these words to the dictionary for each noun right? Can I use asterisks or

[tesseract-ocr] How to train for certain dictionary?

2018-01-15 Thread Yonghan Ruan
Hi everyone! I've read FAQ How Do I Provide My Own Dictionary <https://github.com/tesseract-ocr/tesseract/wiki/FAQ#how-do-i-provide-my-own-dictionary> but it is not for 4.0. All given picture will be in same font (but i don't know what it is), contains 30 words(or less, up

[tesseract-ocr] Re: How to improve the recognition of receipt (text not in words dictionary)

2017-07-13 Thread srnsp92
nce on > receipts, lots of text are not dictionary words. I disabled the > dictionaries, it increased the recognition rate, but it’s still low, I’d > like to create my own dictionary with the product catalog. > > Is there someone who can give the tutorial to do it ? > > Many th

[tesseract-ocr] Re: How to improve the recognition of receipt (text not in words dictionary)

2017-06-29 Thread sfo
hello Laura! could you please tell me how did you disable the dictionaries? Le mardi 20 juin 2017 08:35:25 UTC+2, Laura a écrit : > > Hi, I’m new on tesseract. I’m trying to recognize receipts. Since on > receipts, lots of text are not dictionary words. I disabled the > dicti

Re: [tesseract-ocr] How to improve the recognition of receipt (text not in words dictionary)

2017-06-20 Thread ShreeDevi Kumar
Laura wrote: > Hi, I’m new on tesseract. I’m trying to recognize receipts. Since on > receipts, lots of text are not dictionary words. I disabled the > dictionaries, it increased the recognition rate, but it’s still low, I’d > like to create my own dictionary with the product cata

[tesseract-ocr] How to improve the recognition of receipt (text not in words dictionary)

2017-06-19 Thread Laura
Hi, I’m new on tesseract. I’m trying to recognize receipts. Since on receipts, lots of text are not dictionary words. I disabled the dictionaries, it increased the recognition rate, but it’s still low, I’d like to create my own dictionary with the product catalog. Is there someone who can

[tesseract-ocr] Add new words to dictionary to tesseract LSTM - ara ?

2017-05-21 Thread Ahmad Moawad
Hello All, I want to add new words to dictionary to tesseract LSTM for arabic? I will tell my steps and correct me if I am wrong: go to langdata/ara/ara.wordlist directly Is this right? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr"

[tesseract-ocr] how to increase dictionary weight with v3.02 Windows library

2016-12-03 Thread Marie Sun
Hi, I am using Tesseract v3.02 Windows libraries to create a VC++ console app (couldn't find Windows libraries for later version. If you do, please kindly tell me). I want to increase the strength of dictionary words, but setting tesseract::TessBaseAPI api; if (api.Init(&qu

[tesseract-ocr] my own dictionary

2016-10-21 Thread Amaia Espinosa
Hi!!! i want to use tesseract to read some words (about 300 words) in a preoject. These words are combination between numbers and capital letters, but they are specific words. So i would like to know if i could use my own dictionary and not the english or spanish one, to define the words i need

[tesseract-ocr] URGENT HELP NEEDED: False recognition due to Dictionary usage in Sanskrit

2016-06-24 Thread rohit saluja
Hi, I generated images using Sanskrit 2003 font using text2image default configs. I trained the tesseract using my own box files and compared results using dictionary dawg and without using dictionary dawg. Using dictionary dawg interestingly increase the word-level accuracy, but in certain

[tesseract-ocr] how to check tesseract static classifier otutput(not biased with dictionary) or best 10 matches(again unbiased) for a blob with polygonal approximations?

2016-06-11 Thread rohit saluja
hi how to check tesseract static classifier otutput(not biased with dictionary) or best 10 matches(again unbiased) for a blob with polygonal approximations? Thanks in advance Rohit -- You received this message because you are subscribed to the Google Groups "tesseract-ocr"

[tesseract-ocr] Re: Pretty bad result for non dictionary words

2016-05-14 Thread Tom Morris
On Thursday, May 12, 2016 at 5:39:23 PM UTC-4, Christian Koch wrote: > > > Are smaller texts a problem in general? > Yes. https://github.com/tesseract-ocr/tesseract/wiki/FAQ#is-there-a-minimum-text-size-it-wont-read-screen-text Tom -- You received this message because you are subscribed to t

[tesseract-ocr] Re: Pretty bad result for non dictionary words

2016-05-12 Thread Christian Koch
Hi Rolf, thank you for your response. Is this the "right" way? I read that I should rather use proper settings in tesseract than doing manual processing. Are smaller texts a problem in general? Am Freitag, 6. Mai 2016 06:41:06 UTC+2 schrieb Rolf Mertig: > > If you resize with convert from Image

[tesseract-ocr] Re: Pretty bad result for non dictionary words

2016-05-05 Thread Rolf Mertig
If you resize with convert from ImageMagick (or any other tool): convert ocr.jpg -resize 150% ocr2.jpg then tesseract ocr2.jpg ocr2 ; cat ocr2.txt gives ABC-DEF Am Donnerstag, 5. Mai 2016 14:23:13 UTC+2 schrieb Christian Koch: > > I try to recoginze product codes written in images. > The result

[tesseract-ocr] Pretty bad result for non dictionary words

2016-05-05 Thread Christian Koch
I try to recoginze product codes written in images. The results in tesseract 3.04.00 are pretty bad. Even when I try a primitive example (see attachment) it won't work. Instead "ABC-DEF" I get "AECVDEF" The example works *flawlessy* in gocr but guess I'm just using wrong settings or something s

[tesseract-ocr] How to Add My Custom Dictionary

2016-02-18 Thread hypostases
Hello, Is there a location/dir/folder to place there my custom dictionary with English words that seems not to be in tesseract-ocr data? Or what is the procedure of adding such words to tesseract? Thank you. Hypo -- You received this message because you are subscribed to the Google Groups

[tesseract-ocr] How tesseract use dictionary

2016-02-11 Thread Virtuaklem
Hi all, I have a question on the process of tesseract when he use dictionary. I have a user-words dictionary with one word : PROJECT. I have trained tesseract to my own handwriting. So when i test the result, tesseract chooses PRODECT. I use multiple param as penalty or other but no effect

[tesseract-ocr] Can I use Tesseract dictionary to fix non-dictionary word?

2015-08-21 Thread Jakub Dolecki
ob overall, but fails to determine that "reiiability" should be "reliability" (among few other words, but I'm curious about this case in particular). Can you please explain to me why it Tesseract fails to find the dictionary word? Assuming I cannot fix this discrepa

[tesseract-ocr] Re: Adding a custom dictionary

2015-06-02 Thread noha radwan
Hello, I tried following the approach from this post: stackoverflow.com/questions/9568165/custom-dictionary-for-tesseract However it doesn't seem to make any difference. Please correct me if I am wrong but the way I understand it is as follows: when following that approach, I basically

Re: [tesseract-ocr] Unable to locate dictionary files

2015-02-02 Thread Sashank gondala
Thanks a ton! On Monday, 2 February 2015 21:53:12 UTC+5:30, shree wrote: > > > https://code.google.com/p/tesseract-ocr/source/browse/?repo=langdata#git%2Feng > > https://code.google.com/p/tesseract-ocr/source/browse?repo=tessdata#git > > > http://tesseract-ocr.googlecode.com/svn-history/trunk/doc/

Re: [tesseract-ocr] Unable to locate dictionary files

2015-02-02 Thread ShreeDevi Kumar
seract manually and also downloaded english > training data and put it in the corresponding directory. But I am unable to > locate several dictionary files like freq_dawg, and other similar files. > Where are they located? > > Also, Which c++ files in the source code access these

[tesseract-ocr] Unable to locate dictionary files

2015-02-02 Thread Sashank gondala
Hello, I have installed tesseract manually and also downloaded english training data and put it in the corresponding directory. But I am unable to locate several dictionary files like freq_dawg, and other similar files. Where are they located? Also, Which c++ files in the source

[tesseract-ocr] Re: How do I make a dictionary from set of char images and characters corresponding to them?

2015-01-14 Thread Quan Nguyen
f black and white images... > > a.png > b.png > c.png > > etc... > > How do I teach tesseract those characters into new dictionary? > > Best regards, > FlashT > -- You received this message because you are subscribed to the Google Groups "tesseract

[tesseract-ocr] Re: How do I make a dictionary from set of char images and characters corresponding to them?

2015-01-12 Thread Flash Thunder
PS. I tried to see what jTessBoxEditor does, but output traineddata doesn't seems to be correct... When I use it, application crashes with error: tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file ..\..\classify\adaptmatch.cpp, line 555 Generated files attached. It is ca

[tesseract-ocr] How do I make a dictionary from set of char images and characters corresponding to them?

2015-01-12 Thread Flash Thunder
I got set of black and white images... a.png b.png c.png etc... How do I teach tesseract those characters into new dictionary? Best regards, FlashT -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this grou

Re: [tesseract-ocr] How can i disable the main dictionary (Visual Basic)

2015-01-10 Thread Allistair
the main dictionary (Visual Basic). > I know that i have to set the *init only* parameter load_system_dawg to 0. > > > I know how to set *non init only* parameters like tessedit_char_whitelist. > > tess = New Tesseract() > tess.Init("tessdata", &quo

[tesseract-ocr] How can i disable the main dictionary (Visual Basic)

2015-01-10 Thread Markus H.
Hi, i want to disable the main dictionary (Visual Basic). I know that i have to set the *init only* parameter load_system_dawg to 0. I know how to set *non init only* parameters like tessedit_char_whitelist. tess = New Tesseract() tess.Init("tessdata&q

[tesseract-ocr] Re: Enable Dictionary

2014-10-19 Thread bulkinvk
"tessedit_enable_dict_correction" Anybody works with this parametr? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.co

[tesseract-ocr] Re: Enable Dictionary

2014-10-16 Thread bulkinvk
Anybody know, why tesseract dictionary not work? How i can enable it? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc

[tesseract-ocr] Enable Dictionary

2014-10-13 Thread bulkinvk
I use tesserat wrapper for C# (charlesw <https://github.com/charlesw>)! How i can enable dictionary for better result? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails

[tesseract-ocr] Could I add my own Chinese dictionary into the Tesseract?

2014-10-13 Thread yx wang
Hi All, I am developing an App using tesseract to recognize some Chinese characters, But I find the results often include with some impossible word ,the candidate character maybe true. so I try to add my own dictionary into tesseract .I am using the version of 3.01 . Is it possible ? What

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-07 Thread Meenal Goyal
Goyal wrote: > > If you're sure that all the words you will encounter will be in the > > dictionary this should help somewhat: > > https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_ > > increase_the_trust_in/strength_of_the_dictionary? >

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-04 Thread Nick White
On Fri, Jul 04, 2014 at 02:08:46AM -0700, Meenal Goyal wrote: > If you're sure that all the words you will encounter will be in the > dictionary this should help somewhat: > https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_ > increase_the_trust_in/strength_

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-04 Thread Meenal Goyal
I have already > tried them. > > I wanted to know if anything can be done to improve output at later > stage, > > something like adding the words to the dictionary used by tesseract. > > OK, I see. The reason I recommended binarisation is that I suspect > you'll

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-03 Thread Nick White
output at later stage, > something like adding the words to the dictionary used by tesseract. OK, I see. The reason I recommended binarisation is that I suspect you'll have a lot more luck with that than anything else, for your problems. > I have tried listing words in eng.user-wor

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-03 Thread Meenal Goyal
Hi Nick, The post about "question about training tesseract" only suggests some pre-processing steps which include binarisation and I have already tried them. I wanted to know if anything can be done to improve output at later stage, something like adding the words to the dictiona

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-02 Thread Nick White
That's a tough thing to preprocess. Take a look at this recent thread on this list: "question about training tesseract". Nick On Tue, Jul 01, 2014 at 11:48:07PM -0700, Meenal Goyal wrote: > Hi Nick, > > I have read that post earlier and also tried to preprocess the image. This is > the input im

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-01 Thread Meenal Goyal
Hi Nick, I have read that post earlier and also tried to preprocess the image. This is the input image http://imgur.com/yCxOvQS,GD38rCa which after preprocessing gives this http://imgur.com/JzrDkug . I wanted to know if there is some way to improve in post-processing phase. Right now I am using

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-01 Thread Nick White
Hi Meena, On Tue, Jul 01, 2014 at 02:04:36AM -0700, Meenal Goyal wrote: > When I try to ocr an image, it also produces some noise apart from the > meaningful words. An example output for an image is: > > All women become > > like their’ mqthers. _ ' 1"’ ' > > - —T at-{rs their tragedy. ” "R"-‘

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-07-01 Thread Meenal Goyal
oyal wrote: > > When i run tesseract on my image, it produces some words not present in > the > > dictionary. Is there some way to directly get the list of these words > and > > prevent tesseract from showing them in the output. > > Example of such words are:

Re: [tesseract-ocr] retrieve words not matching the dictionary

2014-06-30 Thread Nick White
Hi Meenal, On Mon, Jun 30, 2014 at 01:40:10AM -0700, Meenal Goyal wrote: > When i run tesseract on my image, it produces some words not present in the > dictionary. Is there some way to directly get the list of these words and > prevent tesseract from showing them in the output. >

[tesseract-ocr] retrieve words not matching the dictionary

2014-06-30 Thread Meenal Goyal
Hi, When i run tesseract on my image, it produces some words not present in the dictionary. Is there some way to directly get the list of these words and prevent tesseract from showing them in the output. Example of such words are: fiJfifilnlflfiflhu-«fifllfllfilfi , neefls» , oscxmwxufis etc. -- You

Re: Working with dictionary

2014-03-25 Thread Nick White
dn't get the > accurate > results, So I added dictionary in my training data file. I created lang. > word-dawg and lang.freq-word-dawg and combined them to training file. > In testing with new trained data files I got similar results. I can see no > change in recognition of t

Working with dictionary

2014-03-25 Thread temp name
Hello, I trained tesseract for a new language. In my testing I didn't get the accurate results, So I added dictionary in my training data file. I created lang.word-dawg and lang.freq-word-dawg and combined them to training file. In testing with new trained data files I got similar resul

Increase trust in dictionary with SetVariable

2013-11-08 Thread Cvetomir Todorov
Hello, I have created my own dictionary and I'd like to increase the trust in it to try to improve results (I didn't really notice any improvements with parameters' default values). According to the FAQ: "For tesseract-ocr >= 3.01 try i

Re: Worse results when using user dictionary and pattern files.

2013-08-10 Thread urfan . alimov
Hi, I have same type of problem. Did you manage to get accurate results with user-words and user-patterns files? Basically i have some constant text on my documents. I want to detect these constant text more accurately. Thank you, -- -- You received this message because you are subscribed t

Re: Force tesseract to recognize Words from Dictionary

2013-07-09 Thread JT Booth
See the documentation at http://tesseract-ocr.googlecode.com/svn-history/r725/trunk/doc/tesseract.1.html . You'll want to do just like it does in the example - suppress the default dictionary and supply your own. Check the tesseract FAQ for how to increase the confidence in the dicti

Re: Force tesseract to recognize Words from Dictionary

2013-04-17 Thread Timo Tischler
No one got an answer?? :( > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com Fo

Re: Performing OCR on a Phonetic Dictionary

2013-01-18 Thread 50295
won't work. > > > > Now if only I could get my hands on the Abbyy Fine Reader project file > ... I'd > > represent each phoneme by a unique character for a start and go from > there. > > > > On Wednesday, January 16, 2013 3:20:04 PM UTC,

Re: Performing OCR on a Phonetic Dictionary

2013-01-17 Thread Nick White
er project file ... I'd > represent each phoneme by a unique character for a start and go from there. > > On Wednesday, January 16, 2013 3:20:04 PM UTC, sventech wrote: > > That particular dictionary has already been OCRed with Abbyy Fine Reader: > http://archive.o

Re: Performing OCR on a Phonetic Dictionary

2013-01-16 Thread 50295
dnesday, January 16, 2013 3:20:04 PM UTC, sventech wrote: > > That particular dictionary has already been OCRed with Abbyy Fine Reader: > > http://archive.org/stream/everymansenglish00jone/everymansenglish00jone_djvu.txt > > Although not perfect, a little cleanup would render that text

Re: Performing OCR on a Phonetic Dictionary

2013-01-16 Thread Sven Pedersen
That particular dictionary has already been OCRed with Abbyy Fine Reader: http://archive.org/stream/everymansenglish00jone/everymansenglish00jone_djvu.txt Although not perfect, a little cleanup would render that text quite usable. --Sven On Wed, Jan 16, 2013 at 8:44 AM, Sven Pedersen wrote

Re: Performing OCR on a Phonetic Dictionary

2013-01-16 Thread Sven Pedersen
You would need to train tesseract to recognize those symbols. The web page outlines how to do that. --Sven On Tue, Jan 15, 2013 at 6:43 PM, <50...@web.de> wrote: > Is Tesseract-OCR capable of recognizing phonetic symbols? I would like to > extract the phonetic transcriptions of the following (ou

Performing OCR on a Phonetic Dictionary

2013-01-15 Thread 50295
Is Tesseract-OCR capable of recognizing phonetic symbols? I would like to extract the phonetic transcriptions of the following (out of copyright) document http://archive.org/stream/everymansenglish00jone#page/2/mode/2up Regards, - Olumide -- You received this message because you are subscribe

Re: Tesseract training problems and dictionary problems

2012-12-27 Thread Andy
lled word at :Bounding box=(2307,959)->(2345,972) >>Found 28583 good blobs and 1026 unlabelled blobs in 0 words. >>74 remaining unlabelled words deleted. >> TRAINING ... Font name = arial >> Generated training data for 5943 words >> >> >> On Tue, Aug 23,

Force tesseract to recognize Words from Dictionary

2012-12-12 Thread Timo Tischler
Hi I want to use tesseract-ocr to recognize nutrition-facts from food. tesseract doesn't recognize the data I want very well. So I have the question whether there is a possibility to force tesseract to pick a word from a (custom) dictionary. I want tesseract to only recognize a custom s

Re: Can I configure Tesseract to *always* match a dictionary word?

2012-11-15 Thread Zdenko Podobný
Regarding "user_patterns_suffix" have a look at tesseract manual page [1]. I am not sure if there is possibility to force tesseract choose ocr output from dictionary (I never tried it ;-) ) But you can increase dictionary strength with variables language_model_penalty_non_freq_dic

Re: Training tesseract 3.01 with new font, for reading non dictionary strings - ideal training text?

2012-10-25 Thread Gaara Sabaku
For your purposes a simple approach will yield the best results. The reason it is recommended to repeat letters is because tesseract does not train or read well with small samples due to its approximation/heuristic methods. As tesseract processes the image it improves apon itself and then takes a s

Re: Training tesseract 3.01 with new font, for reading non dictionary strings - ideal training text?

2012-10-21 Thread Adam Chapam
@ Andres I am afraid i do not know the answer to your question, having only looked into the internals of tesseract since last week. My followup email was purely based on an afternoon of unscientific trial and error, but i am interested enough to do further research and will post anything useful

Re: Training tesseract 3.01 with new font, for reading non dictionary strings - ideal training text?

2012-10-21 Thread Nick White
Hi Adam, Thanks for writing with so much detail. Was interesting to read. On Fri, Oct 19, 2012 at 02:22:44AM -0700, Adam Chapam wrote: > I can follow the training wiki and produce working traineddata files, and have > written a .net app to automate creating tif/box pairs from a font file, (i > k

Re: Training tesseract 3.01 with new font, for reading non dictionary strings - ideal training text?

2012-10-19 Thread Andres
I thought that "abcdefghijklmn..." was not a good idea because of the segmentation problem (e.g.: r followed by n interpreted as m ( rn -> m )). So, as in my project I do the character segmentation by myself, I always was using "abcdefghijklmn..." for training. It would be very interesting to know

Re: Training tesseract 3.01 with new font, for reading non dictionary strings - ideal training text?

2012-10-19 Thread Adam Chapam
Just a quick follow up. I have spent the day running tests. I tried using the above linked data, pages from books, and simple (not recommended) ADBDEFG etc, but found i get the best results randomly generating strings with a simple algorithm that outputs characters in strings ranging from 1 to

Training tesseract 3.01 with new font, for reading non dictionary strings - ideal training text?

2012-10-19 Thread Adam Chapam
-processing i do. I read up on unicharambigs but as either letters may be present, and there will be no dictionary words for it to take a hint from, then that option seems unavailable to me. I tried segmenting myself and processing one char at a time, but it still confused the same chars The other thing

Re: Can I configure Tesseract to *always* match a dictionary word?

2012-09-03 Thread ms
Aidano Did you manage to solve this problem? We have the exact same question? Would really be interested in any solutions thanks On Thursday, March 22, 2012 8:37:44 AM UTC+8, aidano wrote: > > I'd like to configure tesseract with a small dictionary (~200 words) and > tell it to

Re: Feature request: ranking of dictionary word frequency

2012-08-23 Thread Zdenko Podobný
Dňa 23.08.2012 13:08, Nick White wrote / napísal(a): > A great addition to training would be if one dictionary file was > used, combining freq-words and all-words, and a relative frequency > probability score was given to each word. This would allow more > fine-grained scoring based on

Re: Feature request: ranking of dictionary word frequency

2012-08-23 Thread Sven Pedersen
; A great addition to training would be if one dictionary file was > used, combining freq-words and all-words, and a relative frequency > probability score was given to each word. This would allow more > fine-grained scoring based on exactly how likely the word is to > appear, which

Feature request: ranking of dictionary word frequency

2012-08-23 Thread Nick White
A great addition to training would be if one dictionary file was used, combining freq-words and all-words, and a relative frequency probability score was given to each word. This would allow more fine-grained scoring based on exactly how likely the word is to appear, which would be a win

Re: Dictionary

2012-07-11 Thread Nick White
On Tue, Jul 10, 2012 at 02:11:11AM -0700, Umair Anjum wrote: > Actually I am using tesseract for urdu language and urdu does not require > dictionary files so I want to exclude all the dictionary related Functions > which are not being used but they are called each time and increases th

Re: Dictionary

2012-07-10 Thread Umair Anjum
Hello Actually I am using tesseract for urdu language and urdu does not require dictionary files so I want to exclude all the dictionary related Functions which are not being used but they are called each time and increases the time of execution Thats why I want to exclude all Dictionary

Re: Dictionary

2012-07-09 Thread Nick White
On Mon, Jul 09, 2012 at 02:26:37AM -0700, Umair Anjum wrote: > Actually I want to close all function calling of dictionary classes > Because I want to improve systems recognition time Yes, that is what Zdenko's advice will do. If you want more fine-grained control over which functions

Re: Dictionary

2012-07-09 Thread Umair Anjum
Hello Actually I want to close all function calling of dictionary classes Because I want to improve systems recognition time Thanks in Advance On Saturday, 7 July 2012 14:42:37 UTC+5, zdpo wrote: > > try to set these variables (found in 3.02) to false: > load_system_dawg > lo

Re: Dictionary

2012-07-07 Thread zdenko podobny
dictionaries than in 3.01. See http://www.sk-spell.sk.cx/first-notes-for-tesseract-ocr-302-traning -- Zdenko On Sat, Jul 7, 2012 at 10:15 AM, Umair Anjum wrote: > Hello > > I am using tesseract 3.01 and I want to disable the dictionary > Is there anyway to do it? > > Thanks in Advance >

Dictionary

2012-07-07 Thread Umair Anjum
Hello I am using tesseract 3.01 and I want to disable the dictionary Is there anyway to do it? Thanks in Advance -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegro

Re: Specifying different dictionary files [was: Getting usable source files from traineddata files]

2012-04-18 Thread Nick White
nd an older version of the language), it would be useful to only > > use one set of dictionary files (rather than presumably the union of > > grc & ell, in the above example). > > > > I wonder if there's any good way of integrating this functionality > > in to tess

Re: Specifying different dictionary files [was: Getting usable source files from traineddata files]

2012-04-18 Thread Nick White
nd an older version of the language), it would be useful to only > > use one set of dictionary files (rather than presumably the union of > > grc & ell, in the above example). > > The main difficult thing for you will be any characters that are not > already trained. There&#

Re: Specifying different dictionary files [was: Getting usable source files from traineddata files]

2012-04-17 Thread David Eger
output -l grc+ell > > Ah, that's a very good idea, and will indeed be useful. However for > my usecase (a script which is mostly the same, but with additions, > and an older version of the language), it would be useful to only > use one set of dictionary files (rather than presumably th

Re: Specifying different dictionary files [was: Getting usable source files from traineddata files]

2012-04-17 Thread zdenko podobny
tesseract image output -l grc+ell > > Ah, that's a very good idea, and will indeed be useful. However for > my usecase (a script which is mostly the same, but with additions, > and an older version of the language), it would be useful to only > use one set of dictionary files (r

Specifying different dictionary files [was: Getting usable source files from traineddata files]

2012-04-17 Thread Nick White
indeed be useful. However for my usecase (a script which is mostly the same, but with additions, and an older version of the language), it would be useful to only use one set of dictionary files (rather than presumably the union of grc & ell, in the above example). I wonder if there's any go

Can I configure Tesseract to *always* match a dictionary word?

2012-03-21 Thread aidano
I'd like to configure tesseract with a small dictionary (~200 words) and tell it to always choose the best match in the dictionary. Is that possible? Also, when inspecting the source code I saw a variable in dict.h called "user_patterns_suffix". Is there any documentation around

Decrease strength of Dictionary in Tesseract 3

2012-02-02 Thread Will
How do I decrease the strength of the dictionary in tesseract 3 ? In the FAQ it says I need to change the value of "NON_WERD" and "GARBAGE_STRING" but they do not exist in Tesseract 3. So, how is it done in Tesseract 3 ? Thanks in advance -- You received this mes

How does one increase the strength of/trust in the dictionary in Tesseract 3.01?

2011-12-28 Thread Richard Warfield
This question is asked in the FAQ, but the answer seems to be out of date "Try upping NON_WERD and GARBAGE_STRING in dict/permute.cpp to maybe 3 or even 5 you could also try lowering ClassPrunerThreshold in classify/intmatcher.cpp to about 200 from 229." As best I can tell, none of these vari

  1   2   >