Hello. I am experimenting with even smaller numbers in the wordlist2dawg.cpp file. Too small numbers don't seem to work at all. (results in a crash and core dump)
But for everyone of us without dictionaries (one reason is training for a script that doesn't have a dictionary, only numbers for example) could someone provide the "empty" dawg files? e.g. generated from an empty dictionary? Thank you. -Kristian K On 1 okt, 10:34, Ngu Soon Hui <[EMAIL PROTECTED]> wrote: > Thanks, Ray > > On Oct 1, 10:00 am, "Ray Smith" <[EMAIL PROTECTED]> wrote: > > > That's what I meant by grinds to a halt and runs very slowly. It is > > initializing an array bigger than your physical memory, which means it has > > to page in and out from disk, which is very inefficient. Someone else had > > the same problem, and the fix I suggested worked.Ray. > > > On Tue, Sep 30, 2008 at 4:04 AM, Ngu Soon Hui <[EMAIL PROTECTED]> wrote: > > > > Thanks Ray, > > > > The situation I encountered is this: I left the freq_words_list and > > > words_list blank, and run the command > > > > wordlist2dawg frequent_words_list freq-dawg > > > > What happened was the program didn't finish running even after a day. > > > There must be something wrong with this > > > > On Sep 30, 10:06 am, "Ray Smith" <[EMAIL PROTECTED]> wrote: > > > > There is a memory problem with the currently released wordlist2dawg. If > > > you > > > > don't have something more than 1GB of memory, then your system grinds to > > > a > > > > halt and it runs very slowly.Reduce both max_num_edges and > > > > reserved_edges > > > by > > > > a factor of 10 at line 39-40 of training/wordlist2dawg.cpp and rebuild. > > > > Ray. > > > > > On Mon, Sep 29, 2008 at 2:00 AM, Ngu Soon Hui <[EMAIL PROTECTED]> > > > wrote: > > > > > > Hi, 74yrs old. > > > > > > Thanks for this. I am going to investigate this further. > > > > > > On Sep 28, 11:53 pm, "74yrs old" <[EMAIL PROTECTED]> wrote: > > > > > > Soon, > > > > > > It is clarified that it is mandatory to generate datafiles using > > > > > > wordlist2dawg* frequent_words_list.txt* freq-dawg by running > > > > > > wordlist2dawg *words_list.txt* word-dawg irrespective of fact > > > > > > whether > > > > > > *frequent_words_list*.txt as well as *words_list.txt* are emptyt > > > or > > > > > not > > > > > > After generating datafiles of freq-dawg or words-dawg, relevant > > > > > datafiles > > > > > > will contains some encrypted words even though words_list.txt were > > > > > empty. > > > > > > > you can leave > > > > > > frequent_words_list.txt and word_list.txt blank, but NOT datafiles > > > > > freq-dawg > > > > > > and word-dawg > > > > > > I hope your doubts are cleared? > > > > > > -sriranga > > > > > > > On Sat, Sep 27, 2008 at 4:27 PM, Ngu Soon Hui <[EMAIL PROTECTED] > > > > > > wrote: > > > > > > > > Referring to the dictionary data section in > > > > > > >http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract. May > > > I > > > > > > > know how to make DAWG dictionary files? The command line is > > > > > > > > wordlist2dawg frequent_words_list freq-dawg > > > > > > > wordlist2dawg words_list word-dawg > > > > > > > > So I do I get Frequent_words_list and words_list? Can they be left > > > > > > > blank or we need to provide such word lists? I tried to leave > > > > > > > frequent_words_list and word_list blank, but the whole program > > > seems > > > > > > > to hang; there was no respond for over an hour. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

