Hello.
I am experimenting with even smaller numbers in the wordlist2dawg.cpp
file. Too small numbers don't seem to work at all. (results in a crash
and core dump)

But for everyone of us without dictionaries (one reason is training
for a script that doesn't have a dictionary, only numbers for example)
could someone provide the "empty" dawg files? e.g. generated from an
empty dictionary?

Thank you.
-Kristian K

On 1 okt, 10:34, Ngu Soon Hui <[EMAIL PROTECTED]> wrote:
> Thanks, Ray
>
> On Oct 1, 10:00 am, "Ray Smith" <[EMAIL PROTECTED]> wrote:
>
> > That's what I meant by grinds to a halt and runs very slowly. It is
> > initializing an array bigger than your physical memory, which means it has
> > to page in and out from disk, which is very inefficient. Someone else had
> > the same problem, and the fix I suggested worked.Ray.
>
> > On Tue, Sep 30, 2008 at 4:04 AM, Ngu Soon Hui <[EMAIL PROTECTED]> wrote:
>
> > > Thanks Ray,
>
> > > The situation I encountered is this: I left the freq_words_list and
> > > words_list blank, and run the command
>
> > > wordlist2dawg frequent_words_list freq-dawg
>
> > > What happened was the program didn't finish running even after a day.
> > > There must be something wrong with this
>
> > > On Sep 30, 10:06 am, "Ray Smith" <[EMAIL PROTECTED]> wrote:
> > > > There is a memory problem with the currently released wordlist2dawg. If
> > > you
> > > > don't have something more than 1GB of memory, then your system grinds to
> > > a
> > > > halt and it runs very slowly.Reduce both max_num_edges and 
> > > > reserved_edges
> > > by
> > > > a factor of 10 at line 39-40 of training/wordlist2dawg.cpp and rebuild.
> > > > Ray.
>
> > > > On Mon, Sep 29, 2008 at 2:00 AM, Ngu Soon Hui <[EMAIL PROTECTED]>
> > > wrote:
>
> > > > > Hi, 74yrs old.
>
> > > > > Thanks for this. I am going to investigate this further.
>
> > > > > On Sep 28, 11:53 pm, "74yrs old" <[EMAIL PROTECTED]> wrote:
> > > > > > Soon,
> > > > > > It is clarified that  it is mandatory to generate datafiles using
> > > > > > wordlist2dawg* frequent_words_list.txt* freq-dawg by running
> > > > > > wordlist2dawg *words_list.txt* word-dawg irrespective of fact 
> > > > > > whether
> > > > > >   *frequent_words_list*.txt as well as  *words_list.txt*  are emptyt
> > > or
> > > > > not
> > > > > > After generating datafiles of freq-dawg or words-dawg,  relevant
> > > > > datafiles
> > > > > > will contains some encrypted words  even though  words_list.txt were
> > > > > empty.
>
> > > > > > you can  leave
> > > > > > frequent_words_list.txt and word_list.txt blank, but NOT datafiles
> > > > > freq-dawg
> > > > > > and word-dawg
> > > > > > I hope your doubts are cleared?
> > > > > > -sriranga
>
> > > > > > On Sat, Sep 27, 2008 at 4:27 PM, Ngu Soon Hui <[EMAIL PROTECTED]
>
> > > > > wrote:
>
> > > > > > > Referring to the dictionary data section in
> > > > > > >http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract. May
> > > I
> > > > > > > know how to make  DAWG dictionary files? The command line is
>
> > > > > > > wordlist2dawg frequent_words_list freq-dawg
> > > > > > > wordlist2dawg words_list word-dawg
>
> > > > > > > So I do I get Frequent_words_list and words_list? Can they be left
> > > > > > > blank or we need to provide such word lists? I tried to leave
> > > > > > > frequent_words_list and word_list blank, but the whole program
> > > seems
> > > > > > > to hang; there was no respond for over an hour.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to