On Wed, Mar 7, 2012 at 7:55 PM, Sriranga(78yrs)
<withblessing.sriranga.1...@gmail.com> wrote:
> David,
> Thank you for the valuable guidance. I followed your steps still problem of
> window's exe encounter -  vide screenshot is attached. WinXP(sp3)  tesseract
> -r-700
> With warmest regards,
> -sriranga(79yrs)
>
>
> On Thu, Mar 8, 2012 at 12:42 AM, David Eger <david.e...@gmail.com> wrote:
>>
>> $ combine_tessdata -u ./third_party/tesseract/tessdata/
>> kan.traineddata ./kan.
>> Extracting tessdata components from ./third_party/tesseract/tessdata/
>> kan.traineddata
>> Wrote ./kan.unicharset
>> Wrote ./kan.inttemp
>> Wrote ./kan.pffmtable
>> Wrote ./kan.normproto
>> Wrote ./kan.punc-dawg
>> Wrote ./kan.word-dawg
>> Wrote ./kan.number-dawg
>> Wrote ./kan.freq-dawg
>>
>> $ ls kan.*
>> kan.freq-dawg  kan.inttemp  kan.normproto  kan.number-dawg
>>  kan.pffmtable  kan.punc-dawg  kan.unicharset  kan.word-dawg
>>
>> $ dawg2wordlist kan.unicharset kan.word-dawg word.wordlist
>> Loading word list from kan.word-dawg
>> Reading squished dawg
>> Word list loaded.
>>
>> $ wc -l word.wordlist
>> 18720 word.wordlist
>>
>> Looks like there are 18,720 words in the Kannada word dawg, safely
>> uncompressed...
>>
>>
>>
>> On Mar 7, 8:43 am, "Sriranga(78yrs)" <withblessing.sriranga.
>> 1...@gmail.com> wrote:
>> > David,
>> > just now I checked with kan.punc-dawg(1KB) and kan.number-dawg(1KB)
>> > also.
>> > it works fine In both cases the output were not empty. Only
>> > word-dawg(181KB) and freq-dawg(2KB) does not work but with M$ windows's
>> > exe
>> > encounter message were displayed.
>> > this is brought to your kind notice. Even attached files of
>> > kan.word-dawg
>> > and kan.freq.dawg - for your investigation and valuable guidance.
>> > With warmest regards,
>> > -sriranga(79yrs)
>> >
>> > On Wed, Mar 7, 2012 at 9:44 AM, Sriranga(78yrs) <
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > withblessing.sriranga.1...@gmail.com> wrote:
>> > > David,
>> > > Thanks for the valuable guidance.
>> > > Copied dawg2wordlist.exe pasted in the folder n:\Newfolder\ wherein
>> > > extracted files  Kan.unicharset, kan.word-dawg, kan.freq-dawg are
>> > > located.
>> >
>> > > extract of cmd is reproduced below - with encounter.exe windows
>> > > messages
>> > > displayed for word-dawg and freq-dawg.
>> > > M:\New Folder>dawg2wordlist.exe -h
>> > > Print all the words in a given dawg.
>> > > Usage: dawg2wordlist.exe <unicharset> <dawgfile> <wordlistfile>
>> >
>> > > M:\New Folder>dawg2wordlist.exe kan.unicharset kan.word-dawg
>> > > testwordlist
>> > > Loading word list from kan.word-dawg
>> > > Reading squished dawg
>> >
>> > > M:\New Folder>dawg2wordlist.exe kan.unicharset kan.freq-dawg
>> > > testwordlist
>> > > Loading word list from kan.freq-dawg
>> > > Reading squished dawg
>> > > Word list loaded.
>> > > M:\New Folder>
>> >
>> > >    [Note: testwordlist contains 0(zero)kb for kan.freq-dawg which
>> > > contains
>> > > 2KB -
>> > >      whereas testwordlist did not generate for kan.word-dawg which
>> > > contains 181KB]
>> > > Awaiting further valuable guidance.
>> > > With regards,
>> > > -sriranga(79yrs)
>> >
>> > > Still i could not understand where I made mistake?
>> > > With regards,
>> > > -sriranga(79yrs)
>> >
>> > > On Wed, Mar 7, 2012 at 2:41 AM, David Eger <david.e...@gmail.com>
>> > > wrote:
>> >
>> > >> Where you put wordlist2dawg.exe, try putting the name of the output
>> > >> list
>> > >> instead.
>> >
>> > >> On Friday, March 2, 2012 2:39:33 AM UTC-8, sriranga(79yrsold) wrote:
>> >
>> > >>> I had extracted kan.word-dawg from the Kan.traineddata. I am trying
>> > >>> to
>> > >>> convert dawg to wordlist using commandline in cmd as follows:
>> >
>> > >>> ***M:\r684\BuildFolder\tesseract-ocr>dawg2wordlist "m:\New
>> > >>> Folder\kan.unicharset" "
>> > >>> m:\New Folder\kan.word-dawg" wordlist2dawg.exe
>> > >>> Loading word list from m:\New Folder\kan.word-dawg
>> > >>> Reading squished dawg
>> >
>> > >>> M:\r684\BuildFolder\tesseract-ocr>
>> > >>> *
>> > >>> Unfortunately windows encounter exe displayed. Where I made a
>> > >>> mistake?
>> > >>> Awaiting solution?
>> >
>> >
>> >  kan.word-dawg
>> > 243KViewDownload
>> >
>> >  kan.freq-dawg
>> > 2KViewDownload
>> >
>> >  kan.punc-dawg
>> > < 1KViewDownload
>> >
>> >  kan.number-dawg
>> > < 1KViewDownload

Just looking at that screenshot you supplied, it starts with a ERROR
message about TESSDATA_PREFIX not correctly pointing to the parent
folder of TESSDATA folder?

Have you fixed this by setting TESSDATA_PREFIX? This is prominently
mentioned in the README [1] It should now probably point at your SVN
working directory (and make sure it ends with a / character).

And sorry to say, if you keep running into problems like this, you
might want to think about learning to use the Visual Studio 2008
Debugger :) It's pretty easy, and very handy for figuring out exactly
where a program crashes.

1) You already know how to build tesseract with VS, so just set your
   build configuration to LIB_Debug (when debugging the training apps).

2) Make the training app project (in this case dawg2wordlist) you are
   trying to debug, the Default Startup project (by right clicking it
   and choosing Set as Startup Project).

3) Open up the training app project's properties (by again
   right-clicking it and choosing Properties).

4) Make sure at the top Configuration: is LIB_Debug.

5) In the Configuration Properties | Debugging Category, set the
   following fields:

   Command Arguments: (whatever you specified on the command line) so set it to:

      kan.unicharset kan.word-dawg word.wordlist

   Working Directory should be your working directory so:

      M:\New Folder\New Folder

   (a terrible name for folders BTW :P )

6) Now for the exciting part, right-click the dawg2wordlist project and
   choose Debug -> Start new instance from the popup menu.

A new command window will show up (possible hidden by Visual Studio),
displaying all of dawg2wordlist's output.

When the program crashes, you should see a window in the debugger that
shows exactly where the program was when it crashed and what the error
reason is. From that either you (hopefully) or we can better figure
out what is going wrong.

[1] http://code.google.com/p/tesseract-ocr/wiki/ReadMe

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to