$ combine_tessdata -u ./third_party/tesseract/tessdata/ kan.traineddata ./kan. Extracting tessdata components from ./third_party/tesseract/tessdata/ kan.traineddata Wrote ./kan.unicharset Wrote ./kan.inttemp Wrote ./kan.pffmtable Wrote ./kan.normproto Wrote ./kan.punc-dawg Wrote ./kan.word-dawg Wrote ./kan.number-dawg Wrote ./kan.freq-dawg
$ ls kan.* kan.freq-dawg kan.inttemp kan.normproto kan.number-dawg kan.pffmtable kan.punc-dawg kan.unicharset kan.word-dawg $ dawg2wordlist kan.unicharset kan.word-dawg word.wordlist Loading word list from kan.word-dawg Reading squished dawg Word list loaded. $ wc -l word.wordlist 18720 word.wordlist Looks like there are 18,720 words in the Kannada word dawg, safely uncompressed... On Mar 7, 8:43 am, "Sriranga(78yrs)" <withblessing.sriranga. 1...@gmail.com> wrote: > David, > just now I checked with kan.punc-dawg(1KB) and kan.number-dawg(1KB) also. > it works fine In both cases the output were not empty. Only > word-dawg(181KB) and freq-dawg(2KB) does not work but with M$ windows's exe > encounter message were displayed. > this is brought to your kind notice. Even attached files of kan.word-dawg > and kan.freq.dawg - for your investigation and valuable guidance. > With warmest regards, > -sriranga(79yrs) > > On Wed, Mar 7, 2012 at 9:44 AM, Sriranga(78yrs) < > > > > > > > > withblessing.sriranga.1...@gmail.com> wrote: > > David, > > Thanks for the valuable guidance. > > Copied dawg2wordlist.exe pasted in the folder n:\Newfolder\ wherein > > extracted files Kan.unicharset, kan.word-dawg, kan.freq-dawg are located. > > > extract of cmd is reproduced below - with encounter.exe windows messages > > displayed for word-dawg and freq-dawg. > > M:\New Folder>dawg2wordlist.exe -h > > Print all the words in a given dawg. > > Usage: dawg2wordlist.exe <unicharset> <dawgfile> <wordlistfile> > > > M:\New Folder>dawg2wordlist.exe kan.unicharset kan.word-dawg testwordlist > > Loading word list from kan.word-dawg > > Reading squished dawg > > > M:\New Folder>dawg2wordlist.exe kan.unicharset kan.freq-dawg testwordlist > > Loading word list from kan.freq-dawg > > Reading squished dawg > > Word list loaded. > > M:\New Folder> > > > [Note: testwordlist contains 0(zero)kb for kan.freq-dawg which contains > > 2KB - > > whereas testwordlist did not generate for kan.word-dawg which > > contains 181KB] > > Awaiting further valuable guidance. > > With regards, > > -sriranga(79yrs) > > > Still i could not understand where I made mistake? > > With regards, > > -sriranga(79yrs) > > > On Wed, Mar 7, 2012 at 2:41 AM, David Eger <david.e...@gmail.com> wrote: > > >> Where you put wordlist2dawg.exe, try putting the name of the output list > >> instead. > > >> On Friday, March 2, 2012 2:39:33 AM UTC-8, sriranga(79yrsold) wrote: > > >>> I had extracted kan.word-dawg from the Kan.traineddata. I am trying to > >>> convert dawg to wordlist using commandline in cmd as follows: > > >>> ***M:\r684\BuildFolder\tesseract-ocr>dawg2wordlist "m:\New > >>> Folder\kan.unicharset" " > >>> m:\New Folder\kan.word-dawg" wordlist2dawg.exe > >>> Loading word list from m:\New Folder\kan.word-dawg > >>> Reading squished dawg > > >>> M:\r684\BuildFolder\tesseract-ocr> > >>> * > >>> Unfortunately windows encounter exe displayed. Where I made a mistake? > >>> Awaiting solution? > > >> -- > >> You received this message because you are subscribed to the Google > >> Groups "tesseract-ocr" group. > >> To post to this group, send email to tesseract-ocr@googlegroups.com > >> To unsubscribe from this group, send email to > >> tesseract-ocr+unsubscr...@googlegroups.com > >> For more options, visit this group at > >>http://groups.google.com/group/tesseract-ocr?hl=en > > > > kan.word-dawg > 243KViewDownload > > kan.freq-dawg > 2KViewDownload > > kan.punc-dawg > < 1KViewDownload > > kan.number-dawg > < 1KViewDownload -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en