Here are a few more details from an email exchange for those that are interested:
command line is just what is between quotes here: “combine tessdata/ eng." I forgot to mention that the source code for combine, which I have not modified, expects the “eng.” to be added. So, people like me and you have to copy and rename the files into tessdata, such as : copy normproto tessdata\eng.normproto Note that the name "combine" might not be permanent. Ray Smith might produce something else in the official distribution when he gets to wrapping up the training implementation. On Sep 4, 8:37 am, 74yrs old <[email protected]> wrote: > Dear Steve, > I am extremely thankful to you for your valuable clarification. But still I > could not understand/confusion re: > "command line is just what is between quotes here: “combine tessdata/eng" > > It is presumed that *all datafiles-without prefix like eng*. should be > generated as usual - as done in tesseract 2.04 and then *run single line > commandline* as follows: " *combine tessdata/eng* " for tesseract 3.0. > > Kindly excuse me for giving you trouble, since I am not programmer nor > developer. > > With Warmest Regards, > -sriranga(76yrsold) > > > > On Fri, Sep 4, 2009 at 8:22 PM, Pohorsky, Steve <[email protected]> wrote: > > >>see below. Will also post parts to tesseract group. > > > *Steve Pohorsky* > > > Tel +1 818 493 3432 > > > Fax +1 818 362 5851 > > > *[email protected] <[email protected]>*** > > ------------------------------ > > > *From:* 74yrs old [mailto:[email protected]] > > *Sent:* Friday, September 04, 2009 1:34 AM > > *To:* [email protected] > > *Cc:* Pohorsky, Steve > > *Subject:* Re: A vcproj file for building the traineddata files for 3.0 > > > SteveP, > > Appreciated for the detailed instructions how to generate combine.exe. > > Thanks for the same. > > I followed your guidance > > > "rick-click on "Solution 'teseract'" in the Solution Explorer pane, > > > select Add, > > > select Existing Project... > > > > In the dialog box that comes up, navigate to the folder that has > > > combine.vcproj in it, > > > select this vcproj file and click on Open. > > > If this worked, you should see "combine" as a new project in the > > > solution." > > As a result of the re-compilation (Build batch -select all -Clean -rebuild > > all) in VC++2008 > > > cntraining - 0 error(s), 11 warning(s) > > ========== Rebuild All: 35 succeeded, 1 failed, 0 skipped ========== > > Note: I could not understand "1 failed" - which one failed? > > > >>click in the Output pane. do a Find for “error”. > > > In the bin.dbg = 7exe files generated including combine.exe appeared.. > > In the Main folder = 6exe(release) generated. Copied combine.exe from > > bin.dbg and > > pasted under Main folder. Thus total 7 exe files[6 exe release +one exe > > dbg] existed. > > > Tested tesseract photest.tif phtest logfile = phtest.txt reproduced > > correctly from tif file. > > > Regarding generating combine.exe: As per your guidance > > ">To run this exe, it needs to run with the working > > > directory set to the folder that has the tessdata folder in it. The > > > easiest way to do this is to copy the exe to that folder" > > Whether copy "combine.exe" found in bin.dbg can be pasted into folder > > "tessdata" ? > > > >> not into tessdata, but into the folder above it, the one that contains > > tessdata folder. > > > Because I don't know which are files of DLLs to copied into bin.dbg. > > It is presumed that six files of DLLs are of Lepton like Jpeg62.dll, > > libimage.dll, librle3.dll, leptonlib.dll, > > libpng13.dll, libtiff3.dll. plus "tessdata" folder have to be copied into > > bin.dbg. > > > >> I was referring to what Ray S wrote in the README in the wiki site, ‘all > > DLLs except tessdll”. > > > Further, It is presumed that to run combine.exe - the command line( example > > for English datafiles) > > should be as follows: > > " combine tessdata/eng.freq-dawg, tessdata/eng.user-words, > > tessdata/eng.word-dawg, > > tessdata/eng.inttemp, tessdata/eng.normproto, tessdata/eng.pffmtable, > > tessdata/eng.unicharset, > > tessdata/eng.DangAmbigs (output)eng.traineddata " > > > >> no, command line is just what is between quotes here: “combine > > tessdata/eng.” > > > >>all of the suffixes are in the source code; that is why they are not > > specified on command line. > > > >>Note that “DangAmbigs” is the old name. For 3.0 tesseract source code for > > combine (I did not write it) uses “unicharambigs”. > > > Kindly confirm above my presumptions. > > > With Regards, > > -sriranga(76yrs old) > > > On Fri, Sep 4, 2009 at 6:09 AM, SteveP <[email protected]> wrote: > > > This vcproj does not build in Release, so only try building in Debug. > > > On Sep 3, 2:26 pm, SteveP <[email protected]> wrote: > > > I was asked for more details. I will give what I can for those > > > interested, but I think there may still be some other information we > > > need from Ray Smith. > > > > First, copy the file combine.vcproj to the same folder that has > > > tesseract.sln in it. > > > (If you like, you may make a copy of tesseract.sln as a backup, since > > > the steps below update that file.) > > > To update that file, you may add combine.vcproj to the tesseract > > > solution as follows. With this solution open in Visual Studio 2008, > > > rick-click on "Solution 'teseract'" in the Solution Explorer pane, > > > select Add, > > > select Existing Project... > > > > In the dialog box that comes up, navigate to the folder that has > > > combine.vcproj in it, > > > select this vcproj file and click on Open. > > > If this worked, you should see "combine" as a new project in the > > > solution. > > > > You may right click on this new project and select Build to build it > > > and produce > > > combine.exe. To run this exe, it needs to run with the working > > > directory set to the folder that has the tessdata folder in it. The > > > easiest way to do this is to copy the exe to that folder if it is not > > > already there, but if your exe file is in the bin.dbg folder, you can > > > alternatively follow Ray Smith's suggestion to copy the tessdata > > > folder and the dlls to bin.dbg. > > > > To run combine.exe after preparing the necessary files (see below), > > > follow the Usage in the source code: > > > "Usage: %s language_data_path_prefix (e.g. tessdata/eng.)", > > > which means the following command for english: > > > combine tessdata/eng. > > > Here the final period is part of the command since the input files for > > > combine include that period. It is part of language_data_path_prefix. > > > > So what are the necessary files, and which files are optional? > > > To quote the source code, the file paths are a concatenation (as in > > > strcat) of > > > language_data_path_prefix and the suffixes shown below. Note that > > > some of the names are > > > different from 2.04. Except for the unicharset file, all of the files > > > are optional as far as combine is concerned. Thus for English, > > > tessdata/eng.unicharset is a required file, and files such as > > > tessdata/eng.inttemp would also come from training just as before > > > tesseract 3.0. > > > > As of 9-3-2009, there appear to still be missing details or > > > instructions on some of the new files, such as punc-dawg. (FYI, Ray > > > Smith is the expert, I just am summarizing what I see in the source > > > code.) > > > > //Suffixes of input files (most optional) used to build traineddata > > > file. > > > static const char kLangConfigFileSuffix[] = "config"; > > > static const char kUnicharsetFileSuffix[] = "unicharset"; > > > static const char kAmbigsFileSuffix[] = "unicharambigs"; > > > static const char kBuiltInTemplatesFileSuffix[] = "inttemp"; > > > static const char kBuiltInCutoffsFileSuffix[] = "pffmtable"; > > > static const char kNormProtoFileSuffix[] = "normproto"; > > > static const char kPuncDawgFileSuffix[] = "punc-dawg"; > > > static const char kSystemDawgFileSuffix[] = "word-dawg"; > > > static const char kNumberDawgFileSuffix[] = "number-dawg"; > > > static const char kFreqDawgFileSuffix[] = "freq-dawg"; > > > > On Sep 2, 5:32 pm, SteveP <[email protected]> wrote: > > > > > I uploaded a vcproj file named combine.vcproj in the Files area for > > > > Windows users for tesseract 3.0. Ray Smith said the code was there to > > > > build the traineddata files. This builds that code. This vcproj goes > > > > > in the top folder where tesseract.vcproj is.- Hide quoted text - > > > > - Show quoted text - > > > This communication, including any attachments, may contain information that > > is proprietary, privileged, confidential or legally exempt from disclosure. > > If you are not a named addressee, you are hereby notified that you are not > > authorized to read, print, retain a copy of or disseminate any portion of > > this communication without the consent of the sender and that doing so may > > be unlawful. If you have received this communication in error, please > > immediately notify the sender via return e-mail and delete it from your > > system.- Hide quoted text - > > - Show quoted text - --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

