Thanks for the replies. As I said before, I was reluctant to modify the TESSDATA_PREFIX environment variable, since it could hypothetically interfere with another installation of Tesseract the user might have. After thinking it over, I realized that the simplest solution was just to ignore the whole problem. If the variable existed, Tesseract would try to use the pre-installed language files. Otherwise, it would use the language files in the directory it was run from.
This was still less than ideal since I wanted to do some custom training, which would be overridden if another installation existed. Eventually, I just tweaked the code to ignore TESSDATA_PREFIX altogether and assume the language data was stored in a fixed location relative to the engine. Peachy! On May 21, 12:43 pm, Dmitri Silaev <[email protected]> wrote: > Maybe you don't need all these details but at least this can be useful > for other forum users. The rules are a bit complex. > > As Quan said, Tesseract first looks if the TESSDATA_PREFIX environment > variable exists. If it does, Tess then appends to it the string stored > in the "m_data_sub_dir" param. The resulting dir name is where Tess > looks for lang files. > > Now the hard part: > - If the environment variable does not exist, under Windows Tess > checks if it's run via a DLL or linked statically into an EXE. > Whichever is in effect, its location is taken as a base directory. The > name of the DLL being checked is stored in the "tessedit_module_name" > config param, default is "tessdll.dll". > - If for some reason Tess cannot obtain executable's file name, as a > base directory it takes the current working directory (namely "./"). > - By default "m_data_sub_dir" is "tessdata/" but it can be altered via > a config file (you can specify it in the command line). > - Both the environment variable and "m_data_sub_dir" should contain > trailing "/". > - Windows installer automatically creates TESSDATA_PREFIX and sets it > to "<ProgramFiles>\Tesseract-OCR\". > > So, if don't want to deal with environment variables, you can stick to > a config file and set "m_data_sub_dir" to point to any directory you > like using a *relative* path. Well, almost to any. It should be on the > same drive. > > Warm regards, > Dmitri Silaevwww.CustomOCR.com > > > > > > > > On Fri, May 20, 2011 at 2:47 PM, Daniel <[email protected]> wrote: > > I'm attempting to integrate Tesseract 3 with another stand-alone app, > > but I'm running into a problem: Tesseract always looks for the > > language files in "\Program Files (x86)\Tesseract-OCR\tessdata"; I > > need to store the language files in a different location (a subfolder > > of my app's installation folder.) > > > I'm assuming Tesseract is getting this folder from the registry, so I > > could just change the installation path listed, but (a) I don't want > > to break user's possible other installations, and (b) I tried that and > > it (inexplicably) didn't work. > > > Is there a way to specify the hard path from the command line, or do I > > have to modify the code? > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

