Awesome help! I was only getting garbage from the GetXText() methods. The problem was I was using opencv matrices to load my images. I wasn't aware of how to read in a Pix image until I saw the code snippet in this thread. Now my data actually makes sense =)
On Thursday, December 15, 2011 5:03:47 PM UTC-5, braza wrote: > > Hi, > > The world of open source welcomes me with insufficient info/examples/ > documentation but with opened doors to ask ;) > > I`m trying just to recognize really clear and simple line of text in > English like "Tess TEST 123.4 $15" > > now I have: > > //Tesseract block start > CTessOCR *tess = CTessOCR::Instance(); > tess->api->SetVariable ("tessedit_char_whitelist", > "0123456789"); > tess->api->SetVariable ("classify_bln_numeric_mode", > "1"); > tess->api->Init ("./../../tessdata", > "eng"); > #ifdef DEBUG_MODE > tess->api->SetVariable ("tessedit_adaption_debug", > "T"); > tess->api->SetVariable ("tessedit_draw_outwords", > "T"); > tess->api->SetVariable ("tessedit_dump_choices", > "T"); > tess->api->SetVariable ("tessedit_dump_choices", > "T"); > tess->api->SetVariable ("interactive_mode", "T"); > tess->api->SetVariable ("tessedit_create_hocr", > "T"); > #endif > tess->api->SetVariable ("tessedit_single_match", > "0"); > tess->api->SetVariable ("tessedit_zero_rejection", > "T"); > tess->api->SetVariable ("tessedit_minimal_rejection", > "F"); > tess->api->SetVariable ("tessedit_write_rep_codes", > "F"); > tess->api->SetVariable > ("tessedit_resegment_from_boxes", > "T"); > tess->api->SetVariable ("tessedit_train_from_boxes", > "T"); > tess->api->SetVariable ("textord_fast_pitch_test", > "T"); > tess->api->SetVariable ("textord_no_rejects", "T"); > tess->api->SetVariable ("edges_children_fix", "F"); > tess->api->SetVariable ("edges_childarea", "0.65"); > tess->api->SetVariable ("edges_boxarea", "0.9"); > tess->api->SetVariable ("il1_adaption_test", "1"); > tess->api->SetPageSegMode (tesseract::PSM_SINGLE_LINE); > > Mat img = imread( "../../tess.jpg", CV_LOAD_IMAGE_GRAYSCALE ); > //err > now > tess->api->SetImage(convert_mat_to_pix(img)); > std::string text = tess->api->GetUTF8Text(); > > It all fails in > > > match.exe!OpenBoxFile(const STRING & fname) > match.exe!tesseract::Tesseract::ApplyBoxes(const STRING & fname, > bool find_segmentation, BLOCK_LIST * block_list) > match.exe!tesseract::TessBaseAPI::Recognize(ETEXT_DESC * monitor) > match.exe!tesseract::TessBaseAPI::GetUTF8Text() > > Obviously it fails because I`ve never set input file name with boxes. > But why would I need it? I already have trained data downloaded and > put in tessdata: eng.traineddata, eng.cube.size etc > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.