Please see related discussion in
https://github.com/tesseract-ocr/tesseract/issues/1074



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Sep 15, 2017 at 1:42 PM, Supriya Das <[email protected]> wrote:

> Thanks. Is there any idea how to change code to get font information by
> LSTM.
>
> On Wednesday, 13 September 2017 18:52:22 UTC+5:30, Supriya Das wrote:
>>
>> *Hi All,*
>>
>> *When i am trying to get information of word wise font attributes not
>> getting result. I choose ocr engine as a OEM_LSTM_ONLY.  If i
>> choose OEM_TESSERACT_ONLY then getting  correct result. Please suggest.
>> Thanks in advance. *
>>
>>
>> tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
>> // Initialize tesseract-ocr with English, without specifying tessdata path
>> if(api->Init(NULL, "eng", tesseract::OEM_LSTM_ONLY)) {
>> fprintf(stderr, "Could not initialize tesseract.\n");
>> exit(1);
>> }
>> //api->SetVariable("save_blob_choices", "T");
>> //api->SetVariable("tessedit_char_whitelist","0123456789abcd
>> efghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ./-");
>> //api->SetVariable("tessedit_ocr_engine_mode","10");
>> api->SetPageSegMode(tesseract::PSM_AUTO_ONLY);
>> api->SetVariable("chop_enable", "1");
>> api->SetImage((uchar*)Result.data,Result.cols,Result.rows,1,Result.cols);
>>
>>
>> time(& Start);
>> char* out = api->GetUTF8Text();
>> time(& End);
>> Result.release();
>> dif = difftime(End,Start);
>> printf("\nTesseract Processing Time  %lf\n",dif);
>> // Tesseract bounding box finding Word Wise
>> time(& Start);
>> tesseract::ResultIterator* ri = api->GetIterator();
>> char* word;
>> const char *font_name;
>> bool bold, italic, underlined, monospace, serif, smallcaps;
>> int pointsize, font_id;
>> if(ri != 0){
>> do{
>>       //tesseract::Orientation orientation;
>>                    //tesseract::WritingDirection writing_direction;
>>                    //tesseract::TextlineOrder textline_order;
>>                    //float deskew_angle;
>>                    //ri->Orientation(&orientation,&writing_direction,&
>> textline_order,&deskew_angle);
>>       word = ri->GetUTF8Text(tesseract::RIL_WORD);
>>
>> if(word != 0 && word[0] != '\0' && word[1] != ' ' && word[0] != ' '){
>> float conf = ri->Confidence(tesseract::RIL_WORD);
>> int x1, y1, x2, y2;
>> if(conf > 0)
>> {
>> font_name = ri->WordFontAttributes(&bold, &italic, &underlined,
>> &monospace, &serif,
>> &smallcaps, &pointsize,
>> &font_id);
>> ri->BoundingBox(tesseract::RIL_WORD, &x1, &y1, &x2, &y2);
>> fprintf(fpout,"%s %d %d %d %d %d %d %d %d %d %d %d %d %s\n",word, x1, y1,
>> x2, y2,pointsize,bold,italic,underlined,monospace,serif,smallcap
>> s,font_id,font_name);
>> }
>> }
>> } while((ri->Next(tesseract::RIL_WORD)));
>> }
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/abe274d2-f6b2-499a-bcb2-468a72804970%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/abe274d2-f6b2-499a-bcb2-468a72804970%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUq1-yLrPb%3DuEq1u7xZv%3DvaJPo5tTYedca40c1hWOo3cA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to