Re: Table Analysis and Cell Labeling

Neo Song Thu, 28 Jun 2012 02:31:09 -0700

Dear All,

    I have update this thread again.


    I deeply investigate the source code, and I found there are 
TableFinder, TableRecognizer and StructuredTable classes related to table 
detection and table recognition. And my question is these classes seem to 
designed to deal with regular tables with every table cells filled with 
content. For irregular tables(e.g. different rows have different columns or 
vice versa) or some table cells left unfilled, these code can not work 
well. Is my understanding correct?
    If so, is there already plans to improve these code? And can someone 
can give me some advice to over come the "irregular table recognition and 
cell extraction" problem?
    Thank you all in advance.

在 2012年6月19日星期二UTC+8下午4时26分33秒，Neo Song写道：
>
> Dear All,
>
>     Currently I am doing a table text extraction project, and we need to 
> identify the table before any OCR process. 
>     I investigate the related source code (checked out version:r729), and 
> found the there is a table finder class inside tesseract (tablefind.cpp). 
> The problem is that for the irregular tables(e.g. different rows have 
> different columns), even if I got all the ruling lines, I can not identify 
> the concrete table cells.
>     I have called the function "FindLinesCreateBlockList()" and I can 
> iterate all the text block, horizontal lines and vertical lines in the 
> target image. However I can do nothing with these horizontal lines and 
> vertical lines, what I need is something like a CELL_LIST, which contains 
> every table cell in a reading order based on table ruling lines. I believe 
> that the table finder may already contain such a algorithm(I read the code 
> but it is too much complicated), but not exposed to Base API interface. Is 
> it true?
>     Can someone help me out of this? How to obtain the table cells? An 
> example of such irregular table can be found in the attachment. 
>

在 2012年6月19日星期二UTC+8下午4时26分33秒，Neo Song写道：
>
> Dear All,
>
>     Currently I am doing a table text extraction project, and we need to 
> identify the table before any OCR process. 
>     I investigate the related source code (checked out version:r729), and 
> found the there is a table finder class inside tesseract (tablefind.cpp). 
> The problem is that for the irregular tables(e.g. different rows have 
> different columns), even if I got all the ruling lines, I can not identify 
> the concrete table cells.
>     I have called the function "FindLinesCreateBlockList()" and I can 
> iterate all the text block, horizontal lines and vertical lines in the 
> target image. However I can do nothing with these horizontal lines and 
> vertical lines, what I need is something like a CELL_LIST, which contains 
> every table cell in a reading order based on table ruling lines. I believe 
> that the table finder may already contain such a algorithm(I read the code 
> but it is too much complicated), but not exposed to Base API interface. Is 
> it true?
>     Can someone help me out of this? How to obtain the table cells? An 
> example of such irregular table can be found in the attachment. 
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Table Analysis and Cell Labeling

Reply via email to