Dear All, I have update this thread again.
I deeply investigate the source code, and I found there are TableFinder, TableRecognizer and StructuredTable classes related to table detection and table recognition. And my question is these classes seem to designed to deal with regular tables with every table cells filled with content. For irregular tables(e.g. different rows have different columns or vice versa) or some table cells left unfilled, these code can not work well. Is my understanding correct? If so, is there already plans to improve these code? And can someone can give me some advice to over come the "irregular table recognition and cell extraction" problem? Thank you all in advance. 在 2012年6月19日星期二UTC+8下午4时26分33秒,Neo Song写道: > > Dear All, > > Currently I am doing a table text extraction project, and we need to > identify the table before any OCR process. > I investigate the related source code (checked out version:r729), and > found the there is a table finder class inside tesseract (tablefind.cpp). > The problem is that for the irregular tables(e.g. different rows have > different columns), even if I got all the ruling lines, I can not identify > the concrete table cells. > I have called the function "FindLinesCreateBlockList()" and I can > iterate all the text block, horizontal lines and vertical lines in the > target image. However I can do nothing with these horizontal lines and > vertical lines, what I need is something like a CELL_LIST, which contains > every table cell in a reading order based on table ruling lines. I believe > that the table finder may already contain such a algorithm(I read the code > but it is too much complicated), but not exposed to Base API interface. Is > it true? > Can someone help me out of this? How to obtain the table cells? An > example of such irregular table can be found in the attachment. > 在 2012年6月19日星期二UTC+8下午4时26分33秒,Neo Song写道: > > Dear All, > > Currently I am doing a table text extraction project, and we need to > identify the table before any OCR process. > I investigate the related source code (checked out version:r729), and > found the there is a table finder class inside tesseract (tablefind.cpp). > The problem is that for the irregular tables(e.g. different rows have > different columns), even if I got all the ruling lines, I can not identify > the concrete table cells. > I have called the function "FindLinesCreateBlockList()" and I can > iterate all the text block, horizontal lines and vertical lines in the > target image. However I can do nothing with these horizontal lines and > vertical lines, what I need is something like a CELL_LIST, which contains > every table cell in a reading order based on table ruling lines. I believe > that the table finder may already contain such a algorithm(I read the code > but it is too much complicated), but not exposed to Base API interface. Is > it true? > Can someone help me out of this? How to obtain the table cells? An > example of such irregular table can be found in the attachment. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en