RE: How to extract autoshape text in Excel 2007+

2013-07-22 Thread Allison, Timothy B.
This looks like an area for a new feature in both Tika and POI. I've only looked very briefly into the POI libraries, and I may have missed how to extract text from autoshapes. I'll open an issue in both projects. -Original Message- From: Hiroshi Tatsumi

RE: How to extract autoshape text in Excel 2007+

2013-07-22 Thread Allison, Timothy B.
This is one way to access the underlying CTShape that contains the text: XSSFWorkbook wb = new XSSFWorkbook(new FileInputStream(f)); XSSFSheet sheet = wb.getSheetAt(0); XSSFDrawing drawing = sheet.createDrawingPatriarch(); for (XSSFShape shape :

Re: How to extract autoshape text in Excel 2007+

2013-07-22 Thread Hiroshi Tatsumi
Thank you for your reply. I really appreciate it. This is a high priority for me. Because we use solr, and our customer wants to search autoshapes' text in Excel 2007+ files. I've been investigating the Tika source code, and trying to fix it. I understand that I can extract text from