Re: Improving Tika OCR

2017-04-21 Thread Thamme Gowda
Thanks, Kranthi. Keep us informed about how it goes. Cheers, TG On Thu, Apr 20, 2017 at 1:01 PM, Kranthi Kiran G V < kkran...@student.nitw.ac.in> wrote: > Hello Thamme, > > Agreed. Looking at the paper[1], it seems to me that tesseract and VGG > models can co-exist > in Tika to serve all kinds

Re: Change Scope of Jai-ImageIO-Core dependency

2017-04-21 Thread Mattmann, Chris A (3010)
Sounds good to me and looking forward to reviewing the update. ++ Chris Mattmann, Ph.D. Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF & Open Source Projects Formulation and Development

Change Scope of Jai-ImageIO-Core dependency

2017-04-21 Thread Luís Filipe Nassif
Hi devs, Looks like jai-imageio-core from github ( https://github.com/jai-imageio/jai-imageio-core) on which we depend with test scope is Apache compatible. Note that is a fork from the original Jai project referenced by PDFBox. The github fork has extracted jpeg2000 and other problematic code

[jira] [Commented] (TIKA-2335) Extract path info from Excel 2013 .xlsx and .xlsb

2017-04-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978693#comment-15978693 ] Tim Allison commented on TIKA-2335: --- Added to xlsb in POI:

[jira] [Created] (TIKA-2336) Upgrade to POI 3.17-beta1 when available

2017-04-21 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2336: - Summary: Upgrade to POI 3.17-beta1 when available Key: TIKA-2336 URL: https://issues.apache.org/jira/browse/TIKA-2336 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-2335) Extract path info from Excel 2013 .xlsx and .xlsb

2017-04-21 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2335: - Summary: Extract path info from Excel 2013 .xlsx and .xlsb Key: TIKA-2335 URL: https://issues.apache.org/jira/browse/TIKA-2335 Project: Tika Issue Type:

[jira] [Comment Edited] (TIKA-2024) Extract original filename/path when possible

2017-04-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977378#comment-15977378 ] Tim Allison edited comment on TIKA-2024 at 4/21/17 12:14 PM: - Found another