[jira] [Commented] (TIKA-1508) Add uniformity to parser parameter configuration

2016-03-29 Thread Thamme Gowda N (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217280#comment-15217280 ] Thamme Gowda N commented on TIKA-1508: -- [~talli...@mitre.org] [~chrismattmann] Starti

[jira] [Commented] (TIKA-1896) Invalid closing script tag not handled gracefully by HtmlParser

2016-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216786#comment-15216786 ] Tim Allison commented on TIKA-1896: --- The script element is defined as {{cdata}} by TagSou

[jira] [Commented] (TIKA-1896) Invalid closing script tag not handled gracefully by HtmlParser

2016-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216767#comment-15216767 ] Tim Allison commented on TIKA-1896: --- This is the stream of start/end elements and calls t

[jira] [Commented] (TIKA-1896) Invalid closing script tag not handled gracefully by HtmlParser

2016-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216732#comment-15216732 ] Tim Allison commented on TIKA-1896: --- Thank you for raising this. I'm not exceedingly fam

[jira] [Commented] (TIKA-1836) Convertion DOC->TXT failed due to POI issue

2016-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216683#comment-15216683 ] Tim Allison commented on TIKA-1836: --- Thank you for monitoring StackOverflow and pointing

[jira] [Comment Edited] (TIKA-1836) Convertion DOC->TXT failed due to POI issue

2016-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216679#comment-15216679 ] Tim Allison edited comment on TIKA-1836 at 3/29/16 7:16 PM: No

[jira] [Comment Edited] (TIKA-1836) Convertion DOC->TXT failed due to POI issue

2016-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216679#comment-15216679 ] Tim Allison edited comment on TIKA-1836 at 3/29/16 7:15 PM: No

[jira] [Commented] (TIKA-1836) Convertion DOC->TXT failed due to POI issue

2016-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216679#comment-15216679 ] Tim Allison commented on TIKA-1836: --- No problem. 1.12 was cut in January. This is fixed

[jira] [Commented] (TIKA-1836) Convertion DOC->TXT failed due to POI issue

2016-03-29 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216665#comment-15216665 ] Konstantin Gribov commented on TIKA-1836: - Sorry, I thought that it was revision in

[jira] [Updated] (TIKA-1836) Convertion DOC->TXT failed due to POI issue

2016-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1836: -- Fix Version/s: 1.13 > Convertion DOC->TXT failed due to POI issue > -

[jira] [Commented] (TIKA-1836) Convertion DOC->TXT failed due to POI issue

2016-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216657#comment-15216657 ] Tim Allison commented on TIKA-1836: --- The change should have been in POI, no Tika was touc

[jira] [Reopened] (TIKA-1836) Convertion DOC->TXT failed due to POI issue

2016-03-29 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Gribov reopened TIKA-1836: - > Convertion DOC->TXT failed due to POI issue > --- > >

[jira] [Commented] (TIKA-1836) Convertion DOC->TXT failed due to POI issue

2016-03-29 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216653#comment-15216653 ] Konstantin Gribov commented on TIKA-1836: - It seems to be a regression: http://sta

Who's going to Apache: Big Data in May?

2016-03-29 Thread Ken Krugler
I'll be giving a talk (Cascading+Flink) at the conference on Monday, May 9th. I'm planning to stay through Wednesday noon-ish. Wondering if any other Tika devs are going to be attending... -- Ken PS - full schedule at http://events.linuxfoundation.org/events/apache-big-data-north-america/progr

Re: GSOC2016 Sentiment Analysis

2016-03-29 Thread Mattmann, Chris A (3980)
Great that sound awesome Anthony. Friday at 10am PT it is. Please add chris.mattm...@gmail.com to your GHangout buddy list. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet P

Re: GSOC2016 Sentiment Analysis

2016-03-29 Thread Mattmann, Chris A (3980)
I like both of your comments Mondher and Madhawa. My team at USC has been investigating the use of particular corpuses including Fisher Callhome so as to support sentiment analysis. We have been writing Java code outside of both OpenNLP and Tika but with the goal of integrating them into both. We h

[jira] [Commented] (TIKA-1910) Tika 2.0 - Decouple Tika Parser Office Module from Other Dependencies

2016-03-29 Thread Bob Paulin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216019#comment-15216019 ] Bob Paulin commented on TIKA-1910: -- bq. This sounds dangerous. Should we set the default L

[jira] [Commented] (TIKA-1910) Tika 2.0 - Decouple Tika Parser Office Module from Other Dependencies

2016-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215910#comment-15215910 ] Tim Allison commented on TIKA-1910: --- bq. Yes. The goal is if Parser X instantiates Parser

[jira] [Commented] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

2016-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215855#comment-15215855 ] Tim Allison commented on TIKA-1285: --- I opened TIKA-1912 to track this issue. > Upgrade t

[jira] [Commented] (TIKA-1912) Figure out how to parse truncated PDFs that were handled by PDFBox 1.8.x but not by 2.0.0

2016-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215850#comment-15215850 ] Tim Allison commented on TIKA-1912: --- Overall, I see two options: 1. Improve PDFBox 2.0.x

[jira] [Updated] (TIKA-1912) Figure out how to parse truncated PDFs that were handled by PDFBox 1.8.x but not by 2.0.0

2016-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1912: -- Description: While working on TIKA-1285, we found that PDFBox 2.0.0 is not able to handle truncated files