[jira] [Commented] (TIKA-1343) Create a Tika Translator implementation that uses JoshuaDecoder

2016-04-27 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261039#comment-15261039 ] ASF GitHub Bot commented on TIKA-1343: -- GitHub user lewismc opened a pull request:

[GitHub] tika pull request: TIKA-1343 Create a Tika Translator implementati...

2016-04-27 Thread lewismc
GitHub user lewismc opened a pull request: https://github.com/apache/tika/pull/112 TIKA-1343 Create a Tika Translator implementation that uses JoshuaDecoder This issue is this afternoons first attempt at addressing the long overdue https://issues.apache.org/jira/browse/TIKA-1343

[jira] [Created] (TIKA-1962) Support Topic Modeling in Tika

2016-04-27 Thread Madhawa Gunasekara (JIRA)
Madhawa Gunasekara created TIKA-1962: Summary: Support Topic Modeling in Tika Key: TIKA-1962 URL: https://issues.apache.org/jira/browse/TIKA-1962 Project: Tika Issue Type: New Feature

[jira] [Commented] (TIKA-1938) HtmlParser drops

2016-04-27 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260771#comment-15260771 ] ASF GitHub Bot commented on TIKA-1938: -- GitHub user naegelejd opened a pull request:

[GitHub] tika pull request: fix for TIKA-1938 contributed by naegelejd

2016-04-27 Thread naegelejd
GitHub user naegelejd opened a pull request: https://github.com/apache/tika/pull/111 fix for TIKA-1938 contributed by naegelejd Adds HtmlParser support for

Re: GSoC 2016: OpenNLP Sentiment Analysis

2016-04-27 Thread Mattmann, Chris A (3980)
thanks Anthony ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email:

[jira] [Commented] (TIKA-1885) Tika MIME updates for *.cdf and *.xar and custom zero length file detector based on TREC-DD-Polar

2016-04-27 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260457#comment-15260457 ] ASF GitHub Bot commented on TIKA-1885: -- Github user adeshgupta closed the pull request at:

[GitHub] tika pull request: TIKA-1885 updates to tika-mimetypes.xml and cus...

2016-04-27 Thread adeshgupta
Github user adeshgupta closed the pull request at: https://github.com/apache/tika/pull/89 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[jira] [Commented] (TIKA-1961) OutOfMemory when parsing shapes xml from xlsx files with multi-byte Unicode characters

2016-04-27 Thread Andrei Rebegea (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260407#comment-15260407 ] Andrei Rebegea commented on TIKA-1961: -- Thanks Tim for the advice. That is true, we only fixed part of

[jira] [Commented] (TIKA-1961) OutOfMemory when parsing shapes xml from xlsx files with multi-byte Unicode characters

2016-04-27 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260374#comment-15260374 ] Nick Burch commented on TIKA-1961: -- Alfresco upgrade of Apache Tika ought to be pretty easy, the only

[jira] [Commented] (TIKA-1961) OutOfMemory when parsing shapes xml from xlsx files with multi-byte Unicode characters

2016-04-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260372#comment-15260372 ] Tim Allison commented on TIKA-1961: --- Got it. Will add to our unit tests. Thank you. bq. that we

[jira] [Commented] (TIKA-1961) OutOfMemory when parsing shapes xml from xlsx files with multi-byte Unicode characters

2016-04-27 Thread Andrei Rebegea (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260361#comment-15260361 ] Andrei Rebegea commented on TIKA-1961: -- This problem was discovered by a client of Alfresco. We are

[jira] [Comment Edited] (TIKA-1961) OutOfMemory when parsing shapes xml from xlsx files with multi-byte Unicode characters

2016-04-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260325#comment-15260325 ] Tim Allison edited comment on TIKA-1961 at 4/27/16 3:35 PM: I _think_

[jira] [Commented] (TIKA-1961) OutOfMemory when parsing shapes xml from xlsx files with multi-byte Unicode characters

2016-04-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260325#comment-15260325 ] Tim Allison commented on TIKA-1961: --- I _think_ [~kiwiwings] recently fixed this in POI by swapping out

[jira] [Commented] (TIKA-1960) Put legacy language detection code back into 1.x=trunk

2016-04-27 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260317#comment-15260317 ] Konstantin Gribov commented on TIKA-1960: - Thanks, [~talli...@apache.org] for handling this. Sorry,

[jira] [Updated] (TIKA-1961) OutOfMemory when parsing shapes xml from xlsx files with multi-byte Unicode characters

2016-04-27 Thread Andrei Rebegea (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Rebegea updated TIKA-1961: - Attachment: problem char separation.png > OutOfMemory when parsing shapes xml from xlsx files with

[jira] [Updated] (TIKA-1961) OutOfMemory when parsing shapes xml from xlsx files with multi-byte Unicode characters

2016-04-27 Thread Andrei Rebegea (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Rebegea updated TIKA-1961: - Attachment: dmsu1332-reproduced.xlsx > OutOfMemory when parsing shapes xml from xlsx files with

[jira] [Created] (TIKA-1961) OutOfMemory when parsing shapes xml from xlsx files with multi-byte Unicode characters

2016-04-27 Thread Andrei Rebegea (JIRA)
Andrei Rebegea created TIKA-1961: Summary: OutOfMemory when parsing shapes xml from xlsx files with multi-byte Unicode characters Key: TIKA-1961 URL: https://issues.apache.org/jira/browse/TIKA-1961

[jira] [Commented] (TIKA-1844) PooledTimeSeriesParser takes precedence over MP4Parser

2016-04-27 Thread Bob Paulin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260071#comment-15260071 ] Bob Paulin commented on TIKA-1844: -- +1. Good time to be running into this stuff since we're still

[jira] [Commented] (TIKA-1924) Upgrade com.googlecode.mp4parser's isoparser to 1.1.18

2016-04-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260058#comment-15260058 ] Hudson commented on TIKA-1924: -- SUCCESS: Integrated in tika-2.x #90 (See

[jira] [Commented] (TIKA-1959) Upgrade to PDFBox 2.0.1/JempBox 1.8.12

2016-04-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260059#comment-15260059 ] Hudson commented on TIKA-1959: -- SUCCESS: Integrated in tika-2.x #90 (See

[jira] [Commented] (TIKA-1844) PooledTimeSeriesParser takes precedence over MP4Parser

2016-04-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260057#comment-15260057 ] Hudson commented on TIKA-1844: -- SUCCESS: Integrated in tika-2.x #90 (See

[jira] [Updated] (TIKA-1924) Upgrade com.googlecode.mp4parser's isoparser to 1.1.18

2016-04-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1924: -- Fix Version/s: 1.13 2.0 > Upgrade com.googlecode.mp4parser's isoparser to 1.1.18 >

[jira] [Updated] (TIKA-1959) Upgrade to PDFBox 2.0.1/JempBox 1.8.12

2016-04-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1959: -- Fix Version/s: 2.0 > Upgrade to PDFBox 2.0.1/JempBox 1.8.12 > -- > >

[jira] [Commented] (TIKA-1844) PooledTimeSeriesParser takes precedence over MP4Parser

2016-04-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259988#comment-15259988 ] Tim Allison commented on TIKA-1844: --- Moved...duh. Thank you. > PooledTimeSeriesParser takes precedence

RE: [jira] [Commented] (TIKA-1960) Put legacy language detection code back into 1.x=trunk

2016-04-27 Thread Allison, Timothy B.
Thank you, Dave! -Original Message- From: Hudson (JIRA) [mailto:j...@apache.org] Sent: Wednesday, April 27, 2016 3:55 AM To: dev@tika.apache.org Subject: [jira] [Commented] (TIKA-1960) Put legacy language detection code back into 1.x=trunk [