[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-08-06 Thread David Smiley (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661353#comment-14661353 ] David Smiley commented on TIKA-1607: TIKA isn't my area of expertise, but I think it sh

[jira] [Commented] (TIKA-1704) Update tika documentation for configuring ServiceLoader

2015-08-06 Thread Bob Paulin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661285#comment-14661285 ] Bob Paulin commented on TIKA-1704: -- [~gagravarr] Not sure if the format here is completely

[jira] [Updated] (TIKA-1704) Update tika documentation for configuring ServiceLoader

2015-08-06 Thread Bob Paulin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob Paulin updated TIKA-1704: - Attachment: TIKA-1704-DOCS.patch > Update tika documentation for configuring ServiceLoader > --

[jira] [Created] (TIKA-1704) Update tika documentation for configuring ServiceLoader

2015-08-06 Thread Bob Paulin (JIRA)
Bob Paulin created TIKA-1704: Summary: Update tika documentation for configuring ServiceLoader Key: TIKA-1704 URL: https://issues.apache.org/jira/browse/TIKA-1704 Project: Tika Issue Type: Improv

Re: 1.10 release missing license headers noted by Daniel Gruno

2015-08-06 Thread Mattmann, Chris A (3980)
Chris, Tyler Palsulich, Lewis John McGibbney, Mike Joyce, and I think a few others :-) I have a postdoc, Ji-Hyun working on it right now too :-) ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Sec

Re: 1.10 release missing license headers noted by Daniel Gruno

2015-08-06 Thread Nick Burch
On Thu, 6 Aug 2015, Tom Barber wrote: It works well as well doesn't it I mean "hey guys we're missing licence headers" normally I'd probably reply with an expletive, now I can just reply "oh DRAT". I'll get my coat. Isn't it Chris saying "oh DRAT"? Apache Creadur finds a problem, an A

Re: 1.10 release missing license headers noted by Daniel Gruno

2015-08-06 Thread Tom Barber
It works well as well doesn't it I mean "hey guys we're missing licence headers" normally I'd probably reply with an expletive, now I can just reply "oh DRAT". I'll get my coat. On 6 Aug 2015 22:44, "Mattmann, Chris A (3980)" < chris.a.mattm...@jpl.nasa.gov> wrote: > -Original Message

Re: 1.10 release missing license headers noted by Daniel Gruno

2015-08-06 Thread Mattmann, Chris A (3980)
-Original Message- From: Nick Burch Reply-To: "dev@tika.apache.org" Date: Thursday, August 6, 2015 at 2:25 PM To: "dev@tika.apache.org" Cc: Daniel Gruno Subject: Re: 1.10 release missing license headers noted by Daniel Gruno > >> Not sure. Will check. Also thinking of upgrading to DRA

Re: 1.10 release missing license headers noted by Daniel Gruno

2015-08-06 Thread Nick Burch
On Thu, 6 Aug 2015, Mattmann, Chris A (3980) wrote: I think we may have exclusions here since they are test resources? The tika-parsers/src/test/resources/test-documents/ shouldn't have headers at all, and the txt.Charset ones are taken from Icu4j so have their original license header on them

Re: 1.10 release missing license headers noted by Daniel Gruno

2015-08-06 Thread Mattmann, Chris A (3980)
I think we may have exclusions here since they are test resources? Not sure. Will check. Also thinking of upgrading to DRAT (instead of RAT): http://github.com/chrismattmann/drat/ See all the prezos, etc., there for why. Cheers, Chris +++

Re: 1.10 release missing license headers noted by Daniel Gruno

2015-08-06 Thread Nick Burch
On Thu, 6 Aug 2015, Mattmann, Chris A (3980) wrote: From Twitter: https://paste.apache.org/1CPH Don’t have to fix now, but would be good to fix for 1.11. Don't we have Apache Creadur (formerly Rat) setup on the build? If so, how did it pass? If not, can someone turn it on ASAP? :) Nick

1.10 release missing license headers noted by Daniel Gruno

2015-08-06 Thread Mattmann, Chris A (3980)
From Twitter: https://paste.apache.org/1CPH Don’t have to fix now, but would be good to fix for 1.11. Cheers, Chris P.S. Thanks for the catch Daniel! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Sy

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-08-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660623#comment-14660623 ] Tim Allison commented on TIKA-1678: --- I found vaguely similar numbers against govdocs1+sli

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-08-06 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660441#comment-14660441 ] Ray Gauss II commented on TIKA-1607: To clarify, the work mentioned above that uses an

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-08-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660304#comment-14660304 ] Tim Allison commented on TIKA-1607: --- [~chrismattmann], any and all feedback would be grea

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-08-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660283#comment-14660283 ] Chris A. Mattmann commented on TIKA-1607: - I'm confused about Ray's Tika FFMPEG tha

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-08-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660145#comment-14660145 ] Tim Allison commented on TIKA-1607: --- Doh! A related point: binary values. At some point I

[jira] [Updated] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-08-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1607: -- Summary: Introduce new arbitrary object key/values data structure for persistence of Tika Metadata (was:

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persitsence of Tika Metadata

2015-08-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660093#comment-14660093 ] Tim Allison commented on TIKA-1607: --- Y, I agree that we should push the parsers to do as

[jira] [Closed] (TIKA-1030) Page extraction for Word,Excel Documents

2015-08-06 Thread David vandendriessche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David vandendriessche closed TIKA-1030. --- See comments for answer. > Page extraction for Word,Excel Documents >

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persitsence of Tika Metadata

2015-08-06 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660046#comment-14660046 ] Nick Burch commented on TIKA-1607: -- My preference is to push the extra thinking onto the p

[jira] [Comment Edited] (TIKA-1607) Introduce new arbitrary object key/values data structure for persitsence of Tika Metadata

2015-08-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659998#comment-14659998 ] Tim Allison edited comment on TIKA-1607 at 8/6/15 1:45 PM: --- This

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persitsence of Tika Metadata

2015-08-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660009#comment-14660009 ] Tim Allison commented on TIKA-1607: --- So, the reason I went with putting more on the value

[jira] [Comment Edited] (TIKA-1607) Introduce new arbitrary object key/values data structure for persitsence of Tika Metadata

2015-08-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659998#comment-14659998 ] Tim Allison edited comment on TIKA-1607 at 8/6/15 1:32 PM: --- This

[jira] [Updated] (TIKA-1607) Introduce new arbitrary object key/values data structure for persitsence of Tika Metadata

2015-08-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1607: -- Attachment: TIKA-1607v3.patch This patch adds examples for a MultilingualValue and demo/hack examples of