Re: How should video files with audio be handled by parsers?

2014-07-21 Thread Ray Gauss
Yes. Since this approach has the potential to set precedence for representing structured stuff going forward I wanted to see what others thought before committing directly. Regards, Ray On July 21, 2014 at 9:45:31 PM, Mattmann, Chris A (3980) (chris.a.mattm...@jpl.nasa.gov) wrote: > Are you

Re: How should video files with audio be handled by parsers?

2014-07-21 Thread Mattmann, Chris A (3980)
Are you able to contribute to tika ? Sent from my iPhone > On Jul 21, 2014, at 6:43 PM, "Ray Gauss" wrote: > > Hi all, > > This is a few months old but I've been looking at this recently and since > we're unlikely to move to a structured metadata store in the short term I've > come up with w

Re: How should video files with audio be handled by parsers?

2014-07-21 Thread Ray Gauss
Hi all, This is a few months old but I've been looking at this recently and since we're unlikely to move to a structured metadata store in the short term I've come up with what I think is an interim solution [1] that essentially allows nesting through XPath-like syntax:     stream[0]/field1=so

[jira] [Commented] (TIKA-1358) Add support for newer iWork file formats

2014-07-21 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069469#comment-14069469 ] Nick Burch commented on TIKA-1358: -- Any chance you could attach zips of the test files as

[jira] [Commented] (TIKA-1251) RuntimeException when parsing word (.doc) documents. Works in Tika 1.4 but not 1.5

2014-07-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069170#comment-14069170 ] Hudson commented on TIKA-1251: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #104 (See [https://b

[jira] [Commented] (TIKA-411) Generate list of supported and detected types automatically

2014-07-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069169#comment-14069169 ] Hudson commented on TIKA-411: - SUCCESS: Integrated in tika-trunk-jdk1.6 #104 (See [https://buil

[jira] [Commented] (TIKA-411) Generate list of supported and detected types automatically

2014-07-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069129#comment-14069129 ] Hudson commented on TIKA-411: - SUCCESS: Integrated in tika-trunk-jdk1.7 #105 (See [https://buil

[jira] [Commented] (TIKA-1251) RuntimeException when parsing word (.doc) documents. Works in Tika 1.4 but not 1.5

2014-07-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069130#comment-14069130 ] Hudson commented on TIKA-1251: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #105 (See [https://b

[jira] [Resolved] (TIKA-1251) RuntimeException when parsing word (.doc) documents. Works in Tika 1.4 but not 1.5

2014-07-21 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1251. --- Resolution: Fixed Fix Version/s: 1.6 Assignee: Tyler Palsulich Fixed in r16123

[jira] [Commented] (TIKA-1357) Buffered text in EnviHeaderParser

2014-07-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068686#comment-14068686 ] Hudson commented on TIKA-1357: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #103 (See [https://b

[jira] [Commented] (TIKA-1357) Buffered text in EnviHeaderParser

2014-07-21 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068663#comment-14068663 ] Hudson commented on TIKA-1357: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #104 (See [https://b

[jira] [Resolved] (TIKA-1357) Buffered text in EnviHeaderParser

2014-07-21 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1357. --- Resolution: Fixed Fix Version/s: 1.6 Assignee: Tyler Palsulich > Buffered text

[jira] [Commented] (TIKA-1357) Buffered text in EnviHeaderParser

2014-07-21 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068639#comment-14068639 ] Tyler Palsulich commented on TIKA-1357: --- Fixed in r1612316 with unit test and whitesp

[jira] [Commented] (TIKA-1172) Out Of Memory exception occurring in GUI on 20MB pdf

2014-07-21 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068541#comment-14068541 ] Tyler Palsulich commented on TIKA-1172: --- Hi Erik, Thank you for raising this issue.

[jira] [Closed] (TIKA-1050) Charset detection gives wrong results for GB18030 encoding

2014-07-21 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-1050. - Resolution: Cannot Reproduce Fix Version/s: 1.6 Assignee: Tyler Palsulich The atta

Re: Miredot License Key for Apache Tika Project

2014-07-21 Thread Tom Barber
Ah right, I do see that comment, very true. Tom On 21/07/14 09:48, Nick Burch wrote: On Sat, 19 Jul 2014, Tom Barber wrote: Jumping on this thread very late so please excuse me if this had been covered. Anyone cinematic contemplate Enunciate for Rest API documentation? If you look on the l

Re: Miredot License Key for Apache Tika Project

2014-07-21 Thread Nick Burch
On Sat, 19 Jul 2014, Tom Barber wrote: Jumping on this thread very late so please excuse me if this had been covered. Anyone cinematic contemplate Enunciate for Rest API documentation? If you look on the list in about April, you should see a patch I posted which turned on Enunciate support,