[jira] Commented: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894486#action_12894486 ] Jukka Zitting commented on TIKA-447: It would be great if the AutoDetectParser could auto

[jira] Commented: (TIKA-469) The Parser is not correctly outputting Arabic text documents

2010-08-02 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894492#action_12894492 ] Jukka Zitting commented on TIKA-469: Do you have some example documents that you could sh

[jira] Commented: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894494#action_12894494 ] Nick Burch commented on TIKA-447: - At the moment, the ContainerAwareDetector checks the first

[jira] Resolved: (TIKA-358) Auto-detection of HTML fails with common auto-generated template

2010-08-02 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-358. Assignee: Jukka Zitting (was: Ken Krugler) Fix Version/s: 0.8 Resolution: Fixed I fi

[jira] Commented: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Alex Ott (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894501#action_12894501 ] Alex Ott commented on TIKA-447: --- 2Nick: does this will allow to implement support for self-extr

[jira] Commented: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894502#action_12894502 ] Jukka Zitting commented on TIKA-447: Hmm, I guess you're right, perhaps we won't need suc

Hudson build is back to normal : Tika-trunk #331

2010-08-02 Thread Apache Hudson Server
See

[jira] Commented: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Alex Ott (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894507#action_12894507 ] Alex Ott commented on TIKA-447: --- It's better to have some flag, that will say "Stop, if this ru

[jira] Commented: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894509#action_12894509 ] Nick Burch commented on TIKA-447: - Jukka - that might end up being more work though? Also, sh

[jira] Commented: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894510#action_12894510 ] Nick Burch commented on TIKA-447: - Alex - have a look at the code, I think it already does wh

[jira] Commented: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Alex Ott (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894511#action_12894511 ] Alex Ott commented on TIKA-447: --- Ah, sorry Nick - I hadn't looked into code yet. I thought, tha

[jira] Commented: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894518#action_12894518 ] Jukka Zitting commented on TIKA-447: It's a bit more work, yes. What I'm trying to achiev

[jira] Commented: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894520#action_12894520 ] Nick Burch commented on TIKA-447: - Using the container aware detector will give a more accura

[jira] Updated: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-447: --- Attachment: TIKA-447-TikaInputStream.patch BTW, the current new Detector implementations are a bit trou

[jira] Commented: (TIKA-245) Support of CHM Format

2010-08-02 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894574#action_12894574 ] Nick Burch commented on TIKA-245: - JCHM seems to be under the CDDL license, so we're fine to

[jira] Commented: (TIKA-424) Avoid ArrayIndexOutOfBoundsException on some mp3 files

2010-08-02 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894577#action_12894577 ] Nick Burch commented on TIKA-424: - Your file has broken ID3v2.4 tags in it. It looks to me li

[jira] Commented: (TIKA-424) Avoid ArrayIndexOutOfBoundsException on some mp3 files

2010-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894579#action_12894579 ] Chris A. Mattmann commented on TIKA-424: Hey Guys: can we cook up a free MP3 file wit

[jira] Commented: (TIKA-424) Avoid ArrayIndexOutOfBoundsException on some mp3 files

2010-08-02 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894589#action_12894589 ] Nick Burch commented on TIKA-424: - Actually, it looks like it's just a bug in the ID3v2.4 spe

[jira] Commented: (TIKA-424) Avoid ArrayIndexOutOfBoundsException on some mp3 files

2010-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894591#action_12894591 ] Chris A. Mattmann commented on TIKA-424: +1, great thanks Nick! > Avoid ArrayIndexOu

Post link to Tika in Action book on Tika website?

2010-08-02 Thread Mattmann, Chris A (388J)
Hi Tika community, Jukka Zitting and I are working on the Tika in Action book [1]. How would everyone feel about us posting a link to it on the Tika website [2]? If so, I'll prepare a patch and update the website shortly. Cheers, Chris [1] http://manning.com/mattmann/ [2] http://tika.apache.org

Re: Post link to Tika in Action book on Tika website?

2010-08-02 Thread Julien Nioche
+1 from me On 2 August 2010 18:33, Mattmann, Chris A (388J) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hi Tika community, > > Jukka Zitting and I are working on the Tika in Action book [1]. How would > everyone feel about us posting a link to it on the Tika website [2]? > > If so, I'll prepare a p

Re: Post link to Tika in Action book on Tika website?

2010-08-02 Thread Ken Krugler
Hi Chris, On Aug 2, 2010, at 10:33am, Mattmann, Chris A (388J) wrote: Hi Tika community, Jukka Zitting and I are working on the Tika in Action book [1]. How would everyone feel about us posting a link to it on the Tika website [2]? +1 -- Ken

Re: Post link to Tika in Action book on Tika website?

2010-08-02 Thread Oleg Tikhonov
+1, positively. On Mon, Aug 2, 2010 at 8:33 PM, Mattmann, Chris A (388J) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hi Tika community, > > Jukka Zitting and I are working on the Tika in Action book [1]. How would > everyone feel about us posting a link to it on the Tika website [2]? > > If so, I'

Re: Packages and attributes

2010-08-02 Thread Paul Jakubik
I have added Juka Zitting's recursive metadata example to the Tika wiki at http://wiki.apache.org/tika/RecursiveMetadata. I also added some notes on what I did so I could get the metadata for a nested document along with the text for that document. Finally, I modified the http://wiki.apache.org/ti

Re: Packages and attributes

2010-08-02 Thread Mattmann, Chris A (388J)
Thanks Paul! On 8/2/10 1:18 PM, "Paul Jakubik" wrote: I have added Juka Zitting's recursive metadata example to the Tika wiki at http://wiki.apache.org/tika/RecursiveMetadata. I also added some notes on what I did so I could get the metadata for a nested document along with the text for that do

Metadata Discussion Status

2010-08-02 Thread Paul Jakubik
Hi, A while ago I added the http://wiki.apache.org/tika/MetadataDiscussion page to the Tika wiki. Since then, with the help of Jukka Zitting, a solution has been described for using the current Tika library to capture nested document metadata and associate that with the text extracted for each ne