[jira] Resolved: (TIKA-439) DWGParser (and some others) not used by AutoDetectParser

2010-06-16 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-439. Assignee: Jukka Zitting Fix Version/s: 0.8 Resolution: Fixed Thanks! Fixed as suggest

[jira] Commented: (TIKA-442) Image extractors use inconsistent metadata keys and formats for common features

2010-06-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879532#action_12879532 ] Nick Burch commented on TIKA-442: - OK, I'll work up a patch that uses these keys, hopefully s

[jira] Commented: (TIKA-442) Image extractors use inconsistent metadata keys and formats for common features

2010-06-16 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879515#action_12879515 ] Jukka Zitting commented on TIKA-442: I'd go with XMP as much as possible. XMP leverages E

[jira] Updated: (TIKA-361) Update OutlookExtractor to match new POI API

2010-06-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-361: Attachment: outlook.patch Updated patch which inserts more of the from/to/cc information into the metadata.

[jira] Updated: (TIKA-361) Update OutlookExtractor to match new POI API

2010-06-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-361: Attachment: (was: outlook.patch) > Update OutlookExtractor to match new POI API > ---

[jira] Resolved: (TIKA-440) [Patch] Fetch the composer information in the MP3 Parser

2010-06-16 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-440. Assignee: Jukka Zitting Fix Version/s: 0.8 Resolution: Fixed Thanks! Patch committed

[jira] Commented: (TIKA-441) Sometimes, tika not working (crashed) because of null classloader

2010-06-16 Thread Alex Ott (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879463#action_12879463 ] Alex Ott commented on TIKA-441: --- Thank you. I understand, that my patch isn't perfect - I hadn'

[jira] Resolved: (TIKA-441) Sometimes, tika not working (crashed) because of null classloader

2010-06-16 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-441. Assignee: Jukka Zitting Resolution: Fixed Good point, fixed in revision 955348. I used a sligh

Short developerworks article on Tika

2010-06-16 Thread Mattmann, Chris A (388J)
Hi All, Oleg Tikhonov and I recently published a short IBM developerworks article on Tika. We wrote the article last year, and we've been working with the editor to put it online. You can check it out here: http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/ Thanks, Chris +

[jira] Created: (TIKA-442) Image extractors use inconsistent metadata keys and formats for common features

2010-06-16 Thread Nick Burch (JIRA)
Image extractors use inconsistent metadata keys and formats for common features --- Key: TIKA-442 URL: https://issues.apache.org/jira/browse/TIKA-442 Project: Tika I

Re: Detecting container formats

2010-06-16 Thread Alex Ott
Re Nick Burch at "Wed, 16 Jun 2010 12:01:48 +0100 (BST)" wrote: NB> On Tue, 15 Jun 2010, Alex Ott wrote: >> Hmmm, WordDocument stream in .doc could be only under / directory entry, >> but yes - it >> could anywhere in list of OLE2 entries... NB> And the list of ole2 entries can come anywhe

Re: Detecting container formats

2010-06-16 Thread Nick Burch
On Tue, 15 Jun 2010, Ken Krugler wrote: I think this is a reasonable approach, as long as (per Alex's suggestion) it's configurable in various ways. E.g. if you know you don't want to parse OLE2-based files, so you've removed jars for those parser, then it would be great to have an easy way o

Re: Detecting container formats

2010-06-16 Thread Nick Burch
On Tue, 15 Jun 2010, Alex Ott wrote: Hmmm, WordDocument stream in .doc could be only under / directory entry, but yes - it could anywhere in list of OLE2 entries... And the list of ole2 entries can come anywhere in the file - the header block contains a pointer to the block holding the entries

Re: Trouble committing to Tika

2010-06-16 Thread Jukka Zitting
Hi, On Wed, Jun 16, 2010 at 7:08 AM, Mattmann, Chris A (388J) wrote: > Looks like you are there, Jukka? Yep, everything seems fine on that front. It looks like there's some problem with the EU mirror of svn.apache.org, as there was a similar problem report from Mahout and I was able to commit t