[jira] [Commented] (TIKA-897) UTF-8 encoded XML is detected as text/plain because of UTF-8 BOM

2012-04-20 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258241#comment-13258241 ] Nick Burch commented on TIKA-897: - We had support for detecting XML files that are ASCII,

[jira] [Commented] (TIKA-700) Upgrade to POI 3.8 as available

2012-04-03 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245435#comment-13245435 ] Nick Burch commented on TIKA-700: - Upgraded to POI 3.8 Final in r1309005.

[jira] [Commented] (TIKA-792) NoSuchMethodException CTMarkupImpl.init(org.apache.xmlbeans.SchemaType, boolean) processing a OOXML document

2012-04-03 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245436#comment-13245436 ] Nick Burch commented on TIKA-792: - Thanks for the feedback Marek. As of r1309005 we're now

[jira] [Commented] (TIKA-887) Tika fails to parse some MP3 tags correctly and produces null characters in value

2012-03-29 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241134#comment-13241134 ] Nick Burch commented on TIKA-887: - Is the problem still present in Tika 1.1? Only there were

[jira] [Commented] (TIKA-886) OOXMLExtractorFactory can leave files open

2012-03-28 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240469#comment-13240469 ] Nick Burch commented on TIKA-886: - For cases where the OPCPackage is opened in

[jira] [Commented] (TIKA-886) OOXMLExtractorFactory can leave files open

2012-03-28 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240470#comment-13240470 ] Nick Burch commented on TIKA-886: - Changed in r1306411, the two cases of

[jira] [Commented] (TIKA-877) Embedded document not extracted (regression)

2012-03-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232861#comment-13232861 ] Nick Burch commented on TIKA-877: - Hmm, that commit wasn't supposed to break anything, it

[jira] [Commented] (TIKA-876) Signed pdf parsing

2012-03-15 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230299#comment-13230299 ] Nick Burch commented on TIKA-876: - Can you upload a small example file? When you try to

[jira] [Commented] (TIKA-853) java.io.IOException with TikaGUI and testMP4.m4a

2012-02-20 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211801#comment-13211801 ] Nick Burch commented on TIKA-853: - If we do need to buffer it all into memory, then there

[jira] [Commented] (TIKA-863) MailContentHandler should not create AutoDetectParser on each call

2012-02-17 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210205#comment-13210205 ] Nick Burch commented on TIKA-863: - I'm not sure if we should be setting it as

[jira] [Commented] (TIKA-864) Metadata.formatDate should use ThreadLocal

2012-02-17 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210207#comment-13210207 ] Nick Burch commented on TIKA-864: - If we did store them on a ThreadLocal, then how would we

[jira] [Commented] (TIKA-865) MimeTypes.forName should avoid method-level synchronization

2012-02-17 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210209#comment-13210209 ] Nick Burch commented on TIKA-865: - I've had a go at fixing this in r1245426. It'd be good if

[jira] [Commented] (TIKA-866) Incomplete configuration file causes OutOfMemoryException

2012-02-17 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210219#comment-13210219 ] Nick Burch commented on TIKA-866: - If the Tika Config file is missing elements (eg only has

[jira] [Commented] (TIKA-863) MailContentHandler should not create AutoDetectParser on each call

2012-02-17 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210236#comment-13210236 ] Nick Burch commented on TIKA-863: - I'm not sure what the best way is to provide an

[jira] [Commented] (TIKA-863) MailContentHandler should not create AutoDetectParser on each call

2012-02-17 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210245#comment-13210245 ] Nick Burch commented on TIKA-863: - We could check for the TikaConfig on the ParseContext,

[jira] [Commented] (TIKA-858) Tika add parsing support for ANPA-1312 news wire feeds

2012-02-13 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207345#comment-13207345 ] Nick Burch commented on TIKA-858: - Additionally, what reference did you find for the chosen

[jira] [Commented] (TIKA-612) Specify PDFBox options via ParseContext

2012-02-10 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205376#comment-13205376 ] Nick Burch commented on TIKA-612: - The conclusion was to expose the options on the PDFParser

[jira] [Commented] (TIKA-818) Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow for a memory vs performance tradeoff

2012-02-10 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205473#comment-13205473 ] Nick Burch commented on TIKA-818: - Temp files created through TemporaryResources are already

[jira] [Commented] (TIKA-747) Ogg Vorbis and FLAC Parsers

2012-02-09 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204630#comment-13204630 ] Nick Burch commented on TIKA-747: - Getting the central sync to work turned out to be much

[jira] [Commented] (TIKA-853) java.io.IOException with TikaGUI and testMP4.m4a

2012-02-07 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202238#comment-13202238 ] Nick Burch commented on TIKA-853: - We don't want to have a System.gc call in production

[jira] [Commented] (TIKA-857) Tika TrueTypeParser add metadata from Naming tables

2012-02-07 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202928#comment-13202928 ] Nick Burch commented on TIKA-857: - Not sure that this issue should have been resolved, as

[jira] [Commented] (TIKA-857) Tika TrueTypeParser add metadata from Naming tables

2012-02-07 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202962#comment-13202962 ] Nick Burch commented on TIKA-857: - Looking at the patch, my only comment is wondering if we

[jira] [Commented] (TIKA-853) java.io.IOException with TikaGUI and testMP4.m4a

2012-02-01 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197782#comment-13197782 ] Nick Burch commented on TIKA-853: - It's a Windows thing, because Windows won't let you

[jira] [Commented] (TIKA-842) IPTC Properties Should be Defined Completely and Independently of the Drew Library

2012-02-01 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198047#comment-13198047 ] Nick Burch commented on TIKA-842: - One thing to bear in mind is that we try to map the

[jira] [Commented] (TIKA-855) Language Detection not working for Japanese and Chinese.

2012-02-01 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198415#comment-13198415 ] Nick Burch commented on TIKA-855: - I believe we're currently missing language profiles for

[jira] [Commented] (TIKA-853) java.io.IOException with TikaGUI and testMP4.m4a

2012-01-31 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196841#comment-13196841 ] Nick Burch commented on TIKA-853: - I've looked at the code again, and I can't spot anything

[jira] [Commented] (TIKA-850) Consistent way to supply document passwords to parsers

2012-01-31 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196953#comment-13196953 ] Nick Burch commented on TIKA-850: - PasswordProvider added in r1238616, based on the above

[jira] [Commented] (TIKA-853) java.io.IOException with TikaGUI and testMP4.m4a

2012-01-30 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196077#comment-13196077 ] Nick Burch commented on TIKA-853: - Ah, we weren't closing the stream in all cases. This is

[jira] [Commented] (TIKA-852) Quicktime / MP4 Metadata Parser

2012-01-28 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195598#comment-13195598 ] Nick Burch commented on TIKA-852: - It looks like the Apache Licensed MP4Parser

[jira] [Commented] (TIKA-851) M4V magic detection invalid

2012-01-27 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194790#comment-13194790 ] Nick Burch commented on TIKA-851: - I'm not sure if we're going to be able to differentiate

[jira] [Commented] (TIKA-851) M4V and M4A detection invalid

2012-01-27 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194813#comment-13194813 ] Nick Burch commented on TIKA-851: - It looks like most files (not sure if it's all of them

[jira] [Commented] (TIKA-851) M4V and M4A detection invalid

2012-01-27 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194854#comment-13194854 ] Nick Burch commented on TIKA-851: - From

[jira] [Commented] (TIKA-851) M4V and M4A detection invalid

2012-01-27 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194891#comment-13194891 ] Nick Burch commented on TIKA-851: - I've added the audio/x-m4a alias in r1236734.

[jira] [Commented] (TIKA-842) IPTC Properties Should be Defined Completely and Independently of the Drew Library

2012-01-27 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194916#comment-13194916 ] Nick Burch commented on TIKA-842: - Following the confirmation from the IPTC that we can use

[jira] [Commented] (TIKA-747) Ogg Vorbis and FLAC Parsers

2012-01-25 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193046#comment-13193046 ] Nick Burch commented on TIKA-747: - Following discussions on the list, I've decided to

[jira] [Commented] (TIKA-850) Consistent way to supply document passwords to parsers

2012-01-25 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193124#comment-13193124 ] Nick Burch commented on TIKA-850: - Currently, the objects set onto the ParseContext are: *

[jira] [Commented] (TIKA-850) Consistent way to supply document passwords to parsers

2012-01-25 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193131#comment-13193131 ] Nick Burch commented on TIKA-850: - Based on this, I think the best option may be to have a

[jira] [Commented] (TIKA-818) Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow for a memory vs performance tradeoff

2012-01-24 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192072#comment-13192072 ] Nick Burch commented on TIKA-818: - Are you sure the scratchFile should be the real file

[jira] [Commented] (TIKA-849) Identify and parse the Apple iBooks format

2012-01-24 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192079#comment-13192079 ] Nick Burch commented on TIKA-849: - We might be able to use the same handler, but it'd need

[jira] [Commented] (TIKA-839) TikaException with testPPT.potm in Tika GUI / CLI

2012-01-24 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192107#comment-13192107 ] Nick Burch commented on TIKA-839: - Thanks for this, applied r1235233.

[jira] [Commented] (TIKA-850) Consistent way to supply document passwords to parsers

2012-01-24 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192184#comment-13192184 ] Nick Burch commented on TIKA-850: - Does anyone have a feeling for if the password should be

[jira] [Commented] (TIKA-760) NPE XHTMLContentHandler in characters Method

2012-01-24 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192189#comment-13192189 ] Nick Burch commented on TIKA-760: - NPE check added in r1235284. NPE

[jira] [Commented] (TIKA-675) PackageExtractor should track names of recursively nested resources

2012-01-24 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192200#comment-13192200 ] Nick Burch commented on TIKA-675: - We could probably do this with a wrapper parser, which

[jira] [Commented] (TIKA-241) Rar archive support

2012-01-24 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192203#comment-13192203 ] Nick Burch commented on TIKA-241: - Has there been any luck getting junrar into Maven Central

[jira] [Commented] (TIKA-770) New ODF metadata keys

2012-01-24 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192242#comment-13192242 ] Nick Burch commented on TIKA-770: - I've updated the three remaining ones in r1235321, along

[jira] [Commented] (TIKA-818) Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow for a memory vs performance tradeoff

2012-01-23 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190965#comment-13190965 ] Nick Burch commented on TIKA-818: - Tika does already handle its own temporary files, via

[jira] [Commented] (TIKA-844) Ability to Define an Internal Text Bag Property

2012-01-23 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191214#comment-13191214 ] Nick Burch commented on TIKA-844: - Thanks, patch applied in r1234861.

[jira] [Commented] (TIKA-845) Check for Existing Value in Multi-Value Fields in XML Metadata Handler

2012-01-23 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191228#comment-13191228 ] Nick Burch commented on TIKA-845: - I think the current logic isn't quite correct. Rather

[jira] [Commented] (TIKA-849) Identify and parse the Apple iBooks format

2012-01-23 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191242#comment-13191242 ] Nick Burch commented on TIKA-849: - Sample file committed in r1234886, along with a unit test

[jira] [Commented] (TIKA-849) Identify and parse the Apple iBooks format

2012-01-23 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191259#comment-13191259 ] Nick Burch commented on TIKA-849: - Test and parser change committed in r1234904, thanks It

[jira] [Commented] (TIKA-848) NullPointerException in SecurityHandler.addDictionaryAndSubDictionary(SecurityHandler.java:185)

2012-01-22 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190834#comment-13190834 ] Nick Burch commented on TIKA-848: - We can keep this open until it's fixed in PDFBox, and

[jira] [Commented] (TIKA-792) NoSuchMethodException CTMarkupImpl.init(org.apache.xmlbeans.SchemaType, boolean) processing a OOXML document

2012-01-20 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189839#comment-13189839 ] Nick Burch commented on TIKA-792: - Are you able to share one of the files that triggers

[jira] [Commented] (TIKA-507) Parser for font files

2012-01-20 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189861#comment-13189861 ] Nick Burch commented on TIKA-507: - Thanks for this patch, sorry it has taken so long to get

[jira] [Commented] (TIKA-843) Support for Date without a Time Component

2012-01-20 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189888#comment-13189888 ] Nick Burch commented on TIKA-843: - Do we want to set a timezone on these? For a date with no

[jira] [Commented] (TIKA-841) User supplied parsers should be preferred

2012-01-18 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188469#comment-13188469 ] Nick Burch commented on TIKA-841: - Fixed in r1232902, with code similar to the

[jira] [Commented] (TIKA-805) improvements in XSLFPowerPointExtractorDecorator

2012-01-16 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186861#comment-13186861 ] Nick Burch commented on TIKA-805: - Thanks, applied in r1231905.

[jira] [Commented] (TIKA-87) MimeTypes should allow modification of MIME types

2012-01-16 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186896#comment-13186896 ] Nick Burch commented on TIKA-87: I believe this is no longer an issue, because of the recent

[jira] [Commented] (TIKA-86) Support magic(5) files

2012-01-16 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186948#comment-13186948 ] Nick Burch commented on TIKA-86: Turning the file magic into a Tika xml match shouldn't be

[jira] [Commented] (TIKA-86) Support magic(5) files

2012-01-16 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187026#comment-13187026 ] Nick Burch commented on TIKA-86: RegEx magic could be interesting, with a bit of care to

[jira] [Commented] (TIKA-842) IPTC Properties Should be Defined Completely and Independently of the Drew Library

2012-01-16 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187153#comment-13187153 ] Nick Burch commented on TIKA-842: - Did you manage to confirm that the IPTC Spec license

[jira] [Commented] (TIKA-842) IPTC Properties Should be Defined Completely and Independently of the Drew Library

2012-01-16 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187183#comment-13187183 ] Nick Burch commented on TIKA-842: - I think we'll need the OK from Apache Legal for this,

[jira] [Commented] (TIKA-842) IPTC Properties Should be Defined Completely and Independently of the Drew Library

2012-01-16 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187190#comment-13187190 ] Nick Burch commented on TIKA-842: - LEGAL-122 created for this IPTC

[jira] [Commented] (TIKA-360) Outstanding Improvements to Number/Date Formatting in ExcelParser

2012-01-13 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185646#comment-13185646 ] Nick Burch commented on TIKA-360: - Fractions will be supported when we upgrade to POI 3.8

[jira] [Commented] (TIKA-695) Custom properties on xlsx, docx, pptx

2012-01-12 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185001#comment-13185001 ] Nick Burch commented on TIKA-695: - Thanks for the sample files. Based on them, I've added

[jira] [Commented] (TIKA-695) Custom properties on xlsx, docx, pptx

2012-01-04 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180140#comment-13180140 ] Nick Burch commented on TIKA-695: - Would it be possible for you to create some sample files

[jira] [Commented] (TIKA-838) EmptyParser Singleton should be final

2012-01-02 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178637#comment-13178637 ] Nick Burch commented on TIKA-838: - This breaks the CLIRR check, so I'll have to defer to

[jira] [Commented] (TIKA-837) Make inner classes static for performance reasons

2012-01-02 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178638#comment-13178638 ] Nick Burch commented on TIKA-837: - Thanks, patch applied in r1226657. Make

[jira] [Commented] (TIKA-793) Invalid ASCII character (65533) when retriving MP3 metadata

2011-12-29 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177081#comment-13177081 ] Nick Burch commented on TIKA-793: - Comment (COM/COMM) tag handling fixed in r1225480 - it

[jira] [Commented] (TIKA-835) TNEF parsing unstable

2011-12-29 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177574#comment-13177574 ] Nick Burch commented on TIKA-835: - winmail.dat is a TNEF file, which POI supports through

[jira] [Commented] (TIKA-830) Tika.parseToString() causes ForkParser to try to serialize itself

2011-12-28 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177021#comment-13177021 ] Nick Burch commented on TIKA-830: - I think the ForkParser instanceof check is a good

[jira] [Commented] (TIKA-831) ForkClient doesn't report error due to widening conversion issue

2011-12-26 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176068#comment-13176068 ] Nick Burch commented on TIKA-831: - I've enabled the last test in r1224864 - I had to switch

[jira] [Commented] (TIKA-827) ForkServer fails to report issues if an exception is not properly serializable

2011-12-26 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176069#comment-13176069 ] Nick Burch commented on TIKA-827: - I'm not a big fan of the temp file idea, so I've had a

[jira] [Commented] (TIKA-793) Invalid ASCII character (65533) when retriving MP3 metadata

2011-12-26 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176070#comment-13176070 ] Nick Burch commented on TIKA-793: - I've tracked this to two bugs. Both relate to the

[jira] [Commented] (TIKA-829) Tika lacks preconditions on its input, causing some potential misuse of the API

2011-12-25 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175881#comment-13175881 ] Nick Burch commented on TIKA-829: - Thanks, patch applied in r1224675. Tika

[jira] [Commented] (TIKA-830) Tika.parseToString() causes ForkParser to try to serialize itself

2011-12-25 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175882#comment-13175882 ] Nick Burch commented on TIKA-830: - If we're not going to support the ForkParser like this,

[jira] [Commented] (TIKA-826) TikaException / OfficeXmlFileException with .xlsb files

2011-12-22 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175240#comment-13175240 ] Nick Burch commented on TIKA-826: - POI doesn't support .xlsb files, and nor is it likely to

[jira] [Commented] (TIKA-823) Detect StarOffice files

2011-12-20 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173820#comment-13173820 ] Nick Burch commented on TIKA-823: - Note that it looks like the strings are prefixed with a 4

[jira] [Commented] (TIKA-819) Make Option to Exclude Embedded Files' Text for Text Content

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172825#comment-13172825 ] Nick Burch commented on TIKA-819: - You have to explicitly ask for embedded files to be

[jira] [Commented] (TIKA-700) Upgrade to POI 3.8 as available

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172950#comment-13172950 ] Nick Burch commented on TIKA-700: - Upgraded to POI 3.8 beta 5 in r1221109.

[jira] [Commented] (TIKA-805) improvements in XSLFPowerPointExtractorDecorator

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172951#comment-13172951 ] Nick Burch commented on TIKA-805: - The patch doesn't seem to apply cleanly against trunk, is

[jira] [Commented] (TIKA-757) Address TODOs when we upgrade to next POI release (3.8 beta 5)

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172974#comment-13172974 ] Nick Burch commented on TIKA-757: - I believe that as of r1221115 most of these are now

[jira] [Commented] (TIKA-818) Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow for a memory vs performance tradeoff

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173000#comment-13173000 ] Nick Burch commented on TIKA-818: - I've just gone to make the change, and discovered that

[jira] [Commented] (TIKA-811) Upgrade metadatExtractor version for OpenJDK 7 support

2011-12-13 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168288#comment-13168288 ] Nick Burch commented on TIKA-811: - Do you know if 2.5.0-RC3 available in Maven Central, or

[jira] [Commented] (TIKA-806) MS Word Detection magics are a bit overzealous

2011-12-13 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168369#comment-13168369 ] Nick Burch commented on TIKA-806: - You can always get a false positive with mime magic

[jira] [Commented] (TIKA-812) Improve the detection of Works Spreadsheet 7.0 files

2011-12-13 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168902#comment-13168902 ] Nick Burch commented on TIKA-812: - If we put in a slightly higher priority match for

[jira] [Commented] (TIKA-806) MS Word Detection magics are a bit overzealous

2011-12-12 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168056#comment-13168056 ] Nick Burch commented on TIKA-806: - If you use DefaultDetector it isn't an issue, as the

[jira] [Commented] (TIKA-803) Outlook parser to mark the message body in some special way

2011-12-12 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168124#comment-13168124 ] Nick Burch commented on TIKA-803: - As of r1213560, the message body is now wrapped in a div

[jira] [Commented] (TIKA-805) improvements in XSLFPowerPointExtractorDecorator

2011-12-12 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168154#comment-13168154 ] Nick Burch commented on TIKA-805: - Thanks Yegor! Assuming no objections, I'll apply this

[jira] [Commented] (TIKA-808) Fork Parser doesn't work for PDF files

2011-12-11 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167314#comment-13167314 ] Nick Burch commented on TIKA-808: - I've added some unit tests in r1213131 for this case.

[jira] [Commented] (TIKA-809) IndexOutOfBoundsException with TikaGUI

2011-12-11 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167324#comment-13167324 ] Nick Burch commented on TIKA-809: - This should be improved when we move to POI 3.8 beta 5,

[jira] [Commented] (TIKA-804) Parsing outlook format template (.oft )

2011-12-10 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167037#comment-13167037 ] Nick Burch commented on TIKA-804: - Questions are best asked on the Mailing Lists, rather

[jira] [Commented] (TIKA-804) Parsing outlook format template (.oft )

2011-12-10 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167038#comment-13167038 ] Nick Burch commented on TIKA-804: - Seems to parse just fine as a regular outlook file

[jira] [Commented] (TIKA-806) MS Word Detection magics are a bit overzealous

2011-12-10 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167043#comment-13167043 ] Nick Burch commented on TIKA-806: - The file format allows for the directory entries to occur

[jira] [Commented] (TIKA-800) mark/reset not supported from POIFSContainerDetector

2011-12-05 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162730#comment-13162730 ] Nick Burch commented on TIKA-800: - Looks like the issue is that ArchiveInputStream (from

[jira] [Commented] (TIKA-800) mark/reset not supported from POIFSContainerDetector

2011-12-05 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162735#comment-13162735 ] Nick Burch commented on TIKA-800: - In that case, maybe it's best to have the wrapping done

[jira] [Commented] (TIKA-800) mark/reset not supported from POIFSContainerDetector

2011-12-05 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163238#comment-13163238 ] Nick Burch commented on TIKA-800: - Fixed in r1210736 by wrapping the ArchiveInputStream, the

[jira] [Commented] (TIKA-802) NullPointerException when parsing iWork files

2011-12-05 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163304#comment-13163304 ] Nick Burch commented on TIKA-802: - I have just retried with the 1.0 version of tika-app, and

[jira] [Commented] (TIKA-797) MimeType.getExtension for application/vnd.ms-powerpoint returns ppz. I'd expect ppt.

2011-12-02 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161568#comment-13161568 ] Nick Burch commented on TIKA-797: - Good spot! Thanks, patch applied in r1209438.

[jira] [Commented] (TIKA-791) Fix the detection of protected OOXML files

2011-11-28 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158444#comment-13158444 ] Nick Burch commented on TIKA-791: - One thing - I'm not sure that we should be returning the

[jira] [Commented] (TIKA-697) Tika reports the content type of AR archives as text/plain

2011-11-27 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158070#comment-13158070 ] Nick Burch commented on TIKA-697: - Thanks for this I've tweaked the existing mime magic in

  1   2   >