[jira] [Commented] (TIKA-888) NetCDF parser uses Java 6 JAR file and test/compilation fails with Java 1.5, although TIKA is Java 1.5

2012-03-30 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242408#comment-13242408 ] Jukka Zitting commented on TIKA-888: bq. The question is: The parser is still listed in

[jira] [Commented] (TIKA-878) Reuse computed Map inside CompositeParser

2012-03-19 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232975#comment-13232975 ] Jukka Zitting commented on TIKA-878: Do you have a benchmark that shows this to be a not

[jira] [Commented] (TIKA-866) Invalid configuration file causes OutOfMemoryException

2012-02-17 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210429#comment-13210429 ] Jukka Zitting commented on TIKA-866: Actually, scrap the above rationale. The DefaultPar

[jira] [Commented] (TIKA-864) Metadata.formatDate should use ThreadLocal

2012-02-17 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210240#comment-13210240 ] Jukka Zitting commented on TIKA-864: Like in TIKA-865, is this a real measurable perform

[jira] [Commented] (TIKA-865) MimeTypes.forName should avoid method-level synchronization

2012-02-17 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210237#comment-13210237 ] Jukka Zitting commented on TIKA-865: I'd keep the synchronization on "this", as it also

[jira] [Commented] (TIKA-860) Make ZIP bomb detection configureable

2012-02-10 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205453#comment-13205453 ] Jukka Zitting commented on TIKA-860: bq. Couldnt SecureContentHandler not simply get the

[jira] [Commented] (TIKA-860) Make ZIP bomb detection configureable

2012-02-10 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205440#comment-13205440 ] Jukka Zitting commented on TIKA-860: The mentioned configurability is currently only at

[jira] [Commented] (TIKA-853) java.io.IOException with TikaGUI and testMP4.m4a

2012-02-02 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198767#comment-13198767 ] Jukka Zitting commented on TIKA-853: Do you have a virus scanner running? I've seen quit

[jira] [Commented] (TIKA-843) Support for Date without a Time Component

2012-01-20 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189897#comment-13189897 ] Jukka Zitting commented on TIKA-843: FWIW, I've found it most reliable to convert a date

[jira] [Commented] (TIKA-833) POI Daily beta6 as of 12/27 breaks ExcelParserTest.testExcelParserFormatting()

2011-12-29 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177248#comment-13177248 ] Jukka Zitting commented on TIKA-833: Thanks, Jeremy! > POI Daily beta6

[jira] [Commented] (TIKA-833) POI Daily beta6 as of 12/27 breaks ExcelParserTest.testExcelParserFormatting()

2011-12-29 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177183#comment-13177183 ] Jukka Zitting commented on TIKA-833: It's good that we monitor changes in POI and make s

[jira] [Commented] (TIKA-830) Tika.parseToString() causes ForkParser to try to serialize itself

2011-12-29 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177088#comment-13177088 ] Jukka Zitting commented on TIKA-830: Excellent, thanks Nick! > Tika.par

[jira] [Commented] (TIKA-830) Tika.parseToString() causes ForkParser to try to serialize itself

2011-12-28 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176702#comment-13176702 ] Jukka Zitting commented on TIKA-830: The problem here is the basic assumption that the T

[jira] [Commented] (TIKA-830) Tika.parseToString() causes ForkParser to try to serialize itself

2011-12-23 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175622#comment-13175622 ] Jukka Zitting commented on TIKA-830: I'm not sure if we should try to support passing a

[jira] [Commented] (TIKA-832) ForkParser is unfriendly to code that prints things to its output

2011-12-23 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175591#comment-13175591 ] Jukka Zitting commented on TIKA-832: bq. I can write it if you want. That would be grea

[jira] [Commented] (TIKA-810) Upgrade to PDFbox 1.7.0 as available

2011-12-19 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172335#comment-13172335 ] Jukka Zitting commented on TIKA-810: In revision 1220781 I updated the parser code in PD

[jira] [Commented] (TIKA-801) ContentHandlerDecorator outputs invalid element

2011-12-09 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166148#comment-13166148 ] Jukka Zitting commented on TIKA-801: bq. patch attached Looks good, +1.

[jira] [Commented] (TIKA-801) ContentHandlerDecorator outputs invalid element

2011-12-08 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165291#comment-13165291 ] Jukka Zitting commented on TIKA-801: See the org.apache.tika.sax.EmbeddedContentHandler

[jira] [Commented] (TIKA-800) mark/reset not supported from POIFSContainerDetector

2011-12-06 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163701#comment-13163701 ] Jukka Zitting commented on TIKA-800: Note that calling TikaInputStream.get(InputStream)

[jira] [Commented] (TIKA-801) ContentHandlerDecorator outputs invalid element

2011-12-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162795#comment-13162795 ] Jukka Zitting commented on TIKA-801: bq. EndDocumentShieldingContentHandler IMHO we sho

[jira] [Commented] (TIKA-800) mark/reset not supported from POIFSContainerDetector

2011-12-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162734#comment-13162734 ] Jukka Zitting commented on TIKA-800: bq. If the POIFS detector (now by run by default if

[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-12-01 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160919#comment-13160919 ] Jukka Zitting commented on TIKA-623: bq. Is there some way to proceed here without requi

[jira] [Commented] (TIKA-786) Tika CLI --detect returns incorrect content-type for files with altered extensions

2011-11-21 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154154#comment-13154154 ] Jukka Zitting commented on TIKA-786: Cool, looks good. I was simultaneously approaching

[jira] [Commented] (TIKA-786) Tika CLI --detect returns incorrect content-type for files with altered extensions

2011-11-21 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154135#comment-13154135 ] Jukka Zitting commented on TIKA-786: bq. Do we have any control over the ordering though

[jira] [Commented] (TIKA-786) Tika CLI --detect returns incorrect content-type for files with altered extensions

2011-11-21 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154127#comment-13154127 ] Jukka Zitting commented on TIKA-786: Hmm, I didn't think of such a case when doing the D

[jira] [Commented] (TIKA-784) Mimetype entry for DITA

2011-11-18 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152985#comment-13152985 ] Jukka Zitting commented on TIKA-784: bq. For now, I've invented mimetypes for the subtyp

[jira] [Commented] (TIKA-734) Out of memory exception with Xlsx file less than 5 MB

2011-11-17 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152405#comment-13152405 ] Jukka Zitting commented on TIKA-734: Did you see the parse() method [1] that returns a j

[jira] [Commented] (TIKA-778) NullPointerException in tika-app, parsing PDF content

2011-11-15 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150395#comment-13150395 ] Jukka Zitting commented on TIKA-778: Looks like the problem is coming from the HTML seri

[jira] [Commented] (TIKA-773) .NET version of Tika

2011-11-15 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150392#comment-13150392 ] Jukka Zitting commented on TIKA-773: There's now an ikvm profile in the tika-app POM tha

[jira] [Commented] (TIKA-774) ExifTool Parser

2011-11-09 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147342#comment-13147342 ] Jukka Zitting commented on TIKA-774: Some notes: * We already have existing places for

[jira] [Commented] (TIKA-775) Embed Capabilities

2011-11-09 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147339#comment-13147339 ] Jukka Zitting commented on TIKA-775: I'd like to have a concrete use case for introducin

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144862#comment-13144862 ] Jukka Zitting commented on TIKA-772: The metacharacters you mention do sound suspicious.

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144854#comment-13144854 ] Jukka Zitting commented on TIKA-772: The test case you added prints out "text/html" for

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144849#comment-13144849 ] Jukka Zitting commented on TIKA-772: The latter method makes also the .html suffix avail

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144836#comment-13144836 ] Jukka Zitting commented on TIKA-772: I piped the files to tika-app to prevent it from se

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144763#comment-13144763 ] Jukka Zitting commented on TIKA-772: Can you attach an example document that illustrates

[jira] [Commented] (TIKA-763) Update license metadata

2011-10-28 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138545#comment-13138545 ] Jukka Zitting commented on TIKA-763: I updated the embedded LICENSE files in revisions 1

[jira] [Commented] (TIKA-761) Provide version number by CLI argument -V

2011-10-25 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135257#comment-13135257 ] Jukka Zitting commented on TIKA-761: See revision 1188803 for a slightly modified versio

[jira] [Commented] (TIKA-761) Provide version number by CLI argument -V

2011-10-25 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135167#comment-13135167 ] Jukka Zitting commented on TIKA-761: bq. the dots will still be replaced with slashes O

[jira] [Commented] (TIKA-761) Provide version number by CLI argument -V

2011-10-25 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135129#comment-13135129 ] Jukka Zitting commented on TIKA-761: I'd simply hardcode the properties file path as {{

[jira] [Commented] (TIKA-761) Provide version number by CLI argument -V

2011-10-24 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134151#comment-13134151 ] Jukka Zitting commented on TIKA-761: bq. It seems - from what I read - these are only av

[jira] [Commented] (TIKA-761) Provide version number by CLI argument -V

2011-10-24 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134113#comment-13134113 ] Jukka Zitting commented on TIKA-761: +1 Looks good. As a possible improvement, as Nick

[jira] [Commented] (TIKA-755) Add getDetector() method to TikaConfig

2011-10-18 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129971#comment-13129971 ] Jukka Zitting commented on TIKA-755: Hmm, I looked at the interaction between Tika and T

[jira] [Commented] (TIKA-756) XMP output from Tika CLI

2011-10-18 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129957#comment-13129957 ] Jukka Zitting commented on TIKA-756: Rough first version committed in revision 1185805.

[jira] [Commented] (TIKA-754) Automatic line break insertion (BR element) instead of '\n' in XHTMLContentHandler

2011-10-18 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129637#comment-13129637 ] Jukka Zitting commented on TIKA-754: I don't think it's necessarily a good idea to make

[jira] [Commented] (TIKA-657) Email parser gets into trouble on malformed html in enron corpus

2011-10-13 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126938#comment-13126938 ] Jukka Zitting commented on TIKA-657: In revision 1183109 I increased the default line an

[jira] [Commented] (TIKA-513) Support of Deja Vu (DjVu) format

2011-10-07 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122639#comment-13122639 ] Jukka Zitting commented on TIKA-513: Is there a DjVu parser we could use?

[jira] [Commented] (TIKA-272) Expose characters offsets information while parsing text-based inputs.

2011-10-07 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122625#comment-13122625 ] Jukka Zitting commented on TIKA-272: See PDFBOX-577 for some related work in PDFBox.

[jira] [Commented] (TIKA-381) HtmlParser should strip linefeeds out of links

2011-10-07 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122617#comment-13122617 ] Jukka Zitting commented on TIKA-381: The relevant TagSoup scanner state transitions are

[jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app

2011-10-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121157#comment-13121157 ] Jukka Zitting commented on TIKA-636: Do you still see this problem with Tika 0.10? If ye

[jira] [Commented] (TIKA-734) Out of memory exception with Xlsx file less than 5 MB

2011-10-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121032#comment-13121032 ] Jukka Zitting commented on TIKA-734: Tika 0.10 is now available. If the problem still oc

[jira] [Commented] (TIKA-735) OpenOffice parser: embedded OLE docs are extracted at the end, as extra ...

2011-10-01 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118861#comment-13118861 ] Jukka Zitting commented on TIKA-735: A parser should always produce valid XHTML output.