[jira] [Created] (TIKA-985) Support for HTML5 elements

2012-08-30 Thread Markus Jelsma (JIRA)
Markus Jelsma created TIKA-985: -- Summary: Support for HTML5 elements Key: TIKA-985 URL: https://issues.apache.org/jira/browse/TIKA-985 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-985) Support for HTML5 elements

2012-08-30 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated TIKA-985: --- Attachment: TIKA-985-1.3-1.patch Here's a preliminary patch for 1.3. It adds some HTML5 elements to

[jira] [Updated] (TIKA-985) Support for HTML5 elements

2012-08-30 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated TIKA-985: --- Attachment: TIKA-985-1.3-2.patch Here's a new patch listing all HTML5 elements that are missing in the

[jira] [Created] (TIKA-986) NullPointerException trying to parse detached .pk7s signature

2012-08-30 Thread Michael McCandless (JIRA)
Michael McCandless created TIKA-986: --- Summary: NullPointerException trying to parse detached .pk7s signature Key: TIKA-986 URL: https://issues.apache.org/jira/browse/TIKA-986 Project: Tika

[jira] [Updated] (TIKA-986) NullPointerException trying to parse detached .pk7s signature

2012-08-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated TIKA-986: Attachment: TIKA-986.patch Patch, I think it's ready. The example I have isn't shareable

[jira] [Updated] (TIKA-986) NullPointerException trying to parse detached .pk7s signature

2012-08-30 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated TIKA-986: - Attachment: smime.p7s NullPointerException trying to parse detached .pk7s signature

[jira] [Updated] (TIKA-986) NullPointerException trying to parse detached .pk7s signature

2012-08-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated TIKA-986: Attachment: TIKA-986.patch Awesome, thanks Robert: new patch with test case. I think it's

Question about XPath Matcher code MatchingContentHandler

2012-08-30 Thread Ken Krugler
Hi Jukka, I was looking into a failure in a Bixo test, when using BodyContentHandler (wrapped by XHTMLContentHandler). The issue is that BodyContentHandler uses MatchingContentHandler to find only text in nodes under the /html/body hierarchy. And this in turn winds up not matching the html

Standard practice with @author in comments

2012-08-30 Thread Ken Krugler
Hi all, I'm wondering if we've got any convention for including/excluding @author tags. I remember discussion on other Apache project lists about explicitly not including these, but I see 19 in the trunk Tika source. Asking because some patch code being contributed has @author tags. Thanks,

[jira] [Assigned] (TIKA-980) MicrodataContentHandler for Apache Tika

2012-08-30 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler reassigned TIKA-980: Assignee: Ken Krugler MicrodataContentHandler for Apache Tika

Re: AutoDetectParser is not parsing UTF-16 content types

2012-08-30 Thread Ken Krugler
On Aug 29, 2012, at 9:24am, Jukka Zitting wrote: Hi, On Wed, Aug 29, 2012 at 6:02 PM, chraj007 chraj.k...@gmail.com wrote: http://lucene.472066.n3.nabble.com/file/n4004078/test.html test.html Looks like that file has an incorrect http-equiv declaration: META http-equiv=Content-Type

Re: Standard practice with @author in comments

2012-08-30 Thread Mattmann, Chris A (388J)
Hey Ken, I personally don't care too much about having @author tags, or not having them, but I know there are others more passionate (for example about NOT having them) :) Cheers, Chris On Aug 30, 2012, at 2:03 PM, Ken Krugler wrote: Hi all, I'm wondering if we've got any convention for

[jira] [Commented] (TIKA-980) MicrodataContentHandler for Apache Tika

2012-08-30 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445415#comment-13445415 ] Markus Jelsma commented on TIKA-980: No, the Any23 parser is DOM-based and the