[
https://issues.apache.org/jira/browse/TIKA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656484#action_12656484
]
Jukka Zitting commented on TIKA-182:
------------------------------------
After thinking about this a bit more, I find myself reluctant to apply this
patch. Adding such a low-level extension point essentially prevents us from
changing to some other parser library that doesn't generate those low-level SAX
events. For example I wouldn't count out the possibility that at some point
we'd want to replace NekoHTML with a higher level HTML parser that better
expresses how the HTML content gets expressed to the user.
> Allow clients to listen to the raw SAX events if available
> ----------------------------------------------------------
>
> Key: TIKA-182
> URL: https://issues.apache.org/jira/browse/TIKA-182
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Jukka Zitting
> Priority: Minor
>
> As discussed on the mailing list
> (http://markmail.org/message/gojiffbhlcuifnzd) it would be nice to allow
> clients to listen to the raw SAX events of an underlying XML-based (or -like)
> document.
> There's a proposed patch for the HTML parser in
> http://markmail.org/message/l72v6ybf4jjrcp7p
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.