[ 
https://issues.apache.org/jira/browse/SOLR-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670099#action_12670099
 ] 

Shalin Shekhar Mangar commented on SOLR-1003:
---------------------------------------------

No, not really. If HTML is embedded inside an XML document it needs to be 
encoded properly (replace '<' with &lt; etc.). The example described here does 
not contain HTML, rather it contains XML nodes inside the "xhtml : p" node 
mixed with Text nodes. This is the same example which led to the discovery of 
SOLR-999 issue.

> XPathEntityprocessor must allow slurping all text from a given xml node and 
> its children
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-1003
>                 URL: https://issues.apache.org/jira/browse/SOLR-1003
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Noble Paul
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-1003.patch
>
>
> take an example:
> {code:xml}
> <xhtml:p>This text is 
>   <xhtml:b>bold</xhtml:b> and this text is 
>   <xhtml:u>underlined</xhtml:u>!
> </xhtml:p>
> {code}
> It may be useful to get all the text from all the tags in <xhtml: p> ignoring 
> the tag names .
> the configuration of the field may look like
> {code:xml}
> <field column="para" xpath="/p" flatten="true"/>
> {code}
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to