[ https://issues.apache.org/jira/browse/SOLR-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670099#action_12670099 ]
Shalin Shekhar Mangar commented on SOLR-1003: --------------------------------------------- No, not really. If HTML is embedded inside an XML document it needs to be encoded properly (replace '<' with < etc.). The example described here does not contain HTML, rather it contains XML nodes inside the "xhtml : p" node mixed with Text nodes. This is the same example which led to the discovery of SOLR-999 issue. > XPathEntityprocessor must allow slurping all text from a given xml node and > its children > ---------------------------------------------------------------------------------------- > > Key: SOLR-1003 > URL: https://issues.apache.org/jira/browse/SOLR-1003 > Project: Solr > Issue Type: New Feature > Components: contrib - DataImportHandler > Affects Versions: 1.4 > Reporter: Noble Paul > Priority: Minor > Fix For: 1.4 > > Attachments: SOLR-1003.patch > > > take an example: > {code:xml} > <xhtml:p>This text is > <xhtml:b>bold</xhtml:b> and this text is > <xhtml:u>underlined</xhtml:u>! > </xhtml:p> > {code} > It may be useful to get all the text from all the tags in <xhtml: p> ignoring > the tag names . > the configuration of the field may look like > {code:xml} > <field column="para" xpath="/p" flatten="true"/> > {code} > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.