Re: What's the best practices for indexing XML Content with dynamic XML Elements (SOLR 6.1) ?

Erick Erickson Tue, 16 Aug 2016 09:20:02 -0700

You haven't really described the scenario you want
to implement. I get that you have raw XML of an
unknown structure. What do you want to _do_ with that?

1> if all you want to do is index the data (i.e. strip the tags)
try HtmlStripCharFilterFactory.
2> If you want to intelligently take content of the XML
and ingest it into specific Solr fields, I don't think you'll be
able to do that without writing some specific code to
parse the XML, explore it and "do the right thing" with it
which will probably involve SolrJ, an XML parser and
some programming.

Best,
Erick

On Tue, Aug 16, 2016 at 6:15 AM, Stan Lee <sleed...@gmail.com> wrote:
> We currently have a Microsoft SQL table with a XML datatype. We use DIH to
> import the XML Content as is, that is not using the XPathEntityProcessor.
> If the elements of the XML content is known, XPathEntity make sense. Could
> someone kindly suggest the right way of handling such scenario, without
> impacting search performance?
> Which tokenizer should we be using?
>
>
> Thanks.

Re: What's the best practices for indexing XML Content with dynamic XML Elements (SOLR 6.1) ?

Reply via email to