Hi all,

I'm trying to set up DataImportHandler to index some XML documents available
over web services. The XML includes both content and metadata, so for the
indexable content, I'm trying to just index everything under the content
tag:

<entity dataSource="kbws" name="kbxml" pk="title"
        url="resturl" processor="XPathEntityProcessor"
        forEach="/document" transformer="HTMLStripTransformer"
flatten="true">
<field column="content" name="content" xpath="/document/kbml/body"
flatten="true" stripHTML="true" />
<field column="title" name="title" xpath="/document/kbml/kbq" />
</entity>

The result of this is that the title field gets populated and indexed (there
are no child nodes of /document/kbml/kbq), but content does not get indexed
at all. Since /document/kbml/body has many children, I expected that
flatten="true" would store all of the body text in the field. Instead, it
stores nothing at all. I've tried this with many combinations of
transformers and flatten options, and the result is the same each time.

Here are the relevant field declarations from the schema (the type="text" is
just the one from the example's schema.xml). I have tried combinations here
as well of stored= and multiValued=, with the same result each time.

<field name="title" type="text" indexed="true" stored="true"
multiValued="true" />
<field name="content" type="text" indexed="true" stored="true"
multiValued="true" />

If it would help troubleshooting, I could send along some sample XML. I
don't want to spam the list with an attachment unless it's necessary, though
:)

Thanks in advance for your help,

Adam Foltzer

Reply via email to