send a small sample xml snippet you are trying to index and it may help

On Tue, Oct 6, 2009 at 9:29 PM, Adam Foltzer <acfolt...@gmail.com> wrote:
> Hi all,
>
> I'm trying to set up DataImportHandler to index some XML documents available
> over web services. The XML includes both content and metadata, so for the
> indexable content, I'm trying to just index everything under the content
> tag:
>
> <entity dataSource="kbws" name="kbxml" pk="title"
>        url="resturl" processor="XPathEntityProcessor"
>        forEach="/document" transformer="HTMLStripTransformer"
> flatten="true">
> <field column="content" name="content" xpath="/document/kbml/body"
> flatten="true" stripHTML="true" />
> <field column="title" name="title" xpath="/document/kbml/kbq" />
> </entity>
>
> The result of this is that the title field gets populated and indexed (there
> are no child nodes of /document/kbml/kbq), but content does not get indexed
> at all. Since /document/kbml/body has many children, I expected that
> flatten="true" would store all of the body text in the field. Instead, it
> stores nothing at all. I've tried this with many combinations of
> transformers and flatten options, and the result is the same each time.
>
> Here are the relevant field declarations from the schema (the type="text" is
> just the one from the example's schema.xml). I have tried combinations here
> as well of stored= and multiValued=, with the same result each time.
>
> <field name="title" type="text" indexed="true" stored="true"
> multiValued="true" />
> <field name="content" type="text" indexed="true" stored="true"
> multiValued="true" />
>
> If it would help troubleshooting, I could send along some sample XML. I
> don't want to spam the list with an attachment unless it's necessary, though
> :)
>
> Thanks in advance for your help,
>
> Adam Foltzer
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Reply via email to