Re: DIH: Limited xpath syntax unable to parse all xml elements

Mark Miller Thu, 02 Jul 2009 10:39:02 -0700

It looks like DIH implements its own subset of the Xpath spec. I don'tsee any tests with multiple matching sub nodes, so perhaps DIH Xpathdoes not properly support that and just selects the last matching node?

Also, I don't think the double / matters. That would just allow morenodes in between, but since there are not any in between in your exampledocument, its the same as a single /.


- Mark

Jay Hill wrote:

It is not multivalued. The intention is to get all text under they <body>
element into one "body" field in the index that is not multivalued.
Essentially everything within the <body> element minus the markup.

Thanks,
-Jay


On Thu, Jul 2, 2009 at 8:55 AM, Fergus McMenemie <fer...@twig.me.uk> wrote:

Thanks Noble, I gave those examples a try.

If I use <field column="body" xpath="/book/body/chapter/p" />  I only get
the text from the last <p> element, not from all elements.

Hmmmmm, I am sure I have done this. In your schema.xml is the
field "body" multiValued or not?

If I use <field column="body" xpath="/book/body/chapter" flatten="true"/>
or <field column="body" xpath="/book/body/chapter/" flatten="true"/> I

don't

get back anything for the body column.

So the first example is close, but it only gets the text for the last <p>
element. If I could get all <p> elements at the same level that would be
what I need. The double-slash (/book/body/chapter//p) doesn't seem to be
supported.

Thanks,
-Jay


2009/7/1 Noble Paul ??????  Â Ë³Ë <noble.p...@corp.aol.com>

complete xpath is not supported

/book/body/chapter/p

should work.

if you wish all the text under <chapter> irrespective of nesting , tag
names use this
<field column="body" xpath="/book/body/chapter flatten="true"/>






On Thu, Jul 2, 2009 at 5:31 AM, Jay Hill<jayallenh...@gmail.com> wrote:

I'm using the XPathEntityProcessor to parse an xml structure that

looks

like

this:

<book>
   <author>Joe Smith</author>
   <title>World Atlas</title>
   <body>
       <chapter>
           <p>Content I want is here</p>
           <p>More content I want is here.</p>
           <p>Still more content here.>/p>
       </chapter>
   </body>
</book>

The author and title parse out fine:       <field column="title"
xpath="/book/title"/>  <field column="author" xpath="/book/author"/>

But I can't get at the data inside the <p> tags. I want to get all
non-markup text inside the body tag with something like this:

<field column="body" xpath="/book/body/chapter//p"/>

but that is not supported.

Does anyone know of a way that I can get the content within the <p>

tags

without the markup?

Thanks,
-Jay


--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

--

===============================================================
Fergus McMenemie               
Email:fer...@twig.me.uk<email%3afer...@twig.me.uk>
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================





--
- Mark

http://www.lucidimagination.com

Re: DIH: Limited xpath syntax unable to parse all xml elements

Reply via email to