It looks like DIH implements its own subset of the Xpath spec. I don't
see any tests with multiple matching sub nodes, so perhaps DIH Xpath
does not properly support that and just selects the last matching node?
Also, I don't think the double / matters. That would just allow more
nodes in between, but since there are not any in between in your example
document, its the same as a single /.
- Mark
Jay Hill wrote:
It is not multivalued. The intention is to get all text under they <body>
element into one "body" field in the index that is not multivalued.
Essentially everything within the <body> element minus the markup.
Thanks,
-Jay
On Thu, Jul 2, 2009 at 8:55 AM, Fergus McMenemie <fer...@twig.me.uk> wrote:
Thanks Noble, I gave those examples a try.
If I use <field column="body" xpath="/book/body/chapter/p" /> I only get
the text from the last <p> element, not from all elements.
Hmmmmm, I am sure I have done this. In your schema.xml is the
field "body" multiValued or not?
If I use <field column="body" xpath="/book/body/chapter" flatten="true"/>
or <field column="body" xpath="/book/body/chapter/" flatten="true"/> I
don't
get back anything for the body column.
So the first example is close, but it only gets the text for the last <p>
element. If I could get all <p> elements at the same level that would be
what I need. The double-slash (/book/body/chapter//p) doesn't seem to be
supported.
Thanks,
-Jay
2009/7/1 Noble Paul ?????? Â Ë³Ë <noble.p...@corp.aol.com>
complete xpath is not supported
/book/body/chapter/p
should work.
if you wish all the text under <chapter> irrespective of nesting , tag
names use this
<field column="body" xpath="/book/body/chapter flatten="true"/>
On Thu, Jul 2, 2009 at 5:31 AM, Jay Hill<jayallenh...@gmail.com> wrote:
I'm using the XPathEntityProcessor to parse an xml structure that
looks
like
this:
<book>
<author>Joe Smith</author>
<title>World Atlas</title>
<body>
<chapter>
<p>Content I want is here</p>
<p>More content I want is here.</p>
<p>Still more content here.>/p>
</chapter>
</body>
</book>
The author and title parse out fine: <field column="title"
xpath="/book/title"/> <field column="author" xpath="/book/author"/>
But I can't get at the data inside the <p> tags. I want to get all
non-markup text inside the body tag with something like this:
<field column="body" xpath="/book/body/chapter//p"/>
but that is not supported.
Does anyone know of a way that I can get the content within the <p>
tags
without the markup?
Thanks,
-Jay
--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com
--
===============================================================
Fergus McMenemie
Email:fer...@twig.me.uk<email%3afer...@twig.me.uk>
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================
--
- Mark
http://www.lucidimagination.com