On Wed, Feb 4, 2009 at 6:13 AM, Chris Hostetter
<hossman_luc...@fucit.org> wrote:
>
> : > The solr data field is populated properly. So I guess that bit works.
> : > I really wish I could use xpath="//para"
>
> : The limitation comes from streaming the XML instead of creating a DOM.
> : XPathRecordReader is a custom streaming XPath parser implementation and
> : streaming is easy only because we limit the syntax. You can use
> : PlainTextEntityProcessor which gives the XML as a string to a  custom
> : Transformer. This Transformer can create a DOM, run your XPath query and
> : populate the fields. It's more expensive but it is an option.
>
> Maybe it's just me, but it seems like i'm noticing that as DIH gets used
> more, many people are noting that the XPath processing in DIH doesn't work
> the way they expect because it's a custom XPath parser/engine designed for
> streaming.
>
> It seems like it would be helpful to have an alternate processor for
> people who don't need the streaming support (ie: are dealing with small
> enough docs that they can load the full DOM tree into memory) that would
> use the default Java XPath engine (and have less caveats/suprises) ... i
> wou think it would probably even make sense for this new XPath processor
> to be the one we suggest for new users, and only suggest the existing
> (stream based) processor if they have really big xml docs to deal with.
>
I guess the current XPathEntityProcessor must be able to switch
between the streaming xpath(XPathRecordReader) and the default java
XPath engine .

I am just hoping that all the current syntax and semantics will be
applicable for the Java Xpath engine. If not ,we will need a new
EntityProcessor.

I also would like to explore if the current XPathRecordReader can
implement more XPath syntax with streaming.

The java xpath engine is not at all efficient for large scale data processing


> (In hindsight XPathEntityProcessor and XPathRecordReader should probably
> have been named StreamingXPathEntityProcessor and
> StreamingXPathRecordReader)

>
> thoughts?
>
>
> -Hoss
>
>



-- 
--Noble Paul

Reply via email to