On Wed, Feb 4, 2009 at 6:13 AM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : > The solr data field is populated properly. So I guess that bit works. > : > I really wish I could use xpath="//para" > > : The limitation comes from streaming the XML instead of creating a DOM. > : XPathRecordReader is a custom streaming XPath parser implementation and > : streaming is easy only because we limit the syntax. You can use > : PlainTextEntityProcessor which gives the XML as a string to a custom > : Transformer. This Transformer can create a DOM, run your XPath query and > : populate the fields. It's more expensive but it is an option. > > Maybe it's just me, but it seems like i'm noticing that as DIH gets used > more, many people are noting that the XPath processing in DIH doesn't work > the way they expect because it's a custom XPath parser/engine designed for > streaming. > > It seems like it would be helpful to have an alternate processor for > people who don't need the streaming support (ie: are dealing with small > enough docs that they can load the full DOM tree into memory) that would > use the default Java XPath engine (and have less caveats/suprises) ... i > wou think it would probably even make sense for this new XPath processor > to be the one we suggest for new users, and only suggest the existing > (stream based) processor if they have really big xml docs to deal with. > I guess the current XPathEntityProcessor must be able to switch between the streaming xpath(XPathRecordReader) and the default java XPath engine .
I am just hoping that all the current syntax and semantics will be applicable for the Java Xpath engine. If not ,we will need a new EntityProcessor. I also would like to explore if the current XPathRecordReader can implement more XPath syntax with streaming. The java xpath engine is not at all efficient for large scale data processing > (In hindsight XPathEntityProcessor and XPathRecordReader should probably > have been named StreamingXPathEntityProcessor and > StreamingXPathRecordReader) > > thoughts? > > > -Hoss > > -- --Noble Paul