Re: DIH using values from solrconfig.xml inside data-config.xml

Chris Hostetter Tue, 03 Feb 2009 16:43:36 -0800

: > The solr data field is populated properly. So I guess that bit works.
: > I really wish I could use xpath="//para"


: The limitation comes from streaming the XML instead of creating a DOM.
: XPathRecordReader is a custom streaming XPath parser implementation and
: streaming is easy only because we limit the syntax. You can use
: PlainTextEntityProcessor which gives the XML as a string to a  custom
: Transformer. This Transformer can create a DOM, run your XPath query and
: populate the fields. It's more expensive but it is an option.

Maybe it's just me, but it seems like i'm noticing that as DIH gets used 
more, many people are noting that the XPath processing in DIH doesn't work 
the way they expect because it's a custom XPath parser/engine designed for 
streaming.  

It seems like it would be helpful to have an alternate processor for 
people who don't need the streaming support (ie: are dealing with small 
enough docs that they can load the full DOM tree into memory) that would 
use the default Java XPath engine (and have less caveats/suprises) ... i 
wou think it would probably even make sense for this new XPath processor 
to be the one we suggest for new users, and only suggest the existing 
(stream based) processor if they have really big xml docs to deal with.

(In hindsight XPathEntityProcessor and XPathRecordReader should probably 
have been named StreamingXPathEntityProcessor and 
StreamingXPathRecordReader)

thoughts?


-Hoss

Re: DIH using values from solrconfig.xml inside data-config.xml

Reply via email to