: > The solr data field is populated properly. So I guess that bit works. : > I really wish I could use xpath="//para"
: The limitation comes from streaming the XML instead of creating a DOM. : XPathRecordReader is a custom streaming XPath parser implementation and : streaming is easy only because we limit the syntax. You can use : PlainTextEntityProcessor which gives the XML as a string to a custom : Transformer. This Transformer can create a DOM, run your XPath query and : populate the fields. It's more expensive but it is an option. Maybe it's just me, but it seems like i'm noticing that as DIH gets used more, many people are noting that the XPath processing in DIH doesn't work the way they expect because it's a custom XPath parser/engine designed for streaming. It seems like it would be helpful to have an alternate processor for people who don't need the streaming support (ie: are dealing with small enough docs that they can load the full DOM tree into memory) that would use the default Java XPath engine (and have less caveats/suprises) ... i wou think it would probably even make sense for this new XPath processor to be the one we suggest for new users, and only suggest the existing (stream based) processor if they have really big xml docs to deal with. (In hindsight XPathEntityProcessor and XPathRecordReader should probably have been named StreamingXPathEntityProcessor and StreamingXPathRecordReader) thoughts? -Hoss