Yes, that worked quite well. I still need the "//tagname" but that is the only DIH incantation I need. This will substantially accelerate things.
On Mon, Dec 8, 2014 at 5:37 PM, Dan Davis <d...@danizen.net> wrote: > The problem is that XPathEntityProcessor implements Xpath on its own, and > implements a subset of XPath. So, if the input document is small enough, > it makes no sense to fight it. One possibility is to apply an XSLT to the > file before processing ite > > This blog post > <http://www.andornot.com/blog/post/Sample-Solr-DataImportHandler-for-XML-Files.aspx> > shows a worked example. The XSL transform takes place before the forEach > or field specifications, which is the principal question I had about it > from the documentation. This is also illustrated in the initQuery() > private method of XPathEntityProcessor. You can see the transformation > being applied before the forEach. This will not scale to extremely large > XML documents including millions of rows - that is why they have the > stream="true" argument there, so that you don't preprocess the document. > In my case, the entire XML file is 29M, and so I think I could do the XSL > transformation and then do for each document. > > This potentially shortens my time frame of moving to Apache Solr > substantially, because the common case with our previous indexer is to run > XSLT to trasform to the document format desired by the indexer. > > On Mon, Dec 8, 2014 at 5:10 PM, Alexandre Rafalovitch <arafa...@gmail.com> > wrote: > >> I don't believe there are any alternatives. At least I could not get >> anything but the full path to work. >> >> Regards, >> Alex. >> Personal: http://www.outerthoughts.com/ and @arafalov >> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart >> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 >> >> >> On 8 December 2014 at 17:01, Dan Davis <dansm...@gmail.com> wrote: >> > In experimentation with a much simpler and smaller XML file, it doesn't >> > look like '//health-topic/@url" will not work, nor will '//@url' etc. >> So >> > far, only spelling it all out will work. >> > With child elements, such as <title>, an xpath of "//title" works fine, >> but >> > it is beginning to same dangerous. >> > >> > Is there any short-hand for the current node or the match? >> > >> > On Mon, Dec 8, 2014 at 4:42 PM, Dan Davis <dansm...@gmail.com> wrote: >> > >> >> When I have a forEach attribute like the following: >> >> >> >> >> >> >> forEach="/medical-topics/medical-topic/health-topic[@language='English']" >> >> >> >> And then need to match an attribute of that, is there any alternative >> to >> >> spelling it all out: >> >> >> >> <field column="url" >> >> >> xpath="/medical-topics/medical-topic/health-topic[@language='English']/@url"/> >> >> >> >> I suppose I could do "//health-topic/@url" since the document should >> then >> >> have a single health-topic (as long as I know they don't nest). >> >> >> >> >> > >