On Thu, Jul 30, 2009 at 1:23 AM, Erik Hatcher<e...@ehatchersolutions.com> wrote: > I've been troubleshooting an issue where we're trying to load documents > through DIH's URLDataSource and XPathEntityProcessor, where we want to > leverage the $hasMore feature to request to a new URL. > > I've been tinkering with this using a very simple example, two XML files - > > solr.xml: > <add> > <doc> > <field name="id">SOLR1000</field> > </doc> > <doc> > <field name="id">**HASMORE**</field> > </doc> > </add> > > solr2.xml > <add> > <doc> > <field name="id">SOLR2k</field> > </doc> > </add> > > My DIH config is: > > <?xml version="1.0"?> > <dataConfig> > <dataSource type="URLDataSource" > baseUrl="file:///Users/erikhatcher/dev/solr/example/exampledocs/" > readTimeout="180000" connectionTimeout="60000"/> > > <script> > <![CDATA[ > function checkForMore(row, context) { > print("### checkForMore: " + row); > if (row.get('id') == '**HASMORE**') { > print("#### hasMore ####"); > row.put('$hasMore', 'true'); > row.put('$nextUrl', > 'file:///Users/erikhatcher/dev/solr/example/exampledocs/solr2.xml'); > row.put('$skipRow', 'true'); > } else { > row.put('$hasMore', 'false'); > } > return row; > } > ]]> > </script> > > <document name="docs"> > <entity name="doc" > processor="XPathEntityProcessor" > url="solr.xml" > forEach="/add/doc" > stream="true" > > transformer="DateFormatTransformer,TemplateTransformer,script:checkForMore" > onError="abort"> > <field column="id" xpath="/add/doc/fie...@name='id']"/> > </entity> > </document> > </dataConfig> > > Without the else clause in checkForMore to set $hasMore to false, an > infinite loop occurs and solr2.xml is requested repeatedly. This is because > once $hasMore is set on a row, XPathEntityProcess#readUsefulVars sets it in > entity scope and it never gets unset. Is this intentional? Shouldn't > $hasMore get reset after more is requested?
I would say we must reset it after using once. > > On a related note, it would seem useful to allow $hasMore/$skipRow/$nextUrl > to be controlled from the XML data rather than solely from a transformer. > But $prefixed fields are ignored by DIH, right? This is possible using a RegexTransformer (so you may not need to write your own) <field column="$hasMore" regex="HASMORE" replaceWith="true"/> > > I'm still looking for that holy grail of a good example leveraging > $hasMore/$nextUrl! :) > > Thanks, > Erik > > -- ----------------------------------------------------- Noble Paul | Principal Engineer| AOL | http://aol.com