I wrote a nested HttpDataSource RSS poller. The outer loop reads an rss feed which contains N links to other rss feeds. The nested loop then reads each one of those to create documents. (Yes, this is an obnoxious thing to do.) Let's say the outer RSS feed gives 10 items. Both feeds use the same structure: /rss/channel with a <title> node and then N <item> nodes inside the channel. This should create two separate XML streams with two separate Xpath iterators, right?
<entity name="outer" http stuff> <field column="name" xpath="/rss/channel/title" /> <field column="url" xpath="/rss/channel/item/link"/> <entity name="inner" http stuff url="${outer.url}" pk="title" > <field column="title" xpath="/rss/channel/item/title" /> </entity> </entity> This does indeed walk each url from the outer feed and then fetch the inner rss feed. Bravo! However, I found two separate problems in xpath iteration. They may be related. The first problem is that it only stores the first document from each "inner" feed. Each feed has several documents with different title fields but it only grabs the first. The other is an off-by-one bug. The outer loop iterates through the 10 items and then tries to pull an 11th. It then gives this exception trace: INFO: Created URL to: [inner url] Oct 31, 2008 11:21:20 PM org.apache.solr.handler.dataimport.HttpDataSource getData SEVERE: Exception thrown while getting data java.net.MalformedURLException: no protocol: null/account.rss at java.net.URL.<init>(URL.java:567) at java.net.URL.<init>(URL.java:464) at java.net.URL.<init>(URL.java:413) at org.apache.solr.handler.dataimport.HttpDataSource.getData(HttpDataSource.jav a:90) at org.apache.solr.handler.dataimport.HttpDataSource.getData(HttpDataSource.jav a:47) at org.apache.solr.handler.dataimport.DebugLogger$2.getData(DebugLogger.java:18 3) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntit yProcessor.java:210) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEn tityProcessor.java:180) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityP rocessor.java:160) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java: 285) ... Oct 31, 2008 11:21:20 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: album document : SolrInputDocumnt[{name=name(1.0)={Groups of stuff}}] org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in invoking url null Processing Document # 11 at org.apache.solr.handler.dataimport.HttpDataSource.getData(HttpDataSource.jav a:115) at org.apache.solr.handler.dataimport.HttpDataSource.getData(HttpDataSource.jav a:47)