currently the initial counter is not set , so the value becomes an empty string http://subdomain.site.com/boards.rss?page=${blogs.n} becomes http://subdomain.site.com/boards.rss?page=
we need to fix this. Unfortunately the transformer is invoked only after the first chunk is fetched. the best bet is to keep the url as http://subdomain.site.com/boards.rss?page=1 create the $nextUrl from the transformer and return it in the row so the url is ignored for second chunk onwards and the value of $nextUrl will be used On Tue, Feb 3, 2009 at 12:13 AM, Jon Baer <jonb...@gmail.com> wrote: > See I think Im just misunderstanding how this entity is suppose to be setup > ... for example, using the patch on 1.3 I ended up in a loop where .n is > never set ... > > Feb 2, 2009 1:31:02 PM org.apache.solr.handler.dataimport.HttpDataSource > getData > INFO: Created URL to: http://subdomain.site.com/feed.rss?page= > > <entity dataSource="blogs" url=" > http://subdomain.site.com/boards.rss?page=${blogs.n}" chunkSize="50" > name="docs" pk="link" processor="XPathEntityProcessor" > forEach="/rss/channel/item" transformer="RegexTransformer, > com.nhl.solr.DateFormatTransformer, TemplateTransformer, > com.nhl.solr.EnumeratedEntityTransformer"> > > I guess what Im looking for is that snippet which shows how it is setup (the > initial counter) ... > > - Jon > > On Mon, Feb 2, 2009 at 12:39 PM, Noble Paul നോബിള് नोब्ळ् < > noble.p...@gmail.com> wrote: > >> On Mon, Feb 2, 2009 at 11:01 PM, Jon Baer <jonb...@gmail.com> wrote: >> > Yes I think what Jared mentions in the JIRA is what I was thinking about >> > when it is recommended to always return true for $hasMore ... >> > >> > "The transformer must know somehow when $hasMore should be true. If the >> > transformer always give $hasMore a value "true", will there be infinite >> > requests made or will it stop on the first empty request? Using the >> > EnumeratedEntityTransformer, a user can specify from the config xml when >> > $hasMore should be true using the chunkSize attribute. This solves a >> general >> > case of "request N rows at a time until no more are available". I agree, >> a >> > combination of 'rowsFetchedCount' and a HasMoreUntilEmptyTransformer >> would >> > also make this doable from the configuration" >> why cant a Tranformer put a $hasMore=false? >> > >> > This makes sense. >> > >> > - Jon >> > [ Show » <https://issues.apache.org/jira/browse/SOLR-994> ] >> > Jared Flatow< >> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=jflatow>- >> > 28/Jan/09 >> > 09:16 PM The transformer must know somehow when $hasMore should be true. >> If >> > the transformer always give $hasMore a value "true", will there be >> infinite >> > requests made or will it stop on the first empty request? Using the >> > EnumeratedEntityTransformer, a user can specify from the config xml when >> > $hasMore should be true using the chunkSize attribute. This solves a >> general >> > case of "request N rows at a time until no more are available". I agree, >> a >> > combination of 'rowsFetchedCount' and a HasMoreUntilEmptyTransformer >> would >> > also make this doable from the configuration. >> > >> > On Mon, Feb 2, 2009 at 11:53 AM, Shalin Shekhar Mangar < >> > shalinman...@gmail.com> wrote: >> > >> >> On Mon, Feb 2, 2009 at 9:20 PM, Jon Baer <jonb...@gmail.com> wrote: >> >> >> >> > Hi, >> >> > >> >> > Sorry I know this exists ... >> >> > >> >> > "If an API supports chunking (when the dataset is too large) multiple >> >> calls >> >> > need to be made to complete the process. XPathEntityprocessor supports >> >> this >> >> > with a transformer. If transformer returns a row which contains a >> field * >> >> > $hasMore* with a the value "true" the Processor makes another request >> >> with >> >> > the same url template (The actual value is recomputed before invoking >> ). >> >> A >> >> > transformer can pass a totally new url too for the next call by >> returning >> >> a >> >> > row which contains a field *$nextUrl* whose value must be the complete >> >> url >> >> > for the next call." >> >> > >> >> > But is there a true example of it's use somewhere? Im trying to >> figure >> >> out >> >> > if I know before import that I have 56 "pages" to index how to set >> this >> >> up >> >> > properly. (And how to set it up if pages need to be determined by >> >> > something >> >> > in the feed, etc). >> >> > >> >> >> >> No, there is no example (yet). You'll put the url with variables for the >> >> corresponding 'start' and 'count' parameters and a custom transformer >> can >> >> specify if another request needs to be made. I know it's not much to go >> on. >> >> I'll try to write some documentation on the wiki. >> >> >> >> SOLR-994 might be interesting to you. I haven't been able to look at the >> >> patch though. >> >> >> >> https://issues.apache.org/jira/browse/SOLR-994 >> >> -- >> >> Regards, >> >> Shalin Shekhar Mangar. >> >> >> > >> >> >> >> -- >> --Noble Paul >> > -- --Noble Paul