currently the initial counter is not set , so the value becomes an empty string
http://subdomain.site.com/boards.rss?page=${blogs.n}
becomes
http://subdomain.site.com/boards.rss?page=

we need to fix this. Unfortunately the transformer is invoked only
after the first chunk is fetched.

the best bet is to keep the url  as
http://subdomain.site.com/boards.rss?page=1

create the $nextUrl from the transformer and return it in the row

so the url is ignored for second chunk onwards and the value of
$nextUrl will be used




On Tue, Feb 3, 2009 at 12:13 AM, Jon Baer <jonb...@gmail.com> wrote:
> See I think Im just misunderstanding how this entity is suppose to be setup
> ... for example, using the patch on 1.3 I ended up in a loop where .n is
> never set ...
>
> Feb 2, 2009 1:31:02 PM org.apache.solr.handler.dataimport.HttpDataSource
> getData
> INFO: Created URL to: http://subdomain.site.com/feed.rss?page=
>
> <entity dataSource="blogs" url="
> http://subdomain.site.com/boards.rss?page=${blogs.n}"; chunkSize="50"
> name="docs" pk="link" processor="XPathEntityProcessor"
> forEach="/rss/channel/item" transformer="RegexTransformer,
> com.nhl.solr.DateFormatTransformer, TemplateTransformer,
> com.nhl.solr.EnumeratedEntityTransformer">
>
> I guess what Im looking for is that snippet which shows how it is setup (the
> initial counter) ...
>
> - Jon
>
> On Mon, Feb 2, 2009 at 12:39 PM, Noble Paul നോബിള്‍ नोब्ळ् <
> noble.p...@gmail.com> wrote:
>
>> On Mon, Feb 2, 2009 at 11:01 PM, Jon Baer <jonb...@gmail.com> wrote:
>> > Yes I think what Jared mentions in the JIRA is what I was thinking about
>> > when it is recommended to always return true for $hasMore ...
>> >
>> > "The transformer must know somehow when $hasMore should be true. If the
>> > transformer always give $hasMore a value "true", will there be infinite
>> > requests made or will it stop on the first empty request? Using the
>> > EnumeratedEntityTransformer, a user can specify from the config xml when
>> > $hasMore should be true using the chunkSize attribute. This solves a
>> general
>> > case of "request N rows at a time until no more are available". I agree,
>> a
>> > combination of 'rowsFetchedCount' and a HasMoreUntilEmptyTransformer
>> would
>> > also make this doable from the configuration"
>> why cant a Tranformer put a $hasMore=false?
>> >
>> > This makes sense.
>> >
>> > - Jon
>> >  [ Show » <https://issues.apache.org/jira/browse/SOLR-994> ]
>> >  Jared Flatow<
>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=jflatow>-
>> > 28/Jan/09
>> > 09:16 PM The transformer must know somehow when $hasMore should be true.
>> If
>> > the transformer always give $hasMore a value "true", will there be
>> infinite
>> > requests made or will it stop on the first empty request? Using the
>> > EnumeratedEntityTransformer, a user can specify from the config xml when
>> > $hasMore should be true using the chunkSize attribute. This solves a
>> general
>> > case of "request N rows at a time until no more are available". I agree,
>> a
>> > combination of 'rowsFetchedCount' and a HasMoreUntilEmptyTransformer
>> would
>> > also make this doable from the configuration.
>> >
>> > On Mon, Feb 2, 2009 at 11:53 AM, Shalin Shekhar Mangar <
>> > shalinman...@gmail.com> wrote:
>> >
>> >> On Mon, Feb 2, 2009 at 9:20 PM, Jon Baer <jonb...@gmail.com> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > Sorry I know this exists ...
>> >> >
>> >> > "If an API supports chunking (when the dataset is too large) multiple
>> >> calls
>> >> > need to be made to complete the process. XPathEntityprocessor supports
>> >> this
>> >> > with a transformer. If transformer returns a row which contains a
>> field *
>> >> > $hasMore* with a the value "true" the Processor makes another request
>> >> with
>> >> > the same url template (The actual value is recomputed before invoking
>> ).
>> >> A
>> >> > transformer can pass a totally new url too for the next call by
>> returning
>> >> a
>> >> > row which contains a field *$nextUrl* whose value must be the complete
>> >> url
>> >> > for the next call."
>> >> >
>> >> > But is there a true example of it's use somewhere?  Im trying to
>> figure
>> >> out
>> >> > if I know before import that I have 56 "pages" to index how to set
>> this
>> >> up
>> >> > properly.  (And how to set it up if pages need to be determined by
>> >> > something
>> >> > in the feed, etc).
>> >> >
>> >>
>> >> No, there is no example (yet). You'll put the url with variables for the
>> >> corresponding 'start' and 'count' parameters and a custom transformer
>> can
>> >> specify if another request needs to be made. I know it's not much to go
>> on.
>> >> I'll try to write some documentation on the wiki.
>> >>
>> >> SOLR-994 might be interesting to you. I haven't been able to look at the
>> >> patch though.
>> >>
>> >>  https://issues.apache.org/jira/browse/SOLR-994
>> >> --
>> >> Regards,
>> >> Shalin Shekhar Mangar.
>> >>
>> >
>>
>>
>>
>> --
>> --Noble Paul
>>
>



-- 
--Noble Paul

Reply via email to