Re: Streaming Docs, Terms, TermVectors

Walter Underwood Sat, 30 May 2009 11:14:28 -0700

Don't stream, request chunks of 10 or 100 at a time. It works fine and
you don't have to write or test any new code. In addition, it works
well with HTTP caches, so if two clients want to get the same data,
the second can get it from the cache.


We do that at Netflix. Each front-end box does a series of queries
to get all the movie titles, then loads them into a local index for
autocomplete.

wunder

On 5/30/09 11:01 AM, "Kaktu Chakarabati" <jimmoe...@gmail.com> wrote:

> For a streaming-like solution, it is possible infact to have a working
> buffer in-memory that emits chunks on an http connection which is kept alive
> by the server until the full response has been sent.
> This is quite similar for example to how video streaming protocols which can
> operate on top of HTTP work ( cf. a more general discussion on
> http://ajaxpatterns.org/HTTP_Streaming#In_A_Blink ).
> Another (non-mutually exclusive) possibility is to introduce a novel binary
> format for the transmission of such data ( i.e a new wt=<..> type ) over
> http (or any other comm. protocol) so that data can be more effectively
> compressed and made to better fit into memory.
> One such format which has been widely circulating and already has many open
> source projects implementing it is Adobe's AMF (
> http://osflash.org/documentation/amf ). It is however a proprietary format
> so i'm not sure whether it is incorporable under apache foundation terms.
> 
> -Chak
> 
> 
> On Sat, May 30, 2009 at 9:58 AM, Dietrich Featherston
> <d...@dfeatherston.com>wrote:
> 
>> I was actually curious about the same thing.  Perhaps an endpoint reference
>> could be passed in the request where the documents can be sent
>> asynchronously, such as a jms topic.
>> 
>> solr/query?q=*:*&epr=/my/topic&eprtype=jms
>> 
>> Then we would need to consider how to break up the response, how to cancel
>> a running query, etc.
>> 
>> Is this along the lines of what you're looking for?  I would be interested
>> in looking at how the request/response contract changes and what types of
>> endpoint references would be supported.
>> 
>> Thanks,
>> D
>> 
>> On May 30, 2009, at 12:45 PM, Grant Ingersoll <gsing...@apache.org> wrote:
>> 
>>  Anyone have any thoughts on what is involved with streaming lots of
>>> results out of Solr?
>>> 
>>> For instance, if I wanted to get something like 1M docs out of Solr (or
>>> more) via *:* query, how can I tractably do this?  Likewise, if I wanted to
>>> return all the terms in the index or all the Term Vectors.
>>> 
>>> Obviously, it is impossible to load all of these things into memory and
>>> then create a response, so I was wondering if anyone had any ideas on how to
>>> stream them.
>>> 
>>> Thanks,
>>> Grant
>>> 
>>

Re: Streaming Docs, Terms, TermVectors

Reply via email to