What are you doing that start=500000 is normal?  --wunder

On Jul 5, 2013, at 1:28 PM, Valery Giner wrote:

> Eric,
> 
> We did not have any RAM problems, but just the following official limitation 
> makes our life too miserable to use the shards:
> 
> "Makes it more inefficient to use a high "start" parameter. For example, if 
> you request start=500000&rows=25 on an index with 500,000+ docs per shard, 
> this will currently result in 500,000 records getting sent over the network 
> from the shard to the coordinating Solr instance. If you had a single-shard 
> index, in contrast, only 25 records would ever get sent over the network. 
> (Granted, setting start this high is not something many people need to do.) " 
>  http://wiki.apache.org/solr/DistributedSearch
> 
> Reading millions of documents as a result of a query is a "normal" use case 
> for us, not a "design defect".   Subdividing the "large" indexes into smaller 
> ones seems too ugly to use as a way to scale up.  This turns solr from a 
> perfect solution for us into something unacceptable for such cases.
> 
> I wonder whether any one else has similar use cases/problem with sharding.
> 
> Thanks,
> Val
> 
> On 05/03/2013 12:10 PM, Erick Erickson wrote:
>> My off the cuff thought is that there are significant costs trying to
>> do this that would be paid by 99.999% of setups out there. Also,
>> usually you'll run into other issues (RAM etc) long before you come
>> anywhere close to 2^31 docs.
>> 
>> Lucene/Solr often allocates int[maxDoc] for various operations. when
>> maxDoc approaches 2^31, well memory goes through the roof. Now
>> consider allocating longs instead...
>> 
>> which is a long way of saying that I don't really think anyone's going
>> to be working on this any time soon, especially when SolrCloud removes
>> a LOT of the pain /complexity (from a user perspective anyway) from
>> going to a sharded setup...
>> 
>> FWIW,
>> Erick
>> 
>> On Thu, May 2, 2013 at 1:17 PM, Valery Giner <valgi...@research.att.com> 
>> wrote:
>>> Otis,
>>> 
>>> The documents themselves are relatively small, tens of fields, only a few of
>>> them could be up to a hundred bytes.
>>> Lunix Servers with relatively large RAM (256),
>>> Minutes on the searches are fine for our purposes,  adding a few tens of
>>> millions of records in tens of minutes are also fine.
>>> We had to do some simple tricks for keeping indexing up to speed but nothing
>>> too fancy.
>>> Moving to the sharding adds a layer of complexity which we don't really need
>>> because of the above, ... and adding complexity may result in lower
>>> reliability :)
>>> 
>>> Thanks,
>>> Val
>>> 
>>> 
>>> On 05/02/2013 03:41 PM, Otis Gospodnetic wrote:
>>>> Val,
>>>> 
>>>> Haven't seen this mentioned in a while...
>>>> 
>>>> I'm curious...what sort of index, queries, hardware, and latency
>>>> requirements do you have?
>>>> 
>>>> Otis
>>>> Solr & ElasticSearch Support
>>>> http://sematext.com/
>>>> On May 1, 2013 4:36 PM, "Valery Giner" <valgi...@research.att.com> wrote:
>>>> 
>>>>> Dear Solr Developers,
>>>>> 
>>>>> I've been unable to find an answer to the question in the subject line of
>>>>> this e-mail, except of a vague one.
>>>>> 
>>>>> We need to be able to index over 2bln+ documents.   We were doing well
>>>>> without sharding until the number of docs hit the limit ( 2bln+).   The
>>>>> performance was satisfactory for the queries, updates and indexing of new
>>>>> documents.
>>>>> 
>>>>> That is, except for the need to go around the int32 limit, we don't
>>>>> really
>>>>> have a need for setting up distributed solr.
>>>>> 
>>>>> I wonder whether some one on the solr team could tell us when/what
>>>>> version
>>>>> of solr we could expect the limit to be removed.
>>>>> 
>>>>> I hope this question may be of interest to some one else :)
>>>>> 
>>>>> --
>>>>> Thanks,
>>>>> Val
>>>>> 
>>>>> 
> 

--
Walter Underwood
wun...@wunderwood.org



Reply via email to