What are you doing that start=500000 is normal? --wunder On Jul 5, 2013, at 1:28 PM, Valery Giner wrote:
> Eric, > > We did not have any RAM problems, but just the following official limitation > makes our life too miserable to use the shards: > > "Makes it more inefficient to use a high "start" parameter. For example, if > you request start=500000&rows=25 on an index with 500,000+ docs per shard, > this will currently result in 500,000 records getting sent over the network > from the shard to the coordinating Solr instance. If you had a single-shard > index, in contrast, only 25 records would ever get sent over the network. > (Granted, setting start this high is not something many people need to do.) " > http://wiki.apache.org/solr/DistributedSearch > > Reading millions of documents as a result of a query is a "normal" use case > for us, not a "design defect". Subdividing the "large" indexes into smaller > ones seems too ugly to use as a way to scale up. This turns solr from a > perfect solution for us into something unacceptable for such cases. > > I wonder whether any one else has similar use cases/problem with sharding. > > Thanks, > Val > > On 05/03/2013 12:10 PM, Erick Erickson wrote: >> My off the cuff thought is that there are significant costs trying to >> do this that would be paid by 99.999% of setups out there. Also, >> usually you'll run into other issues (RAM etc) long before you come >> anywhere close to 2^31 docs. >> >> Lucene/Solr often allocates int[maxDoc] for various operations. when >> maxDoc approaches 2^31, well memory goes through the roof. Now >> consider allocating longs instead... >> >> which is a long way of saying that I don't really think anyone's going >> to be working on this any time soon, especially when SolrCloud removes >> a LOT of the pain /complexity (from a user perspective anyway) from >> going to a sharded setup... >> >> FWIW, >> Erick >> >> On Thu, May 2, 2013 at 1:17 PM, Valery Giner <valgi...@research.att.com> >> wrote: >>> Otis, >>> >>> The documents themselves are relatively small, tens of fields, only a few of >>> them could be up to a hundred bytes. >>> Lunix Servers with relatively large RAM (256), >>> Minutes on the searches are fine for our purposes, adding a few tens of >>> millions of records in tens of minutes are also fine. >>> We had to do some simple tricks for keeping indexing up to speed but nothing >>> too fancy. >>> Moving to the sharding adds a layer of complexity which we don't really need >>> because of the above, ... and adding complexity may result in lower >>> reliability :) >>> >>> Thanks, >>> Val >>> >>> >>> On 05/02/2013 03:41 PM, Otis Gospodnetic wrote: >>>> Val, >>>> >>>> Haven't seen this mentioned in a while... >>>> >>>> I'm curious...what sort of index, queries, hardware, and latency >>>> requirements do you have? >>>> >>>> Otis >>>> Solr & ElasticSearch Support >>>> http://sematext.com/ >>>> On May 1, 2013 4:36 PM, "Valery Giner" <valgi...@research.att.com> wrote: >>>> >>>>> Dear Solr Developers, >>>>> >>>>> I've been unable to find an answer to the question in the subject line of >>>>> this e-mail, except of a vague one. >>>>> >>>>> We need to be able to index over 2bln+ documents. We were doing well >>>>> without sharding until the number of docs hit the limit ( 2bln+). The >>>>> performance was satisfactory for the queries, updates and indexing of new >>>>> documents. >>>>> >>>>> That is, except for the need to go around the int32 limit, we don't >>>>> really >>>>> have a need for setting up distributed solr. >>>>> >>>>> I wonder whether some one on the solr team could tell us when/what >>>>> version >>>>> of solr we could expect the limit to be removed. >>>>> >>>>> I hope this question may be of interest to some one else :) >>>>> >>>>> -- >>>>> Thanks, >>>>> Val >>>>> >>>>> > -- Walter Underwood wun...@wunderwood.org