My off the cuff thought is that there are significant costs trying to do this that would be paid by 99.999% of setups out there. Also, usually you'll run into other issues (RAM etc) long before you come anywhere close to 2^31 docs.
Lucene/Solr often allocates int[maxDoc] for various operations. when maxDoc approaches 2^31, well memory goes through the roof. Now consider allocating longs instead... which is a long way of saying that I don't really think anyone's going to be working on this any time soon, especially when SolrCloud removes a LOT of the pain /complexity (from a user perspective anyway) from going to a sharded setup... FWIW, Erick On Thu, May 2, 2013 at 1:17 PM, Valery Giner <valgi...@research.att.com> wrote: > Otis, > > The documents themselves are relatively small, tens of fields, only a few of > them could be up to a hundred bytes. > Lunix Servers with relatively large RAM (256), > Minutes on the searches are fine for our purposes, adding a few tens of > millions of records in tens of minutes are also fine. > We had to do some simple tricks for keeping indexing up to speed but nothing > too fancy. > Moving to the sharding adds a layer of complexity which we don't really need > because of the above, ... and adding complexity may result in lower > reliability :) > > Thanks, > Val > > > On 05/02/2013 03:41 PM, Otis Gospodnetic wrote: >> >> Val, >> >> Haven't seen this mentioned in a while... >> >> I'm curious...what sort of index, queries, hardware, and latency >> requirements do you have? >> >> Otis >> Solr & ElasticSearch Support >> http://sematext.com/ >> On May 1, 2013 4:36 PM, "Valery Giner" <valgi...@research.att.com> wrote: >> >>> Dear Solr Developers, >>> >>> I've been unable to find an answer to the question in the subject line of >>> this e-mail, except of a vague one. >>> >>> We need to be able to index over 2bln+ documents. We were doing well >>> without sharding until the number of docs hit the limit ( 2bln+). The >>> performance was satisfactory for the queries, updates and indexing of new >>> documents. >>> >>> That is, except for the need to go around the int32 limit, we don't >>> really >>> have a need for setting up distributed solr. >>> >>> I wonder whether some one on the solr team could tell us when/what >>> version >>> of solr we could expect the limit to be removed. >>> >>> I hope this question may be of interest to some one else :) >>> >>> -- >>> Thanks, >>> Val >>> >>> >