Re: any plans to remove int32 limitation on the number of the documents in the index?

Erick Erickson Fri, 03 May 2013 09:11:07 -0700

My off the cuff thought is that there are significant costs trying to
do this that would be paid by 99.999% of setups out there. Also,
usually you'll run into other issues (RAM etc) long before you come
anywhere close to 2^31 docs.


Lucene/Solr often allocates int[maxDoc] for various operations. when
maxDoc approaches 2^31, well memory goes through the roof. Now
consider allocating longs instead...

which is a long way of saying that I don't really think anyone's going
to be working on this any time soon, especially when SolrCloud removes
a LOT of the pain /complexity (from a user perspective anyway) from
going to a sharded setup...

FWIW,
Erick

On Thu, May 2, 2013 at 1:17 PM, Valery Giner <valgi...@research.att.com> wrote:
> Otis,
>
> The documents themselves are relatively small, tens of fields, only a few of
> them could be up to a hundred bytes.
> Lunix Servers with relatively large RAM (256),
> Minutes on the searches are fine for our purposes,  adding a few tens of
> millions of records in tens of minutes are also fine.
> We had to do some simple tricks for keeping indexing up to speed but nothing
> too fancy.
> Moving to the sharding adds a layer of complexity which we don't really need
> because of the above, ... and adding complexity may result in lower
> reliability :)
>
> Thanks,
> Val
>
>
> On 05/02/2013 03:41 PM, Otis Gospodnetic wrote:
>>
>> Val,
>>
>> Haven't seen this mentioned in a while...
>>
>> I'm curious...what sort of index, queries, hardware, and latency
>> requirements do you have?
>>
>> Otis
>> Solr & ElasticSearch Support
>> http://sematext.com/
>> On May 1, 2013 4:36 PM, "Valery Giner" <valgi...@research.att.com> wrote:
>>
>>> Dear Solr Developers,
>>>
>>> I've been unable to find an answer to the question in the subject line of
>>> this e-mail, except of a vague one.
>>>
>>> We need to be able to index over 2bln+ documents.   We were doing well
>>> without sharding until the number of docs hit the limit ( 2bln+).   The
>>> performance was satisfactory for the queries, updates and indexing of new
>>> documents.
>>>
>>> That is, except for the need to go around the int32 limit, we don't
>>> really
>>> have a need for setting up distributed solr.
>>>
>>> I wonder whether some one on the solr team could tell us when/what
>>> version
>>> of solr we could expect the limit to be removed.
>>>
>>> I hope this question may be of interest to some one else :)
>>>
>>> --
>>> Thanks,
>>> Val
>>>
>>>
>

Re: any plans to remove int32 limitation on the number of the documents in the index?

Reply via email to