A colleague of mine is using native Lucene + some home-grown patches/optimizations to index over 13B small documents in a 32-shard environment, which is around 406M docs per shard.
If there's a 2B doc id limitation in Lucene then I assume he's patched it himself. On Fri, Apr 2, 2010 at 1:17 PM, <dar...@ontrenet.com> wrote: > My guess is that you will need to take advantage of Solr 1.5's upcoming > cloud/cluster renovations and use multiple indexes to comfortably achieve > those numbers. Hypthetically, in that case, you won't be limited by single > index docid limitations of Lucene. > > > We are currently indexing 5 million books in Solr, scaling up over the > > next few years to 20 million. However we are using the entire book as a > > Solr document. We are evaluating the possibility of indexing individual > > pages as there are some use cases where users want the most relevant > pages > > regardless of what book they occur in. However, we estimate that we are > > talking about somewhere between 1 and 6 billion pages and have concerns > > over whether Solr will scale to this level. > > > > Does anyone have experience using Solr with 1-6 billion Solr documents? > > > > The lucene file format document > > (http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations) > > mentions a limit of about 2 billion document ids. I assume this is the > > lucene internal document id and would therefore be a per index/per shard > > limit. Is this correct? > > > > > > Tom Burton-West. > > > > > > > > > >