Hi,

could I interest you in this project?
http://github.com/thkoch2001/lucehbase

The aim is to store the index directly in HBase, a database system modelled 
after google's Bigtable to store data in the regions of tera or petabytes.

Best regards, Thomas Koch

Lance Norskog:
> The 2B limitation is within one shard, due to using a signed 32-bit
> integer. There is no limit in that regard in sharding- Distributed
> Search uses the stored unique document id rather than the internal
> docid.
> 
> On Fri, Apr 2, 2010 at 10:31 AM, Rich Cariens <richcari...@gmail.com> wrote:
> > A colleague of mine is using native Lucene + some home-grown
> > patches/optimizations to index over 13B small documents in a 32-shard
> > environment, which is around 406M docs per shard.
> >
> > If there's a 2B doc id limitation in Lucene then I assume he's patched it
> > himself.
> >
> > On Fri, Apr 2, 2010 at 1:17 PM, <dar...@ontrenet.com> wrote:
> >> My guess is that you will need to take advantage of Solr 1.5's upcoming
> >> cloud/cluster renovations and use multiple indexes to comfortably
> >> achieve those numbers. Hypthetically, in that case, you won't be limited
> >> by single index docid limitations of Lucene.
> >>
> >> > We are currently indexing 5 million books in Solr, scaling up over the
> >> > next few years to 20 million.  However we are using the entire book as
> >> > a Solr document.  We are evaluating the possibility of indexing
> >> > individual pages as there are some use cases where users want the most
> >> > relevant
> >>
> >> pages
> >>
> >> > regardless of what book they occur in.  However, we estimate that we
> >> > are talking about somewhere between 1 and 6 billion pages and have
> >> > concerns over whether Solr will scale to this level.
> >> >
> >> > Does anyone have experience using Solr with 1-6 billion Solr
> >> > documents?
> >> >
> >> > The lucene file format document
> >> > (http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations)
> >> > mentions a limit of about 2 billion document ids.   I assume this is
> >> > the lucene internal document id and would therefore be a per index/per
> >> > shard limit.  Is this correct?
> >> >
> >> >
> >> > Tom Burton-West.
> 

Thomas Koch, http://www.koch.ro

Reply via email to