subject:"Experience with indexing billions of documents\?"

Re: Experience with indexing billions of documents?

2010-04-14 Thread Jason Rutherglen

Tom, Yes, we've (Biz360) indexed 3 billion and upwards... If indexing is the issue (or rather re-indexing) we used SOLR-1301 with Hadoop to re-index efficiently (ie, in a timely manner). For querying we're currently using the out of the box Solr distributed shards query mechanism, which is hard (r

Re: Experience with indexing billions of documents?

2010-04-13 Thread Thomas Koch

Bradford Stephens: > Hey there, > > We've actually been tackling this problem at Drawn to Scale. We'd really > like to get our hands on LuceHBase to see how it scales. Our faceting still > needs to be done in-memory, which is kinda tricky, but it's worth > exploring. Hi Bradford, thank you for yo

Re: Experience with indexing billions of documents?

2010-04-13 Thread Bradford Stephens

Hey there, We've actually been tackling this problem at Drawn to Scale. We'd really like to get our hands on LuceHBase to see how it scales. Our faceting still needs to be done in-memory, which is kinda tricky, but it's worth exploring. On Mon, Apr 12, 2010 at 7:27 AM, Thomas Koch wrote: > Hi,

Re: Experience with indexing billions of documents?

2010-04-12 Thread Thomas Koch

Hi, could I interest you in this project? http://github.com/thkoch2001/lucehbase The aim is to store the index directly in HBase, a database system modelled after google's Bigtable to store data in the regions of tera or petabytes. Best regards, Thomas Koch Lance Norskog: > The 2B limitation i

Re: Experience with indexing billions of documents?

2010-04-05 Thread Lance Norskog

The 2B limitation is within one shard, due to using a signed 32-bit integer. There is no limit in that regard in sharding- Distributed Search uses the stored unique document id rather than the internal docid. On Fri, Apr 2, 2010 at 10:31 AM, Rich Cariens wrote: > A colleague of mine is using nati

Re: Experience with indexing billions of documents?

2010-04-02 Thread Rich Cariens

A colleague of mine is using native Lucene + some home-grown patches/optimizations to index over 13B small documents in a 32-shard environment, which is around 406M docs per shard. If there's a 2B doc id limitation in Lucene then I assume he's patched it himself. On Fri, Apr 2, 2010 at 1:17 PM,

Re: Experience with indexing billions of documents?

2010-04-02 Thread Peter Sturge

You can do this today with multiple indexes, replication and distributed searching. SolrCloud/clustering will certainly make life easier when it comes to managing these, but with distributed searches over multiple indexes, you're limited only by how much hardware you can throw at it. On Fri, Apr

Re: Experience with indexing billions of documents?

2010-04-02 Thread darren

My guess is that you will need to take advantage of Solr 1.5's upcoming cloud/cluster renovations and use multiple indexes to comfortably achieve those numbers. Hypthetically, in that case, you won't be limited by single index docid limitations of Lucene. > We are currently indexing 5 million book

Experience with indexing billions of documents?

2010-04-02 Thread Burton-West, Tom

We are currently indexing 5 million books in Solr, scaling up over the next few years to 20 million. However we are using the entire book as a Solr document. We are evaluating the possibility of indexing individual pages as there are some use cases where users want the most relevant pages rega

Re: Experience with indexing billions of documents?

Re: Experience with indexing billions of documents?

Re: Experience with indexing billions of documents?

Re: Experience with indexing billions of documents?

Re: Experience with indexing billions of documents?

Re: Experience with indexing billions of documents?

Re: Experience with indexing billions of documents?

Re: Experience with indexing billions of documents?

Experience with indexing billions of documents?

9 matches

Site Navigation

Mail list logo

Footer information