Tom,
Yes, we've (Biz360) indexed 3 billion and upwards... If indexing
is the issue (or rather re-indexing) we used SOLR-1301 with
Hadoop to re-index efficiently (ie, in a timely manner). For
querying we're currently using the out of the box Solr
distributed shards query mechanism, which is hard (r
Bradford Stephens:
> Hey there,
>
> We've actually been tackling this problem at Drawn to Scale. We'd really
> like to get our hands on LuceHBase to see how it scales. Our faceting still
> needs to be done in-memory, which is kinda tricky, but it's worth
> exploring.
Hi Bradford,
thank you for yo
Hey there,
We've actually been tackling this problem at Drawn to Scale. We'd really
like to get our hands on LuceHBase to see how it scales. Our faceting still
needs to be done in-memory, which is kinda tricky, but it's worth
exploring.
On Mon, Apr 12, 2010 at 7:27 AM, Thomas Koch wrote:
> Hi,
Hi,
could I interest you in this project?
http://github.com/thkoch2001/lucehbase
The aim is to store the index directly in HBase, a database system modelled
after google's Bigtable to store data in the regions of tera or petabytes.
Best regards, Thomas Koch
Lance Norskog:
> The 2B limitation i
The 2B limitation is within one shard, due to using a signed 32-bit
integer. There is no limit in that regard in sharding- Distributed
Search uses the stored unique document id rather than the internal
docid.
On Fri, Apr 2, 2010 at 10:31 AM, Rich Cariens wrote:
> A colleague of mine is using nati
A colleague of mine is using native Lucene + some home-grown
patches/optimizations to index over 13B small documents in a 32-shard
environment, which is around 406M docs per shard.
If there's a 2B doc id limitation in Lucene then I assume he's patched it
himself.
On Fri, Apr 2, 2010 at 1:17 PM,
You can do this today with multiple indexes, replication and distributed
searching.
SolrCloud/clustering will certainly make life easier when it comes to
managing these,
but with distributed searches over multiple indexes, you're limited only by
how much hardware you can throw at it.
On Fri, Apr
My guess is that you will need to take advantage of Solr 1.5's upcoming
cloud/cluster renovations and use multiple indexes to comfortably achieve
those numbers. Hypthetically, in that case, you won't be limited by single
index docid limitations of Lucene.
> We are currently indexing 5 million book
We are currently indexing 5 million books in Solr, scaling up over the next few
years to 20 million. However we are using the entire book as a Solr document.
We are evaluating the possibility of indexing individual pages as there are
some use cases where users want the most relevant pages rega