"500B" - as in 500,000,000,000? Really?

-- Jack Krupansky

-----Original Message----- From: tomasv
Sent: Friday, July 18, 2014 8:18 PM
To: solr-user@lucene.apache.org
Subject: shards as subset of All Shards

Hello, This is kind of weird, but here goes:

We are setting up a document repository (SOLR4). This will be a large (to
us) repository of approximately 500B documents. The documents are based on
"people".

Once all my documents are uploaded, we will receive new (follow-up)
information on our "people" every month (or so).

Our client facing application has two modes "all inclusive data" or "recent
data".
We want the "recent data" mode to query against the data in the follow-up
information only. We want the "all inclusive" mode to query against the
initial load AND the follow-up data.

We currently have 30 shards with 2 replicas of each shard (60 shards total)
in a SOLR cloud setup including a Zookeeper. This is currently hosting our
data in what will become the "all inclusive" query.

What is the best approach to to a requirement such as this? (Probably not
celar enough??)
(I'm a newbie so please bear with my questions! :-)  )
1. Should we create two separate collections ("initial" and "followup")? And
then have the front end app query against each collection as needed?
2. Is it possible to index the follow-up records to specific shards and then
query those specific shards when the client is in "follow up" mode? Will a
"all inclusive" include the followup shards?
3. Is it possible for one collection to be a subset of a larger collection?

I realize this is quite "fuzzy", but any insights are appreciated.

-tomas






--
View this message in context: http://lucene.472066.n3.nabble.com/shards-as-subset-of-All-Shards-tp4147998.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to