Re: Allow Join over two sharded collection

Susheel Kumar Sat, 01 Jul 2017 17:03:04 -0700

Depending on your use case people also use collection aliasing for time
series data.  See below


https://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/

On Sat, Jul 1, 2017 at 7:13 PM, Susheel Kumar <susheel2...@gmail.com> wrote:

> As Eric said 1docs/month isn't a big deal.  I have 45+ million docs in one
> shard but YMMV depending on other factors.
>
> Also there is lot of confusion in the terminology. The default routing is
> compositeID routing.  The implicit routing which Eric mentioned is the
> manual routing.  https://issues.apache.org/jira/browse/SOLR-6630
>
> Which routing you are suggesting to use? Can you clarify again.  Also
> what's your exact use case.  Do you query old aged documents or you don't
> need to and most or all of your queries are supposed to go to shard with
> newer documents.
>
> Thanks,
> Susheel
>
> On Sat, Jul 1, 2017 at 12:14 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> 1M docs/month shouldn't make Solr break a sweat. If it really worries
>> you and you're indexing in a big batch, index during off hours. At
>> very worst, if you're ingesting them all at once you might have to
>> throttle the indexing a bit.
>>
>> Frankly, most of the time acquiring the documents from the system of
>> record is where the bottleneck is and Solr easily handles the indexing
>> load.
>>
>> The other advantage is that if you use implicit routing rather than a
>> composite ID, you can add shards to your collection one at a time as
>> required, for time-series data that's an elegant way to "age out" old
>> documents.
>>
>> Best,
>> Erick
>>
>> On Sat, Jul 1, 2017 at 8:57 AM, mganeshs <mgane...@live.in> wrote:
>> > Hi Susheel,
>> >
>> > Currently we have around 20M documents already and we are expecting now
>> on
>> > that every month 1M of documents.
>> > The reason why don't want to for time based implicit routing is that,
>> all
>> > documents will end up with recent shard and so indexing will be heavy
>> for
>> > the new shard, where as older shards will be used just for query
>> purpose.
>> > If we have default sharding, then load for indexing is distributed
>> across
>> > all the shards. That's the reason we would like to stick to default
>> > sharding. But Join is the issue over here when default sharding is used
>> :-(
>> >
>> >
>> >
>> > --
>> > View this message in context: http://lucene.472066.n3.nabble
>> .com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>

Re: Allow Join over two sharded collection

Reply via email to