Hello Ephraim,

thank you so much for the great Document/Scaling-Concept!!

First I think you really should publish this on the solr wiki. This approach
is nowhere documented there and not really obvious for newbies and your
document is great and explains this very well!

Please allow me to further questions regarding your document:
1.) Is it correct, that you mean by "DB" the Origin-Data-Source of the data
that is fed into the Solr "Cloud" for searching?

2.) Solr Aggregator: This term did not yeald any google results, but is a
very important aspect of your design (and this was the missing piece for me
when thinking about solr architectures): Is it cocrrec that the
"aggregators" are simply tomcat instances, with the solr webapp deployed?
These Aggregators do not have their own index but only run the solr webapp
and I access them via the ?shard= parameter giving the shards I want to
query? (So in the end they aggreate the data of the shards but do not have
their own data). This is really an important aspect that is not documented
well enough in the solr documentation.

Thank you very much!
Jens


2011/4/5 Ephraim Ofir <ephra...@icq.com>

> of course the attachment didn't get to the list, so here it is if you
> want it...
>
> Ephraim Ofir
>
>
> -----Original Message-----
> From: Ephraim Ofir
> Sent: Tuesday, April 05, 2011 10:20 AM
> To: 'solr-user@lucene.apache.org'
> Subject: RE: Very very large scale Solr Deployment = how to do (Expert
> Question)?
>
> I'm not sure about the scale you're aiming for, but you probably want to
> do both sharding and replication.  There's no central server which would
> be the bottleneck. The guidelines should probably be something like:
> 1. Split your index to enough shards so it can keep up with the update
> rate.
> 2. Have enough replicates of each shard master to keep up with the rate
> of queries.
> 3. Have enough aggregators in front of the shard replicates so the
> aggregation doesn't become a bottleneck.
> 4. Make sure you have good load balancing across your system.
>
> Attached is a diagram of the setup we have.  You might want to look into
> SolrCloud as well.
>
> Ephraim Ofir
>
>
> -----Original Message-----
> From: Jens Mueller [mailto:supidupi...@googlemail.com]
> Sent: Tuesday, April 05, 2011 4:25 AM
> To: solr-user@lucene.apache.org
> Subject: Very very large scale Solr Deployment = how to do (Expert
> Question)?
>
> Hello Experts,
>
>
>
> I am a Solr newbie but read quite a lot of docs. I still do not
> understand what would be the best way to setup very large scale
> deployments:
>
>
>
> Goal (threoretical):
>
>  A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size)
>
>  B) Queries: 100000 Queries/ per Second
>
>  C) Updates: 100000 Updates / per Second
>
>
>
>
> Solr offers:
>
> 1.)    Replication => Scales Well for B)  BUT  A) and C) are not
> satisfied
>
>
> 2.)    Sharding => Scales well for A) BUT B) and C) are not satisfied
> (=> As
> I understand the Sharding approach all goes through a central server,
> that dispatches the updates and assembles the quries retrieved from the
> different shards. But this central server has also some capacity
> limits...)
>
>
>
>
> What is the right approach to handle such large deployments? I would be
> thankfull for just a rough sketch of the concepts so I can
> experiment/search further...
>
>
> Maybe I am missing something very trivial as I think some of the "Solr
> Users/Use Cases" on the homepage are that kind of large deployments. How
> are they implemented?
>
>
>
> Thanky very much!!!
>
> Jens
>

Reply via email to