On Fri, Dec 16, 2016 at 12:39 PM, GW <thegeofo...@gmail.com> wrote:

> Dorian,
>
> From my reading, my belief is that you just need some beefy machines for
> your zookeeper ensemble so they can think fast.

Zookeeper need to think fast enough for cluster state/changes. So I think
it scales with the number of machines/collections/shards and not documents.

> After that your issues are
> complicated by drive I/O which I believe is solved by using shards. If you
> have a collection running on top of a single drive array it should not
> compare to writing to a dozen drive arrays. So a whole bunch of light duty
> machines that have a decent amount of memory and barely able process faster
> than their drive I/O will serve you better.
>
My dataset will be lower than total memory, so I expect no query to hit
disk.

>
> I think the Apache big data mandate was to be horizontally scalable to
> infinity with cheap consumer hardware. In my minds eye you are not going to
> get crazy input rates without a big horizontal drive system.
>
There is overhead with small machines, and with very big machines (pricy).
So something in the middle.
So small cluster of big machines or big cluster of small machines.

>
> I'm in the same boat. All the scaling and roll out documentation seems to
> reference the Witch Doctor's secret handbook.
>
> I just started into making my applications ZK aware and really just
> starting to understand the architecture. After a whole year I still feel
> weak while at the same time I have traveled far. I still feel like an
> amateur.
>
> My plans are to use bridge tools in Linux so all my machines are sitting on
> the switch with layer 2. Then use Conga to monitor which apps need to be
> running. If a server dies, it's apps are spun up on one of the other
> servers using the original IP and mac address through a bridge firewall
> gateway so there is no hold up with with mac phreaking like layer 3. Layer
> 3 does not like to see a route change with a mac address. My apps will be
> on a SAN ~ Data on as many shards/machines as financially possible.
>
By conga you mean https://sourceware.org/cluster/conga/spec/ ?
Also SAN may/will suck like someone answered in your thread.

>
> I was going to put a bunch of Apache web servers in round robin to talk to
> Solr but discovered that a Solr node can be dead and not report errors.
>
Please explain more "dead but no error".

> It's all rough at the moment but it makes total sense to send Solr requests
> based on what ZK says is available verses a round robin.
>
Yes, like I&other commenter wrote on your thread.

>
> Will keep you posted on my roll out if you like.
>
> Best,
>
> GW
>
>
>
>
>
>
>
> On 16 December 2016 at 03:31, Dorian Hoxha <dorian.ho...@gmail.com> wrote:
>
> > Hello searchers,
> >
> > I'm researching solr for a project that would require a
> max-inserts(10M/s)
> > and some heavy facet+fq on top of that, though on low qps.
> >
> > And I'm trying to find blogs/slides where people have used some big
> > machines instead of hundreds of small ones.
> >
> > 1. Largest I've found is this
> > <https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-
> > 4-machines-1-solrcloud/>
> > with 16cores + 384GB ram but they were using 25! solr4 instances / server
> > which seems wasteful to me ?
> >
> > I know that 1 solr can have max ~29-30GB heap because GC is
> wasteful/sucks
> > after that, and you should leave the other amount to the os for
> file-cache.
> > 2. But do you think 1 instance will be able to fully-use a 256GB/20core
> > machine ?
> >
> > 3. Like to share your findings/links with big-machine clusters ?
> >
> > Thank You
> >
>

Reply via email to