If people are so gung-ho to go down the "lots on endless pain" rabbit-hole
route by heavily under-configuring their clusters, I guess that's their
choice, but I would strongly advise against it. Sure, a small "the few and
the proud" warhorses can proudly proclaim how they "did it", and a small
number of elite young Turks can probably do it as well, but it's quite the
fool's errand for average developers to try to replicate the "heroic
efforts" of the few.

Rather, "average developers" are well-advised to simply seek "the easy
path" and cease and desist from trying to configure Solr clusters with a
billion documents or more per node, or even 500 million for that matter.
"Just say no" to any demands that you run Solr on so-called "fat nodes".

Go with relatively commodity hardware (e.g., 16-32 GB per node), even if
that means you need a lot more more nodes. Or virtualize fat nodes into a
bunch of skinny nodes if that's all you have to work with.

My bottom line advice: use 100 million documents per node as your baseline
target, and make sure your index fits entirely in memory, with a proof of
concept implementation to validate whether the sweet spot for your
particular data, data model, and application access patterns may be well
above or even below that.

Yes, indeed, sing praises for heroes, but don't kill yourself and drag down
others trying to be one yourself.

</sermon>

-- Jack Krupansky


-- Jack Krupansky

On Tue, Dec 30, 2014 at 11:03 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> bq: I did at some point try to write a long blog entry on Solr
> hardware and setup for non-small corpuses, but have to give up:
>
> Man, this makes me laugh! Oh the memories!
>
> A common question from sales, quite a reasonable one at that; "can we
> have a checklist that we can use to give clients an idea how much
> hardware to buy?". And do note that sales folks are talking to clients
> with all different types and sizes.
>
> I sat down and tried to do this... three separate times. Pretty soon
> I'd get to the point of realizing that the doc was worthless exactly
> because of all the "if this then that" phrases. I guess I can take
> some comfort from the fact that it only took me about an hour the
> third time to remember that it was hopeless, and after that i
> remembered to not even try.
>
> I think that it would be _extremely_ helpful to have a bunch of "war
> stories" to reference. In my experience, people dealing with large
> numbers of documents really are most concerned with whether what
> they're doing is _possible_, and are mostly looking to see if someone
> else has "been there and done that". Of course they'd like all the
> specificity possible, but there's a lot of comfort in knowing
> something similar has been done before.
>
> Best,
> Erick
>
> On Tue, Dec 30, 2014 at 4:43 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
> wrote:
> > Shawn Heisey [apa...@elyograg.org] wrote:
> >> I believe it would be useful to organize a session at Lucene Revolution,
> >> possibly more interactive than a straight presentation, where users with
> >> very large indexes are encouraged to attend.  The point of this session
> >> would be to exchange war stories, configuration requirements, hardware
> >> requirements, and observations.
> >
> > From the perspective of the conference it might tie up a lot of time: If
> we were to get down to the configuration level, one session would not be
> enough. Some sort of pre-conference bar camp might do it? Or maybe even a
> whole pre-conference day?
> >
> > (side-note to the side-note: Living in Europe, going to Lucene/Solr
> Revolution means spending more time on travel than the actual conference -
> extending the activities to 3 days would increase the odds of me going next
> year)
> >
> >> Better documentation for extreme scaling is also a possible outcome.
> >
> > I did at some point try to write a long blog entry on Solr hardware and
> setup for non-small corpuses, but have to give up: There were just too many
> "but if you need to scale X, you might be better off by choosing Y, unless
> your usage is Z". I think multiple detailed descriptions of setups is a
> great starting point. If we get enough of them, some pattern will hopefully
> emerge, although I am afraid that the pattern will be "to get this to work,
> we had to write custom code".
> >
> >> Another idea, not sure if it would be good as an alternate idea or
> >> supplemental, is a less formal gathering, perhaps over a meal or three.
> >
> > Outside of Lucene/Solr Revolution? How would that work geographically?
> >
> > - Toke Eskildsen
>

Reply via email to