Re: Solr architecture

Jack Krupansky Mon, 08 Feb 2016 12:42:06 -0800

Oops... at 100 qps for a single node you would need 120 nodes to get to 12K
qps and 800 nodes to get 80K qps, but that is just an extremely rough
ballpark estimate, not some precise and firm number. And that's if all the
queries can be evenly distributed throughout the cluster and don't require
fanout to other shards, which effectively turns each incoming query into n
queries where n is the number of shards.


-- Jack Krupansky

On Mon, Feb 8, 2016 at 12:07 PM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> So is there any aging or TTL (in database terminology) of older docs?
>
> And do all of your queries need to query all of the older documents all of
> the time or is there a clear hierarchy of querying for aged documents, like
> past 24-hours vs. past week vs. past year vs. older than a year? Sure, you
> can always use a function query to boost by the inverse of document age,
> but Solr would be more efficient with filter queries or separate indexes
> for different time scales.
>
> Are documents ever updated or are they write-once?
>
> Are documents explicitly deleted?
>
> Technically you probably could meet those specs, but... how many
> organizations have the resources and the energy to do so?
>
> As a back of the envelope calculation, if Solr gave you 100 queries per
> second per node, that would mean you would need 1,200 nodes. It would also
> depend on whether those queries are very narrow so that a single node can
> execute them or if they require fanout to other shards and then aggregation
> of results from those other shards.
>
> -- Jack Krupansky
>
> On Mon, Feb 8, 2016 at 11:24 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> Short form: You really have to prototype. Here's the long form:
>>
>>
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> I've seen between 20M and 200M docs fit on a single piece of hardware,
>> so you'll absolutely have to shard.
>>
>> And the other thing you haven't told us is whether you plan on
>> _adding_ 2B docs a day or whether that number is the total corpus size
>> and you are re-indexing the 2B docs/day. IOW, if you are adding 2B
>> docs/day, 30 days later do you have 2B docs or 60B docs in your
>> corpus?
>>
>> Best,
>> Erick
>>
>> On Mon, Feb 8, 2016 at 8:09 AM, Susheel Kumar <susheel2...@gmail.com>
>> wrote:
>> > Also if you are expecting indexing of 2 billion docs as NRT or if it
>> will
>> > be offline (during off hours etc).  For more accurate sizing you may
>> also
>> > want to index say 10 million documents which may give you idea how much
>> is
>> > your index size and then use that for extrapolation to come up with
>> memory
>> > requirements.
>> >
>> > Thanks,
>> > Susheel
>> >
>> > On Mon, Feb 8, 2016 at 11:00 AM, Emir Arnautovic <
>> > emir.arnauto...@sematext.com> wrote:
>> >
>> >> Hi Mark,
>> >> Can you give us bit more details: size of docs, query types, are docs
>> >> grouped somehow, are they time sensitive, will they update or it is
>> rebuild
>> >> every time, etc.
>> >>
>> >> Thanks,
>> >> Emir
>> >>
>> >>
>> >> On 08.02.2016 16:56, Mark Robinson wrote:
>> >>
>> >>> Hi,
>> >>> We have a requirement where we would need to index around 2 Billion
>> docs
>> >>> in
>> >>> a day.
>> >>> The queries against this indexed data set can be around 80K queries
>> per
>> >>> second during peak time and during non peak hours around 12K queries
>> per
>> >>> second.
>> >>>
>> >>> Can Solr realize this huge volumes.
>> >>>
>> >>> If so, assuming we have no constraints for budget what would be a
>> >>> recommended Solr set up (number of shards, number of Solr instances
>> >>> etc...)
>> >>>
>> >>> Thanks!
>> >>> Mark
>> >>>
>> >>>
>> >> --
>> >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> >> Solr & Elasticsearch Support * http://sematext.com/
>> >>
>> >>
>>
>
>

Re: Solr architecture

Reply via email to