Re: Scaling with SQL query

dkarachentsev Wed, 27 Jun 2018 02:23:21 -0700

Hi,

Slight degradation is expected in some cases. Let me explain how it works.
1) Client sends request to each node (if you have query parallelism > 1 than
number of requests multiplied by that num).
2) Each node runs that query against it's local dataset.
3) Each node responses with 100 entries.
4) Client collects all responses and performs reduce.


So what happens when you add node? First of all dataset splits between
larger number of nodes, but if dataset is too small you will not see any
difference in query processing, or if newly added node does not
significantly reduces amount of data for each other node. F.e. you have 9
nodes and add one more. Each node looses no more than 10% of data. In case
of small dataset it will not give you any performance boost.

In the other hand, client has to send more requests and reduce more data.
For instance, with 9 nodes it receives 900 entries, with 10 nodes - 1K
entries. Again, if dataset is relatively small you get overhead on client
for additional requests/responses and data.

The best scaling show queries by primary key, because in that case client
can send request to affinity node directly without broadcasting to all
nodes.

So when can you get scaling profit for SQL?
1) You have a very large dataset. Each node will process less data and they
will do it in parallel. Here boost for each node will beat additional
overhead on client.
2) You add more clients that run queries in parallel. Total throughput
increases because request/response overhead will be divided between larger
number of clients. (Or you can set more connections per node to better
utilize client machine resources).
3) You query for primary key.

Please note one more thing, that overall latency depends on how fast the
slowest node, because client will wait all responses.

Thanks!
-Dmitry



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Scaling with SQL query

Reply via email to