Re: Deployment

amit sehas Sat, 12 Jan 2019 14:09:43 -0800

 Thanks for your response, i am understanding that the application will 
typically send the queries to coordinator nodes,if that coordinator does not 
respond within a certain time frame then the application will probably resend 
the query to a different coordinator node, under the assumption that the 
primary coordinator is no longer alive.
thanks
    On Saturday, January 12, 2019, 11:31:40 AM PST, Andy Tolbert 
<andrew.tolb...@datastax.com> wrote:  
 
 Hi Amit,


a) If queries are submitted to co-ordinator nodes (i assume this includes 
writes as well as reads) then:  -- is this the approach also followed for the 
initial data load? 


Writes get sent to all replica nodes, and then the coordinator responds to the 
client as soon as enough replicas have responded to achieve the configured 
consistency level.
 
  -- some select queries may not have restrictions on all the partition key 
columns, and Cassandra would reject such a query, but we we utilize ALLOW 
FILTER then the query will execute, since there is no way to determine which 
node to send the query to, it will be sent to all the nodes that could 
potentially have results. In such a case it would seem that the co-ordinator 
would gather the results from all the nodes and return it to the application.

Correct, if the data is on multiple ranges, the coordinator will make queries 
to as many replicas needed to cover those ranges and will then gather those 
results.  Using tracing (blog post) is a good way to get insights into what 
replicas are involved in your queries.


Application does not know which nodes may have the data, so it can not directly 
send the data to the right nodes. Even if application had the data, it may not 
be able to perform load balancing.

Most client drivers have a nice optimization called token-aware load balancing 
(i.e. DataStax Java Driver's TokenAwarePolicy), where if the driver is able to 
infer which partition is being accessed, it will prioritize coordinators that 
have that data.   This determination will typically work if all parts of your 
partition key are bind parameters in your statement (requirements).


 Does the coordinator perform load balancing? I imagine it would have to ...

The coordinator utilizes a dynamic snitch to determine where to route read 
queries.
Thanks,Andy

On Sat, Jan 12, 2019 at 9:14 AM amit sehas <cu...@yahoo.com.invalid> wrote:

 Thanks for your response, this leads to some further questions:
a) If queries are submitted to co-ordinator nodes (i assume this includes 
writes as well as reads) then:  -- is this the approach also followed for the 
initial data load?   -- some select queries may not have restrictions on all 
the partition key columns, and Cassandra would reject such a query, but we we 
utilize ALLOW FILTER then the query will execute, since there is no way to 
determine which node to send the query to, it will be sent to all the nodes 
that could potentially have results. In such a case it would seem that the 
co-ordinator would gather the results from all the nodes and return it to the 
application.
b) This seems as if this is a 3 tier architecture. Application sends query to 
coordinator. coordinator sends it to the right nodes.Application does not know 
which nodes may have the data, so it can not directly send the data to the 
right nodes. Even if application had the data, it may not be able to perform 
load balancing. Does the coordinator perform load balancing? I imagine it would 
have to ...
thanks
    On Saturday, January 12, 2019, 3:32:53 AM PST, Rajesh Kishore 
<rajesh10si...@gmail.com> wrote:  
 
 Application would send request to one of the node(called as coordinating node) 
& this coordinating node is aware of where your result lies(considering you 
have modelled your DB correctly, it should not result in scatter& gather kind 
of stuff) and thus delegate the query to respective node, so it does follow 
client server architecture & your assumption is correct.
As per my knowledge , generally application should be unaware where your result 
lies & must not be tied to a specific node because it would have bigger 
implications when stuffs like re-balancing would occur. So, your application 
should be unaware where your data lies (in which node I meant), but obviously 
keeping application in same region as that of cassandra cluster would make 
sense, can't comment much on cloud deployment.
Thanks,Rajesh

On Sat, Jan 12, 2019 at 8:54 AM amit sehas <cu...@yahoo.com.invalid> wrote:

I am new to Cassandra, i am wondering how the Cassandra applications are 
deployed in the cloud. Does Cassandra have a client server architecture and the 
application is deployed as a 3rd tier that sends over queries to the clients, 
which then submit them to the Cassandra servers?  Or does the application 
submit the request directly to any of the Cassandra server which then decides 
where the query will be routed to, and then gathers the response and returns 
that to the application.
Does the application accessing the data get deployed on the same nodes in the 
cloud as the Cassandra cluster itself? Or on separate nodes?  Are there any 
best practices available in this regard?
thanks

Re: Deployment

Reply via email to