Thanks for your response, i am understanding that the application will typically send the queries to coordinator nodes,if that coordinator does not respond within a certain time frame then the application will probably resend the query to a different coordinator node, under the assumption that the primary coordinator is no longer alive. thanks On Saturday, January 12, 2019, 11:31:40 AM PST, Andy Tolbert <andrew.tolb...@datastax.com> wrote: Hi Amit,
a) If queries are submitted to co-ordinator nodes (i assume this includes writes as well as reads) then: -- is this the approach also followed for the initial data load? Writes get sent to all replica nodes, and then the coordinator responds to the client as soon as enough replicas have responded to achieve the configured consistency level. -- some select queries may not have restrictions on all the partition key columns, and Cassandra would reject such a query, but we we utilize ALLOW FILTER then the query will execute, since there is no way to determine which node to send the query to, it will be sent to all the nodes that could potentially have results. In such a case it would seem that the co-ordinator would gather the results from all the nodes and return it to the application. Correct, if the data is on multiple ranges, the coordinator will make queries to as many replicas needed to cover those ranges and will then gather those results. Using tracing (blog post) is a good way to get insights into what replicas are involved in your queries. Application does not know which nodes may have the data, so it can not directly send the data to the right nodes. Even if application had the data, it may not be able to perform load balancing. Most client drivers have a nice optimization called token-aware load balancing (i.e. DataStax Java Driver's TokenAwarePolicy), where if the driver is able to infer which partition is being accessed, it will prioritize coordinators that have that data. This determination will typically work if all parts of your partition key are bind parameters in your statement (requirements). Does the coordinator perform load balancing? I imagine it would have to ... The coordinator utilizes a dynamic snitch to determine where to route read queries. Thanks,Andy On Sat, Jan 12, 2019 at 9:14 AM amit sehas <cu...@yahoo.com.invalid> wrote: Thanks for your response, this leads to some further questions: a) If queries are submitted to co-ordinator nodes (i assume this includes writes as well as reads) then: -- is this the approach also followed for the initial data load? -- some select queries may not have restrictions on all the partition key columns, and Cassandra would reject such a query, but we we utilize ALLOW FILTER then the query will execute, since there is no way to determine which node to send the query to, it will be sent to all the nodes that could potentially have results. In such a case it would seem that the co-ordinator would gather the results from all the nodes and return it to the application. b) This seems as if this is a 3 tier architecture. Application sends query to coordinator. coordinator sends it to the right nodes.Application does not know which nodes may have the data, so it can not directly send the data to the right nodes. Even if application had the data, it may not be able to perform load balancing. Does the coordinator perform load balancing? I imagine it would have to ... thanks On Saturday, January 12, 2019, 3:32:53 AM PST, Rajesh Kishore <rajesh10si...@gmail.com> wrote: Application would send request to one of the node(called as coordinating node) & this coordinating node is aware of where your result lies(considering you have modelled your DB correctly, it should not result in scatter& gather kind of stuff) and thus delegate the query to respective node, so it does follow client server architecture & your assumption is correct. As per my knowledge , generally application should be unaware where your result lies & must not be tied to a specific node because it would have bigger implications when stuffs like re-balancing would occur. So, your application should be unaware where your data lies (in which node I meant), but obviously keeping application in same region as that of cassandra cluster would make sense, can't comment much on cloud deployment. Thanks,Rajesh On Sat, Jan 12, 2019 at 8:54 AM amit sehas <cu...@yahoo.com.invalid> wrote: I am new to Cassandra, i am wondering how the Cassandra applications are deployed in the cloud. Does Cassandra have a client server architecture and the application is deployed as a 3rd tier that sends over queries to the clients, which then submit them to the Cassandra servers? Or does the application submit the request directly to any of the Cassandra server which then decides where the query will be routed to, and then gathers the response and returns that to the application. Does the application accessing the data get deployed on the same nodes in the cloud as the Cassandra cluster itself? Or on separate nodes? Are there any best practices available in this regard? thanks