wait, how is that a benefit? isn't that a bad thing if you are saying colocating leads to more latency and overall execution time is longer?
On Sat, Dec 3, 2016 at 12:34 AM, vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > You get more latency on reads so overall execution time is longer > > Le 3 déc. 2016 7:39 AM, "kant kodali" <kanth...@gmail.com> a écrit : > >> >> I wonder what benefits do I really I get If I colocate my spark worker >> process and Cassandra server process on each node? >> >> I understand the concept of moving compute towards the data instead of >> moving data towards computation but It sounds more like one is trying to >> optimize for network latency. >> >> Majority of my nodes (m4.xlarge) have 1Gbps = 125MB/s (Megabytes per >> second) Network throughput. >> >> and the DISK throughput for m4.xlarge is 93.75 MB/s (link below) >> >> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html >> >> so In this case I don't see how colocation can help even if there is one >> to one mapping from spark worker node to a colocated Cassandra node where >> say we are doing a table scan of billion rows ? >> >> Thanks! >> >>