GCE seems to have better options. Any one had any experience with GCE? On Sat, Dec 3, 2016 at 1:16 AM, Manish Malhotra < manish.malhotra.w...@gmail.com> wrote:
> thanks for sharing number as well ! > > Now a days even network can be with very high throughput, and might out > perform the disk, but as Sean mentioned data on network will have other > dependencies like network hops, like if its across rack, which can have > switch in between. > > But yes people are discussing and talking about Mesos + high performance > network and not worried about the colocation for various use cases. > > AWS emphmerial is not good for reliable storage file system, EBS is the > expensive alternative :) > > On Sat, Dec 3, 2016 at 1:12 AM, kant kodali <kanth...@gmail.com> wrote: > >> Thanks Sean! Just for the record I am currently seeing 95 MB/s RX >> (Receive throughput ) on my spark worker machine when I do `sudo iftop -B` >> >> The problem with instance store on AWS is that they all are ephemeral so >> placing Cassandra on top doesn't make a lot of sense. so In short, AWS >> doesn't seem to be the right place for colocating in theory. I would still >> give you the benefit of doubt and colocate :) but just the numbers are not >> reflecting significant margins in terms of performance gains for AWS >> >> >> On Sat, Dec 3, 2016 at 12:56 AM, Sean Owen <so...@cloudera.com> wrote: >> >>> I'm sure he meant that this is downside to not colocating. >>> You are asking the right question. While networking is traditionally >>> much slower than disk, that changes a bit in the cloud, where attached >>> storage is remote too. >>> The disk throughput here is mostly achievable in normal workloads. >>> However I think you'll find it's going to be much harder to get 1Gbps out >>> of network transfers. That's just the speed of the local interface, and of >>> course the transfer speed depends on hops across the network beyond that. >>> Network latency is going to be higher than disk too, though that's not as >>> much an issue in this context. >>> >>> On Sat, Dec 3, 2016 at 8:42 AM kant kodali <kanth...@gmail.com> wrote: >>> >>>> wait, how is that a benefit? isn't that a bad thing if you are saying >>>> colocating leads to more latency and overall execution time is longer? >>>> >>>> On Sat, Dec 3, 2016 at 12:34 AM, vincent gromakowski < >>>> vincent.gromakow...@gmail.com> wrote: >>>> >>>> You get more latency on reads so overall execution time is longer >>>> >>>> Le 3 déc. 2016 7:39 AM, "kant kodali" <kanth...@gmail.com> a écrit : >>>> >>>> >>>> I wonder what benefits do I really I get If I colocate my spark worker >>>> process and Cassandra server process on each node? >>>> >>>> I understand the concept of moving compute towards the data instead of >>>> moving data towards computation but It sounds more like one is trying to >>>> optimize for network latency. >>>> >>>> Majority of my nodes (m4.xlarge) have 1Gbps = 125MB/s (Megabytes per >>>> second) Network throughput. >>>> >>>> and the DISK throughput for m4.xlarge is 93.75 MB/s (link below) >>>> >>>> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html >>>> >>>> so In this case I don't see how colocation can help even if there is >>>> one to one mapping from spark worker node to a colocated Cassandra node >>>> where say we are doing a table scan of billion rows ? >>>> >>>> Thanks! >>>> >>>> >>>> >> >