hmm GCE pretty much seems to follow the same model as AWS. On Sat, Dec 3, 2016 at 1:22 AM, kant kodali <kanth...@gmail.com> wrote:
> GCE seems to have better options. Any one had any experience with GCE? > > On Sat, Dec 3, 2016 at 1:16 AM, Manish Malhotra < > manish.malhotra.w...@gmail.com> wrote: > >> thanks for sharing number as well ! >> >> Now a days even network can be with very high throughput, and might out >> perform the disk, but as Sean mentioned data on network will have other >> dependencies like network hops, like if its across rack, which can have >> switch in between. >> >> But yes people are discussing and talking about Mesos + high performance >> network and not worried about the colocation for various use cases. >> >> AWS emphmerial is not good for reliable storage file system, EBS is the >> expensive alternative :) >> >> On Sat, Dec 3, 2016 at 1:12 AM, kant kodali <kanth...@gmail.com> wrote: >> >>> Thanks Sean! Just for the record I am currently seeing 95 MB/s RX >>> (Receive throughput ) on my spark worker machine when I do `sudo iftop -B` >>> >>> The problem with instance store on AWS is that they all are ephemeral so >>> placing Cassandra on top doesn't make a lot of sense. so In short, AWS >>> doesn't seem to be the right place for colocating in theory. I would still >>> give you the benefit of doubt and colocate :) but just the numbers are not >>> reflecting significant margins in terms of performance gains for AWS >>> >>> >>> On Sat, Dec 3, 2016 at 12:56 AM, Sean Owen <so...@cloudera.com> wrote: >>> >>>> I'm sure he meant that this is downside to not colocating. >>>> You are asking the right question. While networking is traditionally >>>> much slower than disk, that changes a bit in the cloud, where attached >>>> storage is remote too. >>>> The disk throughput here is mostly achievable in normal workloads. >>>> However I think you'll find it's going to be much harder to get 1Gbps out >>>> of network transfers. That's just the speed of the local interface, and of >>>> course the transfer speed depends on hops across the network beyond that. >>>> Network latency is going to be higher than disk too, though that's not as >>>> much an issue in this context. >>>> >>>> On Sat, Dec 3, 2016 at 8:42 AM kant kodali <kanth...@gmail.com> wrote: >>>> >>>>> wait, how is that a benefit? isn't that a bad thing if you are saying >>>>> colocating leads to more latency and overall execution time is longer? >>>>> >>>>> On Sat, Dec 3, 2016 at 12:34 AM, vincent gromakowski < >>>>> vincent.gromakow...@gmail.com> wrote: >>>>> >>>>> You get more latency on reads so overall execution time is longer >>>>> >>>>> Le 3 déc. 2016 7:39 AM, "kant kodali" <kanth...@gmail.com> a écrit : >>>>> >>>>> >>>>> I wonder what benefits do I really I get If I colocate my spark worker >>>>> process and Cassandra server process on each node? >>>>> >>>>> I understand the concept of moving compute towards the data instead of >>>>> moving data towards computation but It sounds more like one is trying to >>>>> optimize for network latency. >>>>> >>>>> Majority of my nodes (m4.xlarge) have 1Gbps = 125MB/s (Megabytes per >>>>> second) Network throughput. >>>>> >>>>> and the DISK throughput for m4.xlarge is 93.75 MB/s (link below) >>>>> >>>>> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html >>>>> >>>>> so In this case I don't see how colocation can help even if there is >>>>> one to one mapping from spark worker node to a colocated Cassandra node >>>>> where say we are doing a table scan of billion rows ? >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> >>> >> >