Continued performance issues on a small EC2 Spark cluster

2013-11-13 Thread Gary Malouf
Hi, We have an HDFS set up of a namenode and three datanodes all on EC2 larges. One of our data partitions basically has files that are fed from a few Flume instances rolling *hourly*. This equates to around 3 4-8mb files per hour right now Our Mesos cluster consists of a Master and the three s

Re: Continued performance issues on a small EC2 Spark cluster

2013-11-14 Thread Gary Malouf
I bring this up because the performance we are seeing is dreadful. From cpu usage, it appears the issue is the spark shell cpu power. We have increased this node from a EC2 medium to an xl, we are seeing slightly better performance but still not great. My understanding of Spark was that most of

Re: Continued performance issues on a small EC2 Spark cluster

2013-11-15 Thread Gary Malouf
Hi everyone, I want to share what helped us resolve the issue short term and also our concerns longer term. *Some Background*: Many of our jobs that look at a few weeks of data have task counts around 3500+. We went with a small cluster of 4 EC2 larges for Mesos+Spark in production for now beca

Re: Continued performance issues on a small EC2 Spark cluster

2013-11-15 Thread Michael (Bach) Bui
Hi Gary, What are other frameworks running on your Mesos cluster? If they are all Spark frameworks. Another option you may want to consider (in order to improve your cluster utilization) is to let all of them share a single SparkContext. We also experienced degraded performance while running mu