Hi,
We have an HDFS set up of a namenode and three datanodes all on EC2
larges. One of our data partitions basically has files that are fed from a
few Flume instances rolling *hourly*. This equates to around 3 4-8mb files
per hour right now
Our Mesos cluster consists of a Master and the three s
I bring this up because the performance we are seeing is dreadful. From
cpu usage, it appears the issue is the spark shell cpu power. We have
increased this node from a EC2 medium to an xl, we are seeing slightly
better performance but still not great.
My understanding of Spark was that most of
Hi everyone,
I want to share what helped us resolve the issue short term and also our
concerns longer term.
*Some Background*:
Many of our jobs that look at a few weeks of data have task counts around
3500+. We went with a small cluster of 4 EC2 larges for Mesos+Spark in
production for now beca
Hi Gary,
What are other frameworks running on your Mesos cluster?
If they are all Spark frameworks. Another option you may want to consider (in
order to improve your cluster utilization) is to let all of them share a single
SparkContext.
We also experienced degraded performance while running mu