RE: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Rajesh_Kalluri Wed, 27 May 2015 19:59:46 -0700

Dell - Internal Use - Confidential
Did you check https://drive.google.com/file/d/0B7tmGAdbfMI2OXl6azYySk5iTGM/edit 
and 
http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation


Not sure if the spark kafka receiver emits metrics on the lag, check this  link 
out 
http://community.spiceworks.com/how_to/77610-how-far-behind-is-your-kafka-consumer

You should be able to whip up a script that runs the Kafka 
ConsumerOffsetChecker periodically and pipe it to a metrics backend of your 
choice. Based on this you can work the dynamic resource allocation magic.

-----Original Message-----
From: dgoldenberg [mailto:dgoldenberg...@gmail.com]
Sent: Wednesday, May 27, 2015 6:21 PM
To: user@spark.apache.org
Subject: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka 
or Spark's metrics?

Hi,

I'm trying to understand if there are design patterns for autoscaling Spark 
(add/remove slave machines to the cluster) based on the throughput.

Assuming we can throttle Spark consumers, the respective Kafka topics we stream 
data from would start growing. What are some of the ways to generate the 
metrics on the number of new messages and the rate they are piling up?
This perhaps is more of a Kafka question; I see a pretty sparse javadoc with 
the Metric interface and not much else...

What are some of the ways to expand/contract the Spark cluster? Someone has 
mentioned Mesos...

I see some info on Spark metrics in the Spark monitoring guide . Do we want to 
perhaps implement a custom sink that would help us autoscale up or down based 
on the throughput?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Autoscaling-Spark-cluster-based-on-topic-sizes-rate-of-growth-in-Kafka-or-Spark-s-metrics-tp23062.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org

RE: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Reply via email to