subject:"FW\: Re\: Autoscaling Spark cluster based on topic sizes\/rate of growth in Kafka or Spark's metrics\?"

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-11 Thread Dmitry Goldenberg

Yes, Tathagata, thank you. For #1, the 'need detection', one idea we're entertaining is timestamping the messages coming into the Kafka topics. The consumers would check the interval between the time they get the message and that message origination timestamp. As Kafka topics start to fill up more

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-11 Thread Tathagata Das

Let me try to add some clarity in the different thought directions that's going on in this thread. 1. HOW TO DETECT THE NEED FOR MORE CLUSTER RESOURCES? If there are not rate limits set up, the most reliable way to detect whether the current Spark cluster is being insufficient to handle the data

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-11 Thread Cody Koeninger

Depends on what you're reusing multiple times (if anything). Read http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence On Wed, Jun 10, 2015 at 12:18 AM, Dmitry Goldenberg < dgoldenberg...@gmail.com> wrote: > At which point would I call cache()? I just want the runtime to s

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-11 Thread Dmitry Goldenberg

o:* Evo Eftimov > *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users > *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic > sizes/rate of growth in Kafka or Spark's metrics? > > > > Evo, > > > > One of the ideas is to shadow the current clus

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-09 Thread Dmitry Goldenberg

At which point would I call cache()? I just want the runtime to spill to disk when necessary without me having to know when the "necessary" is. On Thu, Jun 4, 2015 at 9:42 AM, Cody Koeninger wrote: > direct stream isn't a receiver, it isn't required to cache data anywhere > unless you want it

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-04 Thread Cody Koeninger

direct stream isn't a receiver, it isn't required to cache data anywhere unless you want it to. If you want it, just call cache. On Thu, Jun 4, 2015 at 8:20 AM, Dmitry Goldenberg wrote: > "set the storage policy for the DStream RDDs to MEMORY AND DISK" - it > appears the storage level can be sp

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-04 Thread Dmitry Goldenberg

"set the storage policy for the DStream RDDs to MEMORY AND DISK" - it appears the storage level can be specified in the createStream methods but not createDirectStream... On Thu, May 28, 2015 at 9:05 AM, Evo Eftimov wrote: > You can also try Dynamic Resource Allocation > > > > > https://spark.a

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Dmitry Goldenberg

June 3, 2015 4:46 PM > *To:* Evo Eftimov > *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users > *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic > sizes/rate of growth in Kafka or Spark's metrics? > > > > Evo, > > > > One of the ideas i

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Dmitry Goldenberg

vo Eftimov > *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users > *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic > sizes/rate of growth in Kafka or Spark's metrics? > > > > Evo, > > > > One of the ideas is to shadow the current cluster. This

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Dmitry Goldenberg

ldenberg [mailto:dgoldenberg...@gmail.com] > *Sent:* Wednesday, June 3, 2015 4:46 PM > *To:* Evo Eftimov > *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users > *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic > sizes/rate of growth in Kafka or Spark's metrics? &

RE: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Evo Eftimov

more From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] Sent: Wednesday, June 3, 2015 4:46 PM To: Evo Eftimov Cc: Cody Koeninger; Andrew Or; Gerard Maas; spark users Subject: Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Dmitry Goldenberg

er cluster > > > > *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] > *Sent:* Wednesday, June 3, 2015 4:14 PM > *To:* Cody Koeninger > *Cc:* Andrew Or; Evo Eftimov; Gerard Maas; spark users > *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic > sizes/rate

RE: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Evo Eftimov

Maas; spark users Subject: Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics? Would it be possible to implement Spark autoscaling somewhat along these lines? -- 1. If we sense that a new machine is needed, by watching the data lo

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Dmitry Goldenberg

;> Until there is free RAM, spark streaming (spark) will NOT resort to disk – >>> and of course resorting to disk from time to time (ie when there is no free >>> RAM ) and taking a performance hit from that, BUT only until there is no >>> free RAM >>> >>> >>&

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Dmitry Goldenberg

isk from time to time (ie when there is no free >>> RAM ) and taking a performance hit from that, BUT only until there is no >>> free RAM >>> >>> >>> >>> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] >>> *Sent:* Thursday, May

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Cody Koeninger

time (ie when there is no free >> RAM ) and taking a performance hit from that, BUT only until there is no >> free RAM >> >> >> >> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] >> *Sent:* Thursday, May 28, 2015 2:34 PM >> *To:*

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Dmitry Goldenberg

; free RAM >> >> >> >> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] >> *Sent:* Thursday, May 28, 2015 2:34 PM >> *To:* Evo Eftimov >> *Cc:* Gerard Maas; spark users >> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Andrew Or

y until there is no > free RAM > > > > *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] > *Sent:* Thursday, May 28, 2015 2:34 PM > *To:* Evo Eftimov > *Cc:* Gerard Maas; spark users > *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic > sizes/r

RE: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Evo Eftimov

sizes/rate of growth in Kafka or Spark's metrics? Evo, good points. On the dynamic resource allocation, I'm surmising this only works within a particular cluster setup. So it improves the usage of current cluster resources but it doesn't make the cluster itself elastic. At l

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Dmitry Goldenberg

Evo, good points. On the dynamic resource allocation, I'm surmising this only works within a particular cluster setup. So it improves the usage of current cluster resources but it doesn't make the cluster itself elastic. At least, that's my understanding. Memory + disk would be good and hopefull

FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Evo Eftimov

You can also try Dynamic Resource Allocation https://spark.apache.org/docs/1.3.1/job-scheduling.html#dynamic-resource-allocation Also re the Feedback Loop for automatic message consumption rate adjustment – there is a “dumb” solution option – simply set the storage policy for the DStrea

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

RE: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

RE: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

RE: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

21 matches

Site Navigation

Mail list logo

Footer information