Yes, Tathagata, thank you.
For #1, the 'need detection', one idea we're entertaining is timestamping
the messages coming into the Kafka topics. The consumers would check the
interval between the time they get the message and that message origination
timestamp. As Kafka topics start to fill up more
Let me try to add some clarity in the different thought directions that's
going on in this thread.
1. HOW TO DETECT THE NEED FOR MORE CLUSTER RESOURCES?
If there are not rate limits set up, the most reliable way to detect
whether the current Spark cluster is being insufficient to handle the data
Depends on what you're reusing multiple times (if anything).
Read
http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
On Wed, Jun 10, 2015 at 12:18 AM, Dmitry Goldenberg <
dgoldenberg...@gmail.com> wrote:
> At which point would I call cache()? I just want the runtime to s
o:* Evo Eftimov
> *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users
> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
> sizes/rate of growth in Kafka or Spark's metrics?
>
>
>
> Evo,
>
>
>
> One of the ideas is to shadow the current clus
At which point would I call cache()? I just want the runtime to spill to
disk when necessary without me having to know when the "necessary" is.
On Thu, Jun 4, 2015 at 9:42 AM, Cody Koeninger wrote:
> direct stream isn't a receiver, it isn't required to cache data anywhere
> unless you want it
direct stream isn't a receiver, it isn't required to cache data anywhere
unless you want it to.
If you want it, just call cache.
On Thu, Jun 4, 2015 at 8:20 AM, Dmitry Goldenberg
wrote:
> "set the storage policy for the DStream RDDs to MEMORY AND DISK" - it
> appears the storage level can be sp
"set the storage policy for the DStream RDDs to MEMORY AND DISK" - it
appears the storage level can be specified in the createStream methods but
not createDirectStream...
On Thu, May 28, 2015 at 9:05 AM, Evo Eftimov wrote:
> You can also try Dynamic Resource Allocation
>
>
>
>
> https://spark.a
June 3, 2015 4:46 PM
> *To:* Evo Eftimov
> *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users
> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
> sizes/rate of growth in Kafka or Spark's metrics?
>
>
>
> Evo,
>
>
>
> One of the ideas i
vo Eftimov
> *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users
> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
> sizes/rate of growth in Kafka or Spark's metrics?
>
>
>
> Evo,
>
>
>
> One of the ideas is to shadow the current cluster. This
ldenberg [mailto:dgoldenberg...@gmail.com]
> *Sent:* Wednesday, June 3, 2015 4:46 PM
> *To:* Evo Eftimov
> *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users
> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
> sizes/rate of growth in Kafka or Spark's metrics?
&
more
From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com]
Sent: Wednesday, June 3, 2015 4:46 PM
To: Evo Eftimov
Cc: Cody Koeninger; Andrew Or; Gerard Maas; spark users
Subject: Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of
growth in Kafka or Spark's metrics?
er cluster
>
>
>
> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com]
> *Sent:* Wednesday, June 3, 2015 4:14 PM
> *To:* Cody Koeninger
> *Cc:* Andrew Or; Evo Eftimov; Gerard Maas; spark users
> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
> sizes/rate
Maas; spark users
Subject: Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of
growth in Kafka or Spark's metrics?
Would it be possible to implement Spark autoscaling somewhat along these lines?
--
1. If we sense that a new machine is needed, by watching the data lo
;> Until there is free RAM, spark streaming (spark) will NOT resort to disk –
>>> and of course resorting to disk from time to time (ie when there is no free
>>> RAM ) and taking a performance hit from that, BUT only until there is no
>>> free RAM
>>>
>>>
>>&
isk from time to time (ie when there is no free
>>> RAM ) and taking a performance hit from that, BUT only until there is no
>>> free RAM
>>>
>>>
>>>
>>> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com]
>>> *Sent:* Thursday, May
time (ie when there is no free
>> RAM ) and taking a performance hit from that, BUT only until there is no
>> free RAM
>>
>>
>>
>> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com]
>> *Sent:* Thursday, May 28, 2015 2:34 PM
>> *To:*
; free RAM
>>
>>
>>
>> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com]
>> *Sent:* Thursday, May 28, 2015 2:34 PM
>> *To:* Evo Eftimov
>> *Cc:* Gerard Maas; spark users
>> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
y until there is no
> free RAM
>
>
>
> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com]
> *Sent:* Thursday, May 28, 2015 2:34 PM
> *To:* Evo Eftimov
> *Cc:* Gerard Maas; spark users
> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
> sizes/r
sizes/rate of
growth in Kafka or Spark's metrics?
Evo, good points.
On the dynamic resource allocation, I'm surmising this only works within a
particular cluster setup. So it improves the usage of current cluster
resources but it doesn't make the cluster itself elastic. At l
Evo, good points.
On the dynamic resource allocation, I'm surmising this only works within a
particular cluster setup. So it improves the usage of current cluster
resources but it doesn't make the cluster itself elastic. At least, that's
my understanding.
Memory + disk would be good and hopefull
You can also try Dynamic Resource Allocation
https://spark.apache.org/docs/1.3.1/job-scheduling.html#dynamic-resource-allocation
Also re the Feedback Loop for automatic message consumption rate adjustment –
there is a “dumb” solution option – simply set the storage policy for the
DStrea
21 matches
Mail list logo