Re: Approach to parallelism

Javier Gonzalez Mon, 05 Oct 2015 06:14:39 -0700

As much as you can spare. Assuming a dedicated cluster node and single
topology, we would allocate at minimum half the machine's virtual memory.
Depending on there being an external in-memory cache, we would have to
leave the other half open for the cache.


Mind you that, if you want to run multiple topologies in the same cluster,
this goes out of the window. Each topology will spawn its own worker and
each worker will have those memory parameters, so if you have a 8Gb ram
machine with two workers specifying -Xms6G, you have a problem :)

IIRC, the worker options can be overridden on a per-topology basis with the
option Config.TOPOLOGY_WORKER_CHILDOPTS, but I'm not sure at the moment if
this adds to the WORKER_CHILDOPTS option or replaces it completely.

Regards,
JG
On Oct 5, 2015 9:07 AM, "Abe Oppenheim" <abe.oppenh...@gmail.com> wrote:

> Hi All,
>
> Any tips for determining the heap size for node's single JVM?
>
> On Oct 5, 2015, at 5:25 AM, anshu shukla <anshushuk...@gmail.com> wrote:
>
> I was also   facing the  same issue of balancing the latency and tradeoff
> .Got a nice dicussion here .
>
> Just one query How we can map -
> *1-no of workers to number of  cores *
> *2-no of slots on one machine to number of cores over that machine*
>
> On Sun, Oct 4, 2015 at 2:00 AM, Kashyap Mhaisekar <kashya...@gmail.com>
> wrote:
>
>> Thanks guys.
>> So when you say one jvm per node, then it means that one port say 6700 on
>> each machine and for that we assign high amount of heap?
>> So in this case, it translates into 5 (5 machines) workers with atleast
>> 4g heap and all bolts spread across these 5 workers?
>>
>> Is there a guideline on how should I arrive at parallelism hints of bolts
>> themselves? I mean, when complete latency at spout is higher but execute
>> latencies at bolts are very very small...
>>
>> Will jump into the links right away.
>>
>> Thanks
>> Kashyap
>> On Oct 3, 2015 12:00 PM, "Michael Vogiatzis" <michaelvogiat...@gmail.com>
>> wrote:
>>
>>> I will agree with Javier, one JVM per node should eliminate the number
>>> of messages that need to be serialized.
>>>
>>> For tuning Storm topologies you may find the following links useful:
>>>
>>> https://gist.github.com/mrflip/5958028
>>>
>>> https://wassermelonemann.wordpress.com/2014/01/22/tuning-storm-topologies/
>>> Talk:
>>>
>>> http://demo.ooyala.com/player.html?width=640&height=360&embedCode=Q1eXg5NzpKqUUzBm5WTIb6bXuiWHrRMi&videoPcode=9waHc6zKpbJKt9byfS7l4O4sn7Qn
>>>
>>> Cheers,
>>> Michael
>>> @mvogiatzis <https://twitter.com/mvogiatzis>
>>>
>>>
>>> On Sat, 3 Oct 2015 at 14:04 Javier Gonzalez <jagon...@gmail.com> wrote:
>>>
>>>> I would suggest sticking with a single worker per machine. It makes
>>>> memory allocation easier and it makes inter-component communication much
>>>> more efficient. Configure the executors with your parallelism hints to take
>>>> advantage of all your availabe CPU cores.
>>>>
>>>> Regards,
>>>> JG
>>>>
>>>> On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar <kashya...@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>> I was trying to come up with an approach to evaluate the parallelism
>>>>> needed for a topology.
>>>>>
>>>>> Assuming I have 5 machines with 8 cores and 32 gb. And my topology has
>>>>> one spout and 5 bolts.
>>>>>
>>>>> 1. Define one worker port per CPU to start off. (= 8 workers per
>>>>> machine ie 40 workers over all)
>>>>> 2. Each worker spawns one executor per component per worker, it
>>>>> translates to 6 executors per worker which is 40x6= 240 executors.
>>>>> 3. Of this, if the bolt logic is CPU intensive, then leave parallelism
>>>>> hint  at 40 (total workers), else increase parallelism hint beyond 40 till
>>>>> you hit a number beyond which there is no more visible performance.
>>>>>
>>>>> Does this look right?
>>>>>
>>>>> Thanks
>>>>> Kashyap
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Javier González Nicolini
>>>>
>>> --
>>> Michael Vogiatzis
>>> Twitter: @mvogiatzis
>>> http://micvog.com/
>>>
>>
>
>
> --
> Thanks & Regards,
> Anshu Shukla
>
>

Re: Approach to parallelism

Reply via email to