Hi Yury.

1. I am using Storm 0.9.5
2. It is a BaseRichSpout. Yes, it has acking enabled and I ack each tuple
at the end of the "execute" method of the bolt. I see tuples being acked in
Storm UI.
3. Yes I observe memory usage increasing (which eventually leads to the
topology hanging) even in my dummy setup which is not saving anything in
memory, it merely reproduces the message-passing of my algorithm. I do not
get OOM errors when I execute the topology on the cluster, but I get the
most common exception in Storm :* java.lang.RuntimeException:
java.lang.NullPointerException at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)*
and some tasks die and the Storm UI statistics get lost/re-started. I have
never profiled a topology that is being executed on the cluster so I am not
very certain if this is what you mean. If I understand correctly, what you
are suggesting is to take a heap dump using visualVM at some node while the
topology is running and analyze this heap dump.
4. I haven't seen any GC logs (not sure how to collect GC logs from the
cluster).

Thanks again for your help.

On Fri, Jan 15, 2016 at 9:57 PM, Yury Ruchin <yuri.ruc...@gmail.com> wrote:

> Hi Nick,
>
> Some questions:
>
> 1. Well, what version of Storm are you using? :)
>
> 2. What is the spout you are using? Is this spout reliable, i. e. does it
> use message ids to have messages acked/failed by downstream bolts? Do you
> have acker enabled for your topology? If it is unreliable or does not have
> acker, then topology.max.spout.pending has no effect and if your bolts
> don't keep up with your spout, you will likely end up with overflow buffer
> growing larger and larger.
>
> 3. Not sure if I get it right: after you stopped saving anything in memory
> - do you still experience memory usage increasing? Have you observed
> OutOfMemoryErrors? If yes, you might want to launch your worker processes
> with -XX:+HeapDumpOnOutOfMemoryError. If no, you can take on-demand heap
> dump using e. g. VisualVM and feed it to a memory analyzer, such as MAT,
> then take a look what eats up the heap.
>
> 4. What do you think it's a memory issue? Have you looked at GC graphs
> shown by e. g. VisualVM? Did you collect any GC logs to see how long it
> took?
>
> Regards
> Yury
>
> 2016-01-15 20:15 GMT+03:00 Nikolaos Pavlakis <nikolaspavla...@gmail.com>:
>
>> Thanks for all the replies so far. I am profiling the topology in local
>> mode with VisualVm and I do not see this problem. I am still running to
>> this problem when the topology is deployed on the cluster, even with
>> max.spout.pending = 1.
>>
>> On Wed, Jan 13, 2016 at 10:38 PM, John Yost <hokiege...@gmail.com> wrote:
>>
>>> +1 for Andrew, definitely agree profiling with jvisualvm or whatever is
>>> definitely something to do if you have not done already
>>>
>>> On Wed, Jan 13, 2016 at 3:30 PM, Andrew Xor <andreas.gramme...@gmail.com
>>> > wrote:
>>>
>>>> Hey,
>>>>
>>>>  Care to give version of storm/jvm? Does this happen on cluster
>>>> execution only or when also running the topology in local mode?
>>>> Unfortunately, probably the best way to find what's really going on is to
>>>> profile your topology... if you can run the topology locally this will make
>>>> things quite a bit easier as profiling storm topologies on a live cluster
>>>> can be quite time consuming.
>>>>
>>>> Regards.
>>>>
>>>> On Wed, Jan 13, 2016 at 10:06 PM, Nikolaos Pavlakis <
>>>> nikolaspavla...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am implementing a distributed algorithm for pagerank estimation
>>>>> using Storm. I have been having memory problems, so I decided to create a
>>>>> dummy implementation that does not explicitly save anything in memory, to
>>>>> determine whether the problem lies in my algorithm or my Storm structure.
>>>>>
>>>>> Indeed, while the only thing the dummy implementation does is
>>>>> message-passing (a lot of it), the memory of each worker process keeps
>>>>> rising until the pipeline is clogged. I do not understand why this might 
>>>>> be
>>>>> happening.
>>>>>
>>>>> My cluster has 18 machines (some with 8g, some 16g and some 32g of
>>>>> memory). I have set the worker heap size to 6g (-Xmx6g).
>>>>>
>>>>> My topology is very very simple:
>>>>> One spout
>>>>> One bolt (with parallelism).
>>>>>
>>>>> The bolt receives data from the spout (fieldsGrouping) and also from
>>>>> other tasks of itself.
>>>>>
>>>>> My message-passing pattern is based on random walks with a certain
>>>>> stopping probability. More specifically:
>>>>> The spout generates a tuple.
>>>>> One specific task from the bolt receives this tuple.
>>>>> Based on a certain probability, this task generates another tuple and
>>>>> emits it again to another task of the same bolt.
>>>>>
>>>>>
>>>>> I am stuck at this problem for quite a while, so it would be very
>>>>> helpful if someone could help.
>>>>>
>>>>> Best Regards,
>>>>> Nick
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to