Re: Storm process memory constantly increasing

Nikolaos Pavlakis Wed, 20 Jan 2016 08:11:09 -0800

Hi again Yury.

Thanks for the help and the references. I will have a look.


My spout is quite simple. Here is the code :

public void nextTuple() {
  nodeIds[0] = random.nextInt(urlsNum);
  nodeIds[1] = random.nextInt(urlsNum);
  sent++;
  collector.emit("FirstNodeStream", new Values(nodeIds[0],
nodeIds[1]), String.valueOf(sent));
  // Third argument of emit is a messageId for the tuple.

  Utils.sleep(1);
}

*nodeIds* is an int[2], *urlsNum* is just an int and denotes the maximum
possible id, *sent* is an int counter.

On Mon, Jan 18, 2016 at 11:54 PM, Yury Ruchin <yuri.ruc...@gmail.com> wrote:

> Yes, I suggest you to try and spot the problem by looking at the dump of a
> workers that throws the exception. That way you could at least be certain
> about what consumes workers memory.
>
> Oracle HotSpot has a number of options controlling GC logging, setting
> them for worker JVMs may help in troubleshooting. Plumbr's Handbook seems
> to be a decent reading on that matter: https://plumbr.eu/handbook.
>
> Since you are using custom spout, could you provide its code, at least the
> part that emits tuples?
>
> 2016-01-15 23:57 GMT+03:00 Nikolaos Pavlakis <nikolaspavla...@gmail.com>:
>
>> Hi Yury.
>>
>> 1. I am using Storm 0.9.5
>> 2. It is a BaseRichSpout. Yes, it has acking enabled and I ack each tuple
>> at the end of the "execute" method of the bolt. I see tuples being acked in
>> Storm UI.
>> 3. Yes I observe memory usage increasing (which eventually leads to the
>> topology hanging) even in my dummy setup which is not saving anything in
>> memory, it merely reproduces the message-passing of my algorithm. I do not
>> get OOM errors when I execute the topology on the cluster, but I get the
>> most common exception in Storm :* java.lang.RuntimeException:
>> java.lang.NullPointerException at
>> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)*
>> and some tasks die and the Storm UI statistics get lost/re-started. I have
>> never profiled a topology that is being executed on the cluster so I am not
>> very certain if this is what you mean. If I understand correctly, what you
>> are suggesting is to take a heap dump using visualVM at some node while the
>> topology is running and analyze this heap dump.
>> 4. I haven't seen any GC logs (not sure how to collect GC logs from the
>> cluster).
>>
>> Thanks again for your help.
>>
>> On Fri, Jan 15, 2016 at 9:57 PM, Yury Ruchin <yuri.ruc...@gmail.com>
>> wrote:
>>
>>> Hi Nick,
>>>
>>> Some questions:
>>>
>>> 1. Well, what version of Storm are you using? :)
>>>
>>> 2. What is the spout you are using? Is this spout reliable, i. e. does
>>> it use message ids to have messages acked/failed by downstream bolts? Do
>>> you have acker enabled for your topology? If it is unreliable or does not
>>> have acker, then topology.max.spout.pending has no effect and if your bolts
>>> don't keep up with your spout, you will likely end up with overflow buffer
>>> growing larger and larger.
>>>
>>> 3. Not sure if I get it right: after you stopped saving anything in
>>> memory - do you still experience memory usage increasing? Have you observed
>>> OutOfMemoryErrors? If yes, you might want to launch your worker processes
>>> with -XX:+HeapDumpOnOutOfMemoryError. If no, you can take on-demand heap
>>> dump using e. g. VisualVM and feed it to a memory analyzer, such as MAT,
>>> then take a look what eats up the heap.
>>>
>>> 4. What do you think it's a memory issue? Have you looked at GC graphs
>>> shown by e. g. VisualVM? Did you collect any GC logs to see how long it
>>> took?
>>>
>>> Regards
>>> Yury
>>>
>>> 2016-01-15 20:15 GMT+03:00 Nikolaos Pavlakis <nikolaspavla...@gmail.com>
>>> :
>>>
>>>> Thanks for all the replies so far. I am profiling the topology in local
>>>> mode with VisualVm and I do not see this problem. I am still running to
>>>> this problem when the topology is deployed on the cluster, even with
>>>> max.spout.pending = 1.
>>>>
>>>> On Wed, Jan 13, 2016 at 10:38 PM, John Yost <hokiege...@gmail.com>
>>>> wrote:
>>>>
>>>>> +1 for Andrew, definitely agree profiling with jvisualvm or whatever
>>>>> is definitely something to do if you have not done already
>>>>>
>>>>> On Wed, Jan 13, 2016 at 3:30 PM, Andrew Xor <
>>>>> andreas.gramme...@gmail.com> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>>  Care to give version of storm/jvm? Does this happen on cluster
>>>>>> execution only or when also running the topology in local mode?
>>>>>> Unfortunately, probably the best way to find what's really going on is to
>>>>>> profile your topology... if you can run the topology locally this will 
>>>>>> make
>>>>>> things quite a bit easier as profiling storm topologies on a live cluster
>>>>>> can be quite time consuming.
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>> On Wed, Jan 13, 2016 at 10:06 PM, Nikolaos Pavlakis <
>>>>>> nikolaspavla...@gmail.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am implementing a distributed algorithm for pagerank estimation
>>>>>>> using Storm. I have been having memory problems, so I decided to create 
>>>>>>> a
>>>>>>> dummy implementation that does not explicitly save anything in memory, 
>>>>>>> to
>>>>>>> determine whether the problem lies in my algorithm or my Storm 
>>>>>>> structure.
>>>>>>>
>>>>>>> Indeed, while the only thing the dummy implementation does is
>>>>>>> message-passing (a lot of it), the memory of each worker process keeps
>>>>>>> rising until the pipeline is clogged. I do not understand why this 
>>>>>>> might be
>>>>>>> happening.
>>>>>>>
>>>>>>> My cluster has 18 machines (some with 8g, some 16g and some 32g of
>>>>>>> memory). I have set the worker heap size to 6g (-Xmx6g).
>>>>>>>
>>>>>>> My topology is very very simple:
>>>>>>> One spout
>>>>>>> One bolt (with parallelism).
>>>>>>>
>>>>>>> The bolt receives data from the spout (fieldsGrouping) and also from
>>>>>>> other tasks of itself.
>>>>>>>
>>>>>>> My message-passing pattern is based on random walks with a certain
>>>>>>> stopping probability. More specifically:
>>>>>>> The spout generates a tuple.
>>>>>>> One specific task from the bolt receives this tuple.
>>>>>>> Based on a certain probability, this task generates another tuple
>>>>>>> and emits it again to another task of the same bolt.
>>>>>>>
>>>>>>>
>>>>>>> I am stuck at this problem for quite a while, so it would be very
>>>>>>> helpful if someone could help.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Nick
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Storm process memory constantly increasing

Reply via email to