Re: Storm 2.0.0 Local Cluster worker restart

Stig Rohde Døssing Tue, 27 Aug 2019 09:32:07 -0700

Hi Diogo,

Thanks for your thorough explanation. I think you are right, and this is a
bug. We'd be happy to see a PR to fix this.


I think a decent way to handle this could be adding an extra else clause to
https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L146,
and simply create an empty resources directory in the blob extraction
directory, by calling extractDirFromJar(resourcesJar, ServerConfigUtils.
RESOURCES_SUBDIR, extractionDest);. This is just me spitballing, so please
feel free to fix it some other way if you have a better idea.

Den tir. 27. aug. 2019 kl. 14.50 skrev Diogo Monteiro <
[email protected]>:

> Hi all,
>
> My name is Diogo and I am a dev for Paddy Power Betfair in Porto,
> Portugal. We're running Storm 1.x.x in production for a couple of years and
> the time has come for us to upgrade to 2.0.0. We use *LocalCluster* to
> run topologies in our local machines to perform manually tests.
>
> So, going to the point: I was trying to launch a topology that I'm
> developing (in 2.0.0) and noticed that the worker was getting restarted
> each ~30 seconds.
> I placed a breakpoint in the *kill* method of *LocalContainer* (
> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66)
> to try and understand why the worker was getting restarted.
>
> The call stack was:
>
> kill:66, LocalContainer (org.apache.storm.daemon.supervisor)
> killContainerFor:269, Slot (org.apache.storm.daemon.supervisor)
> handleRunning:724, Slot (org.apache.storm.daemon.supervisor)
> stateMachineStep:218, Slot (org.apache.storm.daemon.supervisor)
> run:931, Slot (org.apache.storm.daemon.supervisor)
>
>
> With this I can understand that the worker is killed because a blob has
> changed (
> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724).
> In fact, there's a changing blob in the *dynamicState* at that point.
>
> I checked the *AsyncLocalizer *which downloads, caches blobs locally, and
> notifies the Slot state machine of a changing blob.
>
> I noticed this:
>
>    -
>    
> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339
>    -
>    
> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265
>    -
>    
> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142
>    -
>    
> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192
>
>
> Which tell me that (correct me if I'm wrong):
>
>    - Supervisor tries to update blobs each 30 seconds.
>    - The topology jar blob requires extraction of the resources directory
>    (either from a jar or directly in a classpath URL). It does so in
>    *fetchUnzipToTemp *and it's existence is checked in *isFullyDownloaded*
>    .
>    - The Slot is notified of a changing blob if:
>       - the remote version is different from the local version (the code
>       has changed).
>       - OR the blob is not fully downloaded (the jar exists, and the
>       extracted resources directory exists).
>
> Well, I did not have a resources folder under the root of the classpath,
> and that's why the worker was being restarted each ~30 seconds, as the Slot
> was being notified of a changing blob everytime *updateBlobs* ran.
> I created a resources folder (with dummy files) under the root of the
> classpath and the problem is now solved.
>
> However, if I understand correctly, the resources folder is only required
> for *multilang*. Our topologies do not use *multilang *and this do not
> happen in Storm 1.1.3 for instance.
>
> Am I seeing or doing something wrong and this is an expected behaviour?
> I am happy to contribute if this is in fact something worth to open an
> issue and fix.
>
> Hope this is the right place for these questions, and thanks in advance
> for taking your time to look at this.
>
> Regards,
> Diogo
>

Re: Storm 2.0.0 Local Cluster worker restart

Reply via email to