Re: Problem moving topology from 1.2.3 to 2.2.0 - tuple distribution across cluster

Rui Abreu Fri, 20 Nov 2020 10:36:29 -0800

Hi Thomas!

If you believe it's a bug, please submit it to
https://issues.apache.org/jira/browse/STORM, so hopefully someone will
be able to have a look.


On Mon, 16 Nov 2020 at 15:16, Thomas L. Redman <tomred...@mchsi.com> wrote:

> I think somebody on the Dev team needs to look into this. The topology I
> wrote is on GitHub and should reproduce the problem consistently. This
> feature seems broken.
>
> On Nov 16, 2020, at 4:21 AM, Michael Giroux <michael_a_gir...@yahoo.com>
> wrote:
>
> Hello Thomas,
>
> Thank you very much for the response.  Disabling the feature allowed me to
> move from 1.2.3 => 2.2.0.
>
> I understand the intent of the feature however in my case the end result
> was one node getting all the load and eventual netty heap space
> exceptions.  Perhaps I should look at that... or perhaps I'll just leave
> the feature disabled.
>
> Again - thanks for the reply - VERY helpful.
>
>
> On Saturday, November 14, 2020, 01:05:03 PM EST, Thomas L. Redman <
> tomred...@mchsi.com> wrote:
>
>
> I have seen this same thing. I sent a query on this list, and. after some
> time, got a response. The issue is reportedly the result of a new feature.
> I would assume this feature is CLEARLY broken, as I had built a test
> topology that was clearly compute-bound, not IO bound, and there were
> adjustments to change ratio of compute/IO. I have not tested this fix, I am
> too close to release, I just rolled back to version 1.2.3.
>
>  From version 2.1.0 forward, not matter how I changed this compute/net IO
> ratio, tuples were not distributed across nodes. Now, I could only
> reproduce this with anchored (acked) tuples. But if you anchored tuples,
> you could never span more than one node, which defeats the purpose of using
> Storm. Following is that email from Kishor Patil identifying the issue (and
> a way to disable that feature!!!):
>
> *From: *Kishor Patil <kishorvpa...@apache.org>
> *Subject: **Re: Significant Bug*
> *Date: *October 29, 2020 at 8:07:18 AM CDT
> *To: *<d...@storm.apache.org>
> *Reply-To: *d...@storm.apache.org
>
> Hello Thomas,
>
> Apologies for delay in responding here. I tested the topology code
> provided in storm-issue repo.
> *only one machine gets peggeg*: Although it appears, his is not a bug.
> This is related to Locality Awareness. Please refer to
> https://github.com/apache/storm/blob/master/docs/LocalityAwareness.md
> It appears spout to bolt ratio is 200, so if there are enough bolt's on
> single node to handle events generated by the spout, it won't send events
> out to another node unless it runs out of capacity on single node. If you
> do not like this and want to distribute events evenly, you can try
> disabling this feature. You can turn off LoadAwareShuffleGrouping by
> setting topology.disable.loadaware.messaging to true.
> -Kishor
>
> On 2020/10/28 15:21:54, "Thomas L. Redman" <tomred...@mchsi.com> wrote:
>
> What’s the word on this? I sent this out some time ago, including a GitHub
> project that clearly demonstrates the brokenness, yet I have not heard a
> word. Is there anybody supporting Storm?
>
> On Sep 30, 2020, at 9:03 AM, Thomas L. Redman <tomred...@mchsi.com> wrote:
>
> I believe I have encountered a significant bug. It seems topologies
> employing anchored tuples do not distribute across multiple nodes,
> regardless of the computation demands of the bolts. It works fine on a
> single node, but when throwing multiple nodes into the mix, only one
> machine gets pegged. When we disable anchoring, it will distribute across
> all nodes just fine, pegging each machine appropriately.
>
> This bug manifests from version 2.1 forward. I first encountered this
> issue with my own production cluster on an app that does significant NLP
> computation across hundreds of millions of documents. This topology is
> fairly complex, so I developed a very simple exemplar that demonstrates the
> issue with only one spout and bolt. I pushed this demonstration up to
> github to provide the developers with a mechanism to easily isolate the
> bug, and maybe provide some workaround. I used gradle to build this simple
> topology and software and package the results. This code is well
> documented, so it should be fairly simple to reproduce the issue. I first
> encountered this issue on 3 32 core nodes, but when I started
> experimenting, I set up a test cluster with 8 cores, and then I increased
> each node to 16 cores, and plenty of memory in every case.
>
> The topology can be accessed from github at
> https://github.com/cowchipkid/storm-issue.git <
> https://github.com/cowchipkid/storm-issue.git>. Please feel free to
> respond to me directory if you have any questions that are beyond the scope
> of this mail list.
>
>
>
>
> Hope this helps. Please let me know how this goes, I will upgrade to 2.2.0
> again for my next release.
>
> On Nov 13, 2020, at 12:53 PM, Michael Giroux <michael_a_gir...@yahoo.com>
> wrote:
>
> Hello, all,
>
> I have a topology with 16 workers running across 4 nodes.  This topology
> has a bolt "transform" with executors=1 producing a stream that is comsumed
> by a bolt "ontology" with executors=160.  Everything is configured as
> shufflegrouping.
>
> With Storm 1.2.3 all of the "ontology" bolts get their fair share of
> tuples.  When I run Storm 2.2.0 only the "ontology" bolts that are on the
> same node as the single "transform" bolt get tuples.
>
> Same cluster - same baseline code - only difference is binding in the new
> maven artifact.
>
> No errors in the logs.
>
> Any thoughts would be welcome.  Thanks!
>
>
>
>

Re: Problem moving topology from 1.2.3 to 2.2.0 - tuple distribution across cluster

Reply via email to