Hi Thomas! If you believe it's a bug, please submit it to https://issues.apache.org/jira/browse/STORM, so hopefully someone will be able to have a look.
On Mon, 16 Nov 2020 at 15:16, Thomas L. Redman <tomred...@mchsi.com> wrote: > I think somebody on the Dev team needs to look into this. The topology I > wrote is on GitHub and should reproduce the problem consistently. This > feature seems broken. > > On Nov 16, 2020, at 4:21 AM, Michael Giroux <michael_a_gir...@yahoo.com> > wrote: > > Hello Thomas, > > Thank you very much for the response. Disabling the feature allowed me to > move from 1.2.3 => 2.2.0. > > I understand the intent of the feature however in my case the end result > was one node getting all the load and eventual netty heap space > exceptions. Perhaps I should look at that... or perhaps I'll just leave > the feature disabled. > > Again - thanks for the reply - VERY helpful. > > > On Saturday, November 14, 2020, 01:05:03 PM EST, Thomas L. Redman < > tomred...@mchsi.com> wrote: > > > I have seen this same thing. I sent a query on this list, and. after some > time, got a response. The issue is reportedly the result of a new feature. > I would assume this feature is CLEARLY broken, as I had built a test > topology that was clearly compute-bound, not IO bound, and there were > adjustments to change ratio of compute/IO. I have not tested this fix, I am > too close to release, I just rolled back to version 1.2.3. > > From version 2.1.0 forward, not matter how I changed this compute/net IO > ratio, tuples were not distributed across nodes. Now, I could only > reproduce this with anchored (acked) tuples. But if you anchored tuples, > you could never span more than one node, which defeats the purpose of using > Storm. Following is that email from Kishor Patil identifying the issue (and > a way to disable that feature!!!): > > *From: *Kishor Patil <kishorvpa...@apache.org> > *Subject: **Re: Significant Bug* > *Date: *October 29, 2020 at 8:07:18 AM CDT > *To: *<d...@storm.apache.org> > *Reply-To: *d...@storm.apache.org > > Hello Thomas, > > Apologies for delay in responding here. I tested the topology code > provided in storm-issue repo. > *only one machine gets peggeg*: Although it appears, his is not a bug. > This is related to Locality Awareness. Please refer to > https://github.com/apache/storm/blob/master/docs/LocalityAwareness.md > It appears spout to bolt ratio is 200, so if there are enough bolt's on > single node to handle events generated by the spout, it won't send events > out to another node unless it runs out of capacity on single node. If you > do not like this and want to distribute events evenly, you can try > disabling this feature. You can turn off LoadAwareShuffleGrouping by > setting topology.disable.loadaware.messaging to true. > -Kishor > > On 2020/10/28 15:21:54, "Thomas L. Redman" <tomred...@mchsi.com> wrote: > > What’s the word on this? I sent this out some time ago, including a GitHub > project that clearly demonstrates the brokenness, yet I have not heard a > word. Is there anybody supporting Storm? > > On Sep 30, 2020, at 9:03 AM, Thomas L. Redman <tomred...@mchsi.com> wrote: > > I believe I have encountered a significant bug. It seems topologies > employing anchored tuples do not distribute across multiple nodes, > regardless of the computation demands of the bolts. It works fine on a > single node, but when throwing multiple nodes into the mix, only one > machine gets pegged. When we disable anchoring, it will distribute across > all nodes just fine, pegging each machine appropriately. > > This bug manifests from version 2.1 forward. I first encountered this > issue with my own production cluster on an app that does significant NLP > computation across hundreds of millions of documents. This topology is > fairly complex, so I developed a very simple exemplar that demonstrates the > issue with only one spout and bolt. I pushed this demonstration up to > github to provide the developers with a mechanism to easily isolate the > bug, and maybe provide some workaround. I used gradle to build this simple > topology and software and package the results. This code is well > documented, so it should be fairly simple to reproduce the issue. I first > encountered this issue on 3 32 core nodes, but when I started > experimenting, I set up a test cluster with 8 cores, and then I increased > each node to 16 cores, and plenty of memory in every case. > > The topology can be accessed from github at > https://github.com/cowchipkid/storm-issue.git < > https://github.com/cowchipkid/storm-issue.git>. Please feel free to > respond to me directory if you have any questions that are beyond the scope > of this mail list. > > > > > Hope this helps. Please let me know how this goes, I will upgrade to 2.2.0 > again for my next release. > > On Nov 13, 2020, at 12:53 PM, Michael Giroux <michael_a_gir...@yahoo.com> > wrote: > > Hello, all, > > I have a topology with 16 workers running across 4 nodes. This topology > has a bolt "transform" with executors=1 producing a stream that is comsumed > by a bolt "ontology" with executors=160. Everything is configured as > shufflegrouping. > > With Storm 1.2.3 all of the "ontology" bolts get their fair share of > tuples. When I run Storm 2.2.0 only the "ontology" bolts that are on the > same node as the single "transform" bolt get tuples. > > Same cluster - same baseline code - only difference is binding in the new > maven artifact. > > No errors in the logs. > > Any thoughts would be welcome. Thanks! > > > >