Found the problem... I'm not serializing the json object so when I call emit, it's a python dictionary. It works most of the time, but for some reason we found several values that break it.
I'm not 100% it's not a problem with the storm's multilang support, given that the emit ends doing a json.dumps() call anyway before sending it to the ShellBolt or ShellSpout Java class, so it should not break the protocol. I have a workaround for my problem, but would be nice to know if it's a bug or the right behavior, because having to serialize / unserialize that argument on every bolt would cost us some extra processing time. Thanks. On 28 May 2015 at 22:35, Andrew Xor <andreas.gramme...@gmail.com> wrote: > This must be awkward as I have used storm with tuples that are quite large > with no such problem. Try to replicate with a single spout that generates > huge tuples and a single bolt as a consumer and report back your results > > Regards > On Thu, May 28, 2015 at 10:59 PM Jeffery Maass <maas...@gmail.com> wrote: > >> I would take the kafka spout, JSON, your code out of the equation and >> replicate the problem with a spout that generates strings of various >> lengths around 75KB. >> >> Thank you for your time! >> >> +++++++++++++++++++++ >> Jeff Maass <maas...@gmail.com> >> linkedin.com/in/jeffmaass >> stackoverflow.com/users/373418/maassql >> +++++++++++++++++++++ >> >> >> On Thu, May 28, 2015 at 2:45 PM, Carlos Perelló Marín < >> car...@serverdensity.com> wrote: >> >>> Hi, >>> >>> While working with Apache Storm 0.9.4 with python + multilang, I found >>> that one tuple was hanging the topology. It took me a while to figure >>> what's going on and why it stopped processing payloads until I found that >>> the hanged bolt was blocked waiting from input on its stdin (it hangs >>> calling emit). >>> >>> After inspecting the tuple that hanged it I found that it includes a >>> json string that is about 75KB long, it's valid JSON so it's not corrupted >>> but for some reason breaks when it's emitted. >>> >>> I'm using Kafka as a way to inject tuples into my topology and the >>> KafkaSpout is able to inject such tuple so I wonder whether it's just a >>> limitation of the multilang implementation... >>> >>> Is there any hint to debug or fix it? >>> >>> The worse thing is that there was no errors on the supervisor nor >>> workers logs I just found this because I inspected the processes manually >>> with strace and adding log output on my code to find the place where it >>> hanged. >>> >>> Thanks in advance! >>> >>> -- >>> >>> Carlos Perelló Marínhttps://www.serverdensity.com >>> >>> >> -- Carlos Perelló Marínhttps://www.serverdensity.com