The default (and in old releases ONLY) multi lang serializer is json, which is in fact slow. On May 29, 2015 8:04 AM, "Andrew Xor" <andreas.gramme...@gmail.com> wrote:
> I think in the storm documentation it clearly says that not only you have > to serialize your objects but when using custom types it is better to > implement your own to avoid the "native" serializer which is quite slow. I > have not used storm multi-lang though to be honest. > > Regards. > > On Fri, May 29, 2015 at 2:33 PM, Carlos Perelló Marín < > car...@serverdensity.com> wrote: > >> Found the problem... I'm not serializing the json object so when I call >> emit, it's a python dictionary. It works most of the time, but for some >> reason we found several values that break it. >> >> I'm not 100% it's not a problem with the storm's multilang support, given >> that the emit ends doing a json.dumps() call anyway before sending it to >> the ShellBolt or ShellSpout Java class, so it should not break the protocol. >> >> I have a workaround for my problem, but would be nice to know if it's a >> bug or the right behavior, because having to serialize / unserialize that >> argument on every bolt would cost us some extra processing time. >> >> Thanks. >> >> On 28 May 2015 at 22:35, Andrew Xor <andreas.gramme...@gmail.com> wrote: >> >>> This must be awkward as I have used storm with tuples that are quite >>> large with no such problem. Try to replicate with a single spout that >>> generates huge tuples and a single bolt as a consumer and report back your >>> results >>> >>> Regards >>> On Thu, May 28, 2015 at 10:59 PM Jeffery Maass <maas...@gmail.com> >>> wrote: >>> >>>> I would take the kafka spout, JSON, your code out of the equation and >>>> replicate the problem with a spout that generates strings of various >>>> lengths around 75KB. >>>> >>>> Thank you for your time! >>>> >>>> +++++++++++++++++++++ >>>> Jeff Maass <maas...@gmail.com> >>>> linkedin.com/in/jeffmaass >>>> stackoverflow.com/users/373418/maassql >>>> +++++++++++++++++++++ >>>> >>>> >>>> On Thu, May 28, 2015 at 2:45 PM, Carlos Perelló Marín < >>>> car...@serverdensity.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> While working with Apache Storm 0.9.4 with python + multilang, I found >>>>> that one tuple was hanging the topology. It took me a while to figure >>>>> what's going on and why it stopped processing payloads until I found that >>>>> the hanged bolt was blocked waiting from input on its stdin (it hangs >>>>> calling emit). >>>>> >>>>> After inspecting the tuple that hanged it I found that it includes a >>>>> json string that is about 75KB long, it's valid JSON so it's not corrupted >>>>> but for some reason breaks when it's emitted. >>>>> >>>>> I'm using Kafka as a way to inject tuples into my topology and the >>>>> KafkaSpout is able to inject such tuple so I wonder whether it's just a >>>>> limitation of the multilang implementation... >>>>> >>>>> Is there any hint to debug or fix it? >>>>> >>>>> The worse thing is that there was no errors on the supervisor nor >>>>> workers logs I just found this because I inspected the processes manually >>>>> with strace and adding log output on my code to find the place where it >>>>> hanged. >>>>> >>>>> Thanks in advance! >>>>> >>>>> -- >>>>> >>>>> Carlos Perelló Marínhttps://www.serverdensity.com >>>>> >>>>> >>>> >> >> >> -- >> >> Carlos Perelló Marínhttps://www.serverdensity.com >> >> >