Found the problem... I'm not serializing the json object so when I call
emit, it's a python dictionary. It works most of the time, but for some
reason we found several values that break it.

I'm not 100% it's not a problem with the storm's multilang support, given
that the emit ends doing a json.dumps() call anyway before sending it to
the ShellBolt or ShellSpout Java class, so it should not break the protocol.

I have a workaround for my problem, but would be nice to know if it's a bug
or the right behavior, because having to serialize / unserialize that
argument on every bolt would cost us some extra processing time.

Thanks.

On 28 May 2015 at 22:35, Andrew Xor <andreas.gramme...@gmail.com> wrote:

> This must be awkward as I have used storm with tuples that are quite large
> with no such problem. Try to replicate with a single spout that generates
> huge tuples and a single bolt as a consumer and report back your results
>
> Regards
> On Thu, May 28, 2015 at 10:59 PM Jeffery Maass <maas...@gmail.com> wrote:
>
>> I would take the kafka spout, JSON, your code out of the equation and
>> replicate the problem with a spout that generates strings of various
>> lengths around 75KB.
>>
>> Thank you for your time!
>>
>> +++++++++++++++++++++
>> Jeff Maass <maas...@gmail.com>
>> linkedin.com/in/jeffmaass
>> stackoverflow.com/users/373418/maassql
>> +++++++++++++++++++++
>>
>>
>> On Thu, May 28, 2015 at 2:45 PM, Carlos Perelló Marín <
>> car...@serverdensity.com> wrote:
>>
>>> Hi,
>>>
>>> While working with Apache Storm 0.9.4 with python + multilang, I found
>>> that one tuple was hanging the topology. It took me a while to figure
>>> what's going on and why it stopped processing payloads until I found that
>>> the hanged bolt was blocked waiting from input on its stdin (it hangs
>>> calling emit).
>>>
>>> After inspecting the tuple that hanged it I found that it includes a
>>> json string that is about 75KB long, it's valid JSON so it's not corrupted
>>> but for some reason breaks when it's emitted.
>>>
>>> I'm using Kafka as a way to inject tuples into my topology and the
>>> KafkaSpout is able to inject such tuple so I wonder whether it's just a
>>> limitation of the multilang implementation...
>>>
>>> Is there any hint to debug or fix it?
>>>
>>> The worse thing is that there was no errors on the supervisor nor
>>> workers logs I just found this because I inspected the processes manually
>>> with strace and adding log output on my code to find the place where it
>>> hanged.
>>>
>>> Thanks in advance!
>>>
>>> --
>>>
>>> Carlos Perelló Marínhttps://www.serverdensity.com
>>>
>>>
>>


-- 

Carlos Perelló Marínhttps://www.serverdensity.com

Reply via email to