The default (and in old releases ONLY) multi lang serializer is json, which
is in fact slow.
On May 29, 2015 8:04 AM, "Andrew Xor" <andreas.gramme...@gmail.com> wrote:

> ​I think in the storm documentation it clearly says that not only you have
> to serialize your objects but when using custom types it is better to
> implement your own to avoid the "native" serializer which is quite slow.​ I
> have not used storm multi-lang though to be honest.
>
> Regards.
>
> On Fri, May 29, 2015 at 2:33 PM, Carlos Perelló Marín <
> car...@serverdensity.com> wrote:
>
>> Found the problem... I'm not serializing the json object so when I call
>> emit, it's a python dictionary. It works most of the time, but for some
>> reason we found several values that break it.
>>
>> I'm not 100% it's not a problem with the storm's multilang support, given
>> that the emit ends doing a json.dumps() call anyway before sending it to
>> the ShellBolt or ShellSpout Java class, so it should not break the protocol.
>>
>> I have a workaround for my problem, but would be nice to know if it's a
>> bug or the right behavior, because having to serialize / unserialize that
>> argument on every bolt would cost us some extra processing time.
>>
>> Thanks.
>>
>> On 28 May 2015 at 22:35, Andrew Xor <andreas.gramme...@gmail.com> wrote:
>>
>>> This must be awkward as I have used storm with tuples that are quite
>>> large with no such problem. Try to replicate with a single spout that
>>> generates huge tuples and a single bolt as a consumer and report back your
>>> results
>>>
>>> Regards
>>> On Thu, May 28, 2015 at 10:59 PM Jeffery Maass <maas...@gmail.com>
>>> wrote:
>>>
>>>> I would take the kafka spout, JSON, your code out of the equation and
>>>> replicate the problem with a spout that generates strings of various
>>>> lengths around 75KB.
>>>>
>>>> Thank you for your time!
>>>>
>>>> +++++++++++++++++++++
>>>> Jeff Maass <maas...@gmail.com>
>>>> linkedin.com/in/jeffmaass
>>>> stackoverflow.com/users/373418/maassql
>>>> +++++++++++++++++++++
>>>>
>>>>
>>>> On Thu, May 28, 2015 at 2:45 PM, Carlos Perelló Marín <
>>>> car...@serverdensity.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> While working with Apache Storm 0.9.4 with python + multilang, I found
>>>>> that one tuple was hanging the topology. It took me a while to figure
>>>>> what's going on and why it stopped processing payloads until I found that
>>>>> the hanged bolt was blocked waiting from input on its stdin (it hangs
>>>>> calling emit).
>>>>>
>>>>> After inspecting the tuple that hanged it I found that it includes a
>>>>> json string that is about 75KB long, it's valid JSON so it's not corrupted
>>>>> but for some reason breaks when it's emitted.
>>>>>
>>>>> I'm using Kafka as a way to inject tuples into my topology and the
>>>>> KafkaSpout is able to inject such tuple so I wonder whether it's just a
>>>>> limitation of the multilang implementation...
>>>>>
>>>>> Is there any hint to debug or fix it?
>>>>>
>>>>> The worse thing is that there was no errors on the supervisor nor
>>>>> workers logs I just found this because I inspected the processes manually
>>>>> with strace and adding log output on my code to find the place where it
>>>>> hanged.
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> --
>>>>>
>>>>> Carlos Perelló Marínhttps://www.serverdensity.com
>>>>>
>>>>>
>>>>
>>
>>
>> --
>>
>> Carlos Perelló Marínhttps://www.serverdensity.com
>>
>>
>

Reply via email to