That was "the" thing in mind. I guess I should give it a try and then see
how it performs and see how convenient it is, can't just speculate that.

On Wed, May 11, 2016 at 2:44 PM, Nathan Leung <ncle...@gmail.com> wrote:

> You don't have to batch the whole tuple in supervisor memory, the data is
> already in Kafka. Just keep the tuple ID and write to file. When you close
> the file ack all of the tuple IDs.
> On May 11, 2016 5:42 PM, "Steven Lewis" <steven.le...@walmart.com> wrote:
>
>> It sounds like you want to use Spark / Spark Streaming to do that kind of
>> batching output.
>>
>> From: Milind Vaidya <kava...@gmail.com>
>> Reply-To: "user@storm.apache.org" <user@storm.apache.org>
>> Date: Wednesday, May 11, 2016 at 4:24 PM
>> To: "user@storm.apache.org" <user@storm.apache.org>
>> Subject: Re: Getting Kafka Offset in Storm Bolt
>>
>> Yeah. We have some microbatching in place for other topologies. This one
>> is little ambitious, in the sense each message is 1~2KB in size so grouping
>> them to a reasonable chunk is necessary say 500KB  ~ 1 GB (this is just my
>> guess, I am not sure how much does S3 support or what is optimum). Once
>> that chunk is uploaded, all of them can be acked. But isn't it overkill ? I
>> guess storm is not even meant to support that kind of a use case.
>>
>> On Wed, May 11, 2016 at 12:59 PM, Nathan Leung <ncle...@gmail.com> wrote:
>>
>>> You can micro batch kafka contents into a file that's replicated (e.g.
>>> HDFS) and then ack all of the input tuples after the file has been closed.
>>>
>>> On Wed, May 11, 2016 at 3:43 PM, Milind Vaidya <kava...@gmail.com>
>>> wrote:
>>>
>>>> in case of failure to upload a file or disk corruption leading to loss
>>>> of file, we have only current offset in Kafka Spout but have no record as
>>>> to which offsets were lost in the file which need to be replayed. So these
>>>> can be stored externally in zookeeper and could be used to account for lost
>>>> data. For them to save in ZK, they should be available in a bolt.
>>>>
>>>> On Wed, May 11, 2016 at 11:10 AM, Nathan Leung <ncle...@gmail.com>
>>>> wrote:
>>>>
>>>>> Why not just ack the tuple once it's been written to a file.  If your
>>>>> topology fails then the data will be re-read from Kafka.  Kafka spout
>>>>> already does this for you.  Then uploading files to S3 is the
>>>>> responsibility of another job.  For example, a storm topology that 
>>>>> monitors
>>>>> the output folder.
>>>>>
>>>>> Monitoring the data from Kafka all the way out to S3 seems unnecessary.
>>>>>
>>>>> On Wed, May 11, 2016 at 1:50 PM, Milind Vaidya <kava...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> It does not matter, in the sense I am ready to upgrade if this thing
>>>>>> is in the roadmap.
>>>>>>
>>>>>> None the less
>>>>>>
>>>>>> kafka_2.9.2-0.8.1.1 apache-storm-0.9.4
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, May 11, 2016 at 5:53 AM, Abhishek Agarwal <
>>>>>> abhishc...@gmail.com> wrote:
>>>>>>
>>>>>>> which version of storm-kafka, are you using?
>>>>>>>
>>>>>>> On Wed, May 11, 2016 at 12:29 AM, Milind Vaidya <kava...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Anybody ? Anything about this ?
>>>>>>>>
>>>>>>>> On Wed, May 4, 2016 at 11:31 AM, Milind Vaidya <kava...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Is there any way I can know what Kafka offset corresponds to
>>>>>>>>> current tuple I am processing in a bolt ?
>>>>>>>>>
>>>>>>>>> Use case : Need to batch events from Kafka, persists them to a
>>>>>>>>> local file and eventually upload it to the S3. To manager failure 
>>>>>>>>> cases,
>>>>>>>>> need to know the Kafka offset for a message, so that it can be 
>>>>>>>>> persisted to
>>>>>>>>> Zookeeper and will be used to write / upload file.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Abhishek Agarwal
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> This email and any files transmitted with it are confidential and
>> intended solely for the individual or entity to whom they are addressed. If
>> you have received this email in error destroy it immediately. *** Walmart
>> Confidential ***
>>
>

Reply via email to