in case of failure to upload a file or disk corruption leading to loss of
file, we have only current offset in Kafka Spout but have no record as to
which offsets were lost in the file which need to be replayed. So these can
be stored externally in zookeeper and could be used to account for lost
data. For them to save in ZK, they should be available in a bolt.

On Wed, May 11, 2016 at 11:10 AM, Nathan Leung <ncle...@gmail.com> wrote:

> Why not just ack the tuple once it's been written to a file.  If your
> topology fails then the data will be re-read from Kafka.  Kafka spout
> already does this for you.  Then uploading files to S3 is the
> responsibility of another job.  For example, a storm topology that monitors
> the output folder.
>
> Monitoring the data from Kafka all the way out to S3 seems unnecessary.
>
> On Wed, May 11, 2016 at 1:50 PM, Milind Vaidya <kava...@gmail.com> wrote:
>
>> It does not matter, in the sense I am ready to upgrade if this thing is
>> in the roadmap.
>>
>> None the less
>>
>> kafka_2.9.2-0.8.1.1 apache-storm-0.9.4
>>
>>
>>
>>
>> On Wed, May 11, 2016 at 5:53 AM, Abhishek Agarwal <abhishc...@gmail.com>
>> wrote:
>>
>>> which version of storm-kafka, are you using?
>>>
>>> On Wed, May 11, 2016 at 12:29 AM, Milind Vaidya <kava...@gmail.com>
>>> wrote:
>>>
>>>> Anybody ? Anything about this ?
>>>>
>>>> On Wed, May 4, 2016 at 11:31 AM, Milind Vaidya <kava...@gmail.com>
>>>> wrote:
>>>>
>>>>> Is there any way I can know what Kafka offset corresponds to current
>>>>> tuple I am processing in a bolt ?
>>>>>
>>>>> Use case : Need to batch events from Kafka, persists them to a local
>>>>> file and eventually upload it to the S3. To manager failure cases, need to
>>>>> know the Kafka offset for a message, so that it can be persisted to
>>>>> Zookeeper and will be used to write / upload file.
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Abhishek Agarwal
>>>
>>>
>>
>

Reply via email to