Tx aaron On Dec 1, 2015 1:54 AM, "Aaron.Dossett" <aaron.doss...@target.com> wrote:
> Well, not all of the reasons were entirely unrelated: > > > - If data stopped flowing from Kafka completely then a rotation might > not happen for a very long time and I wanted to guarantee time bounds on > when I processed files. A time-based rotation policy would have addressed > this, but that was not desirable for other reasons. > - If the topology or bolt completely crashed and restarted, rotation > actions would never be triggered, as I understand it, on the files that > were open at the time of the crash. It’s for this reason that I have never > used rotation actions. > > -Aaron > > From: Aaron Dossett <aaron.doss...@target.com> > Reply-To: "user@storm.apache.org" <user@storm.apache.org> > Date: Monday, November 30, 2015 at 2:08 PM > To: "user@storm.apache.org" <user@storm.apache.org> > Subject: Re: Writing file to storm hdfs > > No, I have a separate process that runs periodically and determines which > files haven’t been processed before. Hooking directly into the rotation > wasn’t an option for me for unrelated reasons. > > From: Gaurav Agarwal <gaurav130...@gmail.com> > Reply-To: "user@storm.apache.org" <user@storm.apache.org> > Date: Monday, November 30, 2015 at 2:06 PM > To: "user@storm.apache.org" <user@storm.apache.org> > Subject: Re: Writing file to storm hdfs > > Hello Aaron, > Please correct me if am wrong,You start processing files as soon as it is > written and rotated by the hdfs bolt. > On Dec 1, 2015 12:41 AM, "Aaron.Dossett" <aaron.doss...@target.com> wrote: > >> I recently had to solve a use case like that. I decided to track while >> files i had processed instead of records within each file. If a file is >> still open for writing you could ignore it and come back for it later, or >> insert it more than once if your process is idempotent. >> >> From: Gaurav Agarwal <gaurav130...@gmail.com> >> Reply-To: "user@storm.apache.org" <user@storm.apache.org> >> Date: Monday, November 30, 2015 at 1:01 PM >> To: "user@storm.apache.org" <user@storm.apache.org> >> Subject: Writing file to storm hdfs >> >> Hello >> >> In storm topology we r receiving tuples in millions from Kafka and we >> have to perform some calculations in bolt. Parallely we have bolt that >> starts writing into hdfs ,now we have parallelism hint for writing the file >> is 8. So 8 files will be there. >> Actually problem is once the snapshot data is enriched Nd written to >> multiple file nd completed,we have to trigger the other job that will copy >> the records from files into database. >> How can we find with multiple files created Nd bolt writing paraalely in >> files which is the last record written so that we can trigger nextjob.Any >> ideas? >> >