Re: Writing click stream data to hadoop

2012-05-30 Thread Mohit Anchlia
On Fri, May 25, 2012 at 9:30 AM, Harsh J ha...@cloudera.com wrote: Mohit, Not if you call sync (or hflush/hsync in 2.0) periodically to persist your changes to the file. SequenceFile doesn't currently have a sync-API inbuilt in it (in 1.0 at least), but you can call sync on the underlying

Re: Writing click stream data to hadoop

2012-05-30 Thread alo alt
I cc'd flume-u...@incubator.apache.org because I don't know if Mohit subscribed there. Mohit, you could use Avro to serialize the data and send them to a Flume Avro source. Or you could syslog - both are supported in Flume 1.x.

Re: Writing click stream data to hadoop

2012-05-30 Thread Luke Lu
SequenceFile.Writer#syncFs is in Hadoop 1.0.0 (actually since 0.20.205), which calls the underlying FSDataOutputStream#sync which is actually hflush semantically (data not durable in case of data center wide power outage). hsync implementation is not yet in 2.0. HDFS-744 just brought hsync in

Re: Writing click stream data to hadoop

2012-05-30 Thread Harsh J
Thanks for correcting me there on the syncFs call Luke. I seemed to have missed that method when searching branch-1 code. On Thu, May 31, 2012 at 6:54 AM, Luke Lu l...@apache.org wrote: SequenceFile.Writer#syncFs is in Hadoop 1.0.0 (actually since 0.20.205), which calls the underlying

Re: Writing click stream data to hadoop

2012-05-25 Thread Harsh J
Mohit, Not if you call sync (or hflush/hsync in 2.0) periodically to persist your changes to the file. SequenceFile doesn't currently have a sync-API inbuilt in it (in 1.0 at least), but you can call sync on the underlying output stream instead at the moment. This is possible to do in 1.0 (just