On Fri, May 25, 2012 at 9:30 AM, Harsh J ha...@cloudera.com wrote:
Mohit,
Not if you call sync (or hflush/hsync in 2.0) periodically to persist
your changes to the file. SequenceFile doesn't currently have a
sync-API inbuilt in it (in 1.0 at least), but you can call sync on the
underlying
I cc'd flume-u...@incubator.apache.org because I don't know if Mohit subscribed
there.
Mohit,
you could use Avro to serialize the data and send them to a Flume Avro source.
Or you could syslog - both are supported in Flume 1.x.
SequenceFile.Writer#syncFs is in Hadoop 1.0.0 (actually since
0.20.205), which calls the underlying FSDataOutputStream#sync which is
actually hflush semantically (data not durable in case of data center
wide power outage). hsync implementation is not yet in 2.0. HDFS-744
just brought hsync in
Thanks for correcting me there on the syncFs call Luke. I seemed to
have missed that method when searching branch-1 code.
On Thu, May 31, 2012 at 6:54 AM, Luke Lu l...@apache.org wrote:
SequenceFile.Writer#syncFs is in Hadoop 1.0.0 (actually since
0.20.205), which calls the underlying
Mohit,
Not if you call sync (or hflush/hsync in 2.0) periodically to persist
your changes to the file. SequenceFile doesn't currently have a
sync-API inbuilt in it (in 1.0 at least), but you can call sync on the
underlying output stream instead at the moment. This is possible to do
in 1.0 (just