Not sure what you are trying to do, but the HDFS sink appends. It's just that you have to determine what your roll-over strategy will be. Instead of every few minutes, you can set the hdfs.rollInterval=0 (disables) and set the hdfs.rollSize to however large you want your files before you roll over to appending to a new file. You can also use hdfs.rollCount to set your roll-over for a certain number of records. I use rollSize for my roll-over strategy.
On Tue, Apr 8, 2014 at 8:35 PM, Pritchard, Charles X. -ND < [email protected]> wrote: > Exploring the idea of using "append" instead of creating new files with > HDFS every few minutes. > Are there particular design decisions / considerations? > > There's certainly a history of append with HDFS, mainly, earlier versions > of Hadoop warn strongly against using file append semantics. > > > -Charles
