I am running few tests and would like to confirm whether this is true...

hdfs.codeC = gzip
hdfs.fileType = CompressedStream
hdfs.writeFormat = Text
hdfs.batchSize = 100


now lets assume I have large number of transactions I roll file every 10
minutes

it seems the tmp file stay 0bytes and flushes at once after 10 minutes vs
if I dont use compression, the file will grow as data are written to HDFS

is this correct?

Do you see any drawback in using compressedstream and with very large
files? In my case 120MB compressed file (block size) is 10x uncompressed

Reply via email to