I am running few tests and would like to confirm whether this is true... hdfs.codeC = gzip hdfs.fileType = CompressedStream hdfs.writeFormat = Text hdfs.batchSize = 100
now lets assume I have large number of transactions I roll file every 10 minutes it seems the tmp file stay 0bytes and flushes at once after 10 minutes vs if I dont use compression, the file will grow as data are written to HDFS is this correct? Do you see any drawback in using compressedstream and with very large files? In my case 120MB compressed file (block size) is 10x uncompressed
