Re: ORC file tuning

2013-12-30 Thread Yin Huai
Hi Avrilia, In org.apache.hadoop.hive.ql.io.orc.WriterImpl, the block size is determined by Math.min(1.5GB, 2 * stripeSize). Also, you can use orc.block.padding in the table property to control whether the writer to pad HDFS blocks to prevent stripes from straddling blocks. The default value of

ORC file tuning

2013-12-29 Thread Avrilia Floratou
Hi all, I'm using Hive 0.12 and running some experiments with the ORC file. The hdfs block size is 128MB and I was wondering what is the best stripe size to use. The default one (250MB) is larger than the block size. Is each stripe splittable or in this case each map task will have to access data