Hi Avrilia,
In org.apache.hadoop.hive.ql.io.orc.WriterImpl, the block size is
determined by Math.min(1.5GB, 2 * stripeSize). Also, you can use
orc.block.padding in the table property to control whether the writer to
pad HDFS blocks to prevent stripes from straddling blocks. The default
value of
Hi all,
I'm using Hive 0.12 and running some experiments with the ORC file. The
hdfs block size is 128MB and I was wondering what is the best stripe size
to use. The default one (250MB) is larger than the block size. Is each
stripe splittable or in this case each map task will have to access data