Hi Gopal,

Thanks for your explanation.

What could be the case that SET hive.merge.orcfile.stripe.level=true && alter table <table> concatenate do not work? I have a dynamic partitioned table (stored as orc). I tried to alter concatenate, but it did not work. See my test result.

hive> SET hive.merge.orcfile.stripe.level=true;
hive> alter table orc_merge5a partition(st=0.8) concatenate;
Starting Job = job_1424363133313_0053, Tracking URL = http://service-test-1-2.testlocal:8088/proxy/application_1424363133313_0053/ Kill Command = /usr/hdp/2.2.0.0-2041/hadoop/bin/hadoop job -kill job_1424363133313_0053
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2015-04-21 12:32:56,165 null map = 0%,  reduce = 0%
2015-04-21 12:33:05,964 null map = 100%,  reduce = 0%
Ended Job = job_1424363133313_0053
Loading data to table default.orc_merge5a partition (st=0.8)
Moved: 'hdfs://service-test-1-0.testlocal:8020/apps/hive/warehouse/orc_merge5a/st=0.8/000000_0' to trash at: hdfs://service-test-1-0.testlocal:8020/user/patcharee/.Trash/Current Moved: 'hdfs://service-test-1-0.testlocal:8020/apps/hive/warehouse/orc_merge5a/st=0.8/000002_0' to trash at: hdfs://service-test-1-0.testlocal:8020/user/patcharee/.Trash/Current Partition default.orc_merge5a{st=0.8} stats: [numFiles=2, numRows=0, totalSize=1067, rawDataSize=0]
MapReduce Jobs Launched:
Stage-null:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 22.839 seconds
hive> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orc_merge5a/st=0.8/;
Found 2 items
-rw-r--r-- 3 patcharee hdfs 534 2015-04-21 12:33 /apps/hive/warehouse/orc_merge5a/st=0.8/000000_0 -rw-r--r-- 3 patcharee hdfs 533 2015-04-21 12:33 /apps/hive/warehouse/orc_merge5a/st=0.8/000001_0

It seems nothing happened when I altered table concatenate. Any ideas?

BR,
Patcharee

On 21. april 2015 04:41, Gopal Vijayaraghavan wrote:
Hi,

How to set the configuration hive-site.xml to automatically merge small
orc file (output from mapreduce job) in hive 0.14 ?
Hive cannot add work-stages to a map-reduce job.

Hive follows merge.mapfiles=true when Hive generates a plan, by adding
more work to the plan as a conditional task.

-rwxr-xr-x   1 root hdfs      29072 2015-04-20 15:23
/apps/hive/warehouse/coordinate/zone=2/part-r-00000
This looks like it was written by an MRv2 Reducer and not by the Hive
FileSinkOperator & handled by the MR outputcommitter instead of the Hive
MoveTask.

But 0.14 has an option which helps ³hive.merge.orcfile.stripe.level². If
that is true (like your setting), then do

³alter table <table> concatenate²

which effectively concatenates ORC blocks (without decompressing them),
while maintaining metadata linkage of start/end offsets in the footer.

Cheers,
Gopal



Reply via email to