Hi Gopal,
Thanks for your explanation.
What could be the case that SET hive.merge.orcfile.stripe.level=true &&
alter table <table> concatenate do not work? I have a dynamic
partitioned table (stored as orc). I tried to alter concatenate, but it
did not work. See my test result.
hive> SET hive.merge.orcfile.stripe.level=true;
hive> alter table orc_merge5a partition(st=0.8) concatenate;
Starting Job = job_1424363133313_0053, Tracking URL =
http://service-test-1-2.testlocal:8088/proxy/application_1424363133313_0053/
Kill Command = /usr/hdp/2.2.0.0-2041/hadoop/bin/hadoop job -kill
job_1424363133313_0053
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2015-04-21 12:32:56,165 null map = 0%, reduce = 0%
2015-04-21 12:33:05,964 null map = 100%, reduce = 0%
Ended Job = job_1424363133313_0053
Loading data to table default.orc_merge5a partition (st=0.8)
Moved:
'hdfs://service-test-1-0.testlocal:8020/apps/hive/warehouse/orc_merge5a/st=0.8/000000_0'
to trash at:
hdfs://service-test-1-0.testlocal:8020/user/patcharee/.Trash/Current
Moved:
'hdfs://service-test-1-0.testlocal:8020/apps/hive/warehouse/orc_merge5a/st=0.8/000002_0'
to trash at:
hdfs://service-test-1-0.testlocal:8020/user/patcharee/.Trash/Current
Partition default.orc_merge5a{st=0.8} stats: [numFiles=2, numRows=0,
totalSize=1067, rawDataSize=0]
MapReduce Jobs Launched:
Stage-null: HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 22.839 seconds
hive> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orc_merge5a/st=0.8/;
Found 2 items
-rw-r--r-- 3 patcharee hdfs 534 2015-04-21 12:33
/apps/hive/warehouse/orc_merge5a/st=0.8/000000_0
-rw-r--r-- 3 patcharee hdfs 533 2015-04-21 12:33
/apps/hive/warehouse/orc_merge5a/st=0.8/000001_0
It seems nothing happened when I altered table concatenate. Any ideas?
BR,
Patcharee
On 21. april 2015 04:41, Gopal Vijayaraghavan wrote:
Hi,
How to set the configuration hive-site.xml to automatically merge small
orc file (output from mapreduce job) in hive 0.14 ?
Hive cannot add work-stages to a map-reduce job.
Hive follows merge.mapfiles=true when Hive generates a plan, by adding
more work to the plan as a conditional task.
-rwxr-xr-x 1 root hdfs 29072 2015-04-20 15:23
/apps/hive/warehouse/coordinate/zone=2/part-r-00000
This looks like it was written by an MRv2 Reducer and not by the Hive
FileSinkOperator & handled by the MR outputcommitter instead of the Hive
MoveTask.
But 0.14 has an option which helps ³hive.merge.orcfile.stripe.level². If
that is true (like your setting), then do
³alter table <table> concatenate²
which effectively concatenates ORC blocks (without decompressing them),
while maintaining metadata linkage of start/end offsets in the footer.
Cheers,
Gopal