We are using Hive 0.14

Our input file size is around 100 GB uncompressed

We are using insering this data to hive which is ORC based table , ZLIB

While inserting we are also using following two parameters.

SET hive.exec.reducers.max=10;

SET mapred.reduce.tasks=5;


The output ORC file produced is of about 10GB compressed.

Question :

   1. How to control the number of output ORC files
   2. How to control size of ORC file generated
   3. If we get very big files like 10GB ORC ,  we try to query the table
   we get exception in hive as shown below ( query and exception below)
   4.

   Will setting hive.exec.orc.default.block.size or
   hive.exec.orc.default.stripe.size to some lower value help to control the
   file output size?


Is there any limitation in ORC for file size

We have following hive properties set in Ambari

hive.merge.size.per.task 256000000

hive.merge.orcfile.stripe.level true

hive.merge.mapfiles true

hive.merge.mapredfiles true



Reading Query

Select * from table where partition=“big_*file_*size"


Execption

P-524264982-127.0.0.1-1429020129249:blk_1091744762_18097939):
PathInfo{path=, state=UNUSABLE} is not usable for short circuit; giving up
on BlockReaderLocal.

15/10/21 11:30:02 [LeaseRenewer:d760770@tdcdv2]: DEBUG hdfs.LeaseRenewer:
Lease renewer daemon for [] with renew id 1 executed

15/10/21 11:30:04 [ORC_GET_SPLITS #1]: ERROR orc.OrcInputFormat: Unexpected
Exception

java.lang.NullPointerException

at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setIncludedColumns(OrcInputFormat.java:260)

at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:779)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Failed with exception java.io.IOException:java.lang.RuntimeException:
serious problem

15/10/21 11:30:04 [main]: ERROR CliDriver: Failed with exception
java.io.IOException:java.lang.RuntimeException: serious problem

java.io.IOException: java.lang.RuntimeException: serious problem

at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:663)

at
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:561)

at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)

at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1623)

at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)

at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)

at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)

at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)

at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)

at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Caused by: java.lang.RuntimeException: serious problem

at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.waitForTasks(OrcInputFormat.java:478)

at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:949)

at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:974)

at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:442)

at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:588)

... 15 more

Caused by: java.lang.NullPointerException

at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setIncludedColumns(OrcInputFormat.java:260)

at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:779)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)


15/10/21 11:30:04 [main]: INFO exec.TableScanOperator: 0 finished.
closing...

15/10/21 11:30:04 [main]: DEBUG exec.TableScanOperator: Closing child =
SEL[2]

15/10/21 11:30:04 [main]: DEBUG exec.SelectOperator:
allInitializedParentsAreClosed? parent.state = CLOSE

15/10/21 11:30:04 [main]: INFO exec.SelectOperator: 2 finished. closing...

15/10/21 11:30:04 [main]: DEBUG exec.SelectOperator: Closing child = LIM[3]

Reply via email to