Hi there, We've been encountering the exception
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. Maximum was set to: 100 On a very small dataset (180 lines) using the following setup CREATE TABLE enriched_data ( enriched_json_data string ) PARTITIONED BY (yyyy string, mm string, dd string, identifier string, sub_identifier string, unique_run_id string) CLUSTERED BY (enriched_json_data) INTO 128 BUCKETS LOCATION "${OUTDIR}"; INSERT OVERWRITE TABLE enriched_data PARTITION (yyyy, mm, dd, identifier, sub_identifier, unique_run_id) SELECT … We’ve not seen this issue before (normally our dataset is billions of lines), but in this case we have a very tiny amount of data causing this issue. After looking at the code, it appears as if this condition is failing https://github.com/apache/hive/blob/branch-0.13/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L745 I downloaded and rebuilt the branch with a bit of debugging/stdout printing on the contents of the valToPaths map and it fails as there are 101 entries in it All the entries look like this yyyy=2015/mm=04/dd=09/identifier=1/sub-identifier=3/unique_run_id=df-345345/000047_0 yyyy=2015/mm=04/dd=09/identifier=1/sub-identifier=3/unique_run_id=df-345345/000048_0 yyyy=2015/mm=04/dd=09/identifier=1/sub-identifier=3/unique_run_id=df-345345/000049_0 yyyy=2015/mm=04/dd=09/identifier=1/sub-identifier=3/unique_run_id=df-345345/000051_0 …. We’re just confused as to why Hive considers the final bit of the output path (e.g. 000047_0) to be a “dynamic partition”, as this is not in our PARTITIONED BY clause The only thing I can think of is the CLUSTERED BY 128 BUCKETS clause, combined with the dataset being really small (180 lines), is loading everything into 1 REDUCER task – but the hashing of each line is distributing the rows fairly uniformly so we have > 100 buckets to write to via one reducer Any help will be greatly appreciated With thanks, Daniel Harper Software Engineer, OTG ANT BC5 A5