Hi all, I have a query that involves 2 tables, t1 with 45M rows (not partitioned) and t2 with 61M rows (partitioned).
If I disable the parameter hive.auto.convert.join.noconditionaltask the query runs well, but, if I enable it, using the default value of hive.auto.convert.join.noconditionaltask.size (1145324612) the query fails. The explain of the query tell me that the optimizer does a broadcast join because, based on table statistics, t1 should be eligible for broadcasting (Data size 564604302 bytes). In fact, if I set hive.auto.convert.join.noconditionaltask.size to a value less 564604302, the optimizer doesn't convert the join and it works well. The problem is that, enabling the auto convert join with default values, the query fails because of JVM errors on Tez containers. If I look at Map Vertex's metrics, I see the following: OUTPUT_RECORDS 42733890 OUTPUT_BYTES 1978132995 OUTPUT_BYTES_PHYSICAL 404057386 OUTPUT_BYTES_WITH_OVERHEAD 2063600823 So, my doubt is: is it possible that statistics on the table are incorrect, so the optimizer thinks that it is possible to broadcast the resulting table, but when the query is running it has more bytes, breaking the task's JVM? I already tried to execute an ANALYZE TABLE t1 COMPUTES STATISTICS FOR COLUMNS (c1, c2,...) but nothing changed. Also, running the explain, I see that the optimizer doesn't use statistics for columns: Statistics:Num rows: 176037 Data size: 564604302 Basic stats: COMPLETE Column stats: NONE Tez.task.size is 4096Mb, and it is not an option to increase the size. Also, if possible, I don't want to decrease the size of hive.auto.convert.join.noconditionaltask.size, because it is set to 26% of tez.task.size, as from best practices. I'm using Hive 1.2 Thank you DXC Technology Company -- This message is transmitted to you by or on behalf of DXC Technology Company or one of its affiliates. It is intended exclusively for the addressee. The substance of this message, along with any attachments, may contain proprietary, confidential or privileged information or information that is otherwise legally exempt from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient of this message, you are not authorized to read, print, retain, copy or disseminate any part of this message. If you have received this message in error, please destroy and delete all copies and notify the sender by return e-mail. Regardless of content, this e-mail shall not operate to bind DXC Technology Company or any of its affiliates to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.