[ https://issues.apache.org/jira/browse/SPARK-28046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865236#comment-16865236 ]
Dongjoon Hyun commented on SPARK-28046: --------------------------------------- In general, this `LongToUnsafeRowMap` issue is the same one with SPARK-24912 > OOM caused by building hash table when the compressed ratio of small table is > normal > ------------------------------------------------------------------------------------ > > Key: SPARK-28046 > URL: https://issues.apache.org/jira/browse/SPARK-28046 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.1 > Reporter: Ke Jia > Priority: Major > Attachments: image-2019-06-14-10-34-53-379.png > > > Currently, spark will convert the sort merge join to broadcast hash join when > the small table compressed size <= the broadcast threshold. Same with > Spark, AE also convert the smj to bhj based on the compressed size in > runtime. In our test, when enable ae with 32M broadcast threshold, one smj > with 16M compressed size is converted to bhj. However, when building the hash > table, the 16M small table is decompressed with 2GB size and has 134485048 > row count, which has a mount of continuous and repeated values. Therefore, > the following OOM exception occurs when building hash table: > !image-2019-06-14-10-34-53-379.png! > And based on this founding , it may be not reasonable to decide whether smj > be converted to bhj only by the compressed size. In AE, we add the condition > with the estimation decompressed size based on the row counts. And in spark, > we may also need the decompressed size or row counts condition judgment not > only the compressed size when converting the smj to bhj. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org