Re: [I] [VL] Vanilla Spark broadcast exchange + R2C is slow sometimes [incubator-gluten]

2024-03-27 Thread via GitHub
zhztheplayer closed issue #5136: [VL] Vanilla Spark broadcast exchange + R2C is slow sometimes URL: https://github.com/apache/incubator-gluten/issues/5136 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] [VL] Vanilla Spark broadcast exchange + R2C is slow sometimes [incubator-gluten]

2024-03-27 Thread via GitHub
zhztheplayer commented on issue #5136: URL: https://github.com/apache/incubator-gluten/issues/5136#issuecomment-2022119518 Fixed in https://github.com/apache/incubator-gluten/pull/5141. I assume we can close this now. -- This is an automated message from the Apache Git Service. To

Re: [I] [VL] Vanilla Spark broadcast exchange + R2C is slow sometimes [incubator-gluten]

2024-03-26 Thread via GitHub
zhztheplayer commented on issue #5136: URL: https://github.com/apache/incubator-gluten/issues/5136#issuecomment-2021919347 The major issue I have found is that the `flatMap` approach would cause `UnsafeHashedRelation` to produce duplicated rows in my case (TPCDS q14a with current version

Re: [I] [VL] Vanilla Spark broadcast exchange + R2C is slow sometimes [incubator-gluten]

2024-03-26 Thread via GitHub
zhztheplayer commented on issue #5136: URL: https://github.com/apache/incubator-gluten/issues/5136#issuecomment-2021916024 I don't have dedicated UTs for it so it was incorporated into the other PR. Still I can open one for it if you think it's needed:

Re: [I] [VL] Vanilla Spark broadcast exchange + R2C is slow sometimes [incubator-gluten]

2024-03-26 Thread via GitHub
ulysses-you commented on issue #5136: URL: https://github.com/apache/incubator-gluten/issues/5136#issuecomment-2021873724 Thank you @zhztheplayer It's a good point, columnar broadcast would broadcast the origin binary data but vanilla Spark would broadcast hash relation. So I think this