[GitHub] [spark] maheshk114 commented on pull request #41860: [SPARK-44307][SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

2023-07-20 Thread via GitHub
maheshk114 commented on PR #41860: URL: https://github.com/apache/spark/pull/41860#issuecomment-1643403667 @beliefer @somani @andylam-db Please let me know, if any more info is required for this PR. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] maheshk114 commented on pull request #41860: [SPARK-44307][SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

2023-07-16 Thread via GitHub
maheshk114 commented on PR #41860: URL: https://github.com/apache/spark/pull/41860#issuecomment-1637439815 > > I have got some number with varying size for the bloom filter. This decides the amount of data filtered by bloom from the application side. Even if the data size is reduced to half

[GitHub] [spark] maheshk114 commented on pull request #41860: [SPARK-44307][SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

2023-07-16 Thread via GitHub
maheshk114 commented on PR #41860: URL: https://github.com/apache/spark/pull/41860#issuecomment-1637344666 @beliefer I have got some number with varying size for the bloom filter. This decides the amount of data filtered by bloom from the application side. Even if the data size is reduc

[GitHub] [spark] maheshk114 commented on pull request #41860: [SPARK-44307][SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

2023-07-13 Thread via GitHub
maheshk114 commented on PR #41860: URL: https://github.com/apache/spark/pull/41860#issuecomment-1635347890 @beliefer Doing some experiments to check the impact of size of tables on the performance number. As far as bloom is concern, the worst case seems to be the case when left side (bl

[GitHub] [spark] maheshk114 commented on pull request #41860: [SPARK-44307][SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

2023-07-12 Thread via GitHub
maheshk114 commented on PR #41860: URL: https://github.com/apache/spark/pull/41860#issuecomment-1633594468 > @maheshk114 Yes. I know that. This PR need to prove no possibility of regression. @beliefer Regression in TPCDS and TPCH benchmark or any other scenarios ? -- This is an au

[GitHub] [spark] maheshk114 commented on pull request #41860: [SPARK-44307][SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

2023-07-11 Thread via GitHub
maheshk114 commented on PR #41860: URL: https://github.com/apache/spark/pull/41860#issuecomment-1630402790 > This is so strange. > > ``` > select * > from test_bloom.small_table a > left outer join test_bloom.big_table b > on a.number = b.pk; > ``` > > The SQL sho

[GitHub] [spark] maheshk114 commented on pull request #41860: [SPARK-44307][SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

2023-07-10 Thread via GitHub
maheshk114 commented on PR #41860: URL: https://github.com/apache/spark/pull/41860#issuecomment-1630198985 > @maheshk114 Thank you for the description. You display a case that have better performance. It tell me it's worth to consider. But I guess apply the runtime filter on the small side

[GitHub] [spark] maheshk114 commented on pull request #41860: [SPARK-44307] : [SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

2023-07-10 Thread via GitHub
maheshk114 commented on PR #41860: URL: https://github.com/apache/spark/pull/41860#issuecomment-1628961000 > > @beliefer I don't see any difference as well before and after, but the intent of the PR looks good, in case of left outer join, bloom filter should be added. I would like to +1 thi

[GitHub] [spark] maheshk114 commented on pull request #41860: [SPARK-44307] : [SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

2023-07-07 Thread via GitHub
maheshk114 commented on PR #41860: URL: https://github.com/apache/spark/pull/41860#issuecomment-1625527190 > @maheshk114~ Can you please tell me which queries (TPCH and TPCDS), this will affect, so I can see the perf diff of have bloom filter on left outer join @oss-maker I have not n