maheshk114 commented on PR #41860:
URL: https://github.com/apache/spark/pull/41860#issuecomment-1643403667
@beliefer @somani @andylam-db
Please let me know, if any more info is required for this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message,
maheshk114 commented on PR #41860:
URL: https://github.com/apache/spark/pull/41860#issuecomment-1637439815
> > I have got some number with varying size for the bloom filter. This
decides the amount of data filtered by bloom from the application side. Even if
the data size is reduced to half
maheshk114 commented on PR #41860:
URL: https://github.com/apache/spark/pull/41860#issuecomment-1637344666
@beliefer
I have got some number with varying size for the bloom filter. This decides
the amount of data filtered by bloom from the application side. Even if the
data size is reduc
maheshk114 commented on PR #41860:
URL: https://github.com/apache/spark/pull/41860#issuecomment-1635347890
@beliefer
Doing some experiments to check the impact of size of tables on the
performance number. As far as bloom is concern, the worst case seems to be the
case when left side (bl
maheshk114 commented on PR #41860:
URL: https://github.com/apache/spark/pull/41860#issuecomment-1633594468
> @maheshk114 Yes. I know that. This PR need to prove no possibility of
regression.
@beliefer Regression in TPCDS and TPCH benchmark or any other scenarios ?
--
This is an au
maheshk114 commented on PR #41860:
URL: https://github.com/apache/spark/pull/41860#issuecomment-1630402790
> This is so strange.
>
> ```
> select *
> from test_bloom.small_table a
> left outer join test_bloom.big_table b
> on a.number = b.pk;
> ```
>
> The SQL sho
maheshk114 commented on PR #41860:
URL: https://github.com/apache/spark/pull/41860#issuecomment-1630198985
> @maheshk114 Thank you for the description. You display a case that have
better performance. It tell me it's worth to consider. But I guess apply the
runtime filter on the small side
maheshk114 commented on PR #41860:
URL: https://github.com/apache/spark/pull/41860#issuecomment-1628961000
> > @beliefer I don't see any difference as well before and after, but the
intent of the PR looks good, in case of left outer join, bloom filter should be
added. I would like to +1 thi
maheshk114 commented on PR #41860:
URL: https://github.com/apache/spark/pull/41860#issuecomment-1625527190
> @maheshk114~ Can you please tell me which queries (TPCH and TPCDS), this
will affect, so I can see the perf diff of have bloom filter on left outer join
@oss-maker I have not n