Re: [PR] [draft] Mode null benchmark [pinot]

2024-02-26 Thread via GitHub
gortiz commented on PR #12354: URL: https://github.com/apache/pinot/pull/12354#issuecomment-1964228389 I've created another benchmark where the null distribution is not constant but random while keeping a single null value per given number of rows (aka interval). Also in order to

Re: [PR] [draft] Mode null benchmark [pinot]

2024-02-26 Thread via GitHub
gortiz commented on PR #12354: URL: https://github.com/apache/pinot/pull/12354#issuecomment-1964199109 Extra investigations: ## Why performance in `normal` with `null handling` decreases so much when going from 2 to 4 interval? I've checked with perfnorm: ``` Benchmark

Re: [PR] [draft] Mode null benchmark [pinot]

2024-02-26 Thread via GitHub
gortiz commented on PR #12354: URL: https://github.com/apache/pinot/pull/12354#issuecomment-1963827711 So the speedup in the new operations is explained because we are actually doing a better work in the new implementations by using primitive values. These implementations: - Are faster

Re: [PR] [draft] Mode null benchmark [pinot]

2024-02-26 Thread via GitHub
gortiz commented on PR #12354: URL: https://github.com/apache/pinot/pull/12354#issuecomment-1963704893 BTW, there is another change in this implementation (although I guess we are not planning to merge it). Results in the current implementation are less precise than the new

Re: [PR] [draft] Mode null benchmark [pinot]

2024-02-23 Thread via GitHub
Jackie-Jiang commented on PR #12354: URL: https://github.com/apache/pinot/pull/12354#issuecomment-1961924193 I tried the following micro-benchmark and see very interesting result: Code: ``` package org.apache.pinot.perf; import java.io.IOException; import

Re: [PR] [draft] Mode null benchmark [pinot]

2024-02-23 Thread via GitHub
gortiz commented on PR #12354: URL: https://github.com/apache/pinot/pull/12354#issuecomment-1961194752 That is the interesting part. Aggregation operators implemented before did not used RoaringBitmap that well and they were iterating over it. Instead here we are not doing that. We are

Re: [PR] [draft] Mode null benchmark [pinot]

2024-02-22 Thread via GitHub
Jackie-Jiang commented on PR #12354: URL: https://github.com/apache/pinot/pull/12354#issuecomment-1960600876 The result is a little bit counter intuitive. In most cases, when null handling is enabled, it should be much slower than when it is disabled because we need to pay overhead of

Re: [PR] [draft] Mode null benchmark [pinot]

2024-02-16 Thread via GitHub
gortiz commented on PR #12354: URL: https://github.com/apache/pinot/pull/12354#issuecomment-1948171973 I've repeated the benchmark with: - Java 21 - 1M rows per segment (instead of 10k) - 3 consecutive nulls every 3 not null rows (instead of 1 every 127) And the results are:

Re: [PR] [draft] Mode null benchmark [pinot]

2024-02-16 Thread via GitHub
gortiz commented on PR #12354: URL: https://github.com/apache/pinot/pull/12354#issuecomment-1948107488 We could try to execute the same benchmark changing the null distribution and increasing the size of the segments, but TBH the difference seems very small when we compare with real costs

Re: [PR] [draft] Mode null benchmark [pinot]

2024-02-16 Thread via GitHub
gortiz commented on PR #12354: URL: https://github.com/apache/pinot/pull/12354#issuecomment-1948104199 I've applied similar changes in SUM and results are: In AMD Ryzen 9 3900X, Java 11: ``` Benchmark (_aQueryTemplate)

Re: [PR] [draft] Mode null benchmark [pinot]

2024-02-16 Thread via GitHub
codecov-commenter commented on PR #12354: URL: https://github.com/apache/pinot/pull/12354#issuecomment-1948072486 ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/12354?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) Report Attention:

Re: [PR] [draft] Mode null benchmark [pinot]

2024-02-15 Thread via GitHub
Jackie-Jiang commented on PR #12354: URL: https://github.com/apache/pinot/pull/12354#issuecomment-1947365697 For benchmark purpose, can we try the same idea on a cheaper aggregation such as `SUM`? This way we can amplify the performance impact -- This is an automated message from the

[PR] [draft] Mode null benchmark [pinot]

2024-02-02 Thread via GitHub
gortiz opened a new pull request, #12354: URL: https://github.com/apache/pinot/pull/12354 There are some questions about the performance on the implementation proposed in #12227. This PR modifies the code to add several Mode implementations that can be selected by using `mode(col,