Re: Streams/RocksDB: Why Universal Compaction?

Colt McNealy Wed, 26 Jul 2023 08:38:20 -0700

Guozhang,

Thanks for your response. That makes a lot of sense; I can't promise any
super-formal benchmarks but we will definitely play with the configurations
you sent and report back within a month about our high-level findings.


For our purposes (a workflow engine), we will mostly monitor workflow
execution metrics + state store restoration times. But in the interest of a
formal benchmark that could be included in a KIP—what monitoring software
tooling and setup environment would you recommend? If it doesn't involve
writing copious amounts of custom code, perhaps (no promises) my team could
put something together that's more suitable for a general Streams audience
rather than just our own internal usage.

Cheers,
Colt McNealy

*Founder, LittleHorse.dev*


On Sun, Jul 23, 2023 at 11:21 AM Guozhang Wang <guozhang.wang...@gmail.com>
wrote:

> Yeah I can shed some light here: I used Universal originally since at
> the beginning of Kafka Streams journey there were user reports
> complaining about its storage amplifications. But soon enough (around
> 2019) I've realized that, as a OOTB config, level compaction may be
> more preferable.
>
> I had a PR dating back to that time where I suggested changing a bunch
> of OOTB configs or RocksDB including the compaction config:
> https://github.com/apache/kafka/pull/6406/files, unfortunately it was
> not merged since I wanted to run some benchmarks to make sure it does
> not have any gotchas but never got the time to do so. I would be very
> happy in fact if someone could pick that up and re-examine if they
> still make sense, and if yes drive it through and merge.
>
> Guozhang
>
>
> On Sun, Jul 23, 2023 at 10:29 AM Matthias J. Sax <mj...@apache.org> wrote:
> >
> > Do you happen to know?
> >
> >
> > -------- Forwarded Message --------
> > Subject: Streams/RocksDB: Why Universal Compaction?
> > Date: Fri, 23 Jun 2023 13:19:36 -0700
> > From: Colt McNealy <c...@littlehorse.io>
> > Reply-To: users@kafka.apache.org
> > To: users@kafka.apache.org
> >
> > Hello there!
> >
> > I was wondering if anyone (perhaps an early developer or power-user of
> > Kafka Streams) knows why the Streams developers made the default setting
> > for RocksDB compaction "Universal" compaction rather than "Level"
> > compaction?
> >
> > My understanding (in which I am extremely UNconfident) is as follows—
> >
> > Supposedly Universal compaction leads to lower write amplification after
> > compaction finishes. In a run of Universal compaction, all data is
> > compacted; as per the RocksDB documentation it is possible for temporary
> > write amplification of up to 2x during this process. There have also been
> > reports of "write stalls" during this process [1].
> >
> > In Level compaction, only certain levels (tiers of SST files) are
> compacted
> > at once, meaning that the compaction process is shorter and less
> intensive,
> > but that write amplification after compaction finishes is higher than
> with
> > universal compaction.
> >
> > Can anyone confirm/deny/correct this?
> >
> > [1] https://github.com/solana-labs/solana/issues/14586 (not
> > Streams-related, but it is RocksDB)
> >
> > Thanks in advance,
> > Colt McNealy
> >
> > *Founder, LittleHorse.dev*
> >
>

Re: Streams/RocksDB: Why Universal Compaction?

Reply via email to