RocksDB efficiency and keyby

2022-04-20 Thread Trystan
Hello, We have a job where its main purpose is to track whether or not we've previously seen a particular event - that's it. If it's new, we save it to an external database. If we've seen it, we block the write. There's a 3-day TTL to manage the state size. The downstream db can tolerate new data

Re: RocksDB efficiency and keyby

2022-04-20 Thread Yaroslav Tkachenko
Hey Trystan, Based on my personal experience, good disk IO for RocksDB matters a lot. Are you using the fastest SSD storage you can get for RocskDB folders? For example, when running on GCP, we noticed *10x* throughput improvement by switching RocksDB storage to https://cloud.google.com/compute/d

Re: RocksDB efficiency and keyby

2022-04-20 Thread Trystan
Thanks for the info! We're running EBS gp2 volumes... awhile back we tested local SSDs with a different job and didn't notice any gains, but that was likely due to an under-optimized job where the bottleneck was elsewhere On Wed, Apr 20, 2022, 11:08 AM Yaroslav Tkachenko wrote: > Hey Trystan, >

Re: RocksDB efficiency and keyby

2022-04-20 Thread Yaroslav Tkachenko
Yep, I'd give it another try. EBS could be too slow in some use-cases. On Wed, Apr 20, 2022 at 9:39 AM Trystan wrote: > Thanks for the info! We're running EBS gp2 volumes... awhile back we > tested local SSDs with a different job and didn't notice any gains, but > that was likely due to an under

Re: RocksDB efficiency and keyby

2022-04-21 Thread Yun Tang
ttps://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#state-backend-rocksdb-memory-partitioned-index-filters Best Yun Tang From: Yaroslav Tkachenko Sent: Thursday, April 21, 2022 0:44 To: Trystan Cc: user Subject: Re: RocksDB effici