On Tue, 2 Feb 2021 12:19:22 GMT, djelinski <github.com+30433125+djelin...@openjdk.org> wrote:
>> If I get the patch right, the benchmark performance improvement is a >> trade-off between CPU and memory, by keeping expired entries while putting a >> new entry in the cache. I'm not very sure of the performance impact on >> memory and GC collections. Would you mind add two more types of benchmark: >> get the entries and remove the entries, for cases that there are 1/10, 1/4, >> 1/3 and 1/2 expired entries in the cache? And increase the size to some big >> scales, like 2M and 20M. >> >> It looks like a spec update as it may change the behavior of a few JDK >> components (TLS session cache, OCSP stapling response cache, cert store >> cache, certificate factory, etc), because of "expired entries are no longer >> guaranteed to be removed before live ones". I'm not very sure of the >> impact. I may suggest to file a CSR and have more eyes to check the >> compatibility impact before moving forward. > >> the benchmark performance improvement is a trade-off between CPU and memory, >> by keeping expired entries while putting a new entry in the cache > > Not exactly. The memory use is capped by cache size. The patch is a trade off > between the cache's hit/miss ratio and CPU; we will get faster cache access > at the cost of more frequent cache misses. > > All calls to `put()` remove expired items from the front of the queue, and > never perform a full scan. `get()` calls shuffle the queue, moving the > accessed item to the back. Compare this to original code where `put()` only > removed expired items when the cache overflowed, and scanned the entire cache. > Let me give some examples. > **Example 1**: insertions at a fast pace leading to cache overflows and no > expirations. Here the new implementation improves performance. Consider a > cache with size=4, timeout=10, and the following sequence of events: > T=1, put(1); > T=2, put(2); > T=3, put(3); > T=4, put(4); > Cache contents after these calls (same in old and new scenario). Queue order: > least recently accessed items on the left, most recently accessed on the > right. _K_ denotes cache key, _exp_ denotes entry expiration time and is > equal to insertion time _T_ plus timeout: > > |K=1, exp=11|K=2, exp=12|K=3, exp=13|K=4, exp=14| > > If we now add another item to the queue, it will overflow. Here's where the > implementations behave differently, but the outcome is identical: old one > scans the entire list for expired entries; new one improves performance by > ending the search for expired entries after encountering the first > non-expired entry (which is the first entry in the above example). The end > result is the same in both cases - oldest (least recently accessed) item is > dropped: > T=5, put(5) > > |K=2, exp=12|K=3, exp=13|K=4, exp=14|K=5, exp=15| > > **Example 2**: insertions at a moderate pace, with interleaved reads. Here > the new implementation improves performance, but at a possible cost of > wasting cache capacity on expired entries. Consider a cache with size=4, > timeout=7, and the following sequence of events: > T=1, put(1); > T=3, put(3); > T=5, put(5); > T=7, put(7); > T=7, get(1); > Cache contents after these calls: > > |K=3, exp=10|K=5, exp=12|K=7, exp=14|K=1, exp=8| > > `get(1)` operation moved item with K=1 to the back of the queue. > > If we wait for item with K=1 to expire and then add another item to the > queue, it will overflow. Here's where the implementations behave differently, > and the outcome is different: old one scans the entire list for expired > entries, finds entry with K=1 and drops it; new one gives up after first > non-expired entry (which is the first entry), and drops the first entry. > > So, when we perform: > T=9, put(9); > > Old implementation will get: > |K=3, exp=10|K=5, exp=12|K=7, exp=14|K=9, exp=16| > > New implementation will get: > |K=5, exp=12|K=7, exp=14|K=1, exp=8(expired)|K=9, exp=16| > > Note that: > - an attempt to retrieve expired item (i.e. `get(1)`) will immediately remove > that item from cache, making room for other items > - retrieving a non-expired item will move it to the back of the queue, behind > all expired items > > **Example 3**: insertions at a slow pace, where most items expire before > queue overflows. Here the new implementation improves memory consumption. > Consider a cache with size=4, timeout=1, and the following sequence of events: > T=1, put(1); > T=3, put(3); > T=5, put(5); > T=7, put(7); > Every cache item is expired at then point when a new one is added. Old > implementation only removes expired entries when cache overflows, so all > entries will still be there: > > |K=1, exp=2(expired)|K=3, exp=4(expired)|K=5, exp=6(expired)|K=7, exp=8| > > New implementation removes expired entries on every put, so after the last > put only one entry is in the cache: > > |K=7, exp=8| > > After another put the old implementation will encounter a cache overflow and > remove all expired items. > > Let me know if that helped. > >> add two more types of benchmark: get the entries and remove the entries > > Both these operations are constant-time, both before and after my changes. Do > you expect to see some oddities here, or do we just want a benchmark that > could be used to compare other implementations? > >> increase the size to some big scales, like 2M and 20M > > Can do. Do you think it makes sense to also benchmark the scenario where GC > kicks in and collects soft references? > >> it may change the behavior of a few JDK components > > Of all uses of Cache, only `SSLSessionContextImpl` (TLS session cache), > `StatusResponseManager` (OCSP stapling) and `LDAPCertStoreImpl` (I'm not > familiar with that one) set expiration timeout; when the timeout is not set, > the behavior is exactly the same as before. > `StatusResponseManager` is constantly querying the same keys, and is > liberally sized, so I don't expect much of an impact. > TLS session cache changes may result in fewer session resumptions and more > full handshakes; I expect the cache performance improvement to more than > offset the CPU cycles lost on full handshakes. > > How do I file a CSR? > > Also, what do you think about the changes done in `Do not invalidate objects > before GC` 5859a03 commit? They offer a minor performance improvement, but if > clearing the values before GC is an important security feature of this cache, > I'm prepared to drop that commit. Added benchmarks for get & remove. Added tests for 5M cache size. Switched time units to nanoseconds. Results: Benchmark (size) (timeout) Mode Cnt Score Error Units CacheBench.get 20480 86400 avgt 25 62.999 ? 2.017 ns/op CacheBench.get 20480 0 avgt 25 41.519 ? 1.113 ns/op CacheBench.get 204800 86400 avgt 25 67.995 ? 4.530 ns/op CacheBench.get 204800 0 avgt 25 46.439 ? 2.222 ns/op CacheBench.get 5120000 86400 avgt 25 72.516 ? 0.759 ns/op CacheBench.get 5120000 0 avgt 25 53.471 ? 0.491 ns/op CacheBench.put 20480 86400 avgt 25 117.117 ? 3.424 ns/op CacheBench.put 20480 0 avgt 25 73.582 ? 1.484 ns/op CacheBench.put 204800 86400 avgt 25 116.983 ? 0.743 ns/op CacheBench.put 204800 0 avgt 25 73.945 ? 0.515 ns/op CacheBench.put 5120000 86400 avgt 25 230.878 ? 7.582 ns/op CacheBench.put 5120000 0 avgt 25 192.526 ? 7.048 ns/op CacheBench.remove 20480 86400 avgt 25 39.048 ? 2.036 ns/op CacheBench.remove 20480 0 avgt 25 36.293 ? 0.281 ns/op CacheBench.remove 204800 86400 avgt 25 43.899 ? 0.895 ns/op CacheBench.remove 204800 0 avgt 25 43.046 ? 0.759 ns/op CacheBench.remove 5120000 86400 avgt 25 51.896 ? 0.640 ns/op CacheBench.remove 5120000 0 avgt 25 51.537 ? 0.536 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/2255