[jira] Commented: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

Lars Kotthoff (JIRA) Mon, 28 Jul 2008 18:51:34 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617648#action_12617648
 ]


Lars Kotthoff commented on SOLR-665:
------------------------------------

bq. Not everything in JAVA is extremely good: for instance, synchronization. 
Even for single-threaded application, it needs additionally 600 CPU cycles 
(I've read it somewhere for SUN Java 1.3 on Windows)

That's probably not true for modern JVMs though -- cf. 
http://www.ibm.com/developerworks/library/j-jtp04223.html.

bq. ...9x times performance boost...

How did you measure that exactly? The Solr admin pages will *not* give you 
exact measurements. Could you describe the test setup in detail? I'm guessing 
that you're caching the results of all queries in memory such that no disk 
access is necessary. Are you using highlighting or anything else that might be 
CPU-intensive at all? From my personal experience with Solr I wouldn't expect 
synchronization for the caches to be that big of a performance penalty. In some 
of my tests with a several GB index where all results where cached and 
highlighting was turned on I've seen throughputs in excess of 400 searches per 
second. I think that the performance bottleneck in this case was the network 
interface for sending the replies.

bq. absolutely no need to synchronize get() method for FIFO!

Consider the following case: thread A performs a synchronized put, thread B 
performs an unsynchronized get on the same key. B gets scheduled before A 
completes, the returned value will be undefined. Yes, we could do sanity checks 
to minimise these cases, but that would probably end up being more expensive 
than the synchronization.

bq. From JavaDoc: "Note that this implementation is not synchronized. If 
multiple threads access a linked hash map concurrently, and at least
one of the threads modifies the map structurally, it must be synchronized 
externally."

That's exactly the case here -- the update thread modifies the map 
structurally! It doesn't do this at all times, probably even never after the 
cache has been populated, but there's no way to *know* for sure unless you 
*explicitely* remove the put method.

I'm not convinced that we should change the current implementation for the 
following reasons:
* Concurrency is traditionally a discipline which is very hard to get right. 
Furthermore the serious bugs tend to show up only when you really get race 
conditions and the like, i.e. when the machine is under heavy load and any 
disruption will hit you seriously.
* You've already started to amend your implementation with sanity checks and 
the like -- as I've said before, this might end up being more expensive than 
synchronization.
* A FIFO cache might become a bottleneck itself -- if the cache is very large 
and the most frequently accessed item is inserted just after the cache is 
created, all accesses will need to traverse all the other entries before 
getting that item.

That said, if you can show conclusively (e.g. with a profiler) that the 
synchronized access is indeed the bottleneck and incurs a heavy penalty on 
performance, then I'm all for investigating this further.

> FIFO Cache (Unsynchronized): 9x times performance boost
> -------------------------------------------------------
>
>                 Key: SOLR-665
>                 URL: https://issues.apache.org/jira/browse/SOLR-665
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>         Environment: JRockit R27 (Java 6)
>            Reporter: Fuad Efendi
>         Attachments: FIFOCache.java
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Attached is modified version of LRUCache where 
> 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
> "reordering"/true (performance bottleneck of LRU) is replaced to 
> "insertion-order"/false (so that it became FIFO)
> 2. Almost all (absolutely unneccessary) synchronized statements commented out
> See discussion at 
> http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
> Performance metrics (taken from SOLR Admin):
> LRU
> Requests: 7638
> Average Time-Per-Request: 15300
> Average Request-per-Second: 0.06
> FIFO:
> Requests: 3355
> Average Time-Per-Request: 1610
> Average Request-per-Second: 0.11
> Performance increased 9 times which roughly corresponds to a number of CPU in 
> a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
> Current number of documents: 7494689
> name:          filterCache  
> class:        org.apache.solr.search.LRUCache  
> version:      1.0  
> description:  LRU Cache(maxSize=10000000, initialSize=1000)  
> stats:        lookups : 15966954582
> hits : 16391851546
> hitratio : 0.102
> inserts : 4246120
> evictions : 0
> size : 2668705
> cumulative_lookups : 16415839763
> cumulative_hits : 16411608101
> cumulative_hitratio : 0.99
> cumulative_inserts : 4246246
> cumulative_evictions : 0 
> Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

Reply via email to