On 11/29/2012 8:29 AM, Daniel Exner wrote:
I'll answer both your mails in one.

Shawn Heisey wrote:
On 11/29/2012 3:15 AM, Daniel Exner wrote:
I'm currently doing some benchmarking of a real Solr 3.3 instance vs
the same ported to Solr 4.0.
[..]
In the graph you can see high CPU load, all the time. This is even the
case if I reduce the QPS down to 5, so CPU is no good metric for
comparison between Solr 3.3 and 4.0 (at least on this machine).
The missing memory data is due to the PerfMon JMeter Plugin having
time-outs sometimes.

You can also see no real increase in latency when pushing data into
the index. This is puzzling me, as rumours say one should not push new
data while under high load, as this would hurt query performance.

I don't see any attachments, or any links to external attachments, so I
can't see the graph.  I can only make general statements, and I can't
guarantee that they'll even be applicable to your scenario.  You may
need to use an external attachment service and just send us a link.
Indeed, it seems like the mailing list daemon scrubbed my attachement. I dropped it into my Dropbox, here http://db.tt/EjYCqbpn

Are you seeing lower performance, or just worried about the CPU load?
Solr4 should be able to handle concurrent indexing and querying better
than 3.x.  It is able to do things concurrently that were not possible
before.
In general I'm interested in how much better Solr 4 performs and if it may be feasonable to use less powerful machines to get the same low latency, or do more data pushes etc.

One way that performance improvements happen is that developers find
slow sections of code where the CPU is fairly idle, and rewrite them so
they are faster, but also exercise the CPU harder. When the new code
runs, CPU load goes higher, but it all runs faster.
Graphs show a slightly better latency for Solr 4.0 compared to 3.3, but not while pushing data.


Another note specifically related to this part: Have you used the same
configuration and done the minimal changes required to make it run, or
have you tried to update the config for 4.0 and its considerable list of
new features?  Did you start with a blank index on 4.0, or did you copy
the 3.3 index over?
I used the same configuration and did the minimal changes.
The first runs where using the same data from Solr 3.3 in Solr 4.0 (in fact it was even the same data dir..) but further runs used freshly filled different indices.

For best results, you'll want to ensure that Solr4 is working completely from scratch, that it has never seen a 3.3 index, so that it will use its own native format. It may be a good idea to look into the example Solr4 config/schema and see whether there are improvements you can make. One note: the updateLog feature in the update handler config will generally cause performance to be lower. The features that require updateLog would make this less of an apples to apples comparison, so I wouldn't enable it unless I knew I needed it.

Unless the lines are labelled wrong in the legend, the graph does show higher CPU usage during the push, but lower CPU usage during the optimize and most of the rest of the time.

The graph shows that Solr4 has lower latency than 3.3 during both the push and the optimize, as well as most of the rest of the time. The latency numbers however are a lot higher than I would expect, seeming to average out at around 100 seconds (100000 ms). That is terrible performance from both versions. On my own Solr installation, which is distributed and has 78 million documents, I have a median latency of 8 milliseconds and a 95th percentile latency of 248 milliseconds.

Is this a 64-bit platform with a 64-bit Java? How much memory have you allocated for the java heap? How big is the index?

Thanks,
Shawn

Reply via email to