Is it a good query performance with this data size ?

wwang525 Tue, 18 Aug 2015 08:54:55 -0700

Hi All,

I am working on a search service based on Solr (v5.1.0). The data size is 15
M records. The size of the index files is 860MB. The test was performed on a
local machine that has 8 cores with 32 G memory and CPU is 3.4Ghz (Intel
Core i7-3770).

I found out that setting docValues=true for faceting and grouping indeed
boosted the performance with first-time search under cold cache scenario.
For example, with our requests that use all the features like grouping,
sorting, faceting, I found the difference of faceting alone can be as much
as 300 ms.

However, response time for the same request executed the second time seems
to be at the same level whether the setting of docValues is true or false.
Still, I set up docValues=true for all the faceting properties.

The following are what I have observed:

(1) Test single request one-by-one (no load)

With a cold cache, I execute randomly generated queries one after another.
The first query routinely exceed 1 second, but not usually more than 2
seconds. I continue to generate random requests, and execute the queries
one-by-one, the response time normally stabilized at the range of 500 ms. It
does not seem to improve more as I continue execute randomly generated
queries.

(2) Load test with randomly generated requests

Under load test scenario (each core takes 4 requests per second, and
continue for 20 round), I can see the CPU usage jumped, and the earlier
requests usually got much longer response time, they may even exceed 5
seconds. However, the CPU usage pattern will then changed to the SAW shape,
and the response time will drop, and I can see that the requests got
executed faster and faster. I usually gets an average response time around 1
second.

If I execute a load test again, the average response time will continue
drop. However, it stays at about 500 ms/per request under this load if I try
more tests.

These are the best results so far.

I understand that the requests were all different, so it can not be compared
with the case where I execute the same query twice (usually give me a
response time around 150 ms).

In production environment, many requests may be very similar so that the
filter queries will be executed faster. However, these tests generate all
random requests, and is different than that of production environment.

In addition, the feature of "warming up cache" may not be applicable to my
test scenarios due to randomly generated requests for all tests.

I tried to use other search solutions, and the performance was not good.
That was why I tried to use Solr. Now that I am using Solr, I would like to
know In a typical Solr project:

(1) if it is a good response time for this data size without taking too much
advantage of cache?
(2) if it is possible to improve even further without data sharding? For
example, to get an average of less than 200 ms response time

Additional information to share:
(1) The tests were done when the Solr instance was not indexing. CPU was
dedicated to the test and RAM was enough.

(2) most of the setting in solrconfig.xml are default. However, cache
setting were modified.
Note, I think the autowarmCount setting may not be very beneficial to my
tests due to randomly generated requests. However, I still got >50% hit
ratio for filter queries. This is due to the limited values for some filter
queries.

Thanks

--
View this message in context:
http://lucene.472066.n3.nabble.com/Is-it-a-good-query-performance-with-this-data-size-tp4223699.html
Sent from the Solr - User mailing list archive at Nabble.com.

Is it a good query performance with this data size ?

Reply via email to