I've been doing some performance analysis of a spacial search use case I'm
implementing in Solr 4.3.0. Basically I'm seeing search times alot higher
than I'd like them to be and I'm hoping people may have some suggestions
for how to optimize further.

Here are the specs of what I'm doing now:

Machine:
- 16 cores @ 2.8ghz
- 256gb RAM
- 1TB (RAID 1+0 on 10 SSD)

Content:
- 45M docs (not very big only a few fields with no large textual content)
- 1 geo field (using config below)
- index is 12gb
- 1 shard
- Using MMapDirectory

Field config:

<fieldType name="geo" class="solr.SpatialRecursivePrefixTreeFieldType"
distErrPct="0.025" maxDistErr="0.00045"
spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
units="degrees"/>

<field  name="geopoint" indexed="true" multiValued="false"
required="false" stored="true" type="geo"/>


What I've figured out so far:

- Most of my time (98%) is being spent in
java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
driven by BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
which from what I gather is basically reading terms from the .tim file
in blocks

- I moved from Java 1.6 to 1.7 based upon what I read here:
http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/
and it definitely had some positive impact (i haven't been able to
measure this independantly yet)

- I changed maxDistErr from 0.000009 (which is 1m precision per docs)
to 0.00045 (50m precision) ..

- It looks to me that the .tim file are being memory mapped fully (ie
they show up in pmap output) the virtual size of the jvm is ~18gb
(heap is 6gb)

- I've optimized the index but this doesn't have a dramatic impact on
performance

Changing the precision and the JVM upgrade yielded a drop from ~18s
avg query time to ~9s avg query time.. This is fantastic but I want to
get this down into the 1-2 second range.

At this point it seems that basically i am bottle-necked on basically
copying memory out of the mapped .tim file which leads me to think
that the only solution to my problem would be to read less data or
somehow read it more efficiently..

If anyone has any suggestions of where to go with this I'd love to know


thanks,

steve

Reply via email to