Re: Solr indexing configuration help

Gaku Mak Thu, 29 May 2008 18:52:24 -0700

Hi Yonik and others,

I'm getting this java error after switching to JVM 1.6.0_3.  This error
occurs after the stress test has been going for a while and failed at 12K
docs level and at 18K again.  Am I doing something wrong?  Please help!


Thanks!

#
# An unexpected error has been detected by Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00002aaaaadfbf6d, pid=25030, tid=1079175504
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.6.0_03-b05 mixed mode)
# Problematic frame:
# V  [libjvm.so+0x230f6d]
#
# An error report file with more information is saved as hs_err_pid25030.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

-Gaku


Yonik Seeley wrote:
> 
> On Wed, May 28, 2008 at 10:30 PM, Gaku Mak <[EMAIL PROTECTED]> wrote:
>> I used the admin GUI to get the java info.
>> java.vm.specification.vendor = Sun Microsystems Inc.
> Well, your original email listed IcedTea... but that is mostly Sun
> code,  so maybe that's why the vendor is still listed as Sun.
> 
> I'd recommend downloading1.6.0_3 from java.sun.com and trying that.
> 
> Later versions (1.6.0_04+) have a JVM bug that bites Lucene, so stick
> with 1.6.0_03 for now.
> 
> -Yonik
> 
> 
>> Any suggestion?  Thanks a lot for your help!!
>>
>> -Gaku
>>
>>
>> Yonik Seeley wrote:
>>>
>>> Not sure why you would be getting an OOM from just indexing, and with
>>> the 1.5G heap you've given the JVM.
>>> Have you tried Sun's JVM?
>>>
>>> -Yonik
>>>
>>> On Wed, May 28, 2008 at 7:35 PM, gaku113 <[EMAIL PROTECTED]> wrote:
>>>>
>>>> Hi all Solr users/developers/experts,
>>>>
>>>> I have the following scenario and I appreciate any advice for tuning my
>>>> solr
>>>> master server.
>>>>
>>>> I have a field in my schema that would index (but not stored) about
>>>> ~10000
>>>> ids for each document.  This field is expected to govern the size of
>>>> the
>>>> document.  Each id can contain up to 6 characters.  I figure that there
>>>> are
>>>> two alternatives for this field, one is the use a string multi-valued
>>>> field,
>>>> and the other would be to pass a white-space-delimited string to solr
>>>> and
>>>> have solr tokenize such string based on whitespace (the text_ws
>>>> fieldType).
>>>> The master server is expected to receive constant stream of updates.
>>>>
>>>> The expected/estimated document size can range from 50k to 100k for a
>>>> single
>>>> document.  (I know this is quite large). The number of documents is
>>>> expected
>>>> to be around 200,000 on each master server, and there can be multiple
>>>> master
>>>> servers (sharding).  I wish the master can handle more docs too if I
>>>> can
>>>> figure a way out.
>>>>
>>>> Currently, I'm performing some basic stress tests to simulate the
>>>> indexing
>>>> side on the master server.  This stress test would continuously add new
>>>> documents at the rate of about 10 documents every 30 seconds. 
>>>> Autocommit
>>>> is
>>>> being used (50 docs and 180 seconds constraints), but I have no idea if
>>>> this
>>>> is the preferred way.  The goal is to keep adding new documents until
>>>> we
>>>> can
>>>> get at least 200,000 documents (or about 20GB of index) on the master
>>>> (or
>>>> even more if the server can handle it)
>>>>
>>>> What I experienced from the indexing stress test is that the master
>>>> server
>>>> failed to respond after a while, such as non-pingable when there are
>>>> about
>>>> 30k documents.  When looking at the log, they are mostly:
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> OR
>>>> Ping query caused exception: null (this is probably caused by the OOM
>>>> problem)
>>>>
>>>> There were also a few cases that the java process even went away.
>>>>
>>>> Questions:
>>>> 1)      Is it better to use the multi-valued string field or the
>>>> text_ws
>>>> field
>>>> for this large field?
>>>> 2)      Is it better to have more outstanding docs per commit or more
>>>> frequent
>>>> commit, in term of maximizing server resources?  What is the preferred
>>>> way
>>>> to commit documents assuming that solr master receives updates
>>>> frequently?
>>>> How many updated docs should there be before issuing a commit?
>>>> 3)      How to avoid the OOM problem in my case? I'm already doing
>>>> (-Xms1536M
>>>> -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that
>>>> adding
>>>> more Ram would just delay the OOM problem.  Any additional JVM option
>>>> to
>>>> consider?
>>>> 4)      Any recommendation for the master server configuration, in a
>>>> sense that I
>>>> can maximize the number of indexed docs?
>>>> 5)      How can it disable caching on the master altogether as queries
>>>> won't hit
>>>> the master?
>>>> 6)      For an average doc size of 50k-100k, is that too large for
>>>> solr,
>>>> or even
>>>> solr is the right tool? If not, any alternative?  If we are able to
>>>> reduce
>>>> the size of docs, can we expect to index more documents?
>>>>
>>>> The followings are info related to software/hardware/configuration:
>>>>
>>>> Solr version (solr nightly build on 5/23/2008)
>>>>        Solr Specification Version: 1.2.2008.05.23.08.06.59
>>>>        Solr Implementation Version: nightly
>>>>        Lucene Specification Version: 2.3.2
>>>>        Lucene Implementation Version: 2.3.2 652650
>>>>        Jetty: 6.1.3
>>>>
>>>> Schema.xml (the section that I think are relevant to the master
>>>> server.)
>>>>
>>>>    <fieldType name="string" class="solr.StrField"
>>>> sortMissingLast="true"
>>>> omitNorms="true"/>
>>>>    <fieldType name="text_ws" class="solr.TextField"
>>>> positionIncrementGap="100">
>>>>      <analyzer>
>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>      </analyzer>
>>>>    </fieldType>
>>>>
>>>> <field name="id" type="string" indexed="true" stored="true"
>>>> required="true"
>>>> />
>>>> <field name="hex_id_multi" type="string" indexed="true" stored="false"
>>>> multiValued="true" omitNorms="true"/>
>>>>        <field name="hex_id_string" type="text_ws" indexed="true"
>>>> stored="false"
>>>> omitNorms="true"/>
>>>>
>>>> <uniqueKey>id</uniqueKey>
>>>>
>>>> Solrconfig.xml
>>>>  <indexDefaults>
>>>>    <useCompoundFile>false</useCompoundFile>
>>>>    <mergeFactor>10</mergeFactor>
>>>>    <maxBufferedDocs>500</maxBufferedDocs>
>>>>    <ramBufferSizeMB>50</ramBufferSizeMB>
>>>>    <maxMergeDocs>5000</maxMergeDocs>
>>>>    <maxFieldLength>20000</maxFieldLength>
>>>>    <writeLockTimeout>1000</writeLockTimeout>
>>>>    <commitLockTimeout>10000</commitLockTimeout>
>>>>
>>>> <mergePolicy>org.apache.lucene.index.LogByteSizeMergePolicy</mergePolicy>
>>>> <mergeScheduler>org.apache.lucene.index.ConcurrentMergeScheduler</mergeScheduler>
>>>>    <lockType>single</lockType>
>>>>  </indexDefaults>
>>>>
>>>>  <mainIndex>
>>>>    <useCompoundFile>false</useCompoundFile>
>>>>    <ramBufferSizeMB>50</ramBufferSizeMB>
>>>>    <mergeFactor>10</mergeFactor>
>>>>    <!-- Deprecated -->
>>>>    <maxBufferedDocs>500</maxBufferedDocs>
>>>>    <maxMergeDocs>5000</maxMergeDocs>
>>>>    <maxFieldLength>20000</maxFieldLength>
>>>>    <unlockOnStartup>false</unlockOnStartup>
>>>>  </mainIndex>
>>>>  <updateHandler class="solr.DirectUpdateHandler2">
>>>>
>>>>    <autoCommit>
>>>>      <maxDocs>50</maxDocs>
>>>>      <maxTime>180000</maxTime>
>>>>    </autoCommit>
>>>>    <listener event="postCommit" class="solr.RunExecutableListener">
>>>>      <str name="exe">solr/bin/snapshooter</str>
>>>>      <str name="dir">.</str>
>>>>      <bool name="wait">true</bool>
>>>>    </listener>
>>>>  </updateHandler>
>>>>
>>>>  <query>
>>>>    <maxBooleanClauses>50</maxBooleanClauses>
>>>>    <filterCache
>>>>      class="solr.LRUCache"
>>>>      size="0"
>>>>      initialSize="0"
>>>>      autowarmCount="0"/>
>>>>    <queryResultCache
>>>>      class="solr.LRUCache"
>>>>      size="0"
>>>>      initialSize="0"
>>>>      autowarmCount="0"/>
>>>>    <documentCache
>>>>      class="solr.LRUCache"
>>>>      size="0"
>>>>      initialSize="0"
>>>>      autowarmCount="0"/>
>>>>    <enableLazyFieldLoading>true</enableLazyFieldLoading>
>>>>
>>>>    <queryResultWindowSize>1</queryResultWindowSize>
>>>>    <queryResultMaxDocsCached>1</queryResultMaxDocsCached>
>>>>    <HashDocSet maxSize="1000" loadFactor="0.75"/>
>>>>    <listener event="newSearcher" class="solr.QuerySenderListener">
>>>>      <arr name="queries">
>>>>        <lst> <str name="q">user_id</str> <str name="start">0</str> <str
>>>> name="rows">1</str> </lst>
>>>>        <lst><str name="q">static newSearcher warming query from
>>>> solrconfig.xml</str></lst>
>>>>      </arr>
>>>>    </listener>
>>>>    <listener event="firstSearcher" class="solr.QuerySenderListener">
>>>>      <arr name="queries">
>>>>        <lst> <str name="q">fast_warm</str> <str name="start">0</str>
>>>> <str
>>>> name="rows">10</str> </lst>
>>>>        <lst><str name="q">static firstSearcher warming query from
>>>> solrconfig.xml</str></lst>
>>>>      </arr>
>>>>    </listener>
>>>>    <useColdSearcher>false</useColdSearcher>
>>>>    <maxWarmingSearchers>4</maxWarmingSearchers>
>>>>  </query>
>>>>
>>>> Replication:
>>>>        The snappuller is scheduled to run every 15 mins for now.
>>>>
>>>> Hardware:
>>>>        AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive
>>>>
>>>> OS:
>>>>        Fedora 8 (64-bit)
>>>>
>>>> JVM version:
>>>>        java version "1.7.0"
>>>> IcedTea Runtime Environment (build 1.7.0-b21)
>>>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
>>>>
>>>> Java options:
>>>>        java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
>>>> -XX:+UseParallelGC -jar start.jar
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17550056.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr indexing configuration help

Reply via email to