Re: Solr indexing configuration help

Gaku Mak Thu, 29 May 2008 20:11:23 -0700

Looking further at the java error, those crashes are mostly related to GC.


VM_Operation (0x0000000041b429e0): parallel gc failed allocation, mode:
safepoint, requested by thread 0x00002aab1988c400

I'm following the
http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/gbyzo.html 
and see if their workaround would do the trick, if not, I would try it on a
different(better) machine.

Thanks!

-Gaku


Yonik Seeley wrote:
> 
> It's most likely a
> 1) hardware issue: bad memory
>  OR
> 2) incompatible libraries (most likely libc version for the JVM).
> 
> If you have another box around, try that.
> 
> -Yonik
> 
> On Thu, May 29, 2008 at 9:51 PM, Gaku Mak <[EMAIL PROTECTED]> wrote:
>>
>> Hi Yonik and others,
>>
>> I'm getting this java error after switching to JVM 1.6.0_3.  This error
>> occurs after the stress test has been going for a while and failed at 12K
>> docs level and at 18K again.  Am I doing something wrong?  Please help!
>>
>> Thanks!
>>
>> #
>> # An unexpected error has been detected by Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x00002aaaaadfbf6d, pid=25030, tid=1079175504
>> #
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.6.0_03-b05 mixed mode)
>> # Problematic frame:
>> # V  [libjvm.so+0x230f6d]
>> #
>> # An error report file with more information is saved as
>> hs_err_pid25030.log
>> #
>> # If you would like to submit a bug report, please visit:
>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>> #
>>
>> -Gaku
>>
>>
>> Yonik Seeley wrote:
>>>
>>> On Wed, May 28, 2008 at 10:30 PM, Gaku Mak <[EMAIL PROTECTED]> wrote:
>>>> I used the admin GUI to get the java info.
>>>> java.vm.specification.vendor = Sun Microsystems Inc.
>>> Well, your original email listed IcedTea... but that is mostly Sun
>>> code,  so maybe that's why the vendor is still listed as Sun.
>>>
>>> I'd recommend downloading1.6.0_3 from java.sun.com and trying that.
>>>
>>> Later versions (1.6.0_04+) have a JVM bug that bites Lucene, so stick
>>> with 1.6.0_03 for now.
>>>
>>> -Yonik
>>>
>>>
>>>> Any suggestion?  Thanks a lot for your help!!
>>>>
>>>> -Gaku
>>>>
>>>>
>>>> Yonik Seeley wrote:
>>>>>
>>>>> Not sure why you would be getting an OOM from just indexing, and with
>>>>> the 1.5G heap you've given the JVM.
>>>>> Have you tried Sun's JVM?
>>>>>
>>>>> -Yonik
>>>>>
>>>>> On Wed, May 28, 2008 at 7:35 PM, gaku113 <[EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>> Hi all Solr users/developers/experts,
>>>>>>
>>>>>> I have the following scenario and I appreciate any advice for tuning
>>>>>> my
>>>>>> solr
>>>>>> master server.
>>>>>>
>>>>>> I have a field in my schema that would index (but not stored) about
>>>>>> ~10000
>>>>>> ids for each document.  This field is expected to govern the size of
>>>>>> the
>>>>>> document.  Each id can contain up to 6 characters.  I figure that
>>>>>> there
>>>>>> are
>>>>>> two alternatives for this field, one is the use a string multi-valued
>>>>>> field,
>>>>>> and the other would be to pass a white-space-delimited string to solr
>>>>>> and
>>>>>> have solr tokenize such string based on whitespace (the text_ws
>>>>>> fieldType).
>>>>>> The master server is expected to receive constant stream of updates.
>>>>>>
>>>>>> The expected/estimated document size can range from 50k to 100k for a
>>>>>> single
>>>>>> document.  (I know this is quite large). The number of documents is
>>>>>> expected
>>>>>> to be around 200,000 on each master server, and there can be multiple
>>>>>> master
>>>>>> servers (sharding).  I wish the master can handle more docs too if I
>>>>>> can
>>>>>> figure a way out.
>>>>>>
>>>>>> Currently, I'm performing some basic stress tests to simulate the
>>>>>> indexing
>>>>>> side on the master server.  This stress test would continuously add
>>>>>> new
>>>>>> documents at the rate of about 10 documents every 30 seconds.
>>>>>> Autocommit
>>>>>> is
>>>>>> being used (50 docs and 180 seconds constraints), but I have no idea
>>>>>> if
>>>>>> this
>>>>>> is the preferred way.  The goal is to keep adding new documents until
>>>>>> we
>>>>>> can
>>>>>> get at least 200,000 documents (or about 20GB of index) on the master
>>>>>> (or
>>>>>> even more if the server can handle it)
>>>>>>
>>>>>> What I experienced from the indexing stress test is that the master
>>>>>> server
>>>>>> failed to respond after a while, such as non-pingable when there are
>>>>>> about
>>>>>> 30k documents.  When looking at the log, they are mostly:
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>> OR
>>>>>> Ping query caused exception: null (this is probably caused by the OOM
>>>>>> problem)
>>>>>>
>>>>>> There were also a few cases that the java process even went away.
>>>>>>
>>>>>> Questions:
>>>>>> 1)      Is it better to use the multi-valued string field or the
>>>>>> text_ws
>>>>>> field
>>>>>> for this large field?
>>>>>> 2)      Is it better to have more outstanding docs per commit or more
>>>>>> frequent
>>>>>> commit, in term of maximizing server resources?  What is the
>>>>>> preferred
>>>>>> way
>>>>>> to commit documents assuming that solr master receives updates
>>>>>> frequently?
>>>>>> How many updated docs should there be before issuing a commit?
>>>>>> 3)      How to avoid the OOM problem in my case? I'm already doing
>>>>>> (-Xms1536M
>>>>>> -Xmx1536M) on a 2-GB machine. Is that not enough?  I'm concerned that
>>>>>> adding
>>>>>> more Ram would just delay the OOM problem.  Any additional JVM option
>>>>>> to
>>>>>> consider?
>>>>>> 4)      Any recommendation for the master server configuration, in a
>>>>>> sense that I
>>>>>> can maximize the number of indexed docs?
>>>>>> 5)      How can it disable caching on the master altogether as
>>>>>> queries
>>>>>> won't hit
>>>>>> the master?
>>>>>> 6)      For an average doc size of 50k-100k, is that too large for
>>>>>> solr,
>>>>>> or even
>>>>>> solr is the right tool? If not, any alternative?  If we are able to
>>>>>> reduce
>>>>>> the size of docs, can we expect to index more documents?
>>>>>>
>>>>>> The followings are info related to software/hardware/configuration:
>>>>>>
>>>>>> Solr version (solr nightly build on 5/23/2008)
>>>>>>        Solr Specification Version: 1.2.2008.05.23.08.06.59
>>>>>>        Solr Implementation Version: nightly
>>>>>>        Lucene Specification Version: 2.3.2
>>>>>>        Lucene Implementation Version: 2.3.2 652650
>>>>>>        Jetty: 6.1.3
>>>>>>
>>>>>> Schema.xml (the section that I think are relevant to the master
>>>>>> server.)
>>>>>>
>>>>>>    <fieldType name="string" class="solr.StrField"
>>>>>> sortMissingLast="true"
>>>>>> omitNorms="true"/>
>>>>>>    <fieldType name="text_ws" class="solr.TextField"
>>>>>> positionIncrementGap="100">
>>>>>>      <analyzer>
>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>      </analyzer>
>>>>>>    </fieldType>
>>>>>>
>>>>>> <field name="id" type="string" indexed="true" stored="true"
>>>>>> required="true"
>>>>>> />
>>>>>> <field name="hex_id_multi" type="string" indexed="true"
>>>>>> stored="false"
>>>>>> multiValued="true" omitNorms="true"/>
>>>>>>        <field name="hex_id_string" type="text_ws" indexed="true"
>>>>>> stored="false"
>>>>>> omitNorms="true"/>
>>>>>>
>>>>>> <uniqueKey>id</uniqueKey>
>>>>>>
>>>>>> Solrconfig.xml
>>>>>>  <indexDefaults>
>>>>>>    <useCompoundFile>false</useCompoundFile>
>>>>>>    <mergeFactor>10</mergeFactor>
>>>>>>    <maxBufferedDocs>500</maxBufferedDocs>
>>>>>>    <ramBufferSizeMB>50</ramBufferSizeMB>
>>>>>>    <maxMergeDocs>5000</maxMergeDocs>
>>>>>>    <maxFieldLength>20000</maxFieldLength>
>>>>>>    <writeLockTimeout>1000</writeLockTimeout>
>>>>>>    <commitLockTimeout>10000</commitLockTimeout>
>>>>>>
>>>>>> <mergePolicy>org.apache.lucene.index.LogByteSizeMergePolicy</mergePolicy>
>>>>>> <mergeScheduler>org.apache.lucene.index.ConcurrentMergeScheduler</mergeScheduler>
>>>>>>    <lockType>single</lockType>
>>>>>>  </indexDefaults>
>>>>>>
>>>>>>  <mainIndex>
>>>>>>    <useCompoundFile>false</useCompoundFile>
>>>>>>    <ramBufferSizeMB>50</ramBufferSizeMB>
>>>>>>    <mergeFactor>10</mergeFactor>
>>>>>>    <!-- Deprecated -->
>>>>>>    <maxBufferedDocs>500</maxBufferedDocs>
>>>>>>    <maxMergeDocs>5000</maxMergeDocs>
>>>>>>    <maxFieldLength>20000</maxFieldLength>
>>>>>>    <unlockOnStartup>false</unlockOnStartup>
>>>>>>  </mainIndex>
>>>>>>  <updateHandler class="solr.DirectUpdateHandler2">
>>>>>>
>>>>>>    <autoCommit>
>>>>>>      <maxDocs>50</maxDocs>
>>>>>>      <maxTime>180000</maxTime>
>>>>>>    </autoCommit>
>>>>>>    <listener event="postCommit" class="solr.RunExecutableListener">
>>>>>>      <str name="exe">solr/bin/snapshooter</str>
>>>>>>      <str name="dir">.</str>
>>>>>>      <bool name="wait">true</bool>
>>>>>>    </listener>
>>>>>>  </updateHandler>
>>>>>>
>>>>>>  <query>
>>>>>>    <maxBooleanClauses>50</maxBooleanClauses>
>>>>>>    <filterCache
>>>>>>      class="solr.LRUCache"
>>>>>>      size="0"
>>>>>>      initialSize="0"
>>>>>>      autowarmCount="0"/>
>>>>>>    <queryResultCache
>>>>>>      class="solr.LRUCache"
>>>>>>      size="0"
>>>>>>      initialSize="0"
>>>>>>      autowarmCount="0"/>
>>>>>>    <documentCache
>>>>>>      class="solr.LRUCache"
>>>>>>      size="0"
>>>>>>      initialSize="0"
>>>>>>      autowarmCount="0"/>
>>>>>>    <enableLazyFieldLoading>true</enableLazyFieldLoading>
>>>>>>
>>>>>>    <queryResultWindowSize>1</queryResultWindowSize>
>>>>>>    <queryResultMaxDocsCached>1</queryResultMaxDocsCached>
>>>>>>    <HashDocSet maxSize="1000" loadFactor="0.75"/>
>>>>>>    <listener event="newSearcher" class="solr.QuerySenderListener">
>>>>>>      <arr name="queries">
>>>>>>        <lst> <str name="q">user_id</str> <str name="start">0</str>
>>>>>> <str
>>>>>> name="rows">1</str> </lst>
>>>>>>        <lst><str name="q">static newSearcher warming query from
>>>>>> solrconfig.xml</str></lst>
>>>>>>      </arr>
>>>>>>    </listener>
>>>>>>    <listener event="firstSearcher" class="solr.QuerySenderListener">
>>>>>>      <arr name="queries">
>>>>>>        <lst> <str name="q">fast_warm</str> <str name="start">0</str>
>>>>>> <str
>>>>>> name="rows">10</str> </lst>
>>>>>>        <lst><str name="q">static firstSearcher warming query from
>>>>>> solrconfig.xml</str></lst>
>>>>>>      </arr>
>>>>>>    </listener>
>>>>>>    <useColdSearcher>false</useColdSearcher>
>>>>>>    <maxWarmingSearchers>4</maxWarmingSearchers>
>>>>>>  </query>
>>>>>>
>>>>>> Replication:
>>>>>>        The snappuller is scheduled to run every 15 mins for now.
>>>>>>
>>>>>> Hardware:
>>>>>>        AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive
>>>>>>
>>>>>> OS:
>>>>>>        Fedora 8 (64-bit)
>>>>>>
>>>>>> JVM version:
>>>>>>        java version "1.7.0"
>>>>>> IcedTea Runtime Environment (build 1.7.0-b21)
>>>>>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
>>>>>>
>>>>>> Java options:
>>>>>>        java  -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M
>>>>>> -XX:+UseParallelGC -jar start.jar
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17550056.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17550792.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr indexing configuration help

Reply via email to