Hi Yonik and others, I'm getting this java error after switching to JVM 1.6.0_3. This error occurs after the stress test has been going for a while and failed at 12K docs level and at 18K again. Am I doing something wrong? Please help!
Thanks! # # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00002aaaaadfbf6d, pid=25030, tid=1079175504 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.6.0_03-b05 mixed mode) # Problematic frame: # V [libjvm.so+0x230f6d] # # An error report file with more information is saved as hs_err_pid25030.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # -Gaku Yonik Seeley wrote: > > On Wed, May 28, 2008 at 10:30 PM, Gaku Mak <[EMAIL PROTECTED]> wrote: >> I used the admin GUI to get the java info. >> java.vm.specification.vendor = Sun Microsystems Inc. > Well, your original email listed IcedTea... but that is mostly Sun > code, so maybe that's why the vendor is still listed as Sun. > > I'd recommend downloading1.6.0_3 from java.sun.com and trying that. > > Later versions (1.6.0_04+) have a JVM bug that bites Lucene, so stick > with 1.6.0_03 for now. > > -Yonik > > >> Any suggestion? Thanks a lot for your help!! >> >> -Gaku >> >> >> Yonik Seeley wrote: >>> >>> Not sure why you would be getting an OOM from just indexing, and with >>> the 1.5G heap you've given the JVM. >>> Have you tried Sun's JVM? >>> >>> -Yonik >>> >>> On Wed, May 28, 2008 at 7:35 PM, gaku113 <[EMAIL PROTECTED]> wrote: >>>> >>>> Hi all Solr users/developers/experts, >>>> >>>> I have the following scenario and I appreciate any advice for tuning my >>>> solr >>>> master server. >>>> >>>> I have a field in my schema that would index (but not stored) about >>>> ~10000 >>>> ids for each document. This field is expected to govern the size of >>>> the >>>> document. Each id can contain up to 6 characters. I figure that there >>>> are >>>> two alternatives for this field, one is the use a string multi-valued >>>> field, >>>> and the other would be to pass a white-space-delimited string to solr >>>> and >>>> have solr tokenize such string based on whitespace (the text_ws >>>> fieldType). >>>> The master server is expected to receive constant stream of updates. >>>> >>>> The expected/estimated document size can range from 50k to 100k for a >>>> single >>>> document. (I know this is quite large). The number of documents is >>>> expected >>>> to be around 200,000 on each master server, and there can be multiple >>>> master >>>> servers (sharding). I wish the master can handle more docs too if I >>>> can >>>> figure a way out. >>>> >>>> Currently, I'm performing some basic stress tests to simulate the >>>> indexing >>>> side on the master server. This stress test would continuously add new >>>> documents at the rate of about 10 documents every 30 seconds. >>>> Autocommit >>>> is >>>> being used (50 docs and 180 seconds constraints), but I have no idea if >>>> this >>>> is the preferred way. The goal is to keep adding new documents until >>>> we >>>> can >>>> get at least 200,000 documents (or about 20GB of index) on the master >>>> (or >>>> even more if the server can handle it) >>>> >>>> What I experienced from the indexing stress test is that the master >>>> server >>>> failed to respond after a while, such as non-pingable when there are >>>> about >>>> 30k documents. When looking at the log, they are mostly: >>>> java.lang.OutOfMemoryError: Java heap space >>>> OR >>>> Ping query caused exception: null (this is probably caused by the OOM >>>> problem) >>>> >>>> There were also a few cases that the java process even went away. >>>> >>>> Questions: >>>> 1) Is it better to use the multi-valued string field or the >>>> text_ws >>>> field >>>> for this large field? >>>> 2) Is it better to have more outstanding docs per commit or more >>>> frequent >>>> commit, in term of maximizing server resources? What is the preferred >>>> way >>>> to commit documents assuming that solr master receives updates >>>> frequently? >>>> How many updated docs should there be before issuing a commit? >>>> 3) How to avoid the OOM problem in my case? I'm already doing >>>> (-Xms1536M >>>> -Xmx1536M) on a 2-GB machine. Is that not enough? I'm concerned that >>>> adding >>>> more Ram would just delay the OOM problem. Any additional JVM option >>>> to >>>> consider? >>>> 4) Any recommendation for the master server configuration, in a >>>> sense that I >>>> can maximize the number of indexed docs? >>>> 5) How can it disable caching on the master altogether as queries >>>> won't hit >>>> the master? >>>> 6) For an average doc size of 50k-100k, is that too large for >>>> solr, >>>> or even >>>> solr is the right tool? If not, any alternative? If we are able to >>>> reduce >>>> the size of docs, can we expect to index more documents? >>>> >>>> The followings are info related to software/hardware/configuration: >>>> >>>> Solr version (solr nightly build on 5/23/2008) >>>> Solr Specification Version: 1.2.2008.05.23.08.06.59 >>>> Solr Implementation Version: nightly >>>> Lucene Specification Version: 2.3.2 >>>> Lucene Implementation Version: 2.3.2 652650 >>>> Jetty: 6.1.3 >>>> >>>> Schema.xml (the section that I think are relevant to the master >>>> server.) >>>> >>>> <fieldType name="string" class="solr.StrField" >>>> sortMissingLast="true" >>>> omitNorms="true"/> >>>> <fieldType name="text_ws" class="solr.TextField" >>>> positionIncrementGap="100"> >>>> <analyzer> >>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>> </analyzer> >>>> </fieldType> >>>> >>>> <field name="id" type="string" indexed="true" stored="true" >>>> required="true" >>>> /> >>>> <field name="hex_id_multi" type="string" indexed="true" stored="false" >>>> multiValued="true" omitNorms="true"/> >>>> <field name="hex_id_string" type="text_ws" indexed="true" >>>> stored="false" >>>> omitNorms="true"/> >>>> >>>> <uniqueKey>id</uniqueKey> >>>> >>>> Solrconfig.xml >>>> <indexDefaults> >>>> <useCompoundFile>false</useCompoundFile> >>>> <mergeFactor>10</mergeFactor> >>>> <maxBufferedDocs>500</maxBufferedDocs> >>>> <ramBufferSizeMB>50</ramBufferSizeMB> >>>> <maxMergeDocs>5000</maxMergeDocs> >>>> <maxFieldLength>20000</maxFieldLength> >>>> <writeLockTimeout>1000</writeLockTimeout> >>>> <commitLockTimeout>10000</commitLockTimeout> >>>> >>>> <mergePolicy>org.apache.lucene.index.LogByteSizeMergePolicy</mergePolicy> >>>> <mergeScheduler>org.apache.lucene.index.ConcurrentMergeScheduler</mergeScheduler> >>>> <lockType>single</lockType> >>>> </indexDefaults> >>>> >>>> <mainIndex> >>>> <useCompoundFile>false</useCompoundFile> >>>> <ramBufferSizeMB>50</ramBufferSizeMB> >>>> <mergeFactor>10</mergeFactor> >>>> <!-- Deprecated --> >>>> <maxBufferedDocs>500</maxBufferedDocs> >>>> <maxMergeDocs>5000</maxMergeDocs> >>>> <maxFieldLength>20000</maxFieldLength> >>>> <unlockOnStartup>false</unlockOnStartup> >>>> </mainIndex> >>>> <updateHandler class="solr.DirectUpdateHandler2"> >>>> >>>> <autoCommit> >>>> <maxDocs>50</maxDocs> >>>> <maxTime>180000</maxTime> >>>> </autoCommit> >>>> <listener event="postCommit" class="solr.RunExecutableListener"> >>>> <str name="exe">solr/bin/snapshooter</str> >>>> <str name="dir">.</str> >>>> <bool name="wait">true</bool> >>>> </listener> >>>> </updateHandler> >>>> >>>> <query> >>>> <maxBooleanClauses>50</maxBooleanClauses> >>>> <filterCache >>>> class="solr.LRUCache" >>>> size="0" >>>> initialSize="0" >>>> autowarmCount="0"/> >>>> <queryResultCache >>>> class="solr.LRUCache" >>>> size="0" >>>> initialSize="0" >>>> autowarmCount="0"/> >>>> <documentCache >>>> class="solr.LRUCache" >>>> size="0" >>>> initialSize="0" >>>> autowarmCount="0"/> >>>> <enableLazyFieldLoading>true</enableLazyFieldLoading> >>>> >>>> <queryResultWindowSize>1</queryResultWindowSize> >>>> <queryResultMaxDocsCached>1</queryResultMaxDocsCached> >>>> <HashDocSet maxSize="1000" loadFactor="0.75"/> >>>> <listener event="newSearcher" class="solr.QuerySenderListener"> >>>> <arr name="queries"> >>>> <lst> <str name="q">user_id</str> <str name="start">0</str> <str >>>> name="rows">1</str> </lst> >>>> <lst><str name="q">static newSearcher warming query from >>>> solrconfig.xml</str></lst> >>>> </arr> >>>> </listener> >>>> <listener event="firstSearcher" class="solr.QuerySenderListener"> >>>> <arr name="queries"> >>>> <lst> <str name="q">fast_warm</str> <str name="start">0</str> >>>> <str >>>> name="rows">10</str> </lst> >>>> <lst><str name="q">static firstSearcher warming query from >>>> solrconfig.xml</str></lst> >>>> </arr> >>>> </listener> >>>> <useColdSearcher>false</useColdSearcher> >>>> <maxWarmingSearchers>4</maxWarmingSearchers> >>>> </query> >>>> >>>> Replication: >>>> The snappuller is scheduled to run every 15 mins for now. >>>> >>>> Hardware: >>>> AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive >>>> >>>> OS: >>>> Fedora 8 (64-bit) >>>> >>>> JVM version: >>>> java version "1.7.0" >>>> IcedTea Runtime Environment (build 1.7.0-b21) >>>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode) >>>> >>>> Java options: >>>> java -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M >>>> -XX:+UseParallelGC -jar start.jar >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html >>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>> >>>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17526135.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17550056.html Sent from the Solr - User mailing list archive at Nabble.com.