Not sure why you would be getting an OOM from just indexing, and with the 1.5G heap you've given the JVM. Have you tried Sun's JVM?
-Yonik On Wed, May 28, 2008 at 7:35 PM, gaku113 <[EMAIL PROTECTED]> wrote: > > Hi all Solr users/developers/experts, > > I have the following scenario and I appreciate any advice for tuning my solr > master server. > > I have a field in my schema that would index (but not stored) about ~10000 > ids for each document. This field is expected to govern the size of the > document. Each id can contain up to 6 characters. I figure that there are > two alternatives for this field, one is the use a string multi-valued field, > and the other would be to pass a white-space-delimited string to solr and > have solr tokenize such string based on whitespace (the text_ws fieldType). > The master server is expected to receive constant stream of updates. > > The expected/estimated document size can range from 50k to 100k for a single > document. (I know this is quite large). The number of documents is expected > to be around 200,000 on each master server, and there can be multiple master > servers (sharding). I wish the master can handle more docs too if I can > figure a way out. > > Currently, I'm performing some basic stress tests to simulate the indexing > side on the master server. This stress test would continuously add new > documents at the rate of about 10 documents every 30 seconds. Autocommit is > being used (50 docs and 180 seconds constraints), but I have no idea if this > is the preferred way. The goal is to keep adding new documents until we can > get at least 200,000 documents (or about 20GB of index) on the master (or > even more if the server can handle it) > > What I experienced from the indexing stress test is that the master server > failed to respond after a while, such as non-pingable when there are about > 30k documents. When looking at the log, they are mostly: > java.lang.OutOfMemoryError: Java heap space > OR > Ping query caused exception: null (this is probably caused by the OOM > problem) > > There were also a few cases that the java process even went away. > > Questions: > 1) Is it better to use the multi-valued string field or the text_ws field > for this large field? > 2) Is it better to have more outstanding docs per commit or more frequent > commit, in term of maximizing server resources? What is the preferred way > to commit documents assuming that solr master receives updates frequently? > How many updated docs should there be before issuing a commit? > 3) How to avoid the OOM problem in my case? I'm already doing (-Xms1536M > -Xmx1536M) on a 2-GB machine. Is that not enough? I'm concerned that adding > more Ram would just delay the OOM problem. Any additional JVM option to > consider? > 4) Any recommendation for the master server configuration, in a sense > that I > can maximize the number of indexed docs? > 5) How can it disable caching on the master altogether as queries won't > hit > the master? > 6) For an average doc size of 50k-100k, is that too large for solr, or > even > solr is the right tool? If not, any alternative? If we are able to reduce > the size of docs, can we expect to index more documents? > > The followings are info related to software/hardware/configuration: > > Solr version (solr nightly build on 5/23/2008) > Solr Specification Version: 1.2.2008.05.23.08.06.59 > Solr Implementation Version: nightly > Lucene Specification Version: 2.3.2 > Lucene Implementation Version: 2.3.2 652650 > Jetty: 6.1.3 > > Schema.xml (the section that I think are relevant to the master server.) > > <fieldType name="string" class="solr.StrField" sortMissingLast="true" > omitNorms="true"/> > <fieldType name="text_ws" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > </analyzer> > </fieldType> > > <field name="id" type="string" indexed="true" stored="true" required="true" > /> > <field name="hex_id_multi" type="string" indexed="true" stored="false" > multiValued="true" omitNorms="true"/> > <field name="hex_id_string" type="text_ws" indexed="true" > stored="false" > omitNorms="true"/> > > <uniqueKey>id</uniqueKey> > > Solrconfig.xml > <indexDefaults> > <useCompoundFile>false</useCompoundFile> > <mergeFactor>10</mergeFactor> > <maxBufferedDocs>500</maxBufferedDocs> > <ramBufferSizeMB>50</ramBufferSizeMB> > <maxMergeDocs>5000</maxMergeDocs> > <maxFieldLength>20000</maxFieldLength> > <writeLockTimeout>1000</writeLockTimeout> > <commitLockTimeout>10000</commitLockTimeout> > > <mergePolicy>org.apache.lucene.index.LogByteSizeMergePolicy</mergePolicy> > <mergeScheduler>org.apache.lucene.index.ConcurrentMergeScheduler</mergeScheduler> > <lockType>single</lockType> > </indexDefaults> > > <mainIndex> > <useCompoundFile>false</useCompoundFile> > <ramBufferSizeMB>50</ramBufferSizeMB> > <mergeFactor>10</mergeFactor> > <!-- Deprecated --> > <maxBufferedDocs>500</maxBufferedDocs> > <maxMergeDocs>5000</maxMergeDocs> > <maxFieldLength>20000</maxFieldLength> > <unlockOnStartup>false</unlockOnStartup> > </mainIndex> > <updateHandler class="solr.DirectUpdateHandler2"> > > <autoCommit> > <maxDocs>50</maxDocs> > <maxTime>180000</maxTime> > </autoCommit> > <listener event="postCommit" class="solr.RunExecutableListener"> > <str name="exe">solr/bin/snapshooter</str> > <str name="dir">.</str> > <bool name="wait">true</bool> > </listener> > </updateHandler> > > <query> > <maxBooleanClauses>50</maxBooleanClauses> > <filterCache > class="solr.LRUCache" > size="0" > initialSize="0" > autowarmCount="0"/> > <queryResultCache > class="solr.LRUCache" > size="0" > initialSize="0" > autowarmCount="0"/> > <documentCache > class="solr.LRUCache" > size="0" > initialSize="0" > autowarmCount="0"/> > <enableLazyFieldLoading>true</enableLazyFieldLoading> > > <queryResultWindowSize>1</queryResultWindowSize> > <queryResultMaxDocsCached>1</queryResultMaxDocsCached> > <HashDocSet maxSize="1000" loadFactor="0.75"/> > <listener event="newSearcher" class="solr.QuerySenderListener"> > <arr name="queries"> > <lst> <str name="q">user_id</str> <str name="start">0</str> <str > name="rows">1</str> </lst> > <lst><str name="q">static newSearcher warming query from > solrconfig.xml</str></lst> > </arr> > </listener> > <listener event="firstSearcher" class="solr.QuerySenderListener"> > <arr name="queries"> > <lst> <str name="q">fast_warm</str> <str name="start">0</str> <str > name="rows">10</str> </lst> > <lst><str name="q">static firstSearcher warming query from > solrconfig.xml</str></lst> > </arr> > </listener> > <useColdSearcher>false</useColdSearcher> > <maxWarmingSearchers>4</maxWarmingSearchers> > </query> > > Replication: > The snappuller is scheduled to run every 15 mins for now. > > Hardware: > AMD (2.1GHz) dual core with 2GB ram 160GB SATA harddrive > > OS: > Fedora 8 (64-bit) > > JVM version: > java version "1.7.0" > IcedTea Runtime Environment (build 1.7.0-b21) > IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode) > > Java options: > java -Djetty.home=/path/to/solr/home -d64 -Xms1536M -Xmx1536M > -XX:+UseParallelGC -jar start.jar > > > -- > View this message in context: > http://www.nabble.com/Solr-indexing-configuration-help-tp17524364p17524364.html > Sent from the Solr - User mailing list archive at Nabble.com. > >