Re: Full Indexing is Causing a Java Heap Out of Memory Exception

Ahmet Arslan Fri, 04 Apr 2014 15:38:09 -0700

Hi,

To disable auto commit remove both <autoCommit> and <autoSoftCommit> 
parts/definitions from solrconfig.xml


To disable tlog remove  
   <updateLog>
      <str name="dir">${solr.ulog.dir:}</str>
    </updateLog>

from solrconfig.xml

To commit at the end use commit=true parameter. ?commit=true&command=full-import
There is a checkbox for this in data import admin page.



On Saturday, April 5, 2014 1:27 AM, Candygram For Mongo 
<candygram.for.mo...@gmail.com> wrote:
I might have forgot to mention that we are using the DataImportHandler.  I
think we know how to remove auto commit.  How would we force a commit at
the end?


On Fri, Apr 4, 2014 at 3:18 PM, Candygram For Mongo <
candygram.for.mo...@gmail.com> wrote:

> We would be happy to try that.  That sounds counter intuitive for the high
> volume of records we have.  Can you help me understand how that might solve
> our problem?
>
>
>
> On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan <iori...@yahoo.com> wrote:
>
>> Hi,
>>
>> Can you remove auto commit for bulk import. Commit at the very end?
>>
>> Ahmet
>>
>>
>>
>> On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo <
>> candygram.for.mo...@gmail.com> wrote:
>> In case the attached database.xml file didn't show up, I have pasted in
>> the
>> contents below:
>>
>> <dataConfig>
>> <dataSource
>> name="org_only"
>> type="JdbcDataSource"
>> driver="oracle.jdbc.OracleDriver"
>> url="jdbc:oracle:thin:@test2.abc.com:1521:ORCL"
>> user="admin"
>> password="admin"
>> readOnly="false"
>> batchSize="100"
>> />
>> <document>
>>
>>
>> <entity name="full-index" query="
>> select
>>
>> NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null')
>> as SOLR_ID,
>>
>> 'ORCL.ADDRESS_ACCT_ALL'
>> as SOLR_CATEGORY,
>>
>> NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as
>> ADDRESSALLROWID,
>> NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as
>> ADDRESSALLADDRTYPECD,
>> NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as
>> ADDRESSALLLONGITUDE,
>> NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as
>> ADDRESSALLLATITUDE,
>> NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as
>> ADDRESSALLADDRNAME,
>> NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as
>> ADDRESSALLCITY,
>> NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as
>> ADDRESSALLSTATE,
>> NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as
>> ADDRESSALLEMAILADDR
>>
>> from ORCL.ADDRESS_ACCT_ALL
>> " >
>>
>> <field column="SOLR_ID" name="id" />
>> <field column="SOLR_CATEGORY" name="category" />
>> <field column="ADDRESSALLROWID" name="ADDRESS_ACCT_ALL.RECORD_ID_abc" />
>> <field column="ADDRESSALLADDRTYPECD"
>> name="ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc" />
>> <field column="ADDRESSALLLONGITUDE" name="ADDRESS_ACCT_ALL.LONGITUDE_abc"
>> />
>> <field column="ADDRESSALLLATITUDE" name="ADDRESS_ACCT_ALL.LATITUDE_abc" />
>> <field column="ADDRESSALLADDRNAME" name="ADDRESS_ACCT_ALL.ADDR_NAME_abc"
>> />
>> <field column="ADDRESSALLCITY" name="ADDRESS_ACCT_ALL.CITY_abc" />
>> <field column="ADDRESSALLSTATE" name="ADDRESS_ACCT_ALL.STATE_abc" />
>> <field column="ADDRESSALLEMAILADDR" name="ADDRESS_ACCT_ALL.EMAIL_ADDR_abc"
>> />
>>
>> </entity>
>>
>>
>>
>> <!-- Varaibles -->
>> <!-- '${dataimporter.last_index_time}' -->
>> </document>
>> </dataConfig>
>>
>>
>>
>>
>>
>>
>> On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo <
>> candygram.for.mo...@gmail.com> wrote:
>>
>> > In this case we are indexing an Oracle database.
>> >
>> > We do not include the data-config.xml in our distribution.  We store the
>> > database information in the database.xml file.  I have attached the
>> > database.xml file.
>> >
>> > When we use the default merge policy settings, we get the same results.
>> >
>> >
>> >
>> > We have not tried to dump the table to a comma separated file.  We think
>> > that dumping this size table to disk will introduce other memory
>> problems
>> > with big file management. We have not tested that case.
>> >
>> >
>> > On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan <iori...@yahoo.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> Which database are you using? Can you send us data-config.xml?
>> >>
>> >> What happens when you use default merge policy settings?
>> >>
>> >> What happens when you dump your table to Comma Separated File and fed
>> >> that file to solr?
>> >>
>> >> Ahmet
>> >>
>> >> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo <
>> >> candygram.for.mo...@gmail.com> wrote:
>> >>
>> >> The ramBufferSizeMB was set to 6MB only on the test system to make the
>> >> system crash sooner.  In production that tag is commented out which
>> >> I believe forces the default value to be used.
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan <iori...@yahoo.com>
>> wrote:
>> >>
>> >> Hi,
>> >> >
>> >> >out of curiosity, why did you set ramBufferSizeMB to 6?
>> >> >
>> >> >Ahmet
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo <
>> >> candygram.for.mo...@gmail.com> wrote:
>> >> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory
>> Exception
>> >> >
>> >> >*SOLR/Lucene version: *4.2.1*
>> >> >
>> >> >
>> >> >*JVM version:
>> >> >
>> >> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
>> >> >
>> >> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>> >> >
>> >> >
>> >> >
>> >> >*Indexer startup command:
>> >> >
>> >> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
>> >> >
>> >> >
>> >> >
>> >> >java " %JVMARGS% ^
>> >> >
>> >> >-Dcom.sun.management.jmxremote.port=1092 ^
>> >> >
>> >> >-Dcom.sun.management.jmxremote.ssl=false ^
>> >> >
>> >> >-Dcom.sun.management.jmxremote.authenticate=false ^
>> >> >
>> >> >-jar start.jar
>> >> >
>> >> >
>> >> >
>> >> >*SOLR indexing HTTP parameters request:
>> >> >
>> >> >webapp=/solr path=/dataimport
>> >> >params={clean=false&command=full-import&wt=javabin&version=2}
>> >> >
>> >> >
>> >> >
>> >> >We are getting a Java heap OOM exception when indexing (updating) 27
>> >> >million records.  If we increase the Java heap memory settings the
>> >> problem
>> >> >goes away but we believe the problem has not been fixed and that we
>> will
>> >> >eventually get the same OOM exception.  We have other processes on the
>> >> >server that also require resources so we cannot continually increase
>> the
>> >> >memory settings to resolve the OOM issue.  We are trying to find a
>> way to
>> >> >configure the SOLR instance to reduce or preferably eliminate the
>> >> >possibility of an OOM exception.
>> >> >
>> >> >
>> >> >
>> >> >We can reproduce the problem on a test machine.  We set the Java heap
>> >> >memory size to 64MB to accelerate the exception.  If we increase this
>> >> >setting the same problems occurs, just hours later.  In the test
>> >> >environment, we are using the following parameters:
>> >> >
>> >> >
>> >> >
>> >> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
>> >> >
>> >> >
>> >> >
>> >> >Normally we use the default solrconfig.xml file with only the
>> following
>> >> jar
>> >> >file references added:
>> >> >
>> >> >
>> >> >
>> >> ><lib path="../../../../default/lib/common.jar" />
>> >> >
>> >> ><lib path="../../../../default/lib/webapp.jar" />
>> >> >
>> >> ><lib path="../../../../default/lib/commons-pool-1.4.jar" />
>> >> >
>> >> >
>> >> >
>> >> >Using these values and trying to index 6 million records from the
>> >> database,
>> >> >the Java Heap Out of Memory exception is thrown very quickly.
>> >> >
>> >> >
>> >> >
>> >> >We were able to complete a successful indexing by further modifying
>> the
>> >> >solrconfig.xml and removing all or all but one <copyfield> tags from
>> the
>> >> >schema.xml file.
>> >> >
>> >> >
>> >> >
>> >> >The following solrconfig.xml values were modified:
>> >> >
>> >> >
>> >> >
>> >> ><ramBufferSizeMB>6</ramBufferSizeMB>
>> >> >
>> >> >
>> >> >
>> >> ><mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>> >> >
>> >> ><int name="maxMergeAtOnce">2</int>
>> >> >
>> >> ><int name="maxMergeAtOnceExplicit">2</int>
>> >> >
>> >> ><int name="segmentsPerTier">10</int>
>> >> >
>> >> ><int name="maxMergedSegmentMB">150</int>
>> >> >
>> >> ></mergePolicy>
>> >> >
>> >> >
>> >> >
>> >> ><autoCommit>
>> >> >
>> >> ><maxDocs>15000</maxDocs>  <!--     This tag was maxTime, before this
>> -- >
>> >> >
>> >> ><openSearcher>false</openSearcher>
>> >> >
>> >> ></autoCommit>
>> >> >
>> >> >
>> >> >
>> >> >Using our customized schema.xml file with two or more <copyfield>
>> tags,
>> >> the
>> >> >OOM exception is always thrown.  Based on the errors, the problem
>> occurs
>> >> >when the process was trying to do the merge.  The error is provided
>> >> below:
>> >> >
>> >> >
>> >> >
>> >> >Exception in thread "Lucene Merge Thread #156"
>> >> >org.apache.lucene.index.MergePolicy$MergeException:
>> >> >java.lang.OutOfMemoryError: Java heap space
>> >> >
>> >> >                at
>> >>
>> >>
>> >org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
>> >> >
>> >> >                at
>> >>
>> >>
>> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
>> >> >
>> >> >Caused by: java.lang.OutOfMemoryError: Java heap space
>> >> >
>> >> >                at
>> >>
>> >>
>> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180)
>> >> >
>> >> >                at
>> >>
>> >>
>> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146)
>> >> >
>> >> >                at
>> >>
>> >>
>> >org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
>> >> >
>> >> >                at
>> >>
>> >>
>> >org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259)
>> >> >
>> >> >                at
>> >>
>> >org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233)
>> >> >
>> >> >                at
>> >> >org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137)
>> >> >
>> >> >                at
>> >> >org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693)
>> >> >
>> >> >                at
>> >> >org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296)
>> >> >
>> >> >                at
>> >>
>> >>
>> >org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
>> >> >
>> >> >                at
>> >>
>> >>
>> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)
>> >> >
>> >> >Mar 12, 2014 12:17:40 AM org.apache.solr.common.SolrException log
>> >> >
>> >> >SEVERE: auto commit error...:java.lang.IllegalStateException: this
>> writer
>> >> >hit an OutOfMemoryError; cannot commit
>> >> >
>> >> >                at
>> >> >org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3971)
>> >> >
>> >> >                at
>> >>
>> >>
>> >org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2744)
>> >> >
>> >> >                at
>> >>
>> >org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
>> >> >
>> >> >                at
>> >> >org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
>> >> >
>> >> >                at
>> >>
>> >>
>> >org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536)
>> >> >
>> >> >                at
>> >> >org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>> >> >
>> >> >                at
>> >>
>> >java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >> >
>> >> >                at
>> >> >java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> >> >
>> >> >                at
>> >> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> >> >
>> >> >                at
>> >>
>> >>
>> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>> >> >
>> >> >                at
>> >>
>> >>
>> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>> >> >
>> >> >                at
>> >>
>> >>
>> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >> >
>> >> >                at
>> >>
>> >>
>> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >> >
>> >> >                at java.lang.Thread.run(Thread.java:722)
>> >> >
>> >> >
>> >> >
>> >> >We think but are not 100% sure that the problem is related to the
>> merge.
>> >> >
>> >> >
>> >> >
>> >> >Normally our schema.xml contains a lot of field specifications (like
>> the
>> >> >ones seen in the file fragment below):
>> >> >
>> >> >
>> >> >
>> >> ><copyField source="ADDRESS.RECORD_ID_abc"
>> >> dest="ADDRESS.RECORD_ID.case_abc"
>> >> >/>
>> >> >
>> >> ><copyField source="ADDRESS.RECORD_ID_abc"
>> >> >dest="ADDRESS.RECORD_ID.case.soundex_abc" />
>> >> >
>> >> ><copyField source="ADDRESS.RECORD_ID_abc"
>> >> >dest="ADDRESS.RECORD_ID.case_nvl_abc" />
>> >> >
>> >> >
>> >> >
>> >> >In tests using the default file schema.xml and no <copyfield> tags,
>> >> >indexing completed successfully.  6 million records produced a 900 MB
>> >> data
>> >> >directory.
>> >> >
>> >> >
>> >> >
>> >> >When I included just one <copyfield> tag, indexing completed
>> >> successfully.  6
>> >> >million records produced a 990 MB data directory (90 MB bigger).
>> >> >
>> >> >
>> >> >
>> >> >When I included just two <copyfield> tags, the index crashed with an
>> OOM
>> >> >exception.
>> >> >
>> >> >
>> >> >
>> >> >Changing parameters like maxMergedSegmentMB or maxDocs, only postponed
>> >> the
>> >> >crash.
>> >> >
>> >> >
>> >> >
>> >> >The net of our test results I as follows:
>> >> >
>> >> >
>> >> >
>> >> >*solrconfig.xml*
>> >> >
>> >> >*schema.xml*
>> >> >
>> >> >*result*
>> >> >
>> >> >
>> >> >default plus only jar references
>> >> >
>> >> >default (no copyfield tags)
>> >> >
>> >> >success
>> >> >
>> >> >default plus only jar references
>> >> >
>> >> >modified with one copyfield tag
>> >> >
>> >> >success
>> >> >
>> >> >default plus only jar references
>> >> >
>> >> >modified with two copyfield tags
>> >> >
>> >> >crash
>> >> >
>> >> >additional modified settings
>> >> >
>> >> >default (no copyfield tags)
>> >> >
>> >> >success
>> >> >
>> >> >additional modified settings
>> >> >
>> >> >modified with one copyfield tag
>> >> >
>> >> >success
>> >> >
>> >> >additional modified settings
>> >> >
>> >> >modified with two copyfield tags
>> >> >
>> >> >crash
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >Our question is, what can we do to eliminate these OOM exceptions?
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>>
>

Re: Full Indexing is Causing a Java Heap Out of Memory Exception

Reply via email to