Guessing that the attachments won't work, I am pasting one file in each of four separate emails.
database.xml <dataConfig> <dataSource name="org_only" type="JdbcDataSource" driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@test.abcdata.com:1521:ORCL" user="admin" password="admin" readOnly="false" /> <document> <entity name="full-index" query=" select NVL(cast(ORACLE.ADDRESS_ALL.RECORD_ID as varchar2(100)), 'null') as SOLR_ID, 'ORACLE.ADDRESS_ALL' as SOLR_CATEGORY, NVL(cast(ORACLE.ADDRESS_ALL.RECORD_ID as varchar2(255)), ' ') as ADDRESSALLROWID, NVL(cast(ORACLE.ADDRESS_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as ADDRESSALLADDRTYPECD, NVL(cast(ORACLE.ADDRESS_ALL.LONGITUDE as varchar2(255)), ' ') as ADDRESSALLLONGITUDE, NVL(cast(ORACLE.ADDRESS_ALL.LATITUDE as varchar2(255)), ' ') as ADDRESSALLLATITUDE, NVL(cast(ORACLE.ADDRESS_ALL.ADDR_NAME as varchar2(255)), ' ') as ADDRESSALLADDRNAME, NVL(cast(ORACLE.ADDRESS_ALL.CITY as varchar2(255)), ' ') as ADDRESSALLCITY, NVL(cast(ORACLE.ADDRESS_ALL.STATE as varchar2(255)), ' ') as ADDRESSALLSTATE, NVL(cast(ORACLE.ADDRESS_ALL.EMAIL_ADDR as varchar2(255)), ' ') as ADDRESSALLEMAILADDR from ORACLE.ADDRESS_ALL " > <field column="SOLR_ID" name="id" /> <field column="SOLR_CATEGORY" name="category" /> <field column="ADDRESSALLROWID" name="ADDRESS_ALL.RECORD_ID_abc" /> <field column="ADDRESSALLADDRTYPECD" name="ADDRESS_ALL.ADDR_TYPE_CD_abc" /> <field column="ADDRESSALLLONGITUDE" name="ADDRESS_ALL.LONGITUDE_abc" /> <field column="ADDRESSALLLATITUDE" name="ADDRESS_ALL.LATITUDE_abc" /> <field column="ADDRESSALLADDRNAME" name="ADDRESS_ALL.ADDR_NAME_abc" /> <field column="ADDRESSALLCITY" name="ADDRESS_ALL.CITY_abc" /> <field column="ADDRESSALLSTATE" name="ADDRESS_ALL.STATE_abc" /> <field column="ADDRESSALLEMAILADDR" name="ADDRESS_ALL.EMAIL_ADDR_abc" /> </entity> <!-- Varaibles --> <!-- '${dataimporter.last_index_time}' --> </document> </dataConfig> On Fri, Apr 4, 2014 at 4:57 PM, Candygram For Mongo < candygram.for.mo...@gmail.com> wrote: > Does this user list allow attachments? I have four files attached > (database.xml, error.txt, schema.xml, solrconfig.xml). We just ran the > process again using the parameters you suggested, but not to a csv file. > It errored out quickly. We are working on the csv file run. > > Removed both <autoCommit> and <autoSoftCommit> parts/definitions from > solrconfig.xml > > Disabled tlog by removing > > <updateLog> > <str name="dir">${solr.ulog.dir:}</str> > </updateLog> > > from solrconfig.xml > > Used commit=true parameter. ?commit=true&command=full-import > > > On Fri, Apr 4, 2014 at 3:29 PM, Ahmet Arslan <iori...@yahoo.com> wrote: > >> Hi, >> >> This may not solve your problem but generally it is recommended to >> disable auto commit and transaction logs for bulk indexing. >> And issue one commit at the very end. Do you tlogs enabled? I see "commit >> failed" in the error message thats why I am offering this. >> >> And regarding comma separated values, with this approach you focus on >> just solr importing process. You separate data acquisition phrase. And it >> is very fast load even big csv files >> http://wiki.apache.org/solr/UpdateCSV >> I have never experienced OOM during indexing, I suspect data acquisition >> has role in it. >> >> Ahmet >> >> On Saturday, April 5, 2014 1:18 AM, Candygram For Mongo < >> candygram.for.mo...@gmail.com> wrote: >> >> We would be happy to try that. That sounds counter intuitive for the >> high volume of records we have. Can you help me understand how that might >> solve our problem? >> >> >> >> >> On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan <iori...@yahoo.com> wrote: >> >> Hi, >> > >> >Can you remove auto commit for bulk import. Commit at the very end? >> > >> >Ahmet >> > >> > >> > >> > >> >On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo < >> candygram.for.mo...@gmail.com> wrote: >> >In case the attached database.xml file didn't show up, I have pasted in >> the >> >contents below: >> > >> ><dataConfig> >> ><dataSource >> >name="org_only" >> >type="JdbcDataSource" >> >driver="oracle.jdbc.OracleDriver" >> >url="jdbc:oracle:thin:@test2.abc.com:1521:ORCL" >> >user="admin" >> >password="admin" >> >readOnly="false" >> >batchSize="100" >> >/> >> ><document> >> > >> > >> ><entity name="full-index" query=" >> >select >> > >> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null') >> >as SOLR_ID, >> > >> >'ORCL.ADDRESS_ACCT_ALL' >> >as SOLR_CATEGORY, >> > >> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as >> >ADDRESSALLROWID, >> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as >> >ADDRESSALLADDRTYPECD, >> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as >> >ADDRESSALLLONGITUDE, >> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as >> >ADDRESSALLLATITUDE, >> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as >> >ADDRESSALLADDRNAME, >> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as >> >ADDRESSALLCITY, >> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as >> >ADDRESSALLSTATE, >> >NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as >> >ADDRESSALLEMAILADDR >> > >> >from ORCL.ADDRESS_ACCT_ALL >> >" > >> > >> ><field column="SOLR_ID" name="id" /> >> ><field column="SOLR_CATEGORY" name="category" /> >> ><field column="ADDRESSALLROWID" name="ADDRESS_ACCT_ALL.RECORD_ID_abc" /> >> ><field column="ADDRESSALLADDRTYPECD" >> >name="ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc" /> >> ><field column="ADDRESSALLLONGITUDE" >> name="ADDRESS_ACCT_ALL.LONGITUDE_abc" /> >> ><field column="ADDRESSALLLATITUDE" name="ADDRESS_ACCT_ALL.LATITUDE_abc" >> /> >> ><field column="ADDRESSALLADDRNAME" name="ADDRESS_ACCT_ALL.ADDR_NAME_abc" >> /> >> ><field column="ADDRESSALLCITY" name="ADDRESS_ACCT_ALL.CITY_abc" /> >> ><field column="ADDRESSALLSTATE" name="ADDRESS_ACCT_ALL.STATE_abc" /> >> ><field column="ADDRESSALLEMAILADDR" >> name="ADDRESS_ACCT_ALL.EMAIL_ADDR_abc" >> >/> >> > >> ></entity> >> > >> > >> > >> ><!-- Varaibles --> >> ><!-- '${dataimporter.last_index_time}' --> >> ></document> >> ></dataConfig> >> > >> > >> > >> > >> > >> > >> >On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo < >> >candygram.for.mo...@gmail.com> wrote: >> > >> >> In this case we are indexing an Oracle database. >> >> >> >> We do not include the data-config.xml in our distribution. We store >> the >> >> database information in the database.xml file. I have attached the >> >> database.xml file. >> >> >> >> When we use the default merge policy settings, we get the same results. >> >> >> >> >> >> >> >> We have not tried to dump the table to a comma separated file. We >> think >> >> that dumping this size table to disk will introduce other memory >> problems >> >> with big file management. We have not tested that case. >> >> >> >> >> >> On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan <iori...@yahoo.com> >> wrote: >> >> >> >>> Hi, >> >>> >> >>> Which database are you using? Can you send us data-config.xml? >> >>> >> >>> What happens when you use default merge policy settings? >> >>> >> >>> What happens when you dump your table to Comma Separated File and fed >> >>> that file to solr? >> >>> >> >>> Ahmet >> >>> >> >>> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo < >> >>> candygram.for.mo...@gmail.com> wrote: >> >>> >> >>> The ramBufferSizeMB was set to 6MB only on the test system to make the >> >>> system crash sooner. In production that tag is commented out which >> >>> I believe forces the default value to be used. >> >>> >> >>> >> >>> >> >>> >> >>> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan <iori...@yahoo.com> >> wrote: >> >>> >> >>> Hi, >> >>> > >> >>> >out of curiosity, why did you set ramBufferSizeMB to 6? >> >>> > >> >>> >Ahmet >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo < >> >>> candygram.for.mo...@gmail.com> wrote: >> >>> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory >> Exception >> >>> > >> >>> >*SOLR/Lucene version: *4.2.1* >> >>> > >> >>> > >> >>> >*JVM version: >> >>> > >> >>> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11) >> >>> > >> >>> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) >> >>> > >> >>> > >> >>> > >> >>> >*Indexer startup command: >> >>> > >> >>> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m >> >>> > >> >>> > >> >>> > >> >>> >java " %JVMARGS% ^ >> >>> > >> >>> >-Dcom.sun.management.jmxremote.port=1092 ^ >> >>> > >> >>> >-Dcom.sun.management.jmxremote.ssl=false ^ >> >>> > >> >>> >-Dcom.sun.management.jmxremote.authenticate=false ^ >> >>> > >> >>> >-jar start.jar >> >>> > >> >>> > >> >>> > >> >>> >*SOLR indexing HTTP parameters request: >> >>> > >> >>> >webapp=/solr path=/dataimport >> >>> >params={clean=false&command=full-import&wt=javabin&version=2} >> >>> > >> >>> > >> >>> > >> >>> >We are getting a Java heap OOM exception when indexing (updating) 27 >> >>> >million records. If we increase the Java heap memory settings the >> >>> problem >> >>> >goes away but we believe the problem has not been fixed and that we >> will >> >>> >eventually get the same OOM exception. We have other processes on >> the >> >>> >server that also require resources so we cannot continually increase >> the >> >>> >memory settings to resolve the OOM issue. We are trying to find a >> way to >> >>> >configure the SOLR instance to reduce or preferably eliminate the >> >>> >possibility of an OOM exception. >> >>> > >> >>> > >> >>> > >> >>> >We can reproduce the problem on a test machine. We set the Java heap >> >>> >memory size to 64MB to accelerate the exception. If we increase this >> >>> >setting the same problems occurs, just hours later. In the test >> >>> >environment, we are using the following parameters: >> >>> > >> >>> > >> >>> > >> >>> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m >> >>> > >> >>> > >> >>> > >> >>> >Normally we use the default solrconfig.xml file with only the >> following >> >>> jar >> >>> >file references added: >> >>> > >> >>> > >> >>> > >> >>> ><lib path="../../../../default/lib/common.jar" /> >> >>> > >> >>> ><lib path="../../../../default/lib/webapp.jar" /> >> >>> > >> >>> ><lib path="../../../../default/lib/commons-pool-1.4.jar" /> >> >>> > >> >>> > >> >>> > >> >>> >Using these values and trying to index 6 million records from the >> >>> database, >> >>> >the Java Heap Out of Memory exception is thrown very quickly. >> >>> > >> >>> > >> >>> > >> >>> >We were able to complete a successful indexing by further modifying >> the >> >>> >solrconfig.xml and removing all or all but one <copyfield> tags from >> the >> >>> >schema.xml file. >> >>> > >> >>> > >> >>> > >> >>> >The following solrconfig.xml values were modified: >> >>> > >> >>> > >> >>> > >> >>> ><ramBufferSizeMB>6</ramBufferSizeMB> >> >>> > >> >>> > >> >>> > >> >>> ><mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> >> >>> > >> >>> ><int name="maxMergeAtOnce">2</int> >> >>> > >> >>> ><int name="maxMergeAtOnceExplicit">2</int> >> >>> > >> >>> ><int name="segmentsPerTier">10</int> >> >>> > >> >>> ><int name="maxMergedSegmentMB">150</int> >> >>> > >> >>> ></mergePolicy> >> >>> > >> >>> > >> >>> > >> >>> ><autoCommit> >> >>> > >> >>> ><maxDocs>15000</maxDocs> <!-- This tag was maxTime, before this >> -- > >> >>> > >> >>> ><openSearcher>false</openSearcher> >> >>> > >> >>> ></autoCommit> >> >>> > >> >>> > >> >>> > >> >>> >Using our customized schema.xml file with two or more <copyfield> >> tags, >> >>> the >> >>> >OOM exception is always thrown. Based on the errors, the problem >> occurs >> >>> >when the process was trying to do the merge. The error is provided >> >>> below: >> >>> > >> >>> > >> >>> > >> >>> >Exception in thread "Lucene Merge Thread #156" >> >>> >org.apache.lucene.index.MergePolicy$MergeException: >> >>> >java.lang.OutOfMemoryError: Java heap space >> >>> > >> >>> > at >> >>> >> >>> >> >org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541) >> >>> > >> >>> > at >> >>> >> >>> >> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514) >> >>> > >> >>> >Caused by: java.lang.OutOfMemoryError: Java heap space >> >>> > >> >>> > at >> >>> >> >>> >> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180) >> >>> > >> >>> > at >> >>> >> >>> >> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146) >> >>> > >> >>> > at >> >>> >> >>> >> >org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301) >> >>> > >> >>> > at >> >>> >> >>> >> >org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259) >> >>> > >> >>> > at >> >>> >> >org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233) >> >>> > >> >>> > at >> >>> >org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137) >> >>> > >> >>> > at >> >>> >> >org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693) >> >>> > >> >>> > at >> >>> >org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296) >> >>> > >> >>> > at >> >>> >> >>> >> >org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401) >> >>> > >> >>> > at >> >>> >> >>> >> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478) >> >>> > >> >>> >Mar 12, 2014 12:17:40 AM org.apache.solr.common.SolrException log >> >>> > >> >>> >SEVERE: auto commit error...:java.lang.IllegalStateException: this >> writer >> >>> >hit an OutOfMemoryError; cannot commit >> >>> > >> >>> > at >> >>> >> >org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3971) >> >>> > >> >>> > at >> >>> >> >>> >> >org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2744) >> >>> > >> >>> > at >> >>> >> >org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827) >> >>> > >> >>> > at >> >>> >org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807) >> >>> > >> >>> > at >> >>> >> >>> >> >org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536) >> >>> > >> >>> > at >> >>> >org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) >> >>> > >> >>> > at >> >>> >> >java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> >>> > >> >>> > at >> >>> >java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >> >>> > >> >>> > at >> >>> java.util.concurrent.FutureTask.run(FutureTask.java:166) >> >>> > >> >>> > at >> >>> >> >>> >> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >> >>> > >> >>> > at >> >>> >> >>> >> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >> >>> > >> >>> > at >> >>> >> >>> >> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >> >>> > >> >>> > at >> >>> >> >>> >> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >> >>> > >> >>> > at java.lang.Thread.run(Thread.java:722) >> >>> > >> >>> > >> >>> > >> >>> >We think but are not 100% sure that the problem is related to the >> merge. >> >>> > >> >>> > >> >>> > >> >>> >Normally our schema.xml contains a lot of field specifications (like >> the >> >>> >ones seen in the file fragment below): >> >>> > >> >>> > >> >>> > >> >>> ><copyField source="ADDRESS.RECORD_ID_abc" >> >>> dest="ADDRESS.RECORD_ID.case_abc" >> >>> >/> >> >>> > >> >>> ><copyField source="ADDRESS.RECORD_ID_abc" >> >>> >dest="ADDRESS.RECORD_ID.case.soundex_abc" /> >> >>> > >> >>> ><copyField source="ADDRESS.RECORD_ID_abc" >> >>> >dest="ADDRESS.RECORD_ID.case_nvl_abc" /> >> >>> > >> >>> > >> >>> > >> >>> >In tests using the default file schema.xml and no <copyfield> tags, >> >>> >indexing completed successfully. 6 million records produced a 900 MB >> >>> data >> >>> >directory. >> >>> > >> >>> > >> >>> > >> >>> >When I included just one <copyfield> tag, indexing completed >> >>> successfully. 6 >> >>> >million records produced a 990 MB data directory (90 MB bigger). >> >>> > >> >>> > >> >>> > >> >>> >When I included just two <copyfield> tags, the index crashed with an >> OOM >> >>> >exception. >> >>> > >> >>> > >> >>> > >> >>> >Changing parameters like maxMergedSegmentMB or maxDocs, only >> postponed >> >>> the >> >>> >crash. >> >>> > >> >>> > >> >>> > >> >>> >The net of our test results I as follows: >> >>> > >> >>> > >> >>> > >> >>> >*solrconfig.xml* >> >>> > >> >>> >*schema.xml* >> >>> > >> >>> >*result* >> >>> > >> >>> > >> >>> >default plus only jar references >> >>> > >> >>> >default (no copyfield tags) >> >>> > >> >>> >success >> >>> > >> >>> >default plus only jar references >> >>> > >> >>> >modified with one copyfield tag >> >>> > >> >>> >success >> >>> > >> >>> >default plus only jar references >> >>> > >> >>> >modified with two copyfield tags >> >>> > >> >>> >crash >> >>> > >> >>> >additional modified settings >> >>> > >> >>> >default (no copyfield tags) >> >>> > >> >>> >success >> >>> > >> >>> >additional modified settings >> >>> > >> >>> >modified with one copyfield tag >> >>> > >> >>> >success >> >>> > >> >>> >additional modified settings >> >>> > >> >>> >modified with two copyfield tags >> >>> > >> >>> >crash >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> >Our question is, what can we do to eliminate these OOM exceptions? >> >>> > >> >>> > >> >>> >> >> >> >> >> > >> > >> > >