Siddharth, At the end of your email you said: "One option I see is to break the file in chunks, but with this, I won't be able to search with multiple words if they are distributed in different documents."
Unless I'm missing something unusual about your application, I don't think the above is technically correct. Have you tried doing this and have you then tried your searches? Everything should still work, even if you index one document at a time. Otis-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ________________________________ From: "Gargate, Siddharth" <sgarg...@ptc.com> To: solr-user@lucene.apache.org Sent: Monday, February 16, 2009 2:00:58 PM Subject: Outofmemory error for large files I am trying to index around 150 MB text file with 1024 MB max heap. But I get Outofmemory error in the SolrJ code. Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.jav a:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572) at java.lang.StringBuffer.append(StringBuffer.java:320) at java.io.StringWriter.write(StringWriter.java:60) at org.apache.solr.common.util.XML.escape(XML.java:206) at org.apache.solr.common.util.XML.escapeCharData(XML.java:79) at org.apache.solr.common.util.XML.writeXML(XML.java:149) at org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java: 115) at org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateReques t.java:200) at org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest. java:178) at org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams(Upd ateRequest.java:173) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde dSolrServer.java:136) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest .java:243) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63) I modified the UpdateRequest class to initialize the StringWriter object in UpdateRequest.getXML with initial size, and cleared the SolrInputDocument that is having the reference of the file text. Then I am getting OOM as below: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.lang.StringCoding.safeTrim(StringCoding.java:64) at java.lang.StringCoding.access$300(StringCoding.java:34) at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:251) at java.lang.StringCoding.encode(StringCoding.java:272) at java.lang.String.getBytes(String.java:947) at org.apache.solr.common.util.ContentStreamBase$StringStream.getStream(Con tentStreamBase.java:142) at org.apache.solr.common.util.ContentStreamBase$StringStream.getReader(Con tentStreamBase.java:154) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:61) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte ntStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde dSolrServer.java:139) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest .java:249) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63) After I increase the heap size upto 1250 MB, I get OOM as Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.<init>(String.java:216) at java.lang.StringBuffer.toString(StringBuffer.java:585) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:403) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:276) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte ntStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde dSolrServer.java:139) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest .java:249) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63) So looks like I won't be able to get out of these OOMs. Is there any way to avoid these OOMs? One option I see is to break the file in chunks, but with this, I won't be able to search with multiple words if they are distributed in different documents. Also, can somebody tell me the minimum heap size required w.r.t. file size so that document get indexed successfully? Thanks, Siddharth