I am trying to index around 150 MB text file with 1024 MB max heap. But I get Outofmemory error in the SolrJ code.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.jav a:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572) at java.lang.StringBuffer.append(StringBuffer.java:320) at java.io.StringWriter.write(StringWriter.java:60) at org.apache.solr.common.util.XML.escape(XML.java:206) at org.apache.solr.common.util.XML.escapeCharData(XML.java:79) at org.apache.solr.common.util.XML.writeXML(XML.java:149) at org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java: 115) at org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateReques t.java:200) at org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest. java:178) at org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams(Upd ateRequest.java:173) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde dSolrServer.java:136) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest .java:243) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63) I modified the UpdateRequest class to initialize the StringWriter object in UpdateRequest.getXML with initial size, and cleared the SolrInputDocument that is having the reference of the file text. Then I am getting OOM as below: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.lang.StringCoding.safeTrim(StringCoding.java:64) at java.lang.StringCoding.access$300(StringCoding.java:34) at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:251) at java.lang.StringCoding.encode(StringCoding.java:272) at java.lang.String.getBytes(String.java:947) at org.apache.solr.common.util.ContentStreamBase$StringStream.getStream(Con tentStreamBase.java:142) at org.apache.solr.common.util.ContentStreamBase$StringStream.getReader(Con tentStreamBase.java:154) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:61) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte ntStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde dSolrServer.java:139) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest .java:249) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63) After I increase the heap size upto 1250 MB, I get OOM as Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.<init>(String.java:216) at java.lang.StringBuffer.toString(StringBuffer.java:585) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:403) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:276) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte ntStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde dSolrServer.java:139) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest .java:249) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63) So looks like I won't be able to get out of these OOMs. Is there any way to avoid these OOMs? One option I see is to break the file in chunks, but with this, I won't be able to search with multiple words if they are distributed in different documents. Also, can somebody tell me the minimum heap size required w.r.t. file size so that document get indexed successfully? Thanks, Siddharth