I am trying to index around 150 MB text file with 1024 MB max heap. But
I get Outofmemory error in the SolrJ code. 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2882)
        at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.jav
a:100)
        at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572)
        at java.lang.StringBuffer.append(StringBuffer.java:320)
        at java.io.StringWriter.write(StringWriter.java:60)
        at org.apache.solr.common.util.XML.escape(XML.java:206)
        at org.apache.solr.common.util.XML.escapeCharData(XML.java:79)
        at org.apache.solr.common.util.XML.writeXML(XML.java:149)
        at
org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java:
115)
        at
org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateReques
t.java:200)
        at
org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.
java:178)
        at
org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams(Upd
ateRequest.java:173)
        at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:136)
        at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:243)
        at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


I modified the UpdateRequest class to initialize the StringWriter object
in UpdateRequest.getXML with initial size, and cleared the
SolrInputDocument that is having the reference of the file text. Then I
am getting OOM as below:


Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2786)
        at java.lang.StringCoding.safeTrim(StringCoding.java:64)
        at java.lang.StringCoding.access$300(StringCoding.java:34)
        at
java.lang.StringCoding$StringEncoder.encode(StringCoding.java:251)
        at java.lang.StringCoding.encode(StringCoding.java:272)
        at java.lang.String.getBytes(String.java:947)
        at
org.apache.solr.common.util.ContentStreamBase$StringStream.getStream(Con
tentStreamBase.java:142)
        at
org.apache.solr.common.util.ContentStreamBase$StringStream.getReader(Con
tentStreamBase.java:154)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:61)
        at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
        at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
        at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
        at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


After I increase the heap size upto 1250 MB, I get OOM as 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3209)
        at java.lang.String.<init>(String.java:216)
        at java.lang.StringBuffer.toString(StringBuffer.java:585)
        at
com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:403)
        at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
        at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:276)
        at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
        at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
        at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
        at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
        at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


So looks like I won't be able to get out of these OOMs. 
Is there any way to avoid these OOMs? One option I see is to break the
file in chunks, but with this, I won't be able to search with multiple
words if they are distributed in different documents.
Also, can somebody tell me the minimum heap size required w.r.t. file
size so that document get indexed successfully? 

Thanks,
Siddharth

Reply via email to