Re: Out of memory on Solr sorting

Fuad Efendi Tue, 22 Jul 2008 14:10:27 -0700

Ok, what is confusing me is implicit guess that FieldCache contains"field" and Lucene uses in-memory sort instead of using file-system"index".......

Array syze: 100Mb (25M x 4 bytes), and it is just pointers (4-byteintegers) to documents in index.


org.apache.lucene.search.FieldCacheImpl$10.createValue
...
357: protected Object createValue(IndexReader reader, Object fieldKey)
358:   throws IOException {
359:   String field = ((String) fieldKey).intern();
360:   final int[] retArray = new int[reader.maxDoc()]; // OutOfMemoryError!!!
...
408:   StringIndex value = new StringIndex (retArray, mterms);
409:   return value;
410: }
...

It's very confusing, I don't know such internals...

<field name="XXX" type="string" indexed="true" stored="true"termVectors="true"/>
 The sorting is done based on string field.



I think Sundar should not use [termVectors="true"]...



Quoting Mark Miller <[EMAIL PROTECTED]>:

Hmmm...I think its 32bits an integer with an index entry for each doc, so


   **25 000 000 x 32 bits = 95.3674316 megabytes**

Then you have the string array that contains each unique term from your
index...you can guess that based on the number of terms in your index
and an avg length guess.

There is some other overhead beyond the sort cache as well, but thats
the bulk of what it will add. I think my memory may be bad with my
original estimate :)

Fuad Efendi wrote:
Thank you very much Mark,

it explains me a lot;
I am guessing: for 1,000,000 documents with a [string] field ofaverage size 1024 bytes I need 1Gb for single IndexSearcherinstance; field-level cache it is used internally by Lucene (canLucene manage size if it?); we can't have 1G of such documentswithout having 1Tb RAM...
Quoting Mark Miller <[EMAIL PROTECTED]>:
Fuad Efendi wrote:
SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
I just noticed, this is an exact number of documents in index: 25191979
(http://www.tokenizer.org/, you can sort - click headers Id,[COuntry, Site, Price] in a table; experimental)
If array is allocated ONLY on new searcher warming up I am_extremely_ happy... I had constant OOMs during past month (SUNJava 5).
It is only on warmup - I believe its lazy loaded, so the first time a
search is done (solr does the search as part of warmup I believe) the
fieldcache is loaded. The underlying IndexReader is the key to the
fieldcache, so until you get a new IndexReader (SolrSearcher in solr
world?) the field cache will be good. Keep in mind that as a searcher
is warming, the other search is still serving, so that will up the ram
requirements...and since I think you can have >1 searchers on
deck...you get the idea.

As far as the number I gave, thats from a memory made months and months
ago, so go with what you see.
Quoting Fuad Efendi <[EMAIL PROTECTED]>:
I've even seen exceptions (posted here) when "sort"-type queries caused
Lucene to allocate 100Mb arrays, here is what happened to me:

SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
      at
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360)atorg.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) - it does not happen after I increased from 4096M to 8192M(JRockit
R27; more intelligent stacktrace, isn't it?)

Thanks Mark; I didn't know that it happens only once (on warming up a
searcher).



Quoting Mark Miller <[EMAIL PROTECTED]>:
Because to sort efficiently, Solr loads the term to sort on for each
doc in the index into an array. For ints,longs, etc its just an array
the size of the number of docs in your index (i believe deleted or
not). For a String its an array to hold each unique string and an array
of ints indexing into the String array.

So if you do a sort, and search for something that only gets 1 doc as a
hit...your still loading up that field cache for every single doc in
your index on the first search. With solr, this happens in the
background as it warms up the searcher. The end story is, you need more
RAM to accommodate the sort most likely...have you upped your xmx
setting? I think you can roughly say a 2 million doc index would need
40-50 MB (depending and rough, but to give an idea) per field your
sorting on.

- Mark

sundar shankar wrote:
Thanks Fuad.
But why does just sorting provide an OOM. Iexecuted the query without adding the sort clause it executedperfectly. In fact I even tried remove the maxrows=10 andexecuted. it came out fine. Queries with bigger resultsseems to come out fine too. But why just sort of that toojust 10 rows??
-Sundar
Date: Tue, 22 Jul 2008 12:24:35 -0700> From: [EMAIL PROTECTED]>To: solr-user@lucene.apache.org> Subject: RE: Out ofmemory on Solr sorting> >org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > - this piece of code do not request Array[100M] (as I seen with > Lucene), it asks only few bytes / Kb for a field...> > > Probably 128 - 512 is not enough; it is also advisable to use equal sizes> -Xms1024M -Xmx1024M> (it minimizes GC frequency, and itensures that 1024M is available at startup)> > OOM happens also with fragmented memory, when application requests big > contigues fragment and GC is unable to optimize; looks like your > application requests a little and memory is not available...> > > Quoting sundar shankar <[EMAIL PROTECTED]>:> > >> >> >> >> From: [EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org> >> Subject: Out of memory on Solr sorting> >> Date: Tue, 22 Jul 2008 19:11:02 +0000> >>> >>> >> Hi,> >> Sorry again fellos. I am not sure whats happening. The day with > >> solr is bad for me I guess. EZMLM didnt let me send any mails this > >> morning. Asked me to confirm subscription and when I did, it said I > >> was already a member. Now my mails are all coming out bad. Sorry > >> for troubling y'all this bad. I hope this mail comes out right.> >> >> > Hi,> > We are developing a product in a agile manner and the current > > implementation has a data of size just about a 800 megs in dev.> > The memory allocated to solr on dev (Dual core Linux box) is 128-512.> >> > My config> > =========> >> > > >> > <filterCache> > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="256"/>> >> > <queryResultCache> > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="256"/>> >> > <documentCache> > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="0"/>> >> > <enableLazyFieldLoading>true</enableLazyFieldLoading>> >> >> > My Field> > =======> >> > <fieldType name="autocomplete" class="solr.TextField">> > <analyzer type="index">> > <tokenizer class="solr.KeywordTokenizerFactory"/>> > <filter class="solr.LowerCaseFilterFactory" />> > <filter class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" replacement="" replace="all" />> > <filter class="solr.EdgeNGramFilterFactory" > > maxGramSize="100" minGramSize="1" />> > </analyzer>> > <analyzer type="query">> > <tokenizer class="solr.KeywordTokenizerFactory"/>> > <filter class="solr.LowerCaseFilterFactory" />> > <filter class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" replacement="" replace="all" />> > <filter class="solr.PatternReplaceFilterFactory" > > pattern="^(.{20})(.*)?" replacement="$1" replace="all" />> > </analyzer>> > </fieldType>> >> >> > Problem> > ======> >> > I execute a query that returns 24 rows of result. I pick 10 out of > > it. I have no problem when I execute this.> > But When I do sort it by a String field that is fetched from this > > result. I get an OOM. I am able to execute several> > other queries with no problem. Just having a sort asc clause added > > to the query throws an OOM. Why is that.> > What should I have ideally done. My config on QA is pretty similar > > to the dev box and probably has more data than on dev.> > It didnt throw any OOM during the integration test. The Autocomplete > > is a new field we added recently.> >> > Another point is that the indexing is done with a field of type string> > <field name="XXX" type="string" indexed="true" stored="true" > > termVectors="true"/>> >> > and the autocomplete field is a copy field.> >> > The sorting is done based on string field.> >> > Please do lemme know what mistake am I doing?> >> > Regards> > Sundar> >> > P.S: The stack trace of the exception is> >> >> > Caused by: org.apache.solr.client.solrj.SolrServerException: Error > > executing query> > at > > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86)> > at > > org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101)> > at > > com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193)> > ... 105 more> > Caused by: org.apache.solr.common.SolrException: Java heap space > > java.lang.OutOfMemoryError: Java heap space> > at > > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)> > at > > org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352)> > at > > org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416)> > at > > org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207)> > at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)> > at > > org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)> > at > > org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:56)> > at > > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907)> > at > > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)> > at > > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)> > at > > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)> > at > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:156)> > at > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1025)> > at > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)> > at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)> > at > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)> > at > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)> > at > > org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)> > at > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)> > at > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)> > at > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)> > at > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)> > at > > org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)> > at > > org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)> > at > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)> > at > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)> > at > > org.jboss.web.tomcat.tc5.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:156)> > at > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)> > at > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)> > at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)> >> > _________________________________________________________________> > Wish to Marry Now? Click Here to Register FREE> >http://www.shaadi.com/registration/user/index.php?ptnr=mhottag>
_________________________________________________________________
Missed your favourite programme? Stop surfing TV channels andstart planning your weekend TV viewing with ourcomprehensive TV Listing
http://entertainment.in.msn.com/TV/TVListing.aspx

Re: Out of memory on Solr sorting

Reply via email to