RE: Out of memory on Solr sorting

sundar shankar Tue, 22 Jul 2008 14:47:27 -0700

I haven't seen the source code before, But I don't know why the sorting isn't 
done after the fetch is done. Wouldn't that make it more faster. at least in 
case of field level sorting? I could be wrong too and the implementation might 
probably be better. But don't know why all of the fields have had to be loaded.




> Date: Tue, 22 Jul 2008 14:26:26 -0700> From: [EMAIL PROTECTED]> To: 
> solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr sorting> > > 
> Ok, after some analysis of FieldCacheImpl:> > - it is supposed that (sorted) 
> Enumeration of "terms" is less than > total number of documents> (so that 
> SOLR uses specific field type for sorted searches: > solr.StrField with 
> omitNorms="true")> > It creates int[reader.maxDoc()] array, checks (sorted) 
> Enumeration of > "terms" (untokenized solr.StrField), and populates array 
> with document > Ids.> > > - it also creates array of String> String[] mterms 
> = new String[reader.maxDoc()+1];> > Why do we need that? For 1G document with 
> average term/StrField size > of 100 bytes (which could be unique text!!!) it 
> will create kind of > huge 100Gb cache which is not really needed...> 
> StringIndex value = new StringIndex (retArray, mterms);> > If I understand 
> correctly... StringIndex _must_ be a file in a > filesystem for such a 
> case... We create StringIndex, and retrieve top > 10 documents, huge 
> overhead.> > > > > > Quoting Fuad Efendi <[EMAIL PROTECTED]>:> > >> > Ok, 
> what is confusing me is implicit guess that FieldCache contains> > "field" 
> and Lucene uses in-memory sort instead of using file-system> > 
> "index".......> >> > Array syze: 100Mb (25M x 4 bytes), and it is just 
> pointers (4-byte> > integers) to documents in index.> >> > 
> org.apache.lucene.search.FieldCacheImpl$10.createValue> > ...> > 357: 
> protected Object createValue(IndexReader reader, Object fieldKey)> > 358: 
> throws IOException {> > 359: String field = ((String) fieldKey).intern();> > 
> 360: final int[] retArray = new int[reader.maxDoc()]; // > > 
> OutOfMemoryError!!!> > ...> > 408: StringIndex value = new StringIndex 
> (retArray, mterms);> > 409: return value;> > 410: }> > ...> >> > It's very 
> confusing, I don't know such internals...> >> >> >>>>> <field name="XXX" 
> type="string" indexed="true" stored="true" > >>>>> termVectors="true"/>> 
> >>>>> The sorting is done based on string field.> >> >> > I think Sundar 
> should not use [termVectors="true"]...> >> >> >> > Quoting Mark Miller 
> <[EMAIL PROTECTED]>:> >> >> Hmmm...I think its 32bits an integer with an 
> index entry for each doc, so> >>> >>> >> **25 000 000 x 32 bits = 95.3674316 
> megabytes**> >>> >> Then you have the string array that contains each unique 
> term from your> >> index...you can guess that based on the number of terms in 
> your index> >> and an avg length guess.> >>> >> There is some other overhead 
> beyond the sort cache as well, but thats> >> the bulk of what it will add. I 
> think my memory may be bad with my> >> original estimate :)> >>> >> Fuad 
> Efendi wrote:> >>> Thank you very much Mark,> >>>> >>> it explains me a lot;> 
> >>>> >>> I am guessing: for 1,000,000 documents with a [string] field of > 
> >>> average size 1024 bytes I need 1Gb for single IndexSearcher > >>> 
> instance; field-level cache it is used internally by Lucene (can > >>> Lucene 
> manage size if it?); we can't have 1G of such documents > >>> without having 
> 1Tb RAM...> >>>> >>>> >>>> >>> Quoting Mark Miller <[EMAIL PROTECTED]>:> >>>> 
> >>>> Fuad Efendi wrote:> >>>>>> SEVERE: java.lang.OutOfMemoryError: 
> allocLargeObjectOrArray - Object> >>>>>> size: 100767936, Num elements: 
> 25191979> >>>>>> >>>>>> >>>>> I just noticed, this is an exact number of 
> documents in index: 25191979> >>>>>> >>>>> (http://www.tokenizer.org/, you 
> can sort - click headers Id, > >>>>> [COuntry, Site, Price] in a table; 
> experimental)> >>>>>> >>>>>> >>>>> If array is allocated ONLY on new searcher 
> warming up I am > >>>>> _extremely_ happy... I had constant OOMs during past 
> month (SUN > >>>>> Java 5).> >>>> It is only on warmup - I believe its lazy 
> loaded, so the first time a> >>>> search is done (solr does the search as 
> part of warmup I believe) the> >>>> fieldcache is loaded. The underlying 
> IndexReader is the key to the> >>>> fieldcache, so until you get a new 
> IndexReader (SolrSearcher in solr> >>>> world?) the field cache will be good. 
> Keep in mind that as a searcher> >>>> is warming, the other search is still 
> serving, so that will up the ram> >>>> requirements...and since I think you 
> can have >1 searchers on> >>>> deck...you get the idea.> >>>>> >>>> As far as 
> the number I gave, thats from a memory made months and months> >>>> ago, so 
> go with what you see.> >>>>>> >>>>>> >>>>>> >>>>> Quoting Fuad Efendi <[EMAIL 
> PROTECTED]>:> >>>>>> >>>>>> I've even seen exceptions (posted here) when 
> "sort"-type queries caused> >>>>>> Lucene to allocate 100Mb arrays, here is 
> what happened to me:> >>>>>>> >>>>>> SEVERE: java.lang.OutOfMemoryError: 
> allocLargeObjectOrArray - Object> >>>>>> size: 100767936, Num elements: 
> 25191979> >>>>>> at> >>>>>> 
> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360)
>  > >>>>>> at> >>>>>> 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) - 
> it does not happen after I increased from 4096M to 8192M > >>>>>> (JRockit> 
> >>>>>> R27; more intelligent stacktrace, isn't it?)> >>>>>>> >>>>>> Thanks 
> Mark; I didn't know that it happens only once (on warming up a> >>>>>> 
> searcher).> >>>>>>> >>>>>>> >>>>>>> >>>>>> Quoting Mark Miller <[EMAIL 
> PROTECTED]>:> >>>>>>> >>>>>>> Because to sort efficiently, Solr loads the 
> term to sort on for each> >>>>>>> doc in the index into an array. For 
> ints,longs, etc its just an array> >>>>>>> the size of the number of docs in 
> your index (i believe deleted or> >>>>>>> not). For a String its an array to 
> hold each unique string and an array> >>>>>>> of ints indexing into the 
> String array.> >>>>>>>> >>>>>>> So if you do a sort, and search for something 
> that only gets 1 doc as a> >>>>>>> hit...your still loading up that field 
> cache for every single doc in> >>>>>>> your index on the first search. With 
> solr, this happens in the> >>>>>>> background as it warms up the searcher. 
> The end story is, you need more> >>>>>>> RAM to accommodate the sort most 
> likely...have you upped your xmx> >>>>>>> setting? I think you can roughly 
> say a 2 million doc index would need> >>>>>>> 40-50 MB (depending and rough, 
> but to give an idea) per field your> >>>>>>> sorting on.> >>>>>>>> >>>>>>> - 
> Mark> >>>>>>>> >>>>>>> sundar shankar wrote:> >>>>>>>> Thanks Fuad.> >>>>>>>> 
> But why does just sorting provide an OOM. I > >>>>>>>> executed the query 
> without adding the sort clause it executed > >>>>>>>> perfectly. In fact I 
> even tried remove the maxrows=10 and > >>>>>>>> executed. it came out fine. 
> Queries with bigger results > >>>>>>>> seems to come out fine too. But why 
> just sort of that too > >>>>>>>> just 10 rows??> >>>>>>>> -Sundar> >>>>>>>>> 
> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Date: Tue, 22 Jul 2008 12:24:35 
> -0700> From: [EMAIL PROTECTED]> > >>>>>>>>> To: solr-user@lucene.apache.org> 
> Subject: RE: Out of > >>>>>>>>> memory on Solr sorting> > > >>>>>>>>> 
> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)>
>  > - this piece of code do not request Array[100M] (as I seen with > Lucene), 
> it asks only few bytes / Kb for a field...> > > Probably 128 - 512 is not 
> enough; it is also advisable to use equal sizes> -Xms1024M -Xmx1024M> (it 
> minimizes GC frequency, and itensures that 1024M is available at startup)> > 
> OOM happens also with fragmented memory, when application requests big > 
> contigues fragment and GC is unable to optimize; looks like your > 
> application requests a little and memory is not available...> > > Quoting 
> sundar shankar <[EMAIL PROTECTED]>:> > >> >> >> >> From: [EMAIL PROTECTED]> 
> >> To: solr-user@lucene.apache.org> >> Subject: Out of memory on Solr 
> sorting> >> Date: Tue, 22 Jul 2008 19:11:02 +0000> >>> >>> >> Hi,> >> Sorry 
> again fellos. I am not sure whats happening. The day with > >> solr is bad 
> for me I guess. EZMLM didnt let me send any mails this > >> morning. Asked me 
> to confirm subscription and when I did, it said I > >> was already a member. 
> Now my mails are all coming out bad. Sorry > >> for troubling y'all this bad. 
> I hope this mail comes out right.> >> >> > Hi,> > We are developing a product 
> in a agile manner and the current > > implementation has a data of size just 
> about a 800 megs in dev.> > The memory allocated to solr on dev (Dual core 
> Linux box) is 128-512.> >> > My config> > =========> >> > <!-- autocommit 
> pending docs if certain criteria are met> > <autoCommit>> > 
> <maxDocs>10000</maxDocs>> > <maxTime>1000</maxTime>> > </autoCommit>> > -->> 
> >> > <filterCache> > class="solr.LRUCache"> > size="512"> > 
> initialSize="512"> > autowarmCount="256"/>> >> > <queryResultCache> > 
> class="solr.LRUCache"> > size="512"> > initialSize="512"> > 
> autowarmCount="256"/>> >> > <documentCache> > class="solr.LRUCache"> > 
> size="512"> > initialSize="512"> > autowarmCount="0"/>> >> > 
> <enableLazyFieldLoading>true</enableLazyFieldLoading>> >> >> > My Field> > 
> =======> >> > <fieldType name="autocomplete" class="solr.TextField">> > 
> <analyzer type="index">> > <tokenizer class="solr.KeywordTokenizerFactory"/>> 
> > <filter class="solr.LowerCaseFilterFactory" />> > <filter 
> class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" 
> replacement="" replace="all" />> > <filter 
> class="solr.EdgeNGramFilterFactory" > > maxGramSize="100" minGramSize="1" />> 
> > </analyzer>> > <analyzer type="query">> > <tokenizer 
> class="solr.KeywordTokenizerFactory"/>> > <filter 
> class="solr.LowerCaseFilterFactory" />> > <filter 
> class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" 
> replacement="" replace="all" />> > <filter 
> class="solr.PatternReplaceFilterFactory" > > pattern="^(.{20})(.*)?" 
> replacement="$1" replace="all" />> > </analyzer>> > </fieldType>> >> >> > 
> Problem> > ======> >> > I execute a query that returns 24 rows of result. I 
> pick 10 out of > > it. I have no problem when I execute this.> > But When I 
> do sort it by a String field that is fetched from this > > result. I get an 
> OOM. I am able to execute several> > other queries with no problem. Just 
> having a sort asc clause added > > to the query throws an OOM. Why is that.> 
> > What should I have ideally done. My config on QA is pretty similar > > to 
> the dev box and probably has more data than on dev.> > It didnt throw any OOM 
> during the integration test. The Autocomplete > > is a new field we added 
> recently.> >> > Another point is that the indexing is done with a field of 
> type string> > <field name="XXX" type="string" indexed="true" stored="true" > 
> > termVectors="true"/>> >> > and the autocomplete field is a copy field.> >> 
> > The sorting is done based on string field.> >> > Please do lemme know what 
> mistake am I doing?> >> > Regards> > Sundar> >> > P.S: The stack trace of the 
> exception is> >> >> > Caused by: 
> org.apache.solr.client.solrj.SolrServerException: Error > > executing query> 
> > at > > 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86)>
>  > at > > 
> org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101)>
>  > at > > 
> com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193)>
>  > ... 105 more> > Caused by: org.apache.solr.common.SolrException: Java heap 
> space > > java.lang.OutOfMemoryError: Java heap space> > at > > 
> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)>
>  > at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)> > 
> at > > 
> org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352)>
>  > at > > 
> org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416)>
>  > at > > 
> org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207)>
>  > at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)> > 
> at > > 
> org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)>
>  > at > > 
> org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:56)>
>  > at > > 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907)>
>  > at > > 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)>
>  > at > > 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)> 
> > at > > 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)>
>  > at > > 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:156)>
>  > at > > 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)>
>  > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1025)> > at > > 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)>
>  > at > > 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)>
>  > at > > 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)>
>  > at > > 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)>
>  > at > > 
> org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)>
>  > at > > 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)>
>  > at > > 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)>
>  > at > > 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)>
>  > at > > 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)>
>  > at > > 
> org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)>
>  > at > > 
> org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)>
>  > at > > 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)>
>  > at > > 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)>
>  > at > > 
> org.jboss.web.tomcat.tc5.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:156)>
>  > at > > 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)>
>  > at > > 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)> 
> > at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)> 
> >> > _________________________________________________________________> > 
> Wish to Marry Now? Click Here to Register FREE> > > >>>>>>>>> 
> http://www.shaadi.com/registration/user/index.php?ptnr=mhottag>> >>>>>>>>>>> 
> >>>>>>>>>>> >>>>>>>> 
> _________________________________________________________________> >>>>>>>> 
> Missed your favourite programme? Stop surfing TV channels and > >>>>>>>> 
> start planning your weekend TV viewing with our > >>>>>>>> comprehensive TV 
> Listing> >>>>>>>> http://entertainment.in.msn.com/TV/TVListing.aspx> 
> >>>>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >>>> > > 
_________________________________________________________________
Wish to Marry Now? Join Shaadi.com FREE! 
http://www.shaadi.com/registration/user/index.php?ptnr=mhottag

RE: Out of memory on Solr sorting

Reply via email to