I haven't seen the source code before, But I don't know why the sorting isn't done after the fetch is done. Wouldn't that make it more faster. at least in case of field level sorting? I could be wrong too and the implementation might probably be better. But don't know why all of the fields have had to be loaded.
> Date: Tue, 22 Jul 2008 14:26:26 -0700> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr sorting> > > > Ok, after some analysis of FieldCacheImpl:> > - it is supposed that (sorted) > Enumeration of "terms" is less than > total number of documents> (so that > SOLR uses specific field type for sorted searches: > solr.StrField with > omitNorms="true")> > It creates int[reader.maxDoc()] array, checks (sorted) > Enumeration of > "terms" (untokenized solr.StrField), and populates array > with document > Ids.> > > - it also creates array of String> String[] mterms > = new String[reader.maxDoc()+1];> > Why do we need that? For 1G document with > average term/StrField size > of 100 bytes (which could be unique text!!!) it > will create kind of > huge 100Gb cache which is not really needed...> > StringIndex value = new StringIndex (retArray, mterms);> > If I understand > correctly... StringIndex _must_ be a file in a > filesystem for such a > case... We create StringIndex, and retrieve top > 10 documents, huge > overhead.> > > > > > Quoting Fuad Efendi <[EMAIL PROTECTED]>:> > >> > Ok, > what is confusing me is implicit guess that FieldCache contains> > "field" > and Lucene uses in-memory sort instead of using file-system> > > "index".......> >> > Array syze: 100Mb (25M x 4 bytes), and it is just > pointers (4-byte> > integers) to documents in index.> >> > > org.apache.lucene.search.FieldCacheImpl$10.createValue> > ...> > 357: > protected Object createValue(IndexReader reader, Object fieldKey)> > 358: > throws IOException {> > 359: String field = ((String) fieldKey).intern();> > > 360: final int[] retArray = new int[reader.maxDoc()]; // > > > OutOfMemoryError!!!> > ...> > 408: StringIndex value = new StringIndex > (retArray, mterms);> > 409: return value;> > 410: }> > ...> >> > It's very > confusing, I don't know such internals...> >> >> >>>>> <field name="XXX" > type="string" indexed="true" stored="true" > >>>>> termVectors="true"/>> > >>>>> The sorting is done based on string field.> >> >> > I think Sundar > should not use [termVectors="true"]...> >> >> >> > Quoting Mark Miller > <[EMAIL PROTECTED]>:> >> >> Hmmm...I think its 32bits an integer with an > index entry for each doc, so> >>> >>> >> **25 000 000 x 32 bits = 95.3674316 > megabytes**> >>> >> Then you have the string array that contains each unique > term from your> >> index...you can guess that based on the number of terms in > your index> >> and an avg length guess.> >>> >> There is some other overhead > beyond the sort cache as well, but thats> >> the bulk of what it will add. I > think my memory may be bad with my> >> original estimate :)> >>> >> Fuad > Efendi wrote:> >>> Thank you very much Mark,> >>>> >>> it explains me a lot;> > >>>> >>> I am guessing: for 1,000,000 documents with a [string] field of > > >>> average size 1024 bytes I need 1Gb for single IndexSearcher > >>> > instance; field-level cache it is used internally by Lucene (can > >>> Lucene > manage size if it?); we can't have 1G of such documents > >>> without having > 1Tb RAM...> >>>> >>>> >>>> >>> Quoting Mark Miller <[EMAIL PROTECTED]>:> >>>> > >>>> Fuad Efendi wrote:> >>>>>> SEVERE: java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object> >>>>>> size: 100767936, Num elements: > 25191979> >>>>>> >>>>>> >>>>> I just noticed, this is an exact number of > documents in index: 25191979> >>>>>> >>>>> (http://www.tokenizer.org/, you > can sort - click headers Id, > >>>>> [COuntry, Site, Price] in a table; > experimental)> >>>>>> >>>>>> >>>>> If array is allocated ONLY on new searcher > warming up I am > >>>>> _extremely_ happy... I had constant OOMs during past > month (SUN > >>>>> Java 5).> >>>> It is only on warmup - I believe its lazy > loaded, so the first time a> >>>> search is done (solr does the search as > part of warmup I believe) the> >>>> fieldcache is loaded. The underlying > IndexReader is the key to the> >>>> fieldcache, so until you get a new > IndexReader (SolrSearcher in solr> >>>> world?) the field cache will be good. > Keep in mind that as a searcher> >>>> is warming, the other search is still > serving, so that will up the ram> >>>> requirements...and since I think you > can have >1 searchers on> >>>> deck...you get the idea.> >>>>> >>>> As far as > the number I gave, thats from a memory made months and months> >>>> ago, so > go with what you see.> >>>>>> >>>>>> >>>>>> >>>>> Quoting Fuad Efendi <[EMAIL > PROTECTED]>:> >>>>>> >>>>>> I've even seen exceptions (posted here) when > "sort"-type queries caused> >>>>>> Lucene to allocate 100Mb arrays, here is > what happened to me:> >>>>>>> >>>>>> SEVERE: java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object> >>>>>> size: 100767936, Num elements: > 25191979> >>>>>> at> >>>>>> > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360) > > >>>>>> at> >>>>>> > org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) - > it does not happen after I increased from 4096M to 8192M > >>>>>> (JRockit> > >>>>>> R27; more intelligent stacktrace, isn't it?)> >>>>>>> >>>>>> Thanks > Mark; I didn't know that it happens only once (on warming up a> >>>>>> > searcher).> >>>>>>> >>>>>>> >>>>>>> >>>>>> Quoting Mark Miller <[EMAIL > PROTECTED]>:> >>>>>>> >>>>>>> Because to sort efficiently, Solr loads the > term to sort on for each> >>>>>>> doc in the index into an array. For > ints,longs, etc its just an array> >>>>>>> the size of the number of docs in > your index (i believe deleted or> >>>>>>> not). For a String its an array to > hold each unique string and an array> >>>>>>> of ints indexing into the > String array.> >>>>>>>> >>>>>>> So if you do a sort, and search for something > that only gets 1 doc as a> >>>>>>> hit...your still loading up that field > cache for every single doc in> >>>>>>> your index on the first search. With > solr, this happens in the> >>>>>>> background as it warms up the searcher. > The end story is, you need more> >>>>>>> RAM to accommodate the sort most > likely...have you upped your xmx> >>>>>>> setting? I think you can roughly > say a 2 million doc index would need> >>>>>>> 40-50 MB (depending and rough, > but to give an idea) per field your> >>>>>>> sorting on.> >>>>>>>> >>>>>>> - > Mark> >>>>>>>> >>>>>>> sundar shankar wrote:> >>>>>>>> Thanks Fuad.> >>>>>>>> > But why does just sorting provide an OOM. I > >>>>>>>> executed the query > without adding the sort clause it executed > >>>>>>>> perfectly. In fact I > even tried remove the maxrows=10 and > >>>>>>>> executed. it came out fine. > Queries with bigger results > >>>>>>>> seems to come out fine too. But why > just sort of that too > >>>>>>>> just 10 rows??> >>>>>>>> -Sundar> >>>>>>>>> > >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Date: Tue, 22 Jul 2008 12:24:35 > -0700> From: [EMAIL PROTECTED]> > >>>>>>>>> To: solr-user@lucene.apache.org> > Subject: RE: Out of > >>>>>>>>> memory on Solr sorting> > > >>>>>>>>> > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > > - this piece of code do not request Array[100M] (as I seen with > Lucene), > it asks only few bytes / Kb for a field...> > > Probably 128 - 512 is not > enough; it is also advisable to use equal sizes> -Xms1024M -Xmx1024M> (it > minimizes GC frequency, and itensures that 1024M is available at startup)> > > OOM happens also with fragmented memory, when application requests big > > contigues fragment and GC is unable to optimize; looks like your > > application requests a little and memory is not available...> > > Quoting > sundar shankar <[EMAIL PROTECTED]>:> > >> >> >> >> From: [EMAIL PROTECTED]> > >> To: solr-user@lucene.apache.org> >> Subject: Out of memory on Solr > sorting> >> Date: Tue, 22 Jul 2008 19:11:02 +0000> >>> >>> >> Hi,> >> Sorry > again fellos. I am not sure whats happening. The day with > >> solr is bad > for me I guess. EZMLM didnt let me send any mails this > >> morning. Asked me > to confirm subscription and when I did, it said I > >> was already a member. > Now my mails are all coming out bad. Sorry > >> for troubling y'all this bad. > I hope this mail comes out right.> >> >> > Hi,> > We are developing a product > in a agile manner and the current > > implementation has a data of size just > about a 800 megs in dev.> > The memory allocated to solr on dev (Dual core > Linux box) is 128-512.> >> > My config> > =========> >> > <!-- autocommit > pending docs if certain criteria are met> > <autoCommit>> > > <maxDocs>10000</maxDocs>> > <maxTime>1000</maxTime>> > </autoCommit>> > -->> > >> > <filterCache> > class="solr.LRUCache"> > size="512"> > > initialSize="512"> > autowarmCount="256"/>> >> > <queryResultCache> > > class="solr.LRUCache"> > size="512"> > initialSize="512"> > > autowarmCount="256"/>> >> > <documentCache> > class="solr.LRUCache"> > > size="512"> > initialSize="512"> > autowarmCount="0"/>> >> > > <enableLazyFieldLoading>true</enableLazyFieldLoading>> >> >> > My Field> > > =======> >> > <fieldType name="autocomplete" class="solr.TextField">> > > <analyzer type="index">> > <tokenizer class="solr.KeywordTokenizerFactory"/>> > > <filter class="solr.LowerCaseFilterFactory" />> > <filter > class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" > replacement="" replace="all" />> > <filter > class="solr.EdgeNGramFilterFactory" > > maxGramSize="100" minGramSize="1" />> > > </analyzer>> > <analyzer type="query">> > <tokenizer > class="solr.KeywordTokenizerFactory"/>> > <filter > class="solr.LowerCaseFilterFactory" />> > <filter > class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" > replacement="" replace="all" />> > <filter > class="solr.PatternReplaceFilterFactory" > > pattern="^(.{20})(.*)?" > replacement="$1" replace="all" />> > </analyzer>> > </fieldType>> >> >> > > Problem> > ======> >> > I execute a query that returns 24 rows of result. I > pick 10 out of > > it. I have no problem when I execute this.> > But When I > do sort it by a String field that is fetched from this > > result. I get an > OOM. I am able to execute several> > other queries with no problem. Just > having a sort asc clause added > > to the query throws an OOM. Why is that.> > > What should I have ideally done. My config on QA is pretty similar > > to > the dev box and probably has more data than on dev.> > It didnt throw any OOM > during the integration test. The Autocomplete > > is a new field we added > recently.> >> > Another point is that the indexing is done with a field of > type string> > <field name="XXX" type="string" indexed="true" stored="true" > > > termVectors="true"/>> >> > and the autocomplete field is a copy field.> >> > > The sorting is done based on string field.> >> > Please do lemme know what > mistake am I doing?> >> > Regards> > Sundar> >> > P.S: The stack trace of the > exception is> >> >> > Caused by: > org.apache.solr.client.solrj.SolrServerException: Error > > executing query> > > at > > > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86)> > > at > > > org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101)> > > at > > > com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193)> > > ... 105 more> > Caused by: org.apache.solr.common.SolrException: Java heap > space > > java.lang.OutOfMemoryError: Java heap space> > at > > > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > > at > org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)> > > at > > > org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352)> > > at > > > org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416)> > > at > > > org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207)> > > at > org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)> > > at > > > org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)> > > at > > > org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:56)> > > at > > > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907)> > > at > > > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)> > > at > > > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)> > > at > > > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)> > > at > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:156)> > > at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)> > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1025)> > at > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)> > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)> > > at > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)> > > at > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)> > > at > > > org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)> > > at > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)> > > at > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)> > > at > > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)> > > at > > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)> > > at > > > org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)> > > at > > > org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)> > > at > > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)> > > at > > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)> > > at > > > org.jboss.web.tomcat.tc5.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:156)> > > at > > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)> > > at > > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)> > > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)> > >> > _________________________________________________________________> > > Wish to Marry Now? Click Here to Register FREE> > > >>>>>>>>> > http://www.shaadi.com/registration/user/index.php?ptnr=mhottag>> >>>>>>>>>>> > >>>>>>>>>>> >>>>>>>> > _________________________________________________________________> >>>>>>>> > Missed your favourite programme? Stop surfing TV channels and > >>>>>>>> > start planning your weekend TV viewing with our > >>>>>>>> comprehensive TV > Listing> >>>>>>>> http://entertainment.in.msn.com/TV/TVListing.aspx> > >>>>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >>>> > > _________________________________________________________________ Wish to Marry Now? Join Shaadi.com FREE! http://www.shaadi.com/registration/user/index.php?ptnr=mhottag