Hi Daniel,
             I am afraid that didnt solve my problem. I was guessing my problem 
was that I have too much of data and too little memory allocated for that. I 
happened to read in couple of the posts which mentioned that I need VM that is 
close to the size of my data(folder). I have like 540 Megs now and a little 
more than a million and a half docs. Ideally in that case 512 megs should be 
enough for me. In fact I am able to perform all other operations now, commit, 
optmize, select, update, nightly cron jobs to index data again. etc etc with no 
hassles. Even my load tests perform very well. Just the sort and it doesnt seem 
to work. I allocated 2 gigs of memory now. Still same results. Used the GC 
params u gave me too. No change what so ever. Am not sure, whats going on. Is 
there something that I can do to find out how much is needed in actuality as my 
production server might need to be configured in accordance.

I dont store any documents. We basically fetch standard column data from oracle 
database store them into Solr fields. Before I had EdgeNGram configured and had 
Solr 1.2, My data size was less that half of what it is right now. I guess if I 
remember right, it was of the order of 100 megs. The max size of a field right 
now might not cross a 100 chars too. Quizzled even more now. 

-Sundar

P.S: My configurations : 
Solr 1.3 
Red hat 
540 megs of data (1855013 docs)
2 gigs of memory installed and allocated like this
JAVA_OPTS=$JAVA_OPTS -Xms2048m -Xmx2048m -XX:MinHeapFreeRatio=50 
-XX:NewSize=1024m -XX:NewRatio=2 -Dsun.rmi.dgc.client.gcInterval=3600000 
-Dsun.rmi.dgc.server.gcInterval=3600000

Jboss 4.05


> Subject: RE: Out of memory on Solr sorting
> Date: Wed, 23 Jul 2008 10:49:06 +0100
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> 
> Hi
> 
> I haven't read the whole thread so I will take my chances here.
> 
> I've been fighting recently to keep my Solr instances stable because
> they were frequently crashing with OutOfMemoryErrors. I'm using Solr 1.2
> and when it happens there is a bug that makes the index locked unless
> you restart Solr... So in my cenario it was extremelly damaging.
> 
> After some profiling I realized that my major problem was caused by the
> way the JVM heap was being used as I haven't configured it to run using
> any advanced configuration (I had just made it bigger - Xmx and Xms 1.5
> Gb), it's running on Sun JVM 1.5 (the most recent 1.5 available) and
> it's deployed on a Jboss 4.2 on a RHEL. 
> 
> So my findings were too many objects were being allocated on the old
> generation area of the heap, which makes them harder to be disposed, and
> also the default behaviour was letting the heap get too filled up before
> kicking a GC and according to the JVM specs the default is if after a
> short period when a full gc is executed if a certain percentage of the
> heap is not freed an OutOfMemoryError should be thrown.
> 
> I've changed my JVM startup params and it's working extremelly stable
> since then:
> 
> -Xmx2048m -Xms2048m -XX:MinHeapFreeRatio=50 -XX:NewSize=1024m
> -XX:NewRatio=2 -Dsun.rmi.dgc.client.gcInterval=3600000
> -Dsun.rmi.dgc.server.gcInterval=3600000
> 
> I hope it helps.
> 
> Regards,
> Daniel Alheiros
> 
> -----Original Message-----
> From: Fuad Efendi [mailto:[EMAIL PROTECTED] 
> Sent: 22 July 2008 23:23
> To: solr-user@lucene.apache.org
> Subject: RE: Out of memory on Solr sorting
> 
> Yes, it is a cache, it stores "sorted" by "sorted field" array of
> Document IDs together with sorted fields; query results can intersect
> with it and reorder accordingly.
> 
> But memory requirements should be well documented.
> 
> It uses internally WeakHashMap which is not good(!!!) - a lot of
> "underground" warming ups of caches which SOLR is not aware of...  
> Could be.
> 
> I think Lucene-SOLR developers should join this discussion:
> 
> 
> /**
>   * Expert: The default cache implementation, storing all values in
> memory.
>   * A WeakHashMap is used for storage.
>   *
> ..............
> 
>    // inherit javadocs
>    public StringIndex getStringIndex(IndexReader reader, String field)
>        throws IOException {
>      return (StringIndex) stringsIndexCache.get(reader, field);
>    }
> 
>    Cache stringsIndexCache = new Cache() {
> 
>      protected Object createValue(IndexReader reader, Object fieldKey)
>          throws IOException {
>        String field = ((String) fieldKey).intern();
>        final int[] retArray = new int[reader.maxDoc()];
>        String[] mterms = new String[reader.maxDoc()+1];
>        TermDocs termDocs = reader.termDocs();
>        TermEnum termEnum = reader.terms (new Term (field, ""));
> ....................
> 
> 
> 
> 
> 
> Quoting Fuad Efendi <[EMAIL PROTECTED]>:
> 
> > I am hoping [new StringIndex (retArray, mterms)] is called only once 
> > per-sort-field and cached somewhere at Lucene;
> >
> > theoretically you need multiply number of documents on size of field 
> > (supposing that field contains unique text); you need not tokenize 
> > this field; you need not store TermVector.
> >
> > for 2 000 000 documents with simple untokenized text field such as 
> > title of book (256 bytes) you need probably 512 000 000 bytes per 
> > Searcher, and as Mark mentioned you should limit number of searchers 
> > in SOLR.
> >
> > So that Xmx512M is definitely not enough even for simple cases...
> >
> >
> > Quoting sundar shankar <[EMAIL PROTECTED]>:
> >
> >> I haven't seen the source code before, But I don't know why the    
> >> sorting isn't done after the fetch is done. Wouldn't that make it    
> >> more faster. at least in case of field level sorting? I could be    
> >> wrong too and the implementation might probably be better. But   
> >> don't  know why all of the fields have had to be loaded.
> >>
> >>
> >>
> >>
> >>
> >>> Date: Tue, 22 Jul 2008 14:26:26 -0700> From: [EMAIL PROTECTED]> To:    
> >>> solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr    
> >>> sorting> > > Ok, after some analysis of FieldCacheImpl:> > - it is
> >>>   supposed that (sorted) Enumeration of "terms" is less than >   
> >>> total  number of documents> (so that SOLR uses specific field type  
> >>> for  sorted searches: > solr.StrField with omitNorms="true")> >
> >>> It   creates int[reader.maxDoc()] array, checks (sorted)  
> >>> Enumeration  of  > "terms" (untokenized solr.StrField), and 
> >>> populates array  with  document > Ids.> > > - it also creates array 
> >>> of String>  String[]  mterms = new String[reader.maxDoc()+1];> > Why
> 
> >>> do we  need that? For  1G
> >>> document with average term/StrField size > of  100 bytes (which   
> >>> could be unique text!!!) it will create kind of  > huge 100Gb cache
> 
> >>> which is not really needed...> StringIndex  value = new StringIndex
> 
> >>> (retArray, mterms);> > If I understand  correctly...
> >>> StringIndex  _must_ be a file in a > filesystem for  such a case... 
> >>> We create  StringIndex, and retrieve top > 10  documents, huge 
> >>> overhead.> > > >
> >>>> > Quoting Fuad Efendi <[EMAIL PROTECTED]>:> > >> > Ok, what is
> >>> confusing me is implicit guess that FieldCache contains> > "field"  
> >>>   and Lucene uses in-memory sort instead of using file-system> >    
> >>> "index".......> >> > Array syze: 100Mb (25M x 4 bytes), and it is   
> >>>  just pointers (4-byte> > integers) to documents in index.> >> >    
> >>> org.apache.lucene.search.FieldCacheImpl$10.createValue> > ...> >    
> >>> 357: protected Object createValue(IndexReader reader, Object    
> >>> fieldKey)> > 358: throws IOException {> > 359: String field =    
> >>> ((String) fieldKey).intern();> > 360: final int[] retArray = new    
> >>> int[reader.maxDoc()]; // > > OutOfMemoryError!!!> > ...> > 408:    
> >>> StringIndex value = new StringIndex (retArray, mterms);> > 409:    
> >>> return value;> > 410: }> > ...> >> > It's very confusing, I don't   
> >>>  know such internals...> >> >> >>>>> <field name="XXX"   
> >>> type="string"  indexed="true" stored="true" > >>>>>   
> >>> termVectors="true"/>> >>>>>  The sorting is done based on string   
> >>> field.> >> >> > I think Sundar  should not use   
> >>> [termVectors="true"]...> >> >> >> > Quoting Mark  Miller   
> >>> <[EMAIL PROTECTED]>:> >> >> Hmmm...I think its 32bits an    
> >>> integer with an index entry for each doc, so> >>> >>> >> **25 000   
> >>>  000 x 32 bits = 95.3674316 megabytes**> >>> >> Then you have the   
> >>>  string array that contains each unique term from your> >>    
> >>> index...you can guess that based on the number of terms in your    
> >>> index> >> and an avg length guess.> >>> >> There is some other    
> >>> overhead beyond the sort cache as well, but thats> >> the bulk of   
> >>>  what it will add. I think my memory may be bad with my> >>   
> >>> original  estimate :)> >>> >> Fuad Efendi wrote:> >>> Thank you   
> >>> very much  Mark,> >>>> >>> it explains me a lot;> >>>> >>> I am   
> >>> guessing: for  1,000,000 documents with a [string] field of > >>>   
> >>> average size  1024 bytes I need 1Gb for single IndexSearcher > >>>  
> >>> instance;  field-level cache it is used internally by Lucene (can
> >>>  > >>> Lucene  manage size if it?); we can't have 1G of such   
> >>> documents > >>>  without having 1Tb RAM...> >>>> >>>> >>>> >>>   
> >>> Quoting Mark Miller  <[EMAIL PROTECTED]>:> >>>> >>>> Fuad   
> >>> Efendi wrote:> >>>>>>  SEVERE: java.lang.OutOfMemoryError:   
> >>> allocLargeObjectOrArray -  Object> >>>>>> size: 100767936, Num   
> >>> elements: 25191979> >>>>>>
> >>>>>>>>> >>>>> I just noticed, this is an exact number of documents  in
> >>> index: 25191979> >>>>>> >>>>> (http://www.tokenizer.org/, you  can  
> >>>  sort - click headers Id, > >>>>> [COuntry, Site, Price] in a    
> >>> table; experimental)> >>>>>> >>>>>> >>>>> If array is allocated    
> >>> ONLY on new searcher warming up I am > >>>>> _extremely_ happy...   
> >>> I  had constant OOMs during past month (SUN > >>>>> Java 5).> >>>>  
> >>>  It  is only on warmup - I believe its lazy loaded, so the first   
> >>> time a>  >>>> search is done (solr does the search as part of   
> >>> warmup I  believe) the> >>>> fieldcache is loaded. The underlying   
> >>> IndexReader  is the key to the>
> >>>>>>> fieldcache, so until you get a new  IndexReader (SolrSearcher in
> >>> solr> >>>> world?) the field cache  will be good. Keep in mind   
> >>> that as a searcher> >>>> is warming, the  other search is still   
> >>> serving, so that will up the ram> >>>>  requirements...and since I  
> >>>  think you can have >1 searchers on> >>>>  deck...you get the   
> >>> idea.> >>>>> >>>> As far as the number I gave,  thats from a   
> >>> memory made months and months> >>>> ago, so go with  what you   
> >>> see.> >>>>>> >>>>>> >>>>>>
> >>>>>>>> Quoting Fuad Efendi  <[EMAIL PROTECTED]>:> >>>>>> >>>>>> I've even
> >>> seen exceptions (posted  here) when "sort"-type queries caused>
> >>>>>>>>> Lucene to allocate  100Mb arrays, here is what happened to
> >>> me:> >>>>>>> >>>>>> SEVERE:  java.lang.OutOfMemoryError:   
> >>> allocLargeObjectOrArray - Object>  >>>>>> size: 100767936, Num   
> >>> elements: 25191979> >>>>>> at> >>>>>>    
> >>>
> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.ja
> va:360) > >>>>>> at> >>>>>>
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72
> ) - it does not happen after I increased from 4096M to 8192M > >>>>>>
> (JRockit> >>>>>> R27; more intelligent stacktrace, isn't it?)> >>>>>>>
> >>>>>> Thanks Mark; I didn't know that it happens only once (on warming
> up a> >>>>>> searcher).> >>>>>>> >>>>>>> >>>>>>> >>>>>> Quoting Mark
> Miller <[EMAIL PROTECTED]>:> >>>>>>> >>>>>>> Because to sort
> efficiently, Solr loads the term to sort on for each> >>>>>>> doc in the
> index into an array. For ints,longs, etc its just an array> >>>>>>> the
> size of the number of docs in your index (i believe deleted or> >>>>>>>
> not). For a String its an array to hold each unique string and an array>
> >>>>>>> of ints indexing into the String array.> >>>>>>>> >>>>>>> So if
> you do a sort, and search for something that only gets 1 doc as a>
> >>>>>>> hit...your still loading up that field cache for every single
> doc in> >>>>>>> your index on the first search. With solr, this happens
> in the> >>>>>>> background as it warms up the searcher. The end story
> is, you need more> >>>>>>> RAM to accommodate the sort most
> likely...have you upped your xmx> >>>>>>> setting? I think you can
> roughly say a 2 million doc index would need> >>>>>>> 40-50 MB
> (depending and rough, but to give an idea) per field your> >>>>>>>
> sorting on.> >>>>>>>> >>>>>>> - Mark> >>>>>>>> >>>>>>> sundar shankar
> wrote:> >>>>>>>> Thanks Fuad.> >>>>>>>> But why does just sorting
> provide an OOM. I > >>>>>>>> executed the query without adding the sort
> clause it executed > >>>>>>>> perfectly. In fact I even tried remove the
> maxrows=10 and > >>>>>>>> executed. it came out fine. Queries with
> bigger results > >>>>>>>> seems to come out fine too. But why just sort
> of that too > >>>>>>>> just 10 rows??> >>>>>>>> -Sundar> >>>>>>>>>
> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Date: Tue, 22 Jul 2008 12:24:35
> -0700> From: [EMAIL PROTECTED]> > >>>>>>>>> To:
> solr-user@lucene.apache.org> Subject: RE: Out of > >>>>>>>>> memory on
> Solr sorting> > > >>>>>>>>>
> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.ja
> va:403)> > - this piece of code do not request Array[100M] (as I seen
> with > Lucene), it asks only few bytes / Kb for a field...> > > Probably
> 128 - 512 is not enough; it is also advisable to use equal sizes>
> -Xms1024M -Xmx1024M> (it minimizes GC frequency, and itensures that
> 1024M is available at startup)> > OOM happens also with fragmented
> memory, when application requests big > contigues fragment and GC is
> unable to optimize; looks like your > application requests a little and
> memory is not available...> > > Quoting sundar shankar
> <[EMAIL PROTECTED]>:> > >> >> >> >> From:
> [EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org> >>
> Subject: Out of memory on Solr sorting> >> Date: Tue, 22 Jul 2008
> 19:11:02 +0000> >>> >>> >> Hi,> >> Sorry again fellos. I am not sure
> whats happening. The day with > >> solr is bad for me I guess. EZMLM
> didnt let me send any mails this > >> morning. Asked me to confirm
> subscription and when I did, it said I > >> was already a member. Now my
> mails are all coming out bad. Sorry > >> for troubling y'all this bad. I
> hope this mail comes out right.> >> >> > Hi,> > We are developing a
> product in a agile manner and the current > > implementation has a data
> of size just about a 800 megs in dev.> > The memory allocated to solr on
> dev (Dual core Linux box) is 128-512.> >> > My config> > =========> >> >
> <!-- autocommit pending docs if certain criteria are met> >
> <autoCommit>> > <maxDocs>10000</maxDocs>> > <maxTime>1000</maxTime>> >
> </autoCommit>> > -->> >> > <filterCache> > class="solr.LRUCache"> >
> size="512"> > initialSize="512"> > autowarmCount="256"/>> >> >
> <queryResultCache> > class="solr.LRUCache"> > size="512"> >
> initialSize="512"> > autowarmCount="256"/>> >> > <documentCache> >
> class="solr.LRUCache"> > size="512"> > initialSize="512"> >
> autowarmCount="0"/>> >> >
> <enableLazyFieldLoading>true</enableLazyFieldLoading>> >> >> > My Field>
> > =======> >> > <fieldType name="autocomplete" class="solr.TextField">>
> > <analyzer type="index">> > <tokenizer
> class="solr.KeywordTokenizerFactory"/>> > <filter
> class="solr.LowerCaseFilterFactory" />> > <filter
> class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])"
> replacement="" replace="all" />> > <filter
> class="solr.EdgeNGramFilterFactory" > > maxGramSize="100"
> minGramSize="1" />> > </analyzer>> > <analyzer type="query">> >
> <tokenizer class="solr.KeywordTokenizerFactory"/>> > <filter
> class="solr.LowerCaseFilterFactory" />> > <filter
> class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])"
> replacement="" replace="all" />> > <filter
> class="solr.PatternReplaceFilterFactory" > > pattern="^(.{20})(.*)?"
> replacement="$1" replace="all" />> > </analyzer>> > </fieldType>> >> >>
> > Problem> > ======> >> > I execute a query that returns 24 rows of
> result. I pick 10 out of > > it. I have no problem when I execute this.>
> > But When I do sort it by a String field that is fetched from this > >
> result. I get an OOM. I am able to execute several> > other queries with
> no problem. Just having a sort asc clause added > > to the query throws
> an OOM. Why is that.> > What should I have ideally done. My config on QA
> is pretty similar > > to the dev box and probably has more data than on
> dev.> > It didnt throw any OOM during the integration test. The
> Autocomplete > > is a new field we added recently.> >> > Another point
> is that the indexing is done with a field of type string> > <field
> name="XXX" type="string" indexed="true" stored="true" > >
> termVectors="true"/>> >> > and the autocomplete field is a copy field.>
> >> > The sorting is done based on string field.> >> > Please do lemme
> know what mistake am I doing?> >> > Regards> > Sundar> >> > P.S: The
> stack trace of the exception is> >> >> > Caused by:
> org.apache.solr.client.solrj.SolrServerException: Error > > executing
> query> > at > >
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.j
> ava:86)> > at > >
> org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.ja
> va:101)> > at > >
> com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(Ab
> stractSolrSearchService.java:193)> > ... 105 more> > Caused by:
> org.apache.solr.common.SolrException: Java heap space > >
> java.lang.OutOfMemoryError: Java heap space> > at > >
> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.ja
> va:403)> > at
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72
> )> > at > >
> org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.ja
> va:352)> > at > >
> org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSorte
> dHitQueue.java:416)> > at > >
> org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHi
> tQueue.java:207)> > at
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72
> )> > at > >
> org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSo
> rtedHitQueue.java:168)> > at > >
> org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.
> java:56)> > at > >
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.
> java:907)> > at > >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.j
> ava:838)> > at > >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:2
> 69)> > at > >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.
> java:160)> > at > >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search
> Handler.java:156)> > at > >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
> ase.java:128)> > at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1025)> > at > >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
> va:338)> > at > >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
> ava:272)> > at > >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
> tionFilterChain.java:202)> > at > >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
> erChain.java:173)> > at > >
> org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilte
> r.java:96)> > at > >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
> tionFilterChain.java:202)> > at > >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
> erChain.java:173)> > at > >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
> e.java:213)> > at > >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
> e.java:178)> > at > >
> org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAs
> sociationValve.java:175)> > at > >
> org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.j
> ava:74)> > at > >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
> :126)> > at > >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
> :105)> > at > >
> org.jboss.web.tomcat.tc5.jca.CachedConnectionValve.invoke(CachedConnecti
> onValve.java:156)> > at > >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
> java:107)> > at > >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:1
> 48)> > at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:86
> 9)> >> >
> _________________________________________________________________> >
> Wish to Marry Now? Click Here to Register FREE> > > >>>>>>>>>
> http://www.shaadi.com/registration/user/index.php?ptnr=mhottag>>
> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>
> _________________________________________________________________>
> >>>>>>>> Missed your favourite programme? Stop surfing TV channels and >
> >>>>>>>> start planning your weekend TV viewing with our > >>>>>>>>
> comprehensive TV Listing> >>>>>>>>
> http://entertainment.in.msn.com/TV/TVListing.aspx> >>>>>>>>> >>>>>>
> >>>>>> >>>>>> >>>> >>>> >>>>   
> >>> >
> >>>>
> >> _________________________________________________________________
> >> Wish to Marry Now? Join Shaadi.com FREE!
> >> http://www.shaadi.com/registration/user/index.php?ptnr=mhottag
> 
> 
> 
> 
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal 
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance 
> on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>                                       

_________________________________________________________________
Missed your favourite programme? Stop surfing TV channels and start planning 
your weekend TV viewing with our comprehensive TV Listing
http://entertainment.in.msn.com/TV/TVListing.aspx

Reply via email to