Hi Erick, I agree. The 300K docs in one search is an anomaly. But we do use 'fq' to return a large number of docs for the purposes of generating statistics for the whole index. We do use CursorMark extensively. Thanks!
Reinaldo On Tue, Jul 14, 2020 at 8:55 AM Erick Erickson <erickerick...@gmail.com> wrote: > I’d add that you’re abusing Solr horribly by returning 300K documents in a > single go. > > Solr is built to return the top N docs where N is usually quite small, < > 100. If you allow > an unlimited number of docs to be returned, you’re simply kicking the can > down > the road, somebody will ask for 1,000,000 docs sometime and you’ll be back > where > you started. > > I _strongly_ recommend you do one of two things for such large result sets: > > 1> Use Streaming. Perhaps Streaming Expressions will do what you want > without you having to process all those docs on the client if you’re > doing some kind of analytics. > > 2> if you really, truly need all 300K docs, try getting them in chunks > using CursorMark. > > Best, > Erick > > > On Jul 13, 2020, at 10:03 PM, Odysci <ody...@gmail.com> wrote: > > > > Shawn, > > > > thanks for the extra info. > > The OOM errors were indeed because of heap space. In my case most of the > GC > > calls were not full GC. Only when heap was really near the top, a full GC > > was done. > > I'll try out your suggestion of increasing the G1 heap region size. I've > > been using 4m, and from what you said, a 2m allocation would be > considered > > humongous. My test cases have a few allocations that are definitely > bigger > > than 2m (estimating based on the number of docs returned), but most of > them > > are not. > > > > When i was using maxRamMB, the size used was "compatible" with the the > size > > values, assuming the avg 2K bytes docs that our index has. > > As far as I could tell in my runs, removing maxRamMB did change the GC > > behavior for the better. That is, now, heap goes up and down as expected, > > and before (with maxRamMB) it seemed to increase continuously. > > Thanks > > > > Reinaldo > > > > On Sun, Jul 12, 2020 at 1:02 AM Shawn Heisey <apa...@elyograg.org> > wrote: > > > >> On 6/25/2020 2:08 PM, Odysci wrote: > >>> I have a solrcloud setup with 12GB heap and I've been trying to > optimize > >> it > >>> to avoid OOM errors. My index has about 30million docs and about 80GB > >>> total, 2 shards, 2 replicas. > >> > >> Have you seen the full OutOfMemoryError exception text? OOME can be > >> caused by problems that are not actually memory-related. Unless the > >> error specifically mentions "heap space" we might be chasing the wrong > >> thing here. > >> > >>> When the queries return a smallish number of docs (say, below 1000), > the > >>> heap behavior seems "normal". Monitoring the gc log I see that young > >>> generation grows then when GC kicks in, it goes considerably down. And > >> the > >>> old generation grows just a bit. > >>> > >>> However, at some point i have a query that returns over 300K docs (for > a > >>> total size of approximately 1GB). At this very point the OLD generation > >>> size grows (almost by 2GB), and it remains high for all remaining time. > >>> Even as new queries are executed, the OLD generation size does not go > >> down, > >>> despite multiple GC calls done afterwards. > >> > >> Assuming the OOME exceptions were indeed caused by running out of heap, > >> then the following paragraphs will apply: > >> > >> G1 has this concept called "humongous allocations". In order to reach > >> this designation, a memory allocation must get to half of the G1 heap > >> region size. You have set this to 4 megabytes, so any allocation of 2 > >> megabytes or larger is humongous. Humongous allocations bypass the new > >> generation entirely and go directly into the old generation. The max > >> value that can be set for the G1 region size is 32MB. If you increase > >> the region size and the behavior changes, then humongous allocations > >> could be something to investigate. > >> > >> In the versions of Java that I have used, humongous allocations can only > >> be reclaimed as garbage by a full GC. I do not know if Oracle has > >> changed this so the smaller collections will do it or not. > >> > >> Were any of those multiple GCs a Full GC? If they were, then there is > >> probably little or no garbage to collect. You've gotten a reply from > >> "Zisis T." with some possible causes for this. I do not have anything > >> to add. > >> > >> I did not know about any problems with maxRamMB ... but if I were > >> attempting to limit cache sizes, I would do so by the size values, not a > >> specific RAM size. The size values you have chosen (8192 and 16384) > >> will most likely result in a total cache size well beyond the limits > >> you've indicated with maxRamMB. So if there are any bugs in the code > >> with the maxRamMB parameter, you might end up using a LOT of memory that > >> you didn't expect to be using. > >> > >> Thanks, > >> Shawn > >> > >