Hi Maulin,

I'll chime in by referring to my own findings when analyzing Solr performance: https://www.mail-archive.com/solr-user@lucene.apache.org/msg135857.html

Yonik has a good article about paging: http://yonik.com/solr/paging-and-deep-paging/. While it's about deep paging, the same issues affect searches that request too many rows. We have switched to cursorMark when needing a large set of records and found that it, while consuming less memory, also performs much better when fetching like 100 000 records. We also limited paging in our UI to 100 000 first hits, which avoids intermittent issues experienced when a user tried to e.g. go to the last result page in a really large result set. Unfortunately there's no mechanism that would allow one to present a large result set completely while allowing jumping to a random page.

--Ere

Maulin Rathod kirjoitti 7.2.2018 klo 14.20:
Hi Erick,



Thanks for your response. It shows GC pauses in Solr GC logs (refer below solr 
gc log where it shows 138.4138211 sec pause).



Seems like some bad query causes high memory allocation.

Further analyzing issue we found that asking for too many rows (e.g. 
rows=10000000) can cause full GC problem as mentioned in below link. We had 
some query which was asking for too many rows….so for now we have reduced 
number rows. After changing this we don’t see any large GC Pause (Max GC pause 
is 3-4 second). So seems like issue is resolved for now…But still please let me 
know if you have any other suggestion which can help us to understand the root 
cause…


https://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems
https://sbdevel.wordpress.com/2015/10/05/speeding-up-core-search/





Solr GC Log

==========



2018-01-04T12:13:40.346+0000: 140501.570: 
[CMS-concurrent-abortable-preclean-start]

{Heap before GC invocations=8909 (full 1001):

par new generation   total 10922688K, used 4187699K [0x0000000080000000, 
0x00000003a0000000, 0x00000003a0000000)

   eden space 8738176K,  41% used [0x0000000080000000, 0x000000015fb1f6d8, 
0x0000000295560000)

   from space 2184512K,  23% used [0x0000000295560000, 0x00000002b53cd8b8, 
0x000000031aab0000)

   to   space 2184512K,   0% used [0x000000031aab0000, 0x000000031aab0000, 
0x00000003a0000000)

concurrent mark-sweep generation total 39321600K, used 38082444K 
[0x00000003a0000000, 0x0000000d00000000, 0x0000000d00000000)

Metaspace       used 46676K, capacity 47372K, committed 50136K, reserved 51200K

2018-01-04T12:13:40.393+0000: 140501.618: [GC (Allocation Failure) 140501.618: 
[CMS2018-01-04T12:13:40.534+0000: 140501.766: 
[CMS-concurrent-abortable-preclean: 0.149/0.196 secs] [Times: user=0.41 
sys=0.00, real=0.19 secs]

  (concurrent mode failure): 38082444K->18127660K(39321600K), 138.3709254 secs] 
42270144K->18127660K(50244288K), [Metaspace: 46676K->46676K(51200K)], 138.3746435 
secs] [Times: user=12.75 sys=22.89, real=138.38 secs]

Heap after GC invocations=8910 (full 1002):

par new generation   total 10922688K, used 0K [0x0000000080000000, 
0x00000003a0000000, 0x00000003a0000000)

   eden space 8738176K,   0% used [0x0000000080000000, 0x0000000080000000, 
0x0000000295560000)

   from space 2184512K,   0% used [0x0000000295560000, 0x0000000295560000, 
0x000000031aab0000)

   to   space 2184512K,   0% used [0x000000031aab0000, 0x000000031aab0000, 
0x00000003a0000000)

concurrent mark-sweep generation total 39321600K, used 18127660K 
[0x00000003a0000000, 0x0000000d00000000, 0x0000000d00000000)

Metaspace       used 46676K, capacity 47372K, committed 50136K, reserved 51200K

}

2018-01-04T12:15:58.772+0000: 140639.993: Total time for which application 
threads were stopped: 138.4138211 seconds, Stopping threads took: 0.0369886 
seconds







Regards,



Maulin





-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 31 January 2018 22:47
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Long GC Pauses



Just to double check, when you san you're seeing 60-200 sec  GC pauses are you 
looking at the GC logs (or using some kind of monitor) or is that the time it 
takes the query to respond to the client? Because a single GC pause that long 
on 40G is unusual no matter what. Another take on Jason's question is For all 
the JVMs you're running, how much _total_ heap is allocated?

And how much physical memory is on the box? I generally start with _at least_ 
half the memory left to the OS....



These are fairly horrible, what generates such queries?

AND * AND *



Best,

Erick







On Wed, Jan 31, 2018 at 7:28 AM, Jason Gerlowski 
<gerlowsk...@gmail.com<mailto:gerlowsk...@gmail.com>> wrote:

Hi Maulin,



To clarify, when you said "...allocated 40 GB RAM to each shard."

above, I'm going to assume you meant "to each node" instead.  If you

actually did mean "to each shard" above, please correct me and anyone

who chimes in afterward.



Firstly, it's really hard to even take guesses about potential causes

or remediations without more details about your load characteristics

(average/peak QPS, index size, average document size, etc.).  If no

one gives any satisfactory advice, please consider uploading

additional details to help us help you.



Secondly, I don't know anything about the load characteristics you're

putting on your Solr cluster, but I'm curious whether you've

experimented with lower RAM settings.  Generally speaking, the more

RAM you have, the longer your GC pauses are likely to be (even with

the tuning that various GC settings provide).  If you can get away

with giving the Solr process less RAM, you should see your GC pauses

shrink.  Was 40GB chosen after some trial-and-error experimentation, or is it 
something you could investigate?



For a bit more overview on this, see this slightly outdated (but still

useful) wiki page:

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM



Hope that helps, even if just to disqualify some potential

causes/solutions to close in on a real fix.



Best,



Jason



On Wed, Jan 31, 2018 at 8:17 AM, Maulin Rathod 
<mrat...@asite.com<mailto:mrat...@asite.com>> wrote:



Hi,



We are using solr cloud 6.1. We have around 20 collection on 4 nodes

(We have 2 shards and each shard have 2 replicas). We have allocated

40 GB RAM to each shard.



Intermittently we found long GC pauses (60 sec to 200 sec) due to

which solr stops responding and hence collections goes in recovering

mode. It takes minimum 5-10 minutes (sometime it takes more and we

have to restart the solr node) for recovering all collections. We are

using default GC setting (CMS) as per solr.cmd.



We tried different G1 GC to see if it help, but still we see long GC

pauses(60 sec to 200 sec) and also found that memory usage is more in

in case G1 GC.



What could be reason for long GC pauses and how can fix it?

Insufficient memory or problem with GC setting or something else? Any

suggestion would be greatly appreciated.



In our analysis, we also found some inefficient queries (which uses *

many times in query) in solr logs. Could it be reason for high memory usage?



Slow Query

--------------



INFO  (qtp1239731077-498778) [c:documents s:shard1 r:core_node1

x:documents] o.a.s.c.S.Request [documents]  webapp=/solr path=/select

params={df=summary&distrib=false&fl=id&shards.purpose=4&

start=0&fsv=true&sort=description+asc,id+desc&fq=&shard.url=

s1.asite.com:8983/solr/documents|s1r1.asite.com:

8983/solr/documents&rows=250&version=2&q=((id:(

REV78364_24705418+REV78364_24471492+REV78364_24471429+

REV78364_24470771+REV78364_24470271+))+OR+summary:((HPC*+

AND+*+AND+*+AND+OH1150*+AND+*+AND+*+AND+U0*+AND+*+AND+*+AND+

HGS*+AND+*+AND+*+AND+MDL*+AND+*+AND+*+AND+100067*+AND+*+AND+

-*+AND+Reinforcement*+AND+*+AND+Mode*)+))++AND++(title:((*

HPC\+\-\+OH1150\+\-\+U0\+\-\+HGS\+\-\+MDL\+\-\+100067\+-\+

Reinforcement\+Mode*)+))+AND+project_id:(-2+78243+78365+

78364)+AND+is_active:true+AND+((isLatest:(true)+AND+

isFolderActive:true+AND+isXref:false+AND+-document_

type_id:(3+7)+AND+((is_public:true+OR+distribution_list:

4858120+OR+folderadmin_list:4858120+OR+author_user_id:

4858120)+AND+((defaultAccess:(true)+OR+allowedUsers:(

4858120)+OR+allowedRoles:(6342201+172408+6336860)+OR+

combinationUsers:(4858120))+AND+-blockedUsers:(4858120))))

+OR+(isLatestRevPrivate:(true)+AND+allowedUsersForPvtRev:(

4858120)+AND+-folderadmin_list:(4858120)))&shards.tolerant=true&NOW=

1516786982952&isShard=true&wt=javabin} hits=0 status=0 QTime=83309









Regards,



Maulin



[CC Award Winners!]





--
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Reply via email to