Hi,

I looked at those metrics outputs, but nothing jumps out at me as
problematic.

How full are your JVM heap memory pools?  If you are using SPM to monitor
your Solr/Tomcat/Jetty/... look for a chart that looks like this:
https://apps.sematext.com/spm-reports/s/zB3JcdZyRn

If some of these lines are close to 100% and stay close or at 100%, that's
typically a bad sign.
Next, look at your Garbage Collection times and counts.  If you look at
your GC metrics for e.g. a month and see a recent increase in GC times or
counts then, yes, you have an issue with your memory/heap and that is what
is increasing your CPU usage.

If it looks like heap/GC are not the issue and it's really something inside
Solr, you could profile it with either one of the standard profilers or
something like
https://sematext.com/blog/2016/03/17/on-demand-java-profiling/ .  If there
is something in Solr chewing on the CPU, this should show it.

I hope this helps.

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Wed, Mar 16, 2016 at 10:52 AM, YouPeng Yang <yypvsxf19870...@gmail.com>
wrote:

> Hi
>  It happened again,and worse thing is that my system went to crash.we can
> even not connect to it with ssh.
>  I use the sar command to capture the statistics information about it.Here
> are my details:
>
>
> [1]cpu(by using sar -u),we have to restart our system just as the red font
> LINUX RESTART in the logs.
>
> --------------------------------------------------------------------------------------------------
> 03:00:01 PM     all      7.61      0.00      0.92      0.07      0.00
> 91.40
> 03:10:01 PM     all      7.71      0.00      1.29      0.06      0.00
> 90.94
> 03:20:01 PM     all      7.62      0.00      1.98      0.06      0.00
> 90.34
> 03:30:35 PM     all      5.65      0.00     31.08      0.04      0.00
> 63.23
> 03:42:40 PM     all     47.58      0.00     52.25      0.00      0.00
>  0.16
> Average:        all      8.21      0.00      1.57      0.05      0.00
> 90.17
>
> 04:42:04 PM       LINUX RESTART
>
> 04:50:01 PM     CPU     %user     %nice   %system   %iowait    %steal
> %idle
> 05:00:01 PM     all      3.49      0.00      0.62      0.15      0.00
> 95.75
> 05:10:01 PM     all      9.03      0.00      0.92      0.28      0.00
> 89.77
> 05:20:01 PM     all      7.06      0.00      0.78      0.05      0.00
> 92.11
> 05:30:01 PM     all      6.67      0.00      0.79      0.06      0.00
> 92.48
> 05:40:01 PM     all      6.26      0.00      0.76      0.05      0.00
> 92.93
> 05:50:01 PM     all      5.49      0.00      0.71      0.05      0.00
> 93.75
>
> --------------------------------------------------------------------------------------------------
>
> [2]mem(by using sar -r)
>
> --------------------------------------------------------------------------------------------------
> 03:00:01 PM   1519272 196633272     99.23    361112  76364340 143574212
> 47.77
> 03:10:01 PM   1451764 196700780     99.27    361196  76336340 143581608
> 47.77
> 03:20:01 PM   1453400 196699144     99.27    361448  76248584 143551128
> 47.76
> 03:30:35 PM   1513844 196638700     99.24    361648  76022016 143828244
> 47.85
> 03:42:40 PM   1481108 196671436     99.25    361676  75718320 144478784
> 48.07
> Average:      5051607 193100937     97.45    362421  81775777 142758861
> 47.50
>
> 04:42:04 PM       LINUX RESTART
>
> 04:50:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit
> %commit
> 05:00:01 PM 154357132  43795412     22.10     92012  18648644 134950460
> 44.90
> 05:10:01 PM 136468244  61684300     31.13    219572  31709216 134966548
> 44.91
> 05:20:01 PM 135092452  63060092     31.82    221488  32162324 134949788
> 44.90
> 05:30:01 PM 133410464  64742080     32.67    233848  32793848 134976828
> 44.91
> 05:40:01 PM 132022052  66130492     33.37    235812  33278908 135007268
> 44.92
> 05:50:01 PM 130630408  67522136     34.08    237140  33900912 135099764
> 44.95
> Average:    136996792  61155752     30.86    206645  30415642 134991776
> 44.91
>
> --------------------------------------------------------------------------------------------------
>
>
> As the blue font parts show that my hardware crash from 03:30:35.It is hung
> up until I restart it manually at 04:42:04
> ALl the above information just snapshot the performance when it crashed
> while there is nothing cover the reason.I have also
> check the /var/log/messages and find nothing useful.
>
> Note that I run the command- sar -v .It shows something abnormal:
>
> ------------------------------------------------------------------------------------------------
> 02:50:01 PM  11542262      9216     76446       258
> 03:00:01 PM  11645526      9536     76421       258
> 03:10:01 PM  11748690      9216     76451       258
> 03:20:01 PM  11850191      9152     76331       258
> 03:30:35 PM  11972313     10112    132625       258
> 03:42:40 PM  12177319     13760    340227       258
> Average:      8293601      8950     68187       161
>
> 04:42:04 PM       LINUX RESTART
>
> 04:50:01 PM dentunusd   file-nr  inode-nr    pty-nr
> 05:00:01 PM     35410      7616     35223         4
> 05:10:01 PM    137320      7296     42632         6
> 05:20:01 PM    247010      7296     42839         9
> 05:30:01 PM    358434      7360     42697         9
> 05:40:01 PM    471543      7040     42929        10
> 05:50:01 PM    583787      7296     42837        13
>
> ------------------------------------------------------------------------------------------------
>
> and I check the man info about the -v option :
>
> ------------------------------------------------------------------------------------------------
> *-v*  Report status of inode, file and other kernel tables.  The following
> values are displayed:
>        *dentunusd*
> Number of unused cache entries in the directory cache.
> *file-nr*
> Number of file handles used by the system.
> *inode-nr*
> Number of inode handlers used by the system.
> *pty-nr*
> Number of pseudo-terminals used by the system.
>
> ------------------------------------------------------------------------------------------------
>
> Is the any clue about the crash? Would you please give me some suggestions?
>
>
> Best Regards.
>
>
> 2016-03-16 14:01 GMT+08:00 YouPeng Yang <yypvsxf19870...@gmail.com>:
>
> > Hello
> >    The problem appears several times ,however I could not capture the top
> > output .My script is as follows code.
> > I check the sys cpu usage whether it exceed 30%.the other metric
> > information can be dumpped successfully except the top .
> > Would you like to check my script that I am not able to figure out what
> is
> > wrong.
> >
> >
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > #!/bin/bash
> >
> > while :
> >   do
> >     sysusage=$(mpstat 2 1 | grep -A 1 "%sys" | tail -n 1 | awk '{if($6 <
> > 30) print 1; else print 0;}' )
> >
> >     if [ $sysusage -eq 0 ];then
> >         #echo $sysusage
> >         #perf record -o perf$(date +%Y%m%d%H%M%S).data  -a -g -F 1000
> > sleep 30
> >         file=$(date +%Y%m%d%H%M%S)
> >         top -n 2 >> top$file.data
> >         iotop -b -n 2  >> iotop$file.data
> >         iostat >> iostat$file.data
> >         netstat -an | awk '/^tcp/ {++state[$NF]} END {for(i in state)
> > print i,"\t",state[i]}' >> netstat$file.data
> >     fi
> >     sleep 5
> >   done
> > You have new mail in /var/spool/mail/root
> >
> >
> >
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > 2016-03-08 21:39 GMT+08:00 YouPeng Yang <yypvsxf19870...@gmail.com>:
> >
> >> Hi all
> >>   Thanks for your reply.I do some investigation for much time.and I will
> >> post some logs of the 'top' and IO in a few days when the crash come
> again.
> >>
> >> 2016-03-08 10:45 GMT+08:00 Shawn Heisey <apa...@elyograg.org>:
> >>
> >>> On 3/7/2016 2:23 AM, Toke Eskildsen wrote:
> >>> > How does this relate to YouPeng reporting that the CPU usage
> increases?
> >>> >
> >>> > This is not a snark. YouPeng mentions kernel issues. It might very
> well
> >>> > be that IO is the real problem, but that it manifests in a
> >>> non-intuitive
> >>> > way. Before memory-mapping it was easy: Just look at IO-Wait. Now I
> am
> >>> > not so sure. Can high kernel load (Sy% in *nix top) indicate that the
> >>> IO
> >>> > system is struggling, even if IO-Wait is low?
> >>>
> >>> It might turn out to be not directly related to memory, you're right
> >>> about that.  A very high query rate or particularly CPU-heavy queries
> or
> >>> analysis could cause high CPU usage even when memory is plentiful, but
> >>> in that situation I would expect high user percentage, not kernel.  I'm
> >>> not completely sure what might cause high kernel usage if iowait is
> low,
> >>> but no specific information was given about iowait.  I've seen iowait
> >>> percentages of 10% or less with problems clearly caused by iowait.
> >>>
> >>> With the available information (especially seeing 700GB of index data),
> >>> I believe that the "not enough memory" scenario is more likely than
> >>> anything else.  If the OP replies and says they have plenty of memory,
> >>> then we can move on to the less common (IMHO) reasons for high CPU with
> >>> a large index.
> >>>
> >>> If the OS is one that reports load average, I am curious what the 5
> >>> minute average is, and how many real (non-HT) CPU cores there are.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >>>
> >>
> >
>

Reply via email to