Re: logging-es errors: shards failed

Alex Wauck Fri, 15 Jul 2016 09:18:24 -0700

I'm not sure that I can.  I clicked the "Archive" link for the logging-es
pod and then changed the query in Kibana to "kubernetes_container_name:
logging-es-cycd8veb && kubernetes_namespace_name: logging".  I got no
results, instead getting this error:



   - *Index:*
    unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12
   *Shard:* 2 *Reason:* EsRejectedExecutionException[rejected execution
   (queue capacity 1000) on
   org.elasticsearch.search.action.SearchServiceTransportAction$23@6b1f2699]
   - *Index:*
    unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14
   *Shard:* 2 *Reason:* EsRejectedExecutionException[rejected execution
   (queue capacity 1000) on
   org.elasticsearch.search.action.SearchServiceTransportAction$23@66b9a5fb]
   - *Index:*
    unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15
   *Shard:* 2 *Reason:* EsRejectedExecutionException[rejected execution
   (queue capacity 1000) on
   org.elasticsearch.search.action.SearchServiceTransportAction$23@512820e]
   - *Index:*
    unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29
   *Shard:* 2 *Reason:* EsRejectedExecutionException[rejected execution
   (queue capacity 1000) on
   org.elasticsearch.search.action.SearchServiceTransportAction$23@3dce96b9]
   - *Index:*
    unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30
   *Shard:* 2 *Reason:* EsRejectedExecutionException[rejected execution
   (queue capacity 1000) on
   org.elasticsearch.search.action.SearchServiceTransportAction$23@2f774477]

When I initially clicked the "Archive" link, I saw a lot of messages with
the kubernetes_container_name "logging-fluentd", which is not what I
expected to see.


On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante <pport...@redhat.com>
wrote:

> Can you go back further in the logs to the point where the errors started?
>
> I am thinking about possible Java HEAP issues, or possibly ES
> restarting for some reason.
>
> -peter
>
> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček <lvl...@redhat.com> wrote:
> > Also looking at this.
> > Alex, is it possible to investigate if you were having some kind of
> network connection issues in the ES cluster (I mean between individual
> cluster nodes)?
> >
> > Regards,
> > Lukáš
> >
> >
> >
> >
> >> On 15 Jul 2016, at 17:08, Peter Portante <pport...@redhat.com> wrote:
> >>
> >> Just catching up on the thread, will get back to you all in a few ...
> >>
> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz <ewoli...@redhat.com>
> wrote:
> >>> Adding Lukas and Peter
> >>>
> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer <lme...@redhat.com> wrote:
> >>>>
> >>>> I believe the "queue capacity" there is the number of parallel
> searches
> >>>> that can be queued while the existing search workers operate. It
> sounds like
> >>>> it has plenty of capacity there and it has a different reason for
> rejecting
> >>>> the query. I would guess the data requested is missing given it
> couldn't
> >>>> fetch shards it expected to.
> >>>>
> >>>> The number of shards is a multiple (for redundancy) of the number of
> >>>> indices, and there is an index created per project per day. So even
> for a
> >>>> small cluster this doesn't sound out of line.
> >>>>
> >>>> Can you give a little more information about your logging deployment?
> Have
> >>>> you deployed multiple ES nodes for redundancy, and what are you using
> for
> >>>> storage? Could you attach full ES logs? How many OpenShift nodes and
> >>>> projects do you have? Any history of events that might have resulted
> in lost
> >>>> data?
> >>>>
> >>>> On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck <alexwa...@exosite.com>
> wrote:
> >>>>>
> >>>>> When doing searches in Kibana, I get error messages similar to
> "Courier
> >>>>> Fetch: 919 of 2020 shards failed".  Deeper inspection reveals errors
> like
> >>>>> this: "EsRejectedExecutionException[rejected execution (queue
> capacity 1000)
> >>>>> on
> >>>>>
> org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e
> ]".
> >>>>>
> >>>>> A bit of investigation lead me to conclude that our Elasticsearch
> server
> >>>>> was not sufficiently powerful, but I spun up a new one with four
> times the
> >>>>> CPU and RAM of the original one, but the queue capacity is still
> only 1000.
> >>>>> Also, 2020 seems like a really ridiculous number of shards.  Any
> idea what's
> >>>>> going on here?
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Alex Wauck // DevOps Engineer
> >>>>>
> >>>>> E X O S I T E
> >>>>> www.exosite.com
> >>>>>
> >>>>> Making Machines More Human.
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> users@lists.openshift.redhat.com
> >>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users@lists.openshift.redhat.com
> >>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >
>



-- 

Alex Wauck // DevOps Engineer

*E X O S I T E*
*www.exosite.com <http://www.exosite.com/>*

Making Machines More Human.

_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: logging-es errors: shards failed

Reply via email to