Hi there,
Is there any reason we stuck read only requests as well for a PG when the
acting set size is less than min_size?
Thanks,
Guang
Thanks Greg.
> Date: Tue, 27 Oct 2015 12:47:34 -0700
> Subject: Re: PG: all requests stuck when acting set < min_size
> From: gfar...@redhat.com
> To: yguan...@outlook.com
> CC: ceph-devel@vger.kernel.org
>
> On Tue, Oct 27, 201
even though we would have exposed the writes up to
> 1000 to the client.
[yguang] Thanks for the example, that is true. What about for EC pool? Looks
like for EC pool we don't have this problem.
> -Sam
>
> On Tue, Oct 27, 2015 at 12:47 PM, Gregory Farnum <gfar...@redhat.com> wrote:
&g
Hi Sam/David,
We came across this problem a couple of times and it is extremely painful to
work around it via operational steps, I would like to work on a patch, but
before I start, it would be nice hear your suggestions.
The problem is:
On erasure coded pool, when there is a corruption, and
To: yguan...@outlook.com
> CC: sw...@redhat.com; mgo...@mirantis.com; sj...@redhat.com;
> ceph-devel@vger.kernel.org; dzaf...@redhat.com
> Subject: Re: Pool setting for recovery priority
>
> On Tue, Sep 29, 2015 at 10:23:54AM -0700, GuangYang wrote:
>
>> I sort of mis
tting for recovery priority
>
> Hi Mykola,
>
> On Fri, 25 Sep 2015, Mykola Golub wrote:
>> Hi,
>>
>> On Mon, Sep 21, 2015 at 04:32:19PM +0300, Mykola Golub wrote:
>>> On Wed, Sep 16, 2015 at 09:23:07AM -0700, Sage Weil wrote:
>>>> On Wed, 16 Sep 2015
Hello,
While doing a 'ceph pg {id} query', it dumps the info from all peers, however,
for all peers, it only shows 'num_objects_missing_on_primary', which is the
same across all peers.
Isn't it better to show the 'num_objecgts_missing' for the peer rather than
primary?
Thanks,
Guang
Hi,
Looking at the reporting from command line tools (ceph status, pg query, etc),
I don't find a way to tell how degraded the recovering objects are for a given
PG. The use case could be, when doing deployment, we don't want to stop the
procedure upon a recovering PG, but at the same time, we
Hi Sam,
As part of the effort to solve problems similar to issue #13104
(http://tracker.ceph.com/issues/13104), do you think it is appropriate to add
some parameters to pool setting:
1. recovery priority of the pool - we have a customized pool recovery
priority (like process's nice value) to
gt; Subject: Re: Pool setting for recovery priority
>
> On Wed, 16 Sep 2015, GuangYang wrote:
>> Hi Sam,
>> As part of the effort to solve problems similar to issue #13104
>> (http://tracker.ceph.com/issues/13104), do you think it is appropriate to
>> add some parame
1.
Thanks,
Guang
> Date: Fri, 11 Sep 2015 05:57:42 -0700
> From: s...@newdream.net
> To: yguan...@outlook.com
> CC: ceph-devel@vger.kernel.org; sj...@redhat.com
> Subject: Re: Backfill
>
> On Thu, 10 Sep 2015, GuangYang wrote:
>
Today I played around recovery and backfill of a Ceph cluster (by manually
bringing some OSDs down/out), and got one question regards to the current flow:
Does backfill push everything to the backfill target regardless what the
backfill target already has? The scenario is like - acting set of
Date: Fri, 28 Aug 2015 12:07:39 +0100
From: gfar...@redhat.com
To: vickey.singh22...@gmail.com
CC: ceph-us...@lists.ceph.com; ceph-us...@ceph.com; ceph-devel@vger.kernel.org
Subject: Re: [ceph-users] Opensource plugin for pulling out cluster recovery
Probably! A quick glance at do_mon_report doesn't seem to turn up
anything I'd expect to be really hard to refactor. You do need to
break out the required data (into OSDService, I'd think) so that the
lock is not necessary.
-Sam
On Mon, Aug 17, 2015 at 6:10 PM, GuangYang yguan...@outlook.com
:34 -0700
Subject: Re: radosgw - stuck ops
From: ysade...@redhat.com
To: yguan...@outlook.com
CC: sw...@redhat.com; sj...@redhat.com; yeh...@redhat.com;
ceph-devel@vger.kernel.org
On Tue, Aug 4, 2015 at 3:23 PM, GuangYang yguan...@outlook.com wrote:
Thanks for Sage, Yehuda and Sam's quick
Thanks for Sage, Yehuda and Sam's quick reply.
Given the discussion so far, could I summarize into the following bullet points:
1 The first step we would like to pursue is to implement the following
mechanism to avoid infinite waiting at radosgw side:
1.1. radosgw - send OP with a
; ceph-devel@vger.kernel.org
On Mon, Aug 3, 2015 at 6:53 PM, GuangYang
yguan...@outlook.commailto:yguan...@outlook.com wrote:
Hi Yehuda,
Recently with our pre-production clusters (with radosgw), we had an
outage that all radosgw worker threads got stuck and all clients
request
Hi Yehuda,
Recently with our pre-production clusters (with radosgw), we had an outage that
all radosgw worker threads got stuck and all clients request resulted in 500
because that there is no worker thread taking care of them.
What we observed from the cluster, is that there was a PG stuck at
Hi Cephers,
I have a (test) ceph cluster, on which I had some wrong CRUSH weight (my
mistake to set wrong CRUSH weight), then I tried to set the correct CRUSH
weight (e.g. change the weight from 20 to 5), right after that, the cluster
became cascading failure mode, lots of OSDs starts getting
Looks like we were hitting 12523, and we are working on a fix.
Thanks,
Guang
From: yguan...@outlook.com
To: ceph-devel@vger.kernel.org
Subject: Cascading failure
Date: Wed, 29 Jul 2015 08:55:40 -0700
Hi Cephers,
I have a (test) ceph cluster, on
Hi Yehuda,
Is there any plan to add bucket/object lifecycle management to radosgw?
Thanks,
Guang
; ceph-us...@lists.ceph.com
Subject: Re: radosgw crash within libfcgi
- Original Message -
From: GuangYang yguan...@outlook.com
To: ceph-devel@vger.kernel.org, ceph-us...@lists.ceph.com, yeh...@redhat.com
Sent: Wednesday, June 24, 2015 10:09:58 AM
Subject: radosgw crash within
Date: Wed, 24 Jun 2015 17:04:05 -0400
From: yeh...@redhat.com
To: yguan...@outlook.com
CC: ceph-devel@vger.kernel.org; ceph-us...@lists.ceph.com
Subject: Re: radosgw crash within libfcgi
- Original Message -
From: GuangYang yguan
Hello Cephers,
Recently we have several radosgw daemon crashes with the same following kernel
log:
Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip 7ffa069996f2
sp 7ff55c432710 error 6 in libfcgi.so.0.0.0[7ffa06995000+a000] in
libfcgi.so.0.0.0[7ffa06995000+a000]
Looking
Hi Cephers,
While looking at disk utilization on OSD, I noticed the disk was constantly
busy with large number of small writes, further investigation showed that, as
radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which
made the xattrs get from local to extents, which
with radosgw
On Tue, 16 Jun 2015, GuangYang wrote:
Hi Cephers,
While looking at disk utilization on OSD, I noticed the disk was constantly
busy with large number of small writes, further investigation showed that,
as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.),
which
=Faster Peering/Lower Tail Latency
https://wiki.ceph.com/Planning/Blueprints/Infernalis/osd%
3A_Faster_Peering
https://wiki.ceph.com/Planning/Blueprints/Infernalis/Improve_tail_latency
http://pad.ceph.com/p/I-faster-peering_tailing
In addition to what
Hi Sage,
Is there any timeline around the switch? So that we can plan ahead for the
testing.
We are running apache + mod-fastcgi in production at scale (540 OSDs, 9 RGW
hosts) and it looks good so far. Although at the beginning we came across a
problem with large volume of 500 error, which
Date: Thu, 12 Feb 2015 06:57:19 -0800
Subject: Re: Upgrade/rollback
From: g...@gregs42.com
To: yguan...@outlook.com
CC: ceph-devel@vger.kernel.org
On Thu, Feb 12, 2015 at 12:48 AM, GuangYang yguan...@outlook.com wrote:
Thanks Sage and Greg
Hi ceph-devel,
Recently we are trying the upgrade from Firefly to Giant and it goes pretty
smoothly, however, the problem is that it does not support rollback and seems
like that is by design. For example, there is new feature flag / metadata [1]
added in the new version and they are persisted.
Thanks Sage!
Date: Mon, 9 Feb 2015 02:24:33 -0800
From: sw...@redhat.com
To: yguan...@outlook.com
CC: ceph-devel@vger.kernel.org; simon.lei...@gmail.com
Subject: RE: scrub scheduling
On Mon, 9 Feb 2015, GuangYang wrote:
Hi Sage,
Another potential
Hi Sage,
Another potential problem with scrub scheduling, as observed in our deployment
(2PB cluster, 70% full), was that some PGs hadn't been scrubbed for 1.5 months,
even we have the configuration to do deep scrubbing weekly.
With our deployment and percentage of full of the cluster, as well
Hi ceph-devel,
In our ceph cluster (with rgw), we came across a problem that all rgw process
are stuck (all worker threads wait for the response from OSD, and start giving
500 to clients). objecter_requests dump showed the slow in flight requests were
caused by one OSD, which has 2 PGs doing
Hi Sam,
Yesterday there was one PG down in our cluster and I am confused by the PG
state, I am not sure if it is a bug (or an issue has been fixed as I see a
couple of related fixes in giant), it would be nice you can help to take a look.
Here is what happened:
We are using EC pool with 8 data
allowable size and
went active with osd 8. At that point you needed every member of that
acting set to go active later on to avoiding loosing writes. You can
prevent this by setting a min_size above the number of data chunks.
-Sam
On Thu, Nov 13, 2014 at 4:15 AM, GuangYang yguan...@outlook.com
down to 8. Note, I think you could have marked osd 8 lost and then
marked the unrecoverable objects lost.
-Sam
On Thu, Nov 13, 2014 at 11:20 AM, GuangYang yguan...@outlook.com wrote:
Thanks Sam for the quick response. Just want to make sure I understand it
correctly:
If we have [1, 2, 3, 4
Thanks Sage!
Date: Fri, 7 Nov 2014 02:19:06 -0800
From: s...@newdream.net
To: yguan...@outlook.com
CC: ceph-devel@vger.kernel.org; ceph-us...@lists.ceph.com
Subject: Re: PG inconsistency
On Thu, 6 Nov 2014, GuangYang wrote:
Hello Cephers,
Recently
Hello Cephers,
Recently we observed a couple of inconsistencies in our Ceph cluster, there
were two major patterns leading to inconsistency as I observed: 1) EIO to read
the file, 2) the digest is inconsistent (for EC) even there is no read error).
While ceph has built-in tool sets to repair
-us...@lists.ceph.com
What is your version of the ceph?
0.80.0 - 0.80.3
https://github.com/ceph/ceph/commit/7557a8139425d1705b481d7f010683169fd5e49b
Thu Nov 06 2014 at 16:24:21, GuangYang
yguan...@outlook.commailto:yguan...@outlook.com:
Hello Cephers,
Recently we observed a couple
things further. Thanks.
Best Regards,
Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com || http://community.redhat.com
@scuttlemonkey || @ceph
On Tue, Oct 28, 2014 at 8:45 PM, GuangYang yguan...@outlook.com wrote:
I see. Can we re-schedule to Wednesday sessions which
@vger.kernel.org
Subject: Re: OSD crashed due to filestore EIO
On Wed, 29 Oct 2014, GuangYang wrote:
Recently we observed an OSD crash due to file corruption in filesystem,
which leads to an assertion failure at FileStore::read as EIO is not
tolerated. As file corruption is normal in large deployment, I
Recently we observed an OSD crash due to file corruption in filesystem, which
leads to an assertion failure at FileStore::read as EIO is not tolerated. As
file corruption is normal in large deployment, I am thinking if that behavior
is too aggressive, especially for EC pool.
After searching, I
Date: Thu, 23 Oct 2014 21:26:07 -0700
From: s...@newdream.net
To: yguan...@outlook.com
CC: ceph-devel@vger.kernel.org; ceph-us...@lists.ceph.com
Subject: RE: Filestore throttling
On Fri, 24 Oct 2014, GuangYang wrote:
commit
Hi Patrick,
Sorry I am not able to join today's session for the bucket index scalability. I
am working with Yehuda to polish the implementation according to our discussion
in last CDS, and hopefully we will merge the patch soon. There is not much
update (in terms of what we will implement and
at 6:33 AM, GuangYang yguan...@outlook.com wrote:
Hi Patrick,
Sorry I am not able to join today's session for the bucket index
scalability. I am working with Yehuda to polish the implementation according
to our discussion in last CDS, and hopefully we will merge the patch soon
---
Date: Thu, 23 Oct 2014 06:58:58 -0700
From: s...@newdream.net
To: yguan...@outlook.com
CC: ceph-devel@vger.kernel.org; ceph-us...@lists.ceph.com
Subject: RE: Filestore throttling
On Thu, 23 Oct 2014, GuangYang wrote:
Thanks Sage for the quick
Hello Cephers,
During our testing, I found that the filestore throttling became a limiting
factor for performance, the four settings (with default value) are:
filestore queue max ops = 50
filestore queue max bytes = 100 20
filestore queue committing max ops = 500
filestore queue committing
Resend with plain text format...
From: yguan...@outlook.com
To: yeh...@inktank.com; sw...@redhat.com
CC: ceph-devel@vger.kernel.org
Subject: Conditional PUT on radosgw
Date: Mon, 20 Oct 2014 12:00:23 +
Hi Sage and Yehuda,
One user case we would
Hi ceph-users and ceph-devel,
I came across an issue after restarting monitors of the cluster, that
authentication fails which prevents running any ceph command.
After we did some maintenance work, I restart OSD, however, I found that the
OSD would not join the cluster automatically after being
49 matches
Mail list logo