On Wed, Jan 22, 2020 at 12:24 AM Patrick Donnelly
wrote:
> On Tue, Jan 21, 2020 at 8:32 AM John Madden wrote:
> >
> > On 14.2.5 but also present in Luminous, buffer_anon memory use spirals
> > out of control when scanning many thousands of files. The use case is
> > more or less "look up this fi
t 12:08 PM Nick Fisk wrote:
> On Thursday, January 16, 2020 09:15 GMT, Dan van der Ster <
> d...@vanderster.com> wrote:
>
> > Hi Nick,
> >
> > We saw the exact same problem yesterday after a network outage -- a few
> of
> > our down OSDs were stuc
Hi Nick,
We saw the exact same problem yesterday after a network outage -- a few of
our down OSDs were stuck down until we restarted their processes.
-- Dan
On Wed, Jan 15, 2020 at 3:37 PM Nick Fisk wrote:
> Hi All,
>
> Running 14.2.5, currently experiencing some network blips isolated to a
>
Hi,
One way this can happen is if you change the crush rule of a pool after the
balancer has been running awhile.
This is because the balancer upmaps are only validated when they are
initially created.
ceph osd dump | grep upmap
Does it explain your issue?
.. Dan
On Tue, 14 Jan 2020, 04:17 Yi
e only some
> specific unsafe scenarios?
>
> Best regards,
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ____
> From: ceph-users on behalf of Dan van der
> Ster
> Sent: 03 December
I created https://tracker.ceph.com/issues/43106 and we're downgrading
our osds back to 13.2.6.
-- dan
On Tue, Dec 3, 2019 at 4:09 PM Dan van der Ster wrote:
>
> Hi all,
>
> We're midway through an update from 13.2.6 to 13.2.7 and started
> getting OSDs crashing regula
Hi all,
We're midway through an update from 13.2.6 to 13.2.7 and started
getting OSDs crashing regularly like this [1].
Does anyone obviously know what the issue is? (Maybe
https://github.com/ceph/ceph/pull/26448/files ?)
Or is it some temporary problem while we still have v13.2.6 and
v13.2.7 osds
You were running v14.2.2 before?
It seems that that ceph_assert you're hitting was indeed added
between v14.2.2. and v14.2.3 in this commit
https://github.com/ceph/ceph/commit/12f8b813b0118b13e0cdac15b19ba8a7e127730b
There's a comment in the tracker for that commit which says the
original fix wa
Thanks. The version and balancer config look good.
So you can try `ceph osd reweight osd.10 0.8` to see if it helps to
get you out of this.
-- dan
On Mon, Aug 26, 2019 at 11:35 AM Simon Oosthoek
wrote:
>
> On 26-08-19 11:16, Dan van der Ster wrote:
> > Hi,
> >
> > Which
Hi,
Which version of ceph are you using? Which balancer mode?
The balancer score isn't a percent-error or anything humanly usable.
`ceph osd df tree` can better show you exactly which osds are
over/under utilized and by how much.
You might be able to manually fix things by using `ceph osd reweigh
On Mon, Jul 29, 2019 at 3:47 PM Yan, Zheng wrote:
>
> On Mon, Jul 29, 2019 at 9:13 PM Dan van der Ster wrote:
> >
> > On Mon, Jul 29, 2019 at 2:52 PM Yan, Zheng wrote:
> > >
> > > On Fri, Jul 26, 2019 at 4:45 PM Dan van der Ster
> > > wrote:
>
On Mon, Jul 29, 2019 at 2:52 PM Yan, Zheng wrote:
>
> On Fri, Jul 26, 2019 at 4:45 PM Dan van der Ster wrote:
> >
> > Hi all,
> >
> > Last night we had 60 ERRs like this:
> >
> > 2019-07-26 00:56:44.479240 7efc6cca1700 0 mds.2.cache.dir(0x617)
>
Hi all,
Last night we had 60 ERRs like this:
2019-07-26 00:56:44.479240 7efc6cca1700 0 mds.2.cache.dir(0x617)
_fetched badness: got (but i already had) [inode 0x10006289992
[...2,head] ~mds2/stray1/10006289992 auth v14438219972 dirtyparent
s=116637332 nl=8 n(v0 rc2019-07-26 00:56:17.199090 b116
Hi all,
In September we'll need to power down a CephFS cluster (currently
mimic) for a several-hour electrical intervention.
Having never done this before, I thought I'd check with the list.
Here's our planned procedure:
1. umounts /cephfs from all hpc clients.
2. ceph osd set noout
3. wait unti
On Mon, Jul 8, 2019 at 1:02 PM Lars Marowsky-Bree wrote:
>
> On 2019-07-08T12:25:30, Dan van der Ster wrote:
>
> > Is there a specific bench result you're concerned about?
>
> We're seeing ~5800 IOPS, ~23 MiB/s on 4 KiB IO (stripe_width 8192) on a
> pool that
Hi Lars,
Is there a specific bench result you're concerned about?
I would think that small write perf could be kept reasonable thanks to
bluestore's deferred writes.
FWIW, our bench results (all flash cluster) didn't show a massive
performance difference between 3 replica and 4+2 EC.
I agree abou
http://tracker.ceph.com/issues/40480
On Thu, Jun 20, 2019 at 9:12 PM Dan van der Ster wrote:
>
> I will try to reproduce with logs and create a tracker once I find the
> smoking gun...
>
> It's very strange -- I had the osd mode set to 'passive', and pool
> optio
Fedotov wrote:
>
> I'd like to see more details (preferably backed with logs) on this...
>
> On 6/20/2019 6:23 PM, Dan van der Ster wrote:
> > P.S. I know this has been discussed before, but the
> > compression_(mode|algorithm) pool options [1] seem completely bro
...)
Now I'll try to observe any performance impact of increased
min_blob_size... Do you recall if there were some benchmarks done to
pick those current defaults?
Thanks!
Dan
-- Dan
>
>
> Thanks,
>
> Igor
>
> On 6/20/2019 5:33 PM, Dan van der Ster wrote:
> > Hi
o set
bluestore_compression_mode=force on the osd.
-- dan
[1] http://docs.ceph.com/docs/master/rados/operations/pools/#set-pool-values
On Thu, Jun 20, 2019 at 4:33 PM Dan van der Ster wrote:
>
> Hi all,
>
> I'm trying to compress an rbd pool via backfilling the existing data,
> and th
Hi all,
I'm trying to compress an rbd pool via backfilling the existing data,
and the allocated space doesn't match what I expect.
Here is the test: I marked osd.130 out and waited for it to erase all its data.
Then I set (on the pool) compression_mode=force and compression_algorithm=zstd.
Then I
fore, but for some reasons we could
> not react quickly. We accepted the risk of the bucket becoming slow, but
> had not thought of further risks ...
>
> On 17.06.19 10:15, Dan van der Ster wrote:
> > Nice to hear this was resolved in the end.
> >
> > Coming bac
Nice to hear this was resolved in the end.
Coming back to the beginning -- is it clear to anyone what was the
root cause and how other users can avoid this from happening? Maybe
some better default configs to warn users earlier about too-large
omaps?
Cheers, Dan
On Thu, Jun 13, 2019 at 7:36 PM H
Ahh I was thinking of chooseleaf_vary_r, which you already have.
So probably not related to tunables. What is your `ceph osd tree` ?
By the way, 12.2.9 has an unrelated bug (details
http://tracker.ceph.com/issues/36686)
AFAIU you will just need to update to v12.2.11 or v12.2.12 for that fix.
-- D
Hi,
This looks like a tunables issue.
What is the output of `ceph osd crush show-tunables `
-- Dan
On Fri, Jun 14, 2019 at 11:19 AM Luk wrote:
>
> Hello,
>
> Maybe somone was fighting with this kind of stuck in ceph already.
> This is production cluster, can't/don't want to make wrong s
On Thu, Jun 6, 2019 at 8:00 PM Sage Weil wrote:
>
> Hello RBD users,
>
> Would you mind running this command on a random OSD on your RBD-oriented
> cluster?
>
> ceph-objectstore-tool \
> --data-path /var/lib/ceph/osd/ceph-NNN \
>
> '["meta",{"oid":"snapmapper","key":"","snapid":0,"hash":2758339
Hi all,
Just a quick heads up, and maybe a check if anyone else is affected.
After upgrading our MDS's from v12.2.11 to v12.2.12, we started
getting crashes with
/builddir/build/BUILD/ceph-12.2.12/src/mds/MDSRank.cc: 1304:
FAILED assert(session->get_nref() == 1)
I opened a ticket here with
Hi Reed and Brad,
Did you ever learn more about this problem?
We currently have a few inconsistencies arriving with the same env
(cephfs, v13.2.5) and symptoms.
PG Repair doesn't fix the inconsistency, nor does Brad's omap
workaround earlier in the thread.
In our case, we can fix by cp'ing the fi
Tuesday Sept 17 is indeed the correct day!
We had to move it by one day to get a bigger room... sorry for the confusion.
-- dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On Mon, May 27, 2019 at 11:54 AM Oliver Freyermuth
wrote:
>
> Dear Dan,
>
> thanks for the quick reply!
>
> Am 27.05.19 um 11:44 schrieb Dan van der Ster:
> > Hi Oliver,
> >
> > We saw the same issue after upgrading to mimic.
> >
> > IIRC we could
Hi Oliver,
We saw the same issue after upgrading to mimic.
IIRC we could make the max_bytes xattr visible by touching an empty
file in the dir (thereby updating the dir inode).
e.g. touch /cephfs/user/freyermu/.quota; rm /cephfs/user/freyermu/.quota
Does that work?
-- dan
On Mon, May 27, 2
gt;
>io:
> client: 211KiB/s rd, 46.0KiB/s wr, 158op/s rd, 0op/s wr
>
> On 23.05.19 10:54 vorm., Dan van der Ster wrote:
> > What's the full ceph status?
> > Normally recovery_wait just means that the relevant osd's are busy
> > recovering/backfill
What's the full ceph status?
Normally recovery_wait just means that the relevant osd's are busy
recovering/backfilling another PG.
On Thu, May 23, 2019 at 10:53 AM Kevin Flöh wrote:
>
> Hi,
>
> we have set the PGs to recover and now they are stuck in
> active+recovery_wait+degraded and instructi
Did I understand correctly: you have a crush tree with both ssd and
hdd devices, and you want to direct PGs to the ssds, until they reach
some fullness threshold, and only then start directing PGs to the
hdds?
I can't think of a crush rule alone to achieve that. But something you
could do is add a
On Wed, May 22, 2019 at 3:03 PM Rainer Krienke wrote:
>
> Hello,
>
> I created an erasure code profile named ecprofile-42 with the following
> parameters:
>
> $ ceph osd erasure-code-profile set ecprofile-42 plugin=jerasure k=4 m=2
>
> Next I created a new pool using the ec profile from above:
>
>
> "2(0)",
> "4(1)",
> "23(2)",
> "24(0)",
> "72(1)",
> "79(3)"
> ],
> "down_osds_w
On Tue, May 14, 2019 at 10:59 AM Kevin Flöh wrote:
>
>
> On 14.05.19 10:08 vorm., Dan van der Ster wrote:
>
> On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote:
>
> On 13.05.19 10:51 nachm., Lionel Bouton wrote:
>
> Le 13/05/2019 à 16:20, Kevin Flöh a écrit :
>
>
On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote:
>
> On 13.05.19 10:51 nachm., Lionel Bouton wrote:
> > Le 13/05/2019 à 16:20, Kevin Flöh a écrit :
> >> Dear ceph experts,
> >>
> >> [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
> >> Here is what happened: One osd daem
Presumably the 2 OSDs you marked as lost were hosting those incomplete PGs?
It would be useful to double confirm that: check with `ceph pg
query` and `ceph pg dump`.
(If so, this is why the ignore history les thing isn't helping; you
don't have the minimum 3 stripes up for those 3+1 PGs.)
If thos
y restarting
> the osd that it is reading from?
>
>
>
>
> -----Original Message-
> From: Dan van der Ster [mailto:d...@vanderster.com]
> Sent: donderdag 2 mei 2019 8:51
> To: Yan, Zheng
> Cc: ceph-users; pablo.llo...@cern.ch
> Subject: Re: [ceph-users] co-located
On Mon, Apr 1, 2019 at 1:46 PM Yan, Zheng wrote:
>
> On Mon, Apr 1, 2019 at 6:45 PM Dan van der Ster wrote:
> >
> > Hi all,
> >
> > We have been benchmarking a hyperconverged cephfs cluster (kernel
> > clients + osd on same machines) for awhile. Over the wee
On Tue, Apr 30, 2019 at 9:01 PM Igor Podlesny wrote:
>
> On Wed, 1 May 2019 at 01:26, Igor Podlesny wrote:
> > On Wed, 1 May 2019 at 01:01, Dan van der Ster wrote:
> > >> > The upmap balancer in v12.2.12 works really well... Perfectly uniform
> > >> >
On Tue, Apr 30, 2019 at 8:26 PM Igor Podlesny wrote:
>
> On Wed, 1 May 2019 at 01:01, Dan van der Ster wrote:
> >> > The upmap balancer in v12.2.12 works really well... Perfectly uniform on
> >> > our clusters.
> >>
> >> mode upmap ?
> >
Removing pools won't make a difference.
Read up to slide 22 here:
https://www.slideshare.net/mobile/Inktank_Ceph/ceph-day-berlin-mastering-ceph-operations-upmap-and-the-mgr-balancer
..
Dan
(Apologies for terseness, I'm mobile)
On Tue, 30 Apr 2019, 20:02 Shain Miley, wrote:
> Here is the per
On Tue, 30 Apr 2019, 19:32 Igor Podlesny, wrote:
> On Wed, 1 May 2019 at 00:24, Dan van der Ster wrote:
> >
> > The upmap balancer in v12.2.12 works really well... Perfectly uniform on
> our clusters.
> >
> > .. Dan
>
> mode upmap ?
>
yes, mgr balancer
The upmap balancer in v12.2.12 works really well... Perfectly uniform on
our clusters.
.. Dan
On Tue, 30 Apr 2019, 19:22 Kenneth Van Alstyne,
wrote:
> Unfortunately it looks like he’s still on Luminous, but if upgrading is an
> option, the options are indeed significantly better. If I recall
On Mon, 22 Apr 2019, 22:20 Gregory Farnum, wrote:
> On Sat, Apr 20, 2019 at 9:29 AM Igor Podlesny wrote:
> >
> > I remember seeing reports in regards but it's being a while now.
> > Can anyone tell?
>
> No, this hasn't changed. It's unlikely it ever will; I think NFS
> resolved the issue but it
proposals will be available by mid-May.
All the Best,
Dan van der Ster
CERN IT Department
Ceph Governing Board, Academic Liaison
[1] Sept 16 is the day after CERN Open Days, where there will be
plenty to visit on our campus if you arrive a couple of days before
https://home.cern/news/news/cern/cern
; sleep 5 ; done
After running that for awhile the PG filestore structure has merged
down and now listing the pool and backfilling are back to normal.
Thanks!
Dan
On Tue, Apr 9, 2019 at 7:05 PM Dan van der Ster wrote:
>
> Hi all,
>
> We have a slight issue while trying to migrate
Hi all,
We have a slight issue while trying to migrate a pool from filestore
to bluestore.
This pool used to have 20 million objects in filestore -- it now has
50,000. During its life, the filestore pgs were internally split
several times, but never merged. Now the pg _head dirs have mostly
empty
Which OS are you using?
With CentOS we find that the heap is not always automatically
released. (You can check the heap freelist with `ceph tell osd.0 heap
stats`).
As a workaround we run this hourly:
ceph tell mon.* heap release
ceph tell osd.* heap release
ceph tell mds.* heap release
-- Dan
O
d tends to get imbalanced again as soon as i
> need to replace disks.
>
> On Thu, Apr 4, 2019 at 10:49 AM Iain Buclaw wrote:
>>
>> On Mon, 18 Mar 2019 at 16:42, Dan van der Ster wrote:
>> >
>> > The balancer optimizes # PGs / crush weight. That host looks alr
https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Mon, Apr 1, 2019 at 12:45 PM Dan van der Ster wrote:
> >
> > Hi all,
> >
> > We have been benchmarking a hyperconverged cephfs cluster (k
Hi all,
We have been benchmarking a hyperconverged cephfs cluster (kernel
clients + osd on same machines) for awhile. Over the weekend (for the
first time) we had one cephfs mount deadlock while some clients were
running ior.
All the ior processes are stuck in D state with this stack:
[] wait_on
See http://tracker.ceph.com/issues/38849
As an immediate workaround you can increase `mds bal fragment size
max` to 20 (which will increase the max number of strays to 2
million.)
(Try injecting that option to the mds's -- I think it is read at runtime).
And you don't need to stop the mds's a
ccounted in their quota by CephFS :/
>
>
> Paul
> On Wed, Mar 20, 2019 at 4:34 PM Dan van der Ster
> wrote:
> >
> > Hi all,
> >
> > We're currently upgrading our cephfs (managed by OpenStack Manila)
> > clusters to Mimic, and want to start enabling sna
; --> ceph-volume lvm activate successful for osd ID: 3
> --> ceph-volume lvm create successful for: /dev/sda
>
Yes that's it! Worked for me too.
Thanks!
Dan
> This is a Nautilus test cluster, but I remember having this on a
> Luminous cluster, too. I hope this helps.
>
&
On Thu, Mar 21, 2019 at 1:50 PM Tom Barron wrote:
>
> On 20/03/19 16:33 +0100, Dan van der Ster wrote:
> >Hi all,
> >
> >We're currently upgrading our cephfs (managed by OpenStack Manila)
> >clusters to Mimic, and want to start enabling snapshots of the file
>
On Thu, Mar 21, 2019 at 8:51 AM Gregory Farnum wrote:
>
> On Wed, Mar 20, 2019 at 6:06 PM Dan van der Ster wrote:
>>
>> On Tue, Mar 19, 2019 at 9:43 AM Erwin Bogaard
>> wrote:
>> >
>> > Hi,
>> >
>> >
>> >
>> > For
Hi all,
We're currently upgrading our cephfs (managed by OpenStack Manila)
clusters to Mimic, and want to start enabling snapshots of the file
shares.
There are different ways to approach this, and I hope someone can
share their experiences with:
1. Do you give users the 's' flag in their cap, so
On Tue, Mar 19, 2019 at 9:43 AM Erwin Bogaard wrote:
>
> Hi,
>
>
>
> For a number of application we use, there is a lot of file duplication. This
> wastes precious storage space, which I would like to avoid.
>
> When using a local disk, I can use a hard link to let all duplicate files
> point to
On Tue, Mar 19, 2019 at 12:25 PM Dan van der Ster wrote:
>
> On Tue, Mar 19, 2019 at 12:17 PM Alfredo Deza wrote:
> >
> > On Tue, Mar 19, 2019 at 7:00 AM Alfredo Deza wrote:
> > >
> > > On Tue, Mar 19, 2019 at 6:47 AM Dan van der Ster
> > > wrote:
On Tue, Mar 19, 2019 at 1:05 PM Alfredo Deza wrote:
>
> On Tue, Mar 19, 2019 at 7:26 AM Dan van der Ster wrote:
> >
> > On Tue, Mar 19, 2019 at 12:17 PM Alfredo Deza wrote:
> > >
> > > On Tue, Mar 19, 2019 at 7:00 AM Alfredo Deza wrote:
> > > >
On Tue, Mar 19, 2019 at 12:17 PM Alfredo Deza wrote:
>
> On Tue, Mar 19, 2019 at 7:00 AM Alfredo Deza wrote:
> >
> > On Tue, Mar 19, 2019 at 6:47 AM Dan van der Ster
> > wrote:
> > >
> > > Hi all,
> > >
> > > We've just hit ou
Hi all,
We've just hit our first OSD replacement on a host created with
`ceph-volume lvm batch` with mixed hdds+ssds.
The hdd /dev/sdq was prepared like this:
# ceph-volume lvm batch /dev/sd[m-r] /dev/sdac --yes
Then /dev/sdq failed and was then zapped like this:
# ceph-volume lvm zap /dev/
ule is confusing the new
>>>> upmap cleaning.
>>>> (debug_mon 10 on the active mon should show those cleanups).
>>>>
>>>> I'm copying Xie Xingguo, and probably you should create a tracker for this.
>>>>
>>>> -- dan
>&
one set --rgw-zone=default --infile=zone.json
and now I can safely remove the default.rgw.meta pool.
-- Dan
On Tue, Mar 12, 2019 at 3:17 PM Dan van der Ster wrote:
>
> Hi all,
>
> We have an S3 cluster with >10 million objects in default.rgw.meta.
>
> # radosgw-
Hi all,
We have an S3 cluster with >10 million objects in default.rgw.meta.
# radosgw-admin zone get | jq .metadata_heap
"default.rgw.meta"
In these old tickets I realized that this setting is obsolete, and
those objects are probably useless:
http://tracker.ceph.com/issues/17256
http://tra
e in the same host. So this change should be perfectly
> acceptable by the rule set.
> Something must be blocking the change, but i can't find anything about it in
> any logs.
>
> - Kári
>
> On Thu, Feb 28, 2019 at 8:07 AM Dan van der Ster wrote:
>>
>> Hi,
>>
Hi,
pg-upmap-items became more strict in v12.2.11 when validating upmaps.
E.g., it now won't let you put two PGs in the same rack if the crush
rule doesn't allow it.
Where are OSDs 23 and 123 in your cluster? What is the relevant crush rule?
-- dan
On Wed, Feb 27, 2019 at 9:17 PM Kári Bertilss
osd bench, etc)?
>
> On Fri, Feb 15, 2019 at 3:13 PM M Ranga Swami Reddy
> wrote:
> >
> > today I again hit the warn with 30G also...
> >
> > On Thu, Feb 14, 2019 at 7:39 PM Sage Weil wrote:
> > >
> > > On Thu, 7 Feb 2019, Dan van der Ster wrote:
On Thu, Feb 14, 2019 at 2:31 PM Sage Weil wrote:
>
> On Thu, 7 Feb 2019, Dan van der Ster wrote:
> > On Thu, Feb 7, 2019 at 12:17 PM M Ranga Swami Reddy
> > wrote:
> > >
> > > Hi Dan,
> > > >During backfilling scenarios, the mons keep old ma
On Fri, Feb 15, 2019 at 12:01 PM Willem Jan Withagen wrote:
>
> On 15/02/2019 11:56, Dan van der Ster wrote:
> > On Fri, Feb 15, 2019 at 11:40 AM Willem Jan Withagen
> > wrote:
> >>
> >> On 15/02/2019 10:39, Ilya Dryomov wrote:
> >>> On
On Fri, Feb 15, 2019 at 11:40 AM Willem Jan Withagen wrote:
>
> On 15/02/2019 10:39, Ilya Dryomov wrote:
> > On Fri, Feb 15, 2019 at 12:05 AM Mike Perez wrote:
> >>
> >> Hi Marc,
> >>
> >> You can see previous designs on the Ceph store:
> >>
> >> https://www.proforma.com/sdscommunitystore
> >
> >
On Thu, Feb 14, 2019 at 12:07 PM Wido den Hollander wrote:
>
>
>
> On 2/14/19 11:26 AM, Dan van der Ster wrote:
> > On Thu, Feb 14, 2019 at 11:13 AM Wido den Hollander wrote:
> >>
> >> On 2/14/19 10:20 AM, Dan van der Ster wrote:
> >>> On T
On Thu, Feb 14, 2019 at 11:13 AM Wido den Hollander wrote:
>
> On 2/14/19 10:20 AM, Dan van der Ster wrote:
> > On Thu., Feb. 14, 2019, 6:17 a.m. Wido den Hollander >>
> >> Hi,
> >>
> >> On a cluster running RGW only I'm running into Blu
On Fri, Feb 1, 2019 at 10:18 PM Neha Ojha wrote:
>
> On Fri, Feb 1, 2019 at 1:09 PM Robert Sander
> wrote:
> >
> > Am 01.02.19 um 19:06 schrieb Neha Ojha:
> >
> > > If you would have hit the bug, you should have seen failures like
> > > https://tracker.ceph.com/issues/36686.
> > > Yes, pglog_hard
may not be safe to restart the
> ceph-mon, instead prefer to do the compact on non-leader mons.
> Is this ok?
>
Compaction doesn't solve this particular problem, because the maps
have not yet been deleted by the ceph-mon process.
-- dan
> Thanks
> Swami
>
> On Thu, Feb 7,
showing > 15G, do I need to run the compact commands
> to do the trimming?
Compaction isn't necessary -- you should only need to restart all
peon's then the leader. A few minutes later the db's should start
trimming.
-- dan
>
> Thanks
> Swami
>
> On Wed, Feb
Note that there are some improved upmap balancer heuristics in
development here: https://github.com/ceph/ceph/pull/26187
-- dan
On Tue, Feb 5, 2019 at 10:18 PM Kári Bertilsson wrote:
>
> Hello
>
> I previously enabled upmap and used automatic balancing with "ceph balancer
> on". I got very good
Hi,
With HEALTH_OK a mon data dir should be under 2GB for even such a large cluster.
During backfilling scenarios, the mons keep old maps and grow quite
quickly. So if you have balancing, pg splitting, etc. ongoing for
awhile, the mon stores will eventually trigger that 15GB alarm.
But the intend
No idea, but maybe this commit which landed in v12.2.11 is relevant:
commit 187bc76957dcd8a46a839707dea3c26b3285bd8f
Author: runsisi
Date: Mon Nov 12 20:01:32 2018 +0800
librbd: fix missing unblock_writes if shrink is not allowed
Fixes: http://tracker.ceph.com/issues/36778
Signed
On Tue, Jan 22, 2019 at 3:33 PM Yan, Zheng wrote:
>
> On Tue, Jan 22, 2019 at 9:08 PM Dan van der Ster wrote:
> >
> > Hi Zheng,
> >
> > We also just saw this today and got a bit worried.
> > Should we change to:
> >
>
> What is the error message
Hi Zheng,
We also just saw this today and got a bit worried.
Should we change to:
diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc
index e8c1bc8bc1..e2539390fb 100644
--- a/src/mds/CInode.cc
+++ b/src/mds/CInode.cc
@@ -2040,7 +2040,7 @@ void CInode::finish_scatter_gather_update(int type)
On Wed, Jan 16, 2019 at 11:17 PM Patrick Donnelly wrote:
>
> On Wed, Jan 16, 2019 at 1:21 AM Marvin Zhang wrote:
> > Hi CephFS experts,
> > From document, I know multi-fs within a cluster is still experiment feature.
> > 1. Is there any estimation about stability and performance for this feature?
On Wed, Sep 19, 2018 at 7:01 PM Bryan Stillwell wrote:
>
> > On 08/30/2018 11:00 AM, Joao Eduardo Luis wrote:
> > > On 08/30/2018 09:28 AM, Dan van der Ster wrote:
> > > Hi,
> > > Is anyone else seeing rocksdb mon stores slowly growing to >15GB,
> > &g
Hi Wido,
`rpm -q --scripts ceph-selinux` will tell you why.
It was the same from 12.2.8 to 12.2.10: http://tracker.ceph.com/issues/21672
And the problem is worse than you described, because the daemons are
even restarted before all the package files have been updated.
Our procedure on these upg
not between racks
> (since the very few resources) ?
> Thanks, Massimo
>
> On Mon, Jan 14, 2019 at 3:29 PM Dan van der Ster wrote:
>>
>> On Mon, Jan 14, 2019 at 3:18 PM Massimo Sgaravatto
>> wrote:
>> >
>> > Thanks for the prompt reply
>> >
&g
.00000
> 8 hdd 5.45609 osd.8 up 1.0 1.0
> 9 hdd 5.45609 osd.9 up 1.0 1.0
> [root@ceph-mon-01 ~]#
>
> On Mon, Jan 14, 2019 at 3:13 PM Dan van der Ster wrote:
>>
>> On Mon, Jan 14, 2019 at 3:06
On Mon, Jan 14, 2019 at 3:06 PM Massimo Sgaravatto
wrote:
>
> I have a ceph luminous cluster running on CentOS7 nodes.
> This cluster has 50 OSDs, all with the same size and all with the same weight.
>
> Since I noticed that there was a quite "unfair" usage of OSD nodes (some used
> at 30 %, some
Hi Caspar,
On Thu, Jan 10, 2019 at 1:31 PM Caspar Smit wrote:
>
> Hi all,
>
> I wanted to test Dan's upmap-remapped script for adding new osd's to a
> cluster. (Then letting the balancer gradually move pgs to the new OSD
> afterwards)
Cool. Insert "no guarantees or warranties" comment here.
An
Hi Bryan,
I think this is the old hammer thread you refer to:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013060.html
We also have osdmaps accumulating on v12.2.8 -- ~12000 per osd at the moment.
I'm trying to churn the osdmaps like before, but our maps are not being trimm
On Tue, Jan 8, 2019 at 12:48 PM Thomas Byrne - UKRI STFC
wrote:
>
> For what it's worth, I think the behaviour Pardhiv and Bryan are describing
> is not quite normal, and sounds similar to something we see on our large
> luminous cluster with elderly (created as jewel?) monitors. After large
>
Hey Andras,
Three mons is possibly too few for such a large cluster. We've had lots of
good stable experience with 5-mon clusters. I've never tried 7, so I can't
say if that would lead to other problems (e.g. leader/peon sync
scalability).
That said, our 1-osd bigbang tests managed with only
ames !!!
>
>Ciao ciao
>
> Fulvio
>
> Original Message
> Subject: Re: [ceph-users] Luminous (12.2.8 on CentOS), recover or
> recreate incomplete PG
> From: Dan van der Ster
> To: fulvio.galea...@garr.it
> CC: ceph-users
&g
Hi Joao,
Has that broken the Slack connection? I can't tell if its broken or
just quiet... last message on #ceph-devel was today at 1:13am.
-- Dan
On Tue, Dec 18, 2018 at 12:11 PM Joao Eduardo Luis wrote:
>
> All,
>
>
> Earlier this week our IRC channels were set to require users to be
> regis
Hi Fulvio!
Are you able to query that pg -- which osd is it waiting for?
Also, since you're prepared for data loss anyway, you might have
success setting osd_find_best_info_ignore_history_les=true on the
relevant osds (set it conf, restart those osds).
-- dan
-- dan
On Tue, Dec 18, 2018 at 11
Hi all,
Bringing up this old thread with a couple questions:
1. Did anyone ever follow up on the 2nd part of this thread? -- is
there any way to cache keystone EC2 credentials?
2. A question for Valery: could you please explain exactly how you
added the EC2 credentials to the local backend (your
Luminous has:
osd_scrub_begin_week_day
osd_scrub_end_week_day
Maybe these aren't documented. I usually check here for available option:
https://github.com/ceph/ceph/blob/luminous/src/common/options.cc#L2533
-- Dan
On Fri, Dec 14, 2018 at 12:25 PM Caspar Smit wrote:
>
> Hi all,
>
> We have op
Hey Abhishek,
We just noticed that the debuginfo is missing for 12.2.10:
http://download.ceph.com/rpm-luminous/el7/x86_64/ceph-debuginfo-12.2.10-0.el7.x86_64.rpm
Did something break in the publishing?
Cheers, Dan
On Tue, Nov 27, 2018 at 3:50 PM Abhishek Lekshmanan wrote:
>
>
> We're happy to a
1 - 100 of 617 matches
Mail list logo