On Tue, Nov 5, 2019 at 2:21 PM Janne Johansson wrote:
> I seem to recall some ticket where zap would "only" clear 100M of the drive,
> but lvm and all partition info needed more to be cleared, so using dd
> bs=1M count=1024 (or more!) would be needed to make sure no part of the OS
> picks
On Tue, Nov 5, 2019 at 3:18 AM Paul Emmerich wrote:
> could be a new feature, I've only realized this exists/works since Nautilus.
> You seem to be a relatively old version since you still have ceph-disk
> installed
None of this is using ceph-disk? It's all done with ceph-volume.
The ceph clus
On Mon, Nov 4, 2019 at 1:32 PM Paul Emmerich wrote:
> BTW: you can run destroy before stopping the OSD, you won't need the
> --yes-i-really-mean-it if it's drained in this case
This actually does not seem to work:
$ sudo ceph osd safe-to-destroy 42
OSD(s) 42 are safe to destroy without reducing
On Mon, Nov 4, 2019 at 1:32 PM Paul Emmerich wrote:
> That's probably the ceph-disk udev script being triggered from
> something somewhere (and a lot of things can trigger that script...)
That makes total sense.
> Work-around: convert everything to ceph-volume simple first by running
> "ceph-vol
While converting a luminous cluster from filestore to bluestore, we
are running into a weird race condition on a fairly regular basis.
We have a master script that writes upgrade scripts for each OSD
server. The script for an OSD looks like this:
ceph osd out 68
while ! ceph osd safe-to-destroy
On Wed, Aug 1, 2018 at 9:53 PM, Brad Hubbard wrote:
> What is the status of the cluster with this osd down and out?
Briefly, miserable.
All client IO was blocked.
36 pgs were stuck “down.” pg query reported that they were blocked by
that OSD, despite that OSD not holding any replicas for them,
seemed very happy to see it
again.
Not sure if this solution works generally or if it was specific to
this case, or if it was not a solution and the cluster will eat itself
overnight. But, so far so good!
Thanks!
On Wed, Aug 1, 2018 at 3:42 PM, J David wrote:
> Hello all,
>
> On
Hello all,
On Luminous 12.2.7, during the course of recovering from a failed OSD,
one of the other OSDs started repeatedly crashing every few seconds
with an assertion failure:
2018-08-01 12:17:20.584350 7fb50eded700 -1 log_channel(cluster) log
[ERR] : 2.621 past_interal bound [19300,21449) end d
On Thu, Oct 19, 2017 at 9:42 PM, Brad Hubbard wrote:
> I guess you have both read and followed
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/?highlight=backfill#debugging-slow-requests
>
> What was the result?
Not sure if you’re asking Ольга or myself, but in my cas
On Wed, Oct 18, 2017 at 8:12 AM, Ольга Ухина wrote:
> I have a problem with ceph luminous 12.2.1.
> […]
> I have slow requests on different OSDs on random time (for example at night,
> but I don’t see any problems at the time of problem
> […]
> 2017-10-18 01:20:38.187326 mon.st3 mon.0 10.192.1.78:
ely starving
other normal or lower priority requests. Is that how it works? Or is
the queue in question a simple FIFO queue?
Is there anything else I can try to help narrow this down?
Thanks!
On Sat, Oct 14, 2017 at 6:51 PM, J David wrote:
> On Sat, Oct 14, 2017 at 9:33 AM, David Turner
On Sat, Oct 14, 2017 at 9:33 AM, David Turner wrote:
> First, there is no need to deep scrub your PGs every 2 days.
They aren’t being deep scrubbed every two days, nor is there any
attempt (or desire) to do so. That would be require 8+ scrubs running
at once. Currently, it takes between 2 and 3
Thanks all for input on this.
It’s taken a couple of weeks, but based on the feedback from the list,
we’ve got our version of a scrub-one-at-a-time cron script running and
confirmed that it’s working properly.
Unfortunately, this hasn’t really solved the real problem. Even with
just one scrub an
With “osd max scrubs” set to 1 in ceph.conf, which I believe is also
the default, at almost all times, there are 2-3 deep scrubs running.
3 simultaneous deep scrubs is enough to cause a constant stream of:
mon.ceph1 [WRN] Health check update: 69 slow requests are blocked > 32
sec (REQUEST_SLOW)
On Wed, Oct 26, 2016 at 8:55 AM, Andreas Davour wrote:
> If there are 1 MON in B, that cluster will have quorum within itself and
> keep running, and in A the MON cluster will vote and reach quorum again.
Quorum requires a majority of all monitors. One monitor by itself (in
a cluster with at lea
On Tue, Oct 25, 2016 at 3:10 PM, Steve Taylor wrote:
> Recently we tested an upgrade from 0.94.7 to 10.2.3 and found exactly the
> opposite. Upgrading the clients first worked for many operations, but we
> got "function not implemented" errors when we would try to clone RBD
> snapshots.
>
Yes, w
What are the potential consequences of using out-of-date client
libraries with RBD against newer clusters?
Specifically, what are the potential ill-effects of using Firefly
client libraries (0.80.7 and 0.80.8) to access Hammer or Jewel
(10.2.3) clusters?
The upgrading instructions (
http://docs.c
Yes, given the architectural design limitations of ZFS, there will
indeed always be performance consequences for using it in an
environment its creators never envisioned, like Ceph. But ZFS offers
many advanced features not found on other filesystems, and for
production environments that depend on
For a variety of reasons, a ZFS pool in a QEMU/KVM virtual machine
backed by a Ceph RBD doesn’t perform very well.
Does anyone have any tuning tips (on either side) for this workload?
A fair amount of the problem is probably related to two factors.
First, ZFS always assumes it is talking to bare
On Mon, Oct 19, 2015 at 7:09 PM, John Wilkins wrote:
> The classic case is when you are just trying Ceph out on a laptop (e.g.,
> using file directories for OSDs, setting the replica size to 2, and setting
> osd_crush_chooseleaf_type to 0).
Sure, but the text isn’t really applicable in that situa
In the Ceph docs, at:
http://docs.ceph.com/docs/master/rados/deployment/ceph-deploy-osd/
It says (under "Prepare OSDs"):
"Note: When running multiple Ceph OSD daemons on a single node, and
sharing a partioned journal with each OSD daemon, you should consider
the entire node the minimum failure d
Thanks & Regards
> Somnath
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark
> Nelson
> Sent: Wednesday, September 30, 2015 12:04 PM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Ceph, SSD, and NVMe
On Tue, Sep 29, 2015 at 7:32 AM, Jiri Kanicky wrote:
> Thank you for your reply. In this case I am considering to create separate
> partitions for each disk on the SSD drive. Would be good to know what is the
> performance difference, because creating partitions is kind of waste of
> space.
It ma
Because we have a good thing going, our Ceph clusters are still
running Firefly on all of our clusters including our largest, all-SSD
cluster.
If I understand right, newer versions of Ceph make much better use of
SSDs and give overall much higher performance on the same equipment.
However, the imp
On Wed, Sep 30, 2015 at 8:19 AM, Mark Nelson wrote:
> FWIW, I've mentioned to Supermicro that I would *really* love a version of the
> 5018A-AR12L that replaced the Atom with an embedded Xeon-D 1540. :)
Is even that enough? (It's a serious question; due to our insatiable
need for IOPs rather tha
On Thu, Sep 3, 2015 at 3:49 PM, Gurvinder Singh
wrote:
>> The density would be higher than the 36 drive units but lower than the
>> 72 drive units (though with shorter rack depth afaik).
> You mean the 1U solution with 12 disk is longer in length than 72 disk
> 4U version ?
This is a bit old and
On Fri, Jul 17, 2015 at 12:19 PM, Mark Nelson wrote:
> Maybe try some iperf tests between the different OSD nodes in your
> cluster and also the client to the OSDs.
This proved to be an excellent suggestion. One of these is not like the others:
f16 inbound: 6Gbps
f16 outbound: 6Gbps
f17 inbound
On Fri, Jul 17, 2015 at 11:15 AM, Quentin Hartman
wrote:
> That looks a lot like what I was seeing initially. The OSDs getting marked
> out was relatively rare and it took a bit before I saw it.
Our problem is "most of the time" and does not appear confined to a
specific ceph cluster node or OSD:
On Fri, Jul 17, 2015 at 10:47 AM, Quentin Hartman
wrote:
> What does "ceph status" say?
Usually it says everything is cool. However just now it gave this:
cluster e9c32e63-f3eb-4c25-b172-4815ed566ec7
health HEALTH_WARN 2 requests are blocked > 32 sec
monmap e3: 3 mons at
{f16=192.
On Fri, Jul 17, 2015 at 10:21 AM, Mark Nelson wrote:
> rados -p 30 bench write
>
> just to see how it handles 4MB object writes.
Here's that, from the VM host:
Total time run: 52.062639
Total writes made: 66
Write size: 4194304
Bandwidth (MB/sec): 5.071
Stddev Ban
This is the same cluster I posted about back in April. Since then,
the situation has gotten significantly worse.
Here is what iostat looks like for the one active RBD image on this cluster:
Device: rrqm/s wrqm/s r/s w/srkB/swkB/s
avgrq-sz avgqu-sz await r_await w_awai
On Fri, Apr 24, 2015 at 10:58 AM, Nick Fisk wrote:
> 7.2k drives tend to do about 80 iops at 4kb IO sizes, as the IO size
> increases the number of iops will start to fall. You will probably get
> around 70 iops for 128kb. But please benchmark your raw disks to get some
> accurate numbers if neede
On Fri, Apr 24, 2015 at 6:39 AM, Nick Fisk wrote:
> From the Fio runs, I see you are getting around 200 iops at 128kb write io
> size. I would imagine you should be getting somewhere around 200-300 iops
> for the cluster you posted in the initial post, so it looks like its
> performing about right
On Thu, Apr 23, 2015 at 4:23 PM, Mark Nelson wrote:
> If you want to adjust the iodepth, you'll need to use an asynchronous
> ioengine like libaio (you also need to use direct=1)
Ah yes, libaio makes a big difference. With 1 job:
testfile: (g=0): rw=randwrite, bs=128K-128K/128K-128K/128K-128K,
On Thu, Apr 23, 2015 at 3:05 PM, Nick Fisk wrote:
> I have had a look through the fio runs, could you also try and run a couple
> of jobs with iodepth=64 instead of numjobs=64. I know they should do the
> same thing, but the numbers with the former are easier to understand.
Maybe it's an issue of
On Thu, Apr 23, 2015 at 3:05 PM, Nick Fisk wrote:
> If you can let us know the avg queue depth that ZFS is generating that will
> probably give a good estimation of what you can expect from the cluster.
How would that be measured?
> I have had a look through the fio runs, could you also try and
On Wed, Apr 22, 2015 at 4:07 PM, Somnath Roy wrote:
> I am suggesting synthetic workload like fio to run on top of VM to identify
> where the bottleneck is. For example, if fio is giving decent enough output,
> I guess ceph layer is doing fine. It is your client that is not driving
> enough.
A
On Wed, Apr 22, 2015 at 4:30 PM, Nick Fisk wrote:
> I suspect you are hitting problems with sync writes, which Ceph isn't known
> for being the fastest thing for.
There's "not being the fastest thing" and "an expensive cluster of
hardware that performs worse than a single SATA drive." :-(
> I'm
On Wed, Apr 22, 2015 at 2:16 PM, Gregory Farnum wrote:
> Uh, looks like it's the contents of the "omap" directory (inside of
> "current") are the levelDB store. :)
OK, here's du -sk of all of those:
36740 ceph-0/current/omap
35736 ceph-1/current/omap
37356 ceph-2/current/omap
38096 ceph-3/curren
On Wed, Apr 22, 2015 at 2:54 PM, Somnath Roy wrote:
> What ceph version are you using ?
Firefly, 0.80.9.
> Could you try with rbd_cache=false or true and see if behavior changes ?
As this is ZFS, running a cache layer below it that it is not aware of
violates data integrity and can cause corrup
A very small 3-node Ceph cluster with this OSD tree:
http://pastebin.com/mUhayBk9
has some performance issues. All 27 OSDs are 5TB SATA drives, it
keeps two copies of everything, and it's really only intended for
nearline backups of large data objects.
All of the OSDs look OK in terms of utiliz
On Thu, Apr 16, 2015 at 8:02 PM, Gregory Farnum wrote:
> Since I now realize you did a bunch of reweighting to try and make
> data match up I don't think you'll find something like badly-sized
> LevelDB instances, though.
It's certainly something I can check, just to be sure. Erm, what does
a Le
On Wed, Apr 22, 2015 at 7:12 AM, Stefan Priebe - Profihost AG
wrote:
> Also a reweight-by-utilization does nothing.
As a fellow sufferer from this issue, mostly what I can offer you is
sympathy rather than actual help. However, this may be beneficial:
By default, reweight-by-utilization only al
On Thu, Apr 9, 2015 at 7:20 PM, Gregory Farnum wrote:
> Okay, but 118/85 = 1.38. You say you're seeing variance from 53%
> utilization to 96%, and 53%*1.38 = 73.5%, which is *way* off your
> numbers.
53% to 96% is with all weights set to default (i.e. disk size) and all
reweights set to 1. (I.e.
On Wed, Apr 8, 2015 at 11:40 AM, Gregory Farnum wrote:
> "ceph pg dump" will output the size of each pg, among other things.
Among many other things. :)
Here is the raw output, in case I'm misinterpreting it:
http://pastebin.com/j4ySNBdQ
It *looks* like the pg's are roughly uniform in size. T
On Wed, Apr 8, 2015 at 11:33 AM, Gregory Farnum wrote:
> Is this a problem with your PGs being placed unevenly, with your PGs being
> sized very differently, or both?
Please forgive the silly question, but how would one check that?
Thanks!
___
ceph-use
Getting placement groups to be placed evenly continues to be a major
challenge for us, bordering on impossible.
When we first reported trouble with this, the ceph cluster had 12
OSD's (each Intel DC S3700 400GB) spread across three nodes. Since
then, it has grown to 8 nodes with 38 OSD's.
The av
On Wed, Jan 21, 2015 at 5:53 PM, Gregory Farnum wrote:
> Depending on how you configured things it's possible that the min_size
> is also set to 2, which would be bad for your purposes (it should be
> at 1).
This was exactly the problem. Setting min_size=1 (which I believe
used to be the default
A couple of weeks ago, we had some involuntary maintenance come up
that required us to briefly turn off one node of a three-node ceph
cluster.
To our surprise, this resulted in failure to write on the VM's on that
ceph cluster, even though we set noout before the maintenance.
This cluster is for
On Tue, Sep 2, 2014 at 3:47 PM, Alfredo Deza wrote:
> This is an actual issue, so I created:
>
> http://tracker.ceph.com/issues/9319
>
> And should be fixing it soon.
Thank you!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com
On Tue, Sep 2, 2014 at 2:50 PM, Konrad Gutkowski
wrote:
> You need to set higher priority for ceph repo, check "ceph-deploy with
> --release (--stable) for dumpling?" thread.
Right, this is the same issue as that. It looks like the 0.80.1
packages are coming from Ubuntu; this is the first time w
On Tue, Sep 2, 2014 at 1:00 PM, Alfredo Deza wrote:
> correct, if you don't specify what release you want/need, ceph-deploy
> will use the latest stable release (firefly as of this writing)
So, ceph-deploy set up emperor repositories in
/etc/apt/sources.list.d/ceph.list and then didn't use them?
While adding some nodes to a ceph emperor cluster using ceph-deploy,
the new nodes somehow wound up with 0.80.1, which I think is a Firefly
release.
The ceph version on existing nodes:
$ ceph --version
ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
The repository on the new nodes
On Fri, Aug 29, 2014 at 2:53 AM, Christian Balzer wrote:
>> Now, 1200 is not a power of two, but it makes sense. (12 x 100).
> Should have been 600 and then upped to 1024.
At the time, there was a reason why doing that did not work, but I
don't remember the specifics. All messages sent back in
On Fri, Aug 29, 2014 at 12:52 AM, pragya jain wrote:
> #2: why odd no. of monitors are recommended for production cluster, not even
> no.?
Because to achieve a quorum, you must always have participation of
more than 50% of the monitors. Not 50%. More than 50%. With an even
number of monitors,
On Thu, Aug 28, 2014 at 10:47 PM, Christian Balzer wrote:
>> There are 1328 PG's in the pool, so about 110 per OSD.
>>
> And just to be pedantic, the PGP_NUM is the same?
Ah, "ceph status" reports 1328 pgs. But:
$ sudo ceph osd pool get rbd pg_num
pg_num: 1200
$ sudo ceph osd pool get rbd pgp_n
On Thu, Aug 28, 2014 at 7:00 PM, Robert LeBlanc wrote:
> How many PGs do you have in your pool? This should be about 100/OSD.
There are 1328 PG's in the pool, so about 110 per OSD.
Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://li
Hello,
Is there any way to provoke a ceph cluster to level out its OSD usage?
Currently, a cluster of 3 servers with 4 identical OSDs each is
showing disparity of about 20% between the most-used OSD and the
least-used OSD. This wouldn't be too big of a problem, but the
most-used OSD is now at 86
58 matches
Mail list logo