Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-19 Thread Tarek Zegar
On the host with the osd run:  ceph-volume lvm list From: "☣Adam" To: ceph-users@lists.ceph.com Date: 07/18/2019 03:25 PM Subject:[EXTERNAL] Re: [ceph-users] Need to replace OSD. How do I find physical disk Sent by:"ceph-users" The block device can

Re: [ceph-users] Client admin socket for RBD

2019-06-25 Thread Tarek Zegar
advanced mon_osd_down_out_interval 30 mon.hostmonitor1 advanced mon_osd_min_in_ratio 0.10 root@hostmonitor1:~# ceph config get client.admin WHO MASK LEVEL OPTION VALUE RO <-blank What am I missing from what you're suggesting? Thank you for

Re: [ceph-users] Enable buffered write for bluestore

2019-06-13 Thread Tarek Zegar
http://docs.ceph.com/docs/master/rbd/rbd-config-ref/ From: Trilok Agarwal To: ceph-users@lists.ceph.com Date: 06/12/2019 07:31 PM Subject:[EXTERNAL] [ceph-users] Enable buffered write for bluestore Sent by:"ceph-users" Hi How can we enable bluestore_default_buffer

Re: [ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size

2019-06-09 Thread Tarek Zegar
Hi Haung, So you are suggesting that even though osd.4 in this case has weight 0, it's still getting new data being written to it? I find that counter to what weight 0 means. Thanks Tarek From: huang jun To: Tarek Zegar Cc: Paul Emmerich , Ceph Users Date:

Re: [ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size

2019-06-07 Thread Tarek Zegar
p TOTAL 90 GiB 15 GiB 6.2 GiB 61 KiB 9.0 GiB 75 GiB 16.92 MIN/MAX VAR: 0.89/1.13 STDDEV: 1.32 Tarek Zegar Senior SDS Engineer Email tze...@us.ibm.com Mobile 630.974.7172 From: Paul Emmerich To: Tarek Zegar Cc: Ceph Users Date: 06/07/2019 05:25 AM Subjec

[ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size

2019-06-06 Thread Tarek Zegar
For testing purposes I set a bunch of OSD to 0 weight, this correctly forces Ceph to not use said OSD. I took enough out such that the UP set only had Pool min size # of OSD (i.e 2 OSD). Two Questions: 1. Why doesn't the acting set eventually match the UP set and simply point to [6,5] only 2. Why

Re: [ceph-users] Fix scrub error in bluestore.

2019-06-06 Thread Tarek Zegar
Look here http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent Read error typically is a disk issue. The doc is not clear on how to resolve that From: Alfredo Rezinovsky To: Ceph Users Date: 06/06/2019 10:58 AM Subject:[EXTERNAL] [ce

Re: [ceph-users] Balancer: uneven OSDs

2019-05-29 Thread Tarek Zegar
:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/upmap_max_iterations 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/upmap_max_deviation 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] pools ['rbd'] 2019-05-29 17:06:54.327 7f40cd3e870

Re: [ceph-users] Balancer: uneven OSDs

2019-05-29 Thread Tarek Zegar
ions/upmap/ Of course, please be aware that your clients must be recent enough (especially for kernel clients). Sadly, if the compat-level is too old for upmap, you'll only find a small warning about that in the logfiles, but no error on terminal when activating the balancer or any other

[ceph-users] Balancer: uneven OSDs

2019-05-29 Thread Tarek Zegar
Can anyone help with this? Why can't I optimize this cluster, the pg counts and data distribution is way off. __ I enabled the balancer plugin and even tried to manually invoke it but it won't allow any changes. Looking at ceph osd df, it's not even at all. Thoughts? root@hostadm

[ceph-users] Balancer: uneven OSDs

2019-05-28 Thread Tarek Zegar
I enabled the balancer plugin and even tried to manually invoke it but it won't allow any changes. Looking at ceph osd df, it's not even at all. Thoughts? root@hostadmin:~# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL%USE VAR PGS 1 hdd 0.0098000 B 0 B 0

[ceph-users] PG stuck in Unknown after removing OSD - Help?

2019-05-20 Thread Tarek Zegar
Set 3 osd to "out", all were on the same host and should not impact the pool because it's 3x replication and CRUSH is one osd per host. However, now we have one PG stuck UKNOWN. Not sure why this is the case, I do have background writes going on at the time of OSD out. Thoughts? ceph osd tree ID

Re: [ceph-users] Lost OSD from PCIe error, recovered, HOW to restore OSD process

2019-05-16 Thread Tarek Zegar
ceph-volume lvm activate --all 6. You should see the drive somewhere in the ceph tree, move it to the right host Tarek From: "Tarek Zegar" To: Alfredo Deza Cc: ceph-users Date: 05/15/2019 10:32 AM Subject:[EXTERNAL] Re: [ceph-users] Lost OSD from PCIe erro

Re: [ceph-users] Lost OSD from PCIe error, recovered, to restore OSD process

2019-05-15 Thread Tarek Zegar
-osd@122.service: Start request repeated too quickly. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: Failed with result 'exit-code'. May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: Failed to start Ceph object storage daemon osd.122 From: Alfredo Deza

Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO [EXT]

2019-05-14 Thread Tarek Zegar
https://github.com/ceph/ceph-ansible/issues/3961 <--- created ticket Thanks Tarek From: Matthew Vernon To: Tarek Zegar , solarflo...@gmail.com Cc: ceph-users@lists.ceph.com Date: 05/14/2019 04:41 AM Subject:[EXTERNAL] Re: [ceph-users] Rolling upgrade fails with f

[ceph-users] Lost OSD from PCIe error, recovered, to restore OSD process

2019-05-14 Thread Tarek Zegar
Someone nuked and OSD that had 1 replica PGs. They accidentally did echo 1 > /sys/block/nvme0n1/device/device/remove We got it back doing a echo 1 > /sys/bus/pci/rescan However, it reenumerated as a different drive number (guess we didn't have udev rules) They restored the LVM volume (vgcfgrestore

Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO

2019-05-13 Thread Tarek Zegar
It's not just mimic to nautilus I confirmed with luminous to mimic   They are checking for clean pgs with flags set, they should unset flags, then check. Set flags again, move on to next osd   - Original message -From: solarflow99 To: Tarek Zegar Cc: Ceph Users Subject: [EXTERNA

[ceph-users] Ceph MGR CRASH : balancer module

2019-05-13 Thread Tarek Zegar
Hello, My manager keeps dying, the last meta log is below. What is causing this? I do have two roots in the osd tree with shared hosts(see below), I can't imagine that is causing balancer to fail? meta log: { "crash_id": "2019-05-11_19:09:17.999875Z_aa7afa7c-bc7e-43ec-b32a-821bd47bd68b",

[ceph-users] Rolling upgrade fails with flag norebalance with background IO

2019-05-10 Thread Tarek Zegar
Ceph-ansible 3.2, rolling upgrade mimic -> nautilus. The ansible file sets flag "norebalance". When there is*no* I/O to the cluster, upgrade works fine. When upgrading with IO running in the background, some PG become `active+undersized+remapped+backfilling` Flag norebalance prevents them from ba

[ceph-users] PG in UP set but not Acting? Backfill halted

2019-05-09 Thread Tarek Zegar
Hello, Been working on Ceph for only a few weeks and have a small cluster in VMs. I did a ceph-ansible rolling_update to nautilus from mimic and some of my PG were stuck in 'active+undersized+remapped+backfilling' with no progress. All OSD were up and in (see ceph tree below). The PGs only had 2