[ceph-users] All shards of PG missing object and inconsistent

2018-09-21 Thread Thomas White
Hi all, I have recently performed a few tasks, namely purging several buckets from our RGWs and added additional hosts into Ceph causing some data movement for a rebalance. As this is now almost completed, I kicked off some deep scrubs and one PG is now returning the following information:

[ceph-users] Bluestore DB showing as ssd

2018-09-21 Thread Brett Chancellor
Hi all. Quick question about osd metadata information. I have several OSDs setup with the data dir on HDD and the db going to a partition on ssd. But when I look at the metadata for all the OSDs, it's showing the db as "hdd". Does this effect anything? And is there anyway to change it? $ sudo

Re: [ceph-users] crush map reclassifier

2018-09-21 Thread Paul Emmerich
I've used a crush location hook script to handle this before device classes existed. It checked the device type on startup and assigned the crush position based on this. I don't have that crush map any longer anywhere but the basic version of it looked like this: two roots "hdd" and "ssd". The

[ceph-users] crush map reclassifier

2018-09-21 Thread Sage Weil
Hi everyone, In luminous we added the crush device classes that automagically categorize your OSDs and hdd, ssd, etc, and allow you write CRUSH rules that target a subset of devices. Prior to this it was necessary to make custom edits to your CRUSH map with parallel hierarchies for each OSD

[ceph-users] radosgw rest API to retrive rgw log entries

2018-09-21 Thread Jin Mao
I am looking for an API equivalent of 'radosgw-admin log list' and 'radosgw-admin log show'. Existing /usage API only reports bucket level numbers like 'radosgw-admin usage show' does. Does anyone know if this is possible from rest API? Thanks. Jin.

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
Le vendredi 21 septembre 2018 à 19:45 +0200, Paul Emmerich a écrit : > The cache tiering has nothing to do with the PG of the underlying > pool > being incomplete. > You are just seeing these requests as stuck because it's the only > thing trying to write to the underlying pool. I agree, It was

Re: [ceph-users] Proxmox/ceph upgrade and addition of a new node/OSDs

2018-09-21 Thread Fabian Grünbichler
On Fri, Sep 21, 2018 at 09:03:15AM +0200, Hervé Ballans wrote: > Hi MJ (and all), > > So we upgraded our Proxmox/Ceph cluster, and if we have to summarize the > operation in a few words : overall, everything went well :) > The most critical operation of all is the 'osd crush tunables optimal', I

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Paul Emmerich
The cache tiering has nothing to do with the PG of the underlying pool being incomplete. You are just seeing these requests as stuck because it's the only thing trying to write to the underlying pool. What you need to fix is the PG showing incomplete. I assume you already tried reducing the

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
So I've totally disable cache-tiering and overlay. Now OSD 68 & 69 are fine, no more blocked. But OSD 32 is still blocked, and PG 37.9c still marked incomplete with : "recovery_state": [ { "name": "Started/Primary/Peering/Incomplete", "enter_time": "2018-09-21

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Maks Kowalik
According to the query output you pasted shards 1 and 2 are broken. But, on the other hand EC profile (4+2) should make it possible to recover from 2 shards lost simultanously... pt., 21 wrz 2018 o 16:29 Olivier Bonvalet napisał(a): > Well on drive, I can find thoses parts : > > - cs0 on OSD 29

Re: [ceph-users] No fix for 0x6706be76 CRCs ? [SOLVED] (WORKAROUND)

2018-09-21 Thread Alfredo Daniel Rezinovsky
I have ubuntu servers. With ukuu I installed kernel 4.8.17-040817 (The last < 4.9 available kernel) and I haven't any 0x6706be76 crc since. Nor any inconsistence. On 19/09/18 12:01, Alfredo Daniel Rezinovsky wrote: Tried 4.17 with the same problem Just downgraded to 4.8. Let's see if no

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
Well on drive, I can find thoses parts : - cs0 on OSD 29 and 30 - cs1 on OSD 18 and 19 - cs2 on OSD 13 - cs3 on OSD 66 - cs4 on OSD 0 - cs5 on OSD 75 And I can read thoses files too. And all thoses OSD are UP and IN. Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a écrit : > > > I

Re: [ceph-users] rbd-nbd map question

2018-09-21 Thread Vikas Rana
Hi I’m using 10.2.10 Thx On Fri, Sep 21, 2018 at 9:14 AM Mykola Golub wrote: > Vikas, could you tell what version do you observe this on? > > Because I can reproduce this only on jewel, and it has been fixed > starting since luminous 12.2.1 [1]. > > [1] http://tracker.ceph.com/issues/20426 > >

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
Yep : pool 38 'cache-bkp-foo' replicated size 3 min_size 2 crush_rule 26 object_hash rjenkins pg_num 128 pgp_num 128 last_change 585369 lfor 68255/68255 flags hashpspool,incomplete_clones tier_of 37 cache_mode readproxy target_bytes 209715200 hit_set bloom{false_positive_probability: 0.05,

Re: [ceph-users] Dashboard Object Gateway

2018-09-21 Thread Volker Theile
Hi Hendrik, thank you for reporting the issue. I've opened a tracker issue for that, see https://tracker.ceph.com/issues/36109. As workaround manually configure host and port via CLI using "ceph dashboard set-rgw-api-host " and "ceph dashboard set-rgw-api-port "? Regards Volker Am 18.09.2018

Re: [ceph-users] customized ceph cluster name by ceph-deploy

2018-09-21 Thread Paul Emmerich
Cluster names are deprecated, don't use them. I think they might have been removed with ceph-deploy 2.x (?) Paul Am Fr., 21. Sep. 2018 um 15:13 Uhr schrieb Joshua Chen : > > Hi all, > I am using ceph-deploy 2.0.1 to create my testing cluster by this command: > > ceph-deploy --cluster

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Eugen Block
I also switched the cache tier to "readproxy", to avoid using this cache. But, it's still blocked. You could change the cache mode to "none" to disable it. Could you paste the output of: ceph osd pool ls detail | grep cache-bkp-foo Zitat von Olivier Bonvalet : In fact, one object (only

Re: [ceph-users] rbd-nbd map question

2018-09-21 Thread Mykola Golub
Vikas, could you tell what version do you observe this on? Because I can reproduce this only on jewel, and it has been fixed starting since luminous 12.2.1 [1]. [1] http://tracker.ceph.com/issues/20426 On Wed, Sep 19, 2018 at 03:48:44PM -0400, Jason Dillaman wrote: > Thanks for reporting this

[ceph-users] customized ceph cluster name by ceph-deploy

2018-09-21 Thread Joshua Chen
Hi all, I am using ceph-deploy 2.0.1 to create my testing cluster by this command: ceph-deploy --cluster pescadores new --cluster-network 100.109.240.0/24 --public-network 10.109.240.0/24 cephmon1 cephmon2 cephmon3 but the --cluster pescadores (name of the cluster) doesn't seem to work.

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Eugen Block
I tried to flush the cache with "rados -p cache-bkp-foo cache-flush- evict-all", but it blocks on the object "rbd_data.f66c92ae8944a.000f2596". This is the object that's stuck in the cache tier (according to your output in https://pastebin.com/zrwu5X0w). Can you verify if that block

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Maks Kowalik
Could you, please paste the output of pg 37.9c query pt., 21 wrz 2018 o 14:39 Olivier Bonvalet napisał(a): > In fact, one object (only one) seem to be blocked on the cache tier > (writeback). > > I tried to flush the cache with "rados -p cache-bkp-foo cache-flush- > evict-all", but it blocks on

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
In fact, one object (only one) seem to be blocked on the cache tier (writeback). I tried to flush the cache with "rados -p cache-bkp-foo cache-flush- evict-all", but it blocks on the object "rbd_data.f66c92ae8944a.000f2596". So I reduced (a lot) the cache tier to 200MB, "rados -p

Re: [ceph-users] Hyper-v ISCSI support

2018-09-21 Thread Jason Dillaman
On Fri, Sep 21, 2018 at 6:48 AM Glen Baars wrote: > > Hello Ceph Users, > > > > We have been using ceph-iscsi-cli for some time now with vmware and it is > performing ok. > > > > We would like to use the same iscsi service to store our Hyper-v VMs via > windows clustered shared volumes. When we

Re: [ceph-users] Hyper-v ISCSI support

2018-09-21 Thread Maged Mokhtar
Hi Glen, Yes you need clustered SCSI-3 persistent reservations support. This is supported in SUSE SLE kernels, you may also be interested in PetaSAN: http://www.petasan.org which is based on these kernels. Maged On 21/09/18 12:48, Glen Baars wrote: Hello Ceph Users, We have been using

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
Ok, so it's a replica 3 pool, and OSD 68 & 69 are on the same host. Le vendredi 21 septembre 2018 à 11:09 +, Eugen Block a écrit : > > cache-tier on this pool have 26GB of data (for 5.7TB of data on the > > EC > > pool). > > We tried to flush the cache tier, and restart OSD 68 & 69, without >

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Eugen Block
cache-tier on this pool have 26GB of data (for 5.7TB of data on the EC pool). We tried to flush the cache tier, and restart OSD 68 & 69, without any success. I meant the replication size of the pool ceph osd pool ls detail | grep In the experimental state of our cluster we had a cache tier

Re: [ceph-users] Remotely tell an OSD to stop ?

2018-09-21 Thread Nicolas Huillard
Thanks! I was in the process of upgrading, so "noout" was already set, probably preventing setting "noin". I thus just "ceph osdset noup", then "ceph osd down ", which stopped activity on the disks (probably not enough to clean everything in Bluestore, but I decided to trust its inner working). I

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
Hi, cache-tier on this pool have 26GB of data (for 5.7TB of data on the EC pool). We tried to flush the cache tier, and restart OSD 68 & 69, without any success. But I don't see any related data on cache-tier OSD (filestore) with : find /var/lib/ceph/osd/ -maxdepth 3 -name '*37.9c*' I

[ceph-users] Hyper-v ISCSI support

2018-09-21 Thread Glen Baars
Hello Ceph Users, We have been using ceph-iscsi-cli for some time now with vmware and it is performing ok. We would like to use the same iscsi service to store our Hyper-v VMs via windows clustered shared volumes. When we add the volume to windows failover manager we get a device is not ready

Re: [ceph-users] ceph-ansible

2018-09-21 Thread Alfredo Deza
On Thu, Sep 20, 2018 at 7:04 PM solarflow99 wrote: > > oh, was that all it was... git clone https://github.com/ceph/ceph-ansible/ > I installed the notario package from EPEL, > python2-notario-0.0.11-2.el7.noarch and thats the newest they have Hey Ken, I thought the latest versions were

Re: [ceph-users] Remotely tell an OSD to stop ?

2018-09-21 Thread Alexandru Cucu
Hi, You won't be able to stop them, but if the OSDs are still running I would just set them as out, wait for all data to be moved from them and then it should be safe to power off the host. --- Alex On Fri, Sep 21, 2018 at 11:50 AM Nicolas Huillard wrote: > > Hi all, > > One of my server

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Eugen Block
Hi Olivier, what size does the cache tier have? You could set cache-mode to forward and flush it, maybe restarting those OSDs (68, 69) helps, too. Or there could be an issue with the cache tier, what do those logs say? Regards, Eugen Zitat von Olivier Bonvalet : Hello, on a Luminous

Re: [ceph-users] Remotely tell an OSD to stop ?

2018-09-21 Thread Patrick Nawracay
Hi, you'll need to set `noup` to prevent OSDs from being started automatically. The `noin` flags prevents that the cluster sets the OSD `in` again, after it has been set `out`.     `ceph osd set noup` before `ceph osd down `     `ceph osd set noin` before `ceph osd out ` Those global flags

[ceph-users] Remotely tell an OSD to stop ?

2018-09-21 Thread Nicolas Huillard
Hi all, One of my server crashed its root filesystem, ie. the currently open shell just says "command not found" for any basic command (ls, df, mount, dmesg, etc.) ACPI soft power-off won't work because it needs scripts on /... Before I reset the hardware, I'd like to cleanly stop the OSDs on

[ceph-users] how dynamic bucket sharding works

2018-09-21 Thread Tomasz Płaza
Hi Cephers, Could someone explain me how dynamic bucket index sharding works? I have created a test bucket with 4 million objects on ceph 12.2.8 and it showed 80 shards (ver, master_ver, max_marker fomr 0 to 79 in bucket stats) and leave it for a night. Next day in the morning I found this in

Re: [ceph-users] backup ceph

2018-09-21 Thread Oliver Freyermuth
Hi, Am 21.09.18 um 03:28 schrieb ST Wong (ITSC): > Hi, > >>> Will the RAID 6 be mirrored to another storage in remote site for DR >>> purpose? >> >> Not yet. Our goal is to have the backup ceph to which we will replicate >> spread across three different buildings, with 3 replicas. > > May I

Re: [ceph-users] Proxmox/ceph upgrade and addition of a new node/OSDs

2018-09-21 Thread mj
Hi Hervé! Thanks for the detailed summary, much appreciated! Best, MJ On 09/21/2018 09:03 AM, Hervé Ballans wrote: Hi MJ (and all), So we upgraded our Proxmox/Ceph cluster, and if we have to summarize the operation in a few words : overall, everything went well :) The most critical

Re: [ceph-users] Proxmox/ceph upgrade and addition of a new node/OSDs

2018-09-21 Thread Hervé Ballans
Hi MJ (and all), So we upgraded our Proxmox/Ceph cluster, and if we have to summarize the operation in a few words : overall, everything went well :) The most critical operation of all is the 'osd crush tunables optimal', I talk about it in more detail after... The Proxmox documentation is