[ceph-users] 2 pgs stuck in undersized after cluster recovery

2018-06-29 Thread shadow_lin
Hi list, After recovery from losting some of the osds I got 2 pgs stuck in undersized. ceph health detail returns PG_DEGRADED Degraded data redundancy: 2 pgs undersized pg 4.2 is stuck undersized for 3081.012062, current state active+undersized, last acting [13] pg 4.33

Re: [ceph-users] VMWARE and RBD

2018-06-29 Thread Enrico Kern
We use ceph iscsi with vmware. Mainly for disks in veeam to backup some other vmware clusters with local disks. No problem so far. I think really depends on the usecase Steven Vacaroaia schrieb am Fr., 29. Juni 2018, 15:19: > Hi Horace > > Thanks > > Would you be willing to share instructions f

Re: [ceph-users] CephFS+NFS For VMWare

2018-06-29 Thread Paul Emmerich
VMWare can be quite picky about NFS servers. Some things that you should test before deploying anything with that in production: * failover * reconnects after NFS reboots or outages * NFS3 vs NFS4 * Kernel NFS (which kernel version? cephfs-fuse or cephfs-kernel?) vs NFS Ganesha (VFS FSAL vs. Ceph

Re: [ceph-users] radosgw multizone not syncing large bucket completly to other zone

2018-06-29 Thread Enrico Kern
hmm that also pops up right away when i restart all radosgw instances. But i will check further and see if i can find something. Maybe doing the upgrade to mimic too. That bucket is basically under load on the master zone all the time as we use it as historical storage for druid, so there is const

[ceph-users] CephFS+NFS For VMWare

2018-06-29 Thread Nick Fisk
This is for us peeps using Ceph with VMWare. My current favoured solution for consuming Ceph in VMWare is via RBD's formatted with XFS and exported via NFS to ESXi. This seems to perform better than iSCSI+VMFS which seems to not play nicely with Ceph's PG contention issues particularly if wor

Re: [ceph-users] Problems setting up iSCSI

2018-06-29 Thread Bernhard Dick
Am 29.06.2018 um 17:45 schrieb Jason Dillaman: The 7d0023e73855a42ac25038403387ab41ca10753a version should be fine, it's what I use in my test environment. I really cannot explain why tcmu-runner is missing the rbd handler. I can only assume you have restarted the daemon after installing it. Th

[ceph-users] Performance tuning for SAN SSD config

2018-06-29 Thread Matthew Stroud
We back some of our ceph clusters with SAN SSD disk, particularly VSP G/F and Purestorage. I’m curious what are some settings we should look into modifying to take advantage of our SAN arrays. We had to manually set the class for the luns to SSD class which was a big improvement. However we stil

Re: [ceph-users] Ceph snapshots

2018-06-29 Thread Paul Emmerich
IIRC it can be changed/takes effect immediately. The message is only an implementation detail: there is no observer registered that explicitly takes some action when it's changed, but the value is re-read anyways. But it's some time since I had to change this value on run-time but I'm pretty sure i

Re: [ceph-users] Ceph snapshots

2018-06-29 Thread Marc Schöchlin
It seems that this might interesting - unfortunately this cannot be changed dynamically: # ceph tell osd.* injectargs '--osd_snap_trim_sleep 0.025' osd.0: osd_snap_trim_sleep = '0.025000' (not observed, change may require restart) osd.1: osd_snap_trim_sleep = '0.025000' (not observed, change may r

Re: [ceph-users] radosgw multizone not syncing large bucket completly to other zone

2018-06-29 Thread Yehuda Sadeh-Weinraub
On Fri, Jun 29, 2018 at 8:48 AM, Enrico Kern wrote: > also when i try to sync the bucket manual i get this error: > > ERROR: sync.run() returned ret=-16 > 2018-06-29 15:47:50.137268 7f54b7e4ecc0 0 data sync: ERROR: failed to > read sync status for bucketname:6a9448d2-bdba-4bec- > aad6-aba72cd8ea

Re: [ceph-users] radosgw multizone not syncing large bucket completly to other zone

2018-06-29 Thread Enrico Kern
also when i try to sync the bucket manual i get this error: ERROR: sync.run() returned ret=-16 2018-06-29 15:47:50.137268 7f54b7e4ecc0 0 data sync: ERROR: failed to read sync status for bucketname:6a9448d2-bdba-4bec-aad6-aba72cd8eac6.27150814.1 it works flawless with all other buckets. On Fri,

Re: [ceph-users] Problems setting up iSCSI

2018-06-29 Thread Jason Dillaman
The 7d0023e73855a42ac25038403387ab41ca10753a version should be fine, it's what I use in my test environment. I really cannot explain why tcmu-runner is missing the rbd handler. I can only assume you have restarted the daemon after installing it. Any other log messages from tcmu-runner? Do you have

Re: [ceph-users] Problems setting up iSCSI

2018-06-29 Thread Bernhard Dick
Am 29.06.2018 um 17:26 schrieb Jason Dillaman: OK, so your tcmu-runner doesn't have support for rbd images for some reason. Where did you get your copy of tcmu-runner? hm originally I took it from the gluster40 repository from the Centos Storage SIG. You should build it from upstream or pull a

Re: [ceph-users] radosgw multizone not syncing large bucket completly to other zone

2018-06-29 Thread Enrico Kern
Hello, thanks for the reply. We have around 200k objects in the bucket. It is not automatic resharded (is that even supported in multisite?) What i see when i run a complete data sync with the debug logs after a while i see alot of informations that it is unable to perform some log and also some

Re: [ceph-users] RBD gets resized when used as iSCSI target

2018-06-29 Thread Jason Dillaman
gwcli doesn't allow you to shrink images (it silently ignores you). Use 'rbd resize' and restart the GWs to pick up the new size. On Fri, Jun 29, 2018 at 11:36 AM Wladimir Mutel wrote: > Wladimir Mutel wrote: > > > it back to gwcli/disks), I discover that its size is rounded up to 3 > > TiB, i.

[ceph-users] crusmap show wrong osd for PGs (EC-Pool)

2018-06-29 Thread ulembke
Hi all, I had an issue on an hammer-cluster (0.94.9 - ugraded from 0.94.7 today). There are three PGs incomplete: root@ceph-06:~# ceph health detail HEALTH_WARN 3 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck unclean pg 24.cc is stuck inactive for 595902.285007, current state incomplete, l

Re: [ceph-users] Ceph snapshots

2018-06-29 Thread Paul Emmerich
It's usually the snapshot deletion that triggers slowness. Are you also deleting/rotating old snapshots when creating new ones? In this case: try to increase osd_snap_trim_sleep a little bit. Even to 0.025 can help a lot with a lot of concurrent snapshot deletions. (That's what we set as default f

Re: [ceph-users] RBD gets resized when used as iSCSI target

2018-06-29 Thread Wladimir Mutel
Wladimir Mutel wrote: it back to gwcli/disks), I discover that its size is rounded up to 3 TiB, i.e. 3072 GiB or 786432*4M Ceph objects. As we know, GPT is stored 'targetcli ls /' (there, it is still 3.0T). Also, when I restart rbd-target-gw.service, it gets resized back up to 3.0T as shown

Re: [ceph-users] Ceph snapshots

2018-06-29 Thread Marc Schöchlin
Hi Gregory, thanks for the link - very interesting talk. You mentioned the following settings in your talk, but i was not able to find some documentation in the osd config reference: (http://docs.ceph.com/docs/luminous/rados/configuration/osd-config-ref/) My clusters settings look like this (lumi

Re: [ceph-users] Problems setting up iSCSI

2018-06-29 Thread Jason Dillaman
OK, so your tcmu-runner doesn't have support for rbd images for some reason. Where did you get your copy of tcmu-runner? You should build it from upstream or pull a copy from here [1]. [1] https://shaman.ceph.com/repos/tcmu-runner/ On Fri, Jun 29, 2018 at 11:12 AM Bernhard Dick wrote: > Hi, > >

Re: [ceph-users] Problems setting up iSCSI

2018-06-29 Thread Bernhard Dick
Hi, Am 29.06.2018 um 17:04 schrieb Jason Dillaman: Is 'tcmu-runner' running on that node? yes it is running Any errors in dmesg or here are no errors /var/log/tcmu-runner.log? the following error is shown: [ERROR] add_device:436: could not find handler for uio0 Regards Bernhard On

[ceph-users] RBD gets resized when used as iSCSI target

2018-06-29 Thread Wladimir Mutel
Dear all, I create an RBD to be used as iSCSI target, with size close to the most popular 3TB HDD size, 5860533168 512-byte sectors, or 715398*4M Ceph objects (2.7 TB or 2794.4 GB). Then I add it into gwcli/disks (having to specify the same size, 2861592M), and then, after some manipu

Re: [ceph-users] Problems setting up iSCSI

2018-06-29 Thread Jason Dillaman
Is 'tcmu-runner' running on that node? Any errors in dmesg or /var/log/tcmu-runner.log? On Fri, Jun 29, 2018 at 10:43 AM Bernhard Dick wrote: > Hi, > > Am 28.06.2018 um 18:09 schrieb Jason Dillaman: > > Do you have the ansible backtrace from the "ceph-iscsi-gw : igw_lun | > > configureluns (crea

Re: [ceph-users] Problems setting up iSCSI

2018-06-29 Thread Bernhard Dick
Hi, Am 28.06.2018 um 18:09 schrieb Jason Dillaman: Do you have the ansible backtrace from the "ceph-iscsi-gw : igw_lun | configureluns (create/map rbds and add to lio)]" step? I assume you mean the following (from running with verbosity 3, after running the purge-iscsi-gateways playbook before

Re: [ceph-users] cephfs compression?

2018-06-29 Thread Youzhong Yang
Thanks Richard. Yes, it seems working by perf dump: osd.6 "bluestore_compressed": 62622444, "bluestore_compressed_allocated": 186777600, "bluestore_compressed_original":373555200, It's very interesting that bluestore_compressed_allocated is approxima

Re: [ceph-users] VMWARE and RBD

2018-06-29 Thread Steven Vacaroaia
Hi Horace Thanks Would you be willing to share instructions for using SCST instead of ceph-iscsi ? Thanks Steven On Thu, 28 Jun 2018 at 23:59, Horace wrote: > Seems there's no plan for that and the vmware kernel documentation will > only share to partners. You would better off to use iscsi. B

Re: [ceph-users] Luminous Bluestore performance, bcache

2018-06-29 Thread Andrei Mikhailovsky
Thanks Richard, That sounds impressive, especially the around 30% hit ratio. That would be ideal for me, but we were only getting single digit results during my trials. I think around 5% was the figure if I remember correctly. However, most of our vms were created a bit chaotically (not using p

[ceph-users] How to secure Prometheus endpoints (mgr plugin and node_exporter)

2018-06-29 Thread Martin Palma
Since Prometheus uses a pull model over HTTP for collecting metrics. What are the best practices to secure these HTTP endpoints? - With a reverse proxy with authentication? - Export the node_exporter only on the cluster network? (not usable for the mgr plugin and for nodes like mons, mdss,...) - N

Re: [ceph-users] ceph-volume: failed to activate some bluestore osds

2018-06-29 Thread Dan van der Ster
On Thu, Jun 7, 2018 at 8:40 PM Dan van der Ster wrote: > > On Thu, Jun 7, 2018 at 6:33 PM Sage Weil wrote: > > > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > > > > Wait, we found something!!! > > > > > > > > > > In the 1st 4k on the block we found the block.db pointing at the wrong > > > > >

Re: [ceph-users] pre-sharding s3 buckets

2018-06-29 Thread Sean Purdy
On Wed, 27 Jun 2018, Matthew Vernon said: > Hi, > > On 27/06/18 11:18, Thomas Bennett wrote: > > > We have a particular use case that we know that we're going to be > > writing lots of objects (up to 3 million) into a bucket. To take > > advantage of sharding, I'm wanting to shard buckets, withou

Re: [ceph-users] Ceph FS (kernel driver) - Unable to set extended file attributed

2018-06-29 Thread Yu Haiyang
Problem solved. Seems like I can’t set stripe_unit to a value larger than object_size. I should increase object_size attribute before increasing stripe_unit attribute. Hope this would help someone. :) On Jun 29, 2018, at 12:20 PM, Yu Haiyang mailto:haiya...@moqi.ai>> wrote: Hi, I want to play