Re: [ceph-users] cephfs mds millions of caps

2017-12-14 Thread Wei Jin
> > So, questions: does that really matter? What are possible impacts? What > could have caused this 2 hosts to hold so many capabilities? > 1 of the hosts are for tests purposes, traffic is close to zero. The other > host wasn't using cephfs at all. All services stopped. > The reason might be

Re: [ceph-users] cephfs mds millions of caps

2017-12-14 Thread Patrick Donnelly
On Thu, Dec 14, 2017 at 4:44 PM, Webert de Souza Lima wrote: > Hi Patrick, > > On Thu, Dec 14, 2017 at 7:52 PM, Patrick Donnelly > wrote: >> >> >> It's likely you're a victim of a kernel backport that removed a dentry >> invalidation mechanism for FUSE

Re: [ceph-users] 1 osd Segmentation fault in test cluster

2017-12-14 Thread Konstantin Shalygin
>/Is this useful for someone? / Yes! Seehttp://tracker.ceph.com/issues/21259 The latest luminous branch (which you can get from https://shaman.ceph.com/builds/ceph/luminous/) has some additional debugging on OSD shutdown that should help me figure out what is causing this. If this is

Re: [ceph-users] Odd object blocking IO on PG

2017-12-14 Thread Brad Hubbard
On Wed, Dec 13, 2017 at 11:39 PM, Nick Fisk wrote: > Boom!! Fixed it. Not sure if the behavior I stumbled from is correct, but > this has a potential to break a few things for people moving from Jewel to > Luminous if they potentially had a few too many PG’s. > > > > Firstly, how

[ceph-users] cephfs miss data for 15s when master mds rebooting

2017-12-14 Thread 13605702...@163.com
hi i used 3 nodes to deploy mds (each node also has mon on it) my config: [mds.ceph-node-10-101-4-17] mds_standby_replay = true mds_standby_for_rank = 0 [mds.ceph-node-10-101-4-21] mds_standby_replay = true mds_standby_for_rank = 0 [mds.ceph-node-10-101-4-22] mds_standby_replay = true

Re: [ceph-users] cephfs mds millions of caps

2017-12-14 Thread Yan, Zheng
On Fri, Dec 15, 2017 at 1:18 AM, Webert de Souza Lima wrote: > Hi, > > I've been look at ceph mds perf counters and I saw the one of my clusters > was hugely different from other in number of caps: > > rlat inos caps | hsr hcs hcr | writ read actv | recd recy stry

Re: [ceph-users] cephfs mds millions of caps

2017-12-14 Thread Webert de Souza Lima
Hi Patrick, On Thu, Dec 14, 2017 at 7:52 PM, Patrick Donnelly wrote: > > It's likely you're a victim of a kernel backport that removed a dentry > invalidation mechanism for FUSE mounts. The result is that ceph-fuse > can't trim dentries. > even though I'm not using FUSE?

[ceph-users] S3 objects deleted but storage doesn't free space

2017-12-14 Thread Jan-Willem Michels
Hi there all, Perhaps someone can help. We tried to free some storage so we deleted a lot S3 objects. The bucket has also valuable data so we can't delete the whole bucket. Everything went fine, but used storage space doesn't get less. We are expecting several TB of data to be freed. We

Re: [ceph-users] Understanding reshard issues

2017-12-14 Thread Graham Allan
On 12/14/2017 04:00 AM, Martin Emrich wrote: Hi! Am 13.12.17 um 20:50 schrieb Graham Allan: After our Jewel to Luminous 12.2.2 upgrade, I ran into some of the same issues reported earlier on the list under "rgw resharding operation seemingly won't end". Yes, that were/are my threads, I

Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-14 Thread Cary
James, Usually once the misplaced data has balanced out the cluster should reach a healthy state. If you run a "ceph health detail" Ceph will show you some more detail about what is happening. Is Ceph still recovering, or has it stalled? has the "objects misplaced (62.511%" changed to a lower

Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-14 Thread James Okken
Thanks Cary! Your directions worked on my first sever. (once I found the missing carriage return in your list of commands, the email musta messed it up. For anyone else: chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd 'allow *' mon 'allow profile osd' -i

Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-14 Thread Ronny Aasen
On 14.12.2017 18:34, James Okken wrote: Hi all, Please let me know if I am missing steps or using the wrong steps I'm hoping to expand my small CEPH cluster by adding 4TB hard drives to each of the 3 servers in the cluster. I also need to change my replication factor from 1 to 3. This is

Re: [ceph-users] cephfs mds millions of caps

2017-12-14 Thread Patrick Donnelly
On Thu, Dec 14, 2017 at 9:18 AM, Webert de Souza Lima wrote: > So, questions: does that really matter? What are possible impacts? What > could have caused this 2 hosts to hold so many capabilities? > 1 of the hosts are for tests purposes, traffic is close to zero. The other

Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-14 Thread Cary
Jim, I am not an expert, but I believe I can assist. Normally you will only have 1 OSD per drive. I have heard discussions about using multiple OSDs per disk, when using SSDs though. Once your drives have been installed you will have to format them, unless you are using Bluestore. My steps

[ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-14 Thread James Okken
Hi all, Please let me know if I am missing steps or using the wrong steps I'm hoping to expand my small CEPH cluster by adding 4TB hard drives to each of the 3 servers in the cluster. I also need to change my replication factor from 1 to 3. This is part of an Openstack environment deployed by

[ceph-users] cephfs mds millions of caps

2017-12-14 Thread Webert de Souza Lima
Hi, I've been look at ceph mds perf counters and I saw the one of my clusters was hugely different from other in number of caps: rlat inos caps | hsr hcs hcr | writ read actv | recd recy stry purg | segs evts subm 0 3.0M 5.1M | 0 0 595 | 30440 | 0 0 13k 0

Re: [ceph-users] High Load and High Apply Latency

2017-12-14 Thread David Turner
We show high disk latencies on a node when the controller's cache battery dies. This is assuming that you're using a controller with cache enabled for your disks. In any case, I would look at the hardware on the server. On Thu, Dec 14, 2017 at 10:15 AM John Petrini

Re: [ceph-users] Snap trim queue length issues

2017-12-14 Thread David Turner
I've tracked this in a much more manual way. I would grab a random subset of PGs in the pool and query the PGs counting how much were in there queues. After that, you average it out by how many PGs you queried and how many objects there were and multiply it back out by how many PGs are in the

Re: [ceph-users] Ceph luminous nfs-ganesha-ceph

2017-12-14 Thread Daniel Gryniewicz
On 12/14/2017 09:46 AM, nigel davies wrote: Is this nfs-ganesha exporting Cephfs? Yes Are you using NFS for a Vmware Datastore? Yes What are you using for the NFS failover? (this is where i could be going wrong) When creating the NFS Datastore i added the two NFS servers ip address in

Re: [ceph-users] High Load and High Apply Latency

2017-12-14 Thread John Petrini
Anyone have any ideas on this? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph luminous nfs-ganesha-ceph

2017-12-14 Thread nigel davies
Is this nfs-ganesha exporting Cephfs? Yes Are you using NFS for a Vmware Datastore? Yes What are you using for the NFS failover? (this is where i could be going wrong) When creating the NFS Datastore i added the two NFS servers ip address in On Thu, Dec 14, 2017 at 2:29 PM, David C

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2017-12-14 Thread Yan, Zheng
On Thu, Dec 14, 2017 at 8:52 PM, Florent B wrote: > On 14/12/2017 03:38, Yan, Zheng wrote: >> On Thu, Dec 14, 2017 at 12:49 AM, Florent B wrote: >>> >>> Systems are on Debian Jessie : kernel 3.16.0-4-amd64 & libfuse 2.9.3-15. >>> >>> I don't know pattern

[ceph-users] Snap trim queue length issues

2017-12-14 Thread Piotr Dałek
Hi, We recently ran into low disk space issues on our clusters, and it wasn't because of actual data. On those affected clusters we're hosting VMs and volumes, so naturally there are snapshots involved. For some time, we observed increased disk space usage that we couldn't explain, as there

Re: [ceph-users] Ceph luminous nfs-ganesha-ceph

2017-12-14 Thread David C
Is this nfs-ganesha exporting Cephfs? Are you using NFS for a Vmware Datastore? What are you using for the NFS failover? We need more info but this does sound like a vmware/nfs question rather than specifically ceph/nfs-ganesha On Thu, Dec 14, 2017 at 1:47 PM, nigel davies

[ceph-users] Max number of objects per bucket

2017-12-14 Thread Prasad Bhalerao
Hello , I have following doubts, Could you please help me out? I am using S3 Apis, What is the Max number of objects a bucket can have when using indexless bucket? What the max number of bucket can a user create? Can we have both indexless and indexed buckets at the same time. Do we have any

Re: [ceph-users] measure performance / latency in blustore

2017-12-14 Thread Sage Weil
On Thu, 14 Dec 2017, Stefan Priebe - Profihost AG wrote: > > Am 14.12.2017 um 13:22 schrieb Sage Weil: > > On Thu, 14 Dec 2017, Stefan Priebe - Profihost AG wrote: > >> Hello, > >> > >> Am 21.11.2017 um 11:06 schrieb Stefan Priebe - Profihost AG: > >>> Hello, > >>> > >>> to measure performance /

Re: [ceph-users] measure performance / latency in blustore

2017-12-14 Thread Stefan Priebe - Profihost AG
Am 14.12.2017 um 13:22 schrieb Sage Weil: > On Thu, 14 Dec 2017, Stefan Priebe - Profihost AG wrote: >> Hello, >> >> Am 21.11.2017 um 11:06 schrieb Stefan Priebe - Profihost AG: >>> Hello, >>> >>> to measure performance / latency for filestore we used: >>> filestore:apply_latency >>>

[ceph-users] Ceph luminous nfs-ganesha-ceph

2017-12-14 Thread nigel davies
Hay all i am in the process or trying to set up and VMware storage environment I been reading and found that Iscsi (on jewel release) can cause issues and the datastore can drop out. I been looking at using nfs-ganesha with my ceph platform, it all looked good until i looked at failover to our

Re: [ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

2017-12-14 Thread Matthew Vernon
On 29/11/17 17:24, Matthew Vernon wrote: > We have a 3,060 OSD ceph cluster (running Jewel > 10.2.7-0ubuntu0.16.04.1), and one OSD on one host keeps misbehaving - by > which I mean it keeps spinning ~100% CPU (cf ~5% for other OSDs on that > host), and having ops blocking on it for some time. It

Re: [ceph-users] measure performance / latency in blustore

2017-12-14 Thread Sage Weil
On Thu, 14 Dec 2017, Stefan Priebe - Profihost AG wrote: > Hello, > > Am 21.11.2017 um 11:06 schrieb Stefan Priebe - Profihost AG: > > Hello, > > > > to measure performance / latency for filestore we used: > > filestore:apply_latency > > filestore:commitcycle_latency > >

Re: [ceph-users] cephfs automatic data pool cleanup

2017-12-14 Thread Yan, Zheng
On Thu, Dec 14, 2017 at 12:52 AM, Jens-U. Mozdzen wrote: > Hi Yan, > > Zitat von "Yan, Zheng" : >> >> [...] >> >> It's likely some clients had caps on unlinked inodes, which prevent >> MDS from purging objects. When a file gets deleted, mds notifies all >>

Re: [ceph-users] how to troubleshoot "heartbeat_check: no reply" in OSD log

2017-12-14 Thread Tristan Le Toullec
Hi Jared,     did you have find a solution to your problem ? It appear that I have the same osd problem, and tcpdump captures won't show any solution. All OSD nodes produced logs like 2017-12-14 11:25:11.756552 7f0cc5905700 -1 osd.49 29546 heartbeat_check: no reply from 172.16.5.155:6817

Re: [ceph-users] Understanding reshard issues

2017-12-14 Thread Martin Emrich
Hi! Am 13.12.17 um 20:50 schrieb Graham Allan: After our Jewel to Luminous 12.2.2 upgrade, I ran into some of the same issues reported earlier on the list under "rgw resharding operation seemingly won't end". Yes, that were/are my threads, I also have this issue. I was able to correct the

Re: [ceph-users] Cache tier unexpected behavior: promote on lock

2017-12-14 Thread Захаров Алексей
Hi, Gregory,Thank you for your answer! Is there a way to not promote on "locking", when not using EC pools?Is it possible to make this configurable? We don't use EC pool. So, for us this meachanism is overhead. It only adds more load on both pools and network. 14.12.2017, 01:16, "Gregory Farnum"

[ceph-users] Ceph scrub logs: _scan_snaps no head for $object?

2017-12-14 Thread Stefan Kooman
Hi, We see the following in the logs after we start a scrub for some osds: ceph-osd.2.log:2017-12-14 06:50:47.180344 7f0f47db2700 0 log_channel(cluster) log [DBG] : 1.2d8 scrub starts ceph-osd.2.log:2017-12-14 06:50:47.180915 7f0f47db2700 -1 osd.2 pg_epoch: 11897 pg[1.2d8( v 11890'165209

Re: [ceph-users] Blocked requests

2017-12-14 Thread Fulvio Galeazzi
Hallo Matthew, thanks for your feedback! Please clarify one point: you mean that you recreated the pool as an erasure-coded one, or that you recreated it as a regular replicated one? I mean, you now have an erasure-coded pool in production as a gnocchi backend? In any case, from the