Re: [ceph-users] Bad performances in recovery

2015-08-20 Thread Christian Balzer
Hello, from all the pertinent points by Somnath, the one about pre-conditioning would be pretty high on my list, especially if this slowness persists and nothing else (scrub) is going on. This might be fixed by doing a fstrim. Additionally the levelDB's per OSD are of course sync'ing heavily

[ceph-users] ceph osd debug question / proposal

2015-08-20 Thread Goncalo Borges
Dear Ceph gurus... Just wanted to report something that may be interesting to enhance... or maybe I am not doing the right debugging procedure. 1. I am working with 0.92.2 and I am testing the cluster in several disaster catastrophe scenarios. 2. I have 32 OSDs distributed in 4 servers,

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Is there a bug for this in the tracker? -Sam On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Issue, that in forward mode, fstrim doesn't work proper, and when we take snapshot - data not proper update in cache layer, and client (ceph) see damaged snap.. As

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Not yet. I will create. But according to mail lists and Inktank docs - it's expected behaviour when cache enable 2015-08-20 19:56 GMT+03:00 Samuel Just sj...@redhat.com: Is there a bug for this in the tracker? -Sam On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Which docs? -Sam On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Not yet. I will create. But according to mail lists and Inktank docs - it's expected behaviour when cache enable 2015-08-20 19:56 GMT+03:00 Samuel Just sj...@redhat.com: Is there a bug

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Inktank: https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf Mail-list: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html 2015-08-20 20:06 GMT+03:00 Samuel Just sj...@redhat.com: Which docs? -Sam On Thu, Aug 20, 2015 at 9:57 AM,

[ceph-users] PCIE-SSD OSD bottom performance issue

2015-08-20 Thread scott_tan...@yahoo.com
dear ALL: I used PCIE-SSD to OSD disk . But I found it very bottom performance. I have two hosts, each host 1 PCIE-SSD,so i create two osd by PCIE-SSD. ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.35999 root default -2 0.17999 host tds_node03 0

Re: [ceph-users] requests are blocked - problem

2015-08-20 Thread Jacek Jarosiewicz
On 08/19/2015 03:41 PM, Nick Fisk wrote: Although you may get some benefit from tweaking parameters, I suspect you are nearer the performance ceiling for the current implementation of the tiering code. Could you post all the variables you set for the tiering including target_max_bytes and the

Re: [ceph-users] ceph osd debug question / proposal

2015-08-20 Thread Jan Schermer
Just to clarify - you unmounted the filesystem with umount -l? That almost never a good idea, and it puts the OSD in a very unusual situation where IO will actually work on the open files, but it can't open any new ones. I think this would be enough to confuse just about any piece of software.

Re: [ceph-users] requests are blocked - problem

2015-08-20 Thread Jacek Jarosiewicz
On 08/20/2015 03:07 AM, Christian Balzer wrote: For a realistic comparison with your current setup, a total rebuild would be in order. Provided your cluster is testing only at this point. Given your current HW, that means the same 2-3 HDDs per storage node and 1 SSD as journal. What exact

Re: [ceph-users] Ceph OSD nodes in XenServer VMs

2015-08-20 Thread Christian Balzer
Hello, On Thu, 20 Aug 2015 11:55:55 +1000 Jiri Kanicky wrote: Hi all, We are experimenting with an idea to run OSD nodes in XenServer VMs. We believe this could provide better flexibility, backups for the nodes etc. For example: Xenserver with 4 HDDs dedicated for Ceph. We would

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Ilya Dryomov
On Fri, Aug 21, 2015 at 2:02 AM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably

[ceph-users] Rados: Undefined symbol error

2015-08-20 Thread Aakanksha Pudipeddi-SSI
Hello, I cloned the master branch of Ceph and after setting up the cluster, when I tried to use the rados commands, I got this error: rados: symbol lookup error: rados: undefined symbol: _ZN5MutexC1ERKSsbbbP11CephContext I saw a similar post here: http://tracker.ceph.com/issues/12563 but I am

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Image? One? We start deleting images only to fix thsi (export/import)m before - 1-4 times per day (when VM destroyed)... 2015-08-21 1:44 GMT+03:00 Samuel Just sj...@redhat.com: Interesting. How often do you delete an image? I'm wondering if whatever this is happened when you deleted these

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com: Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
We switch to forward mode as step to switch cache layer off. Right now we have samsung 850 pro in cache layer (10 ssd, 2 per nodes) and they show 2MB for 4K blocks... 250 IOPS... intead of 18-20K for intel S3500 240G which we choose as replacement.. So with such good disks - cache layer - very

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Created a ticket to improve our testing here -- this appears to be a hole. http://tracker.ceph.com/issues/12742 -Sam On Thu, Aug 20, 2015 at 4:09 PM, Samuel Just sj...@redhat.com wrote: So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Right. But issues started... 2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com: But that was still in writeback mode, right? -Sam On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: WE haven't set values for max_bytes / max_objects.. and all data

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Yeah, I'm trying to confirm that the issues did happen in writeback mode. -Sam On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Right. But issues started... 2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com: But that was still in writeback mode,

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
As i we use journal collocation for journal now (because we want to utilize cache layer ((( ) i use ceph-disk to create new OSD (changed journal size on ceph.conf). I don;t prefer manual work)) So create very simple script to update journal size 2015-08-21 2:25 GMT+03:00 Voloshanenko Igor

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Will do, Sam! thank in advance for you help! 2015-08-21 2:28 GMT+03:00 Samuel Just sj...@redhat.com: Ok, create a ticket with a timeline and all of this information, I'll try to look into it more tomorrow. -Sam On Thu, Aug 20, 2015 at 4:25 PM, Voloshanenko Igor igor.voloshane...@gmail.com

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Hi Samuel, we try to fix it in trick way. we check all rbd_data chunks from logs (OSD) which are affected, then query rbd info to compare which rbd consist bad rbd_data, after that we mount this rbd as rbd0, create empty rbd, and DD all info from bad volume to new one. But after that - scrub

[ceph-users] Testing CephFS

2015-08-20 Thread Simon Hallam
Hey all, We are currently testing CephFS on a small (3 node) cluster. The setup is currently: Each server has 12 OSDs, 1 Monitor and 1 MDS running on it: The servers are running: 0.94.2-0.el7 The clients are running: Ceph: 0.80.10-1.fc21, Kernel: 4.0.6-200.fc21.x86_64 ceph -s cluster

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer. Then we received notification from monitoring that we collect about 750GB in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk size...

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
But that was still in writeback mode, right? -Sam On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer. Then we received

Re: [ceph-users] PCIE-SSD OSD bottom performance issue

2015-08-20 Thread scott_tan...@yahoo.com
my ceph.conf [global] auth_service_required = cephx osd_pool_default_size = 2 filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 172.168.2.171 mon_initial_members = tds_node01 fsid =

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
I already kill cache layer, but will try to reproduce on lab 2015-08-21 1:58 GMT+03:00 Samuel Just sj...@redhat.com: Hmm, that might actually be client side. Can you attempt to reproduce with rbd-fuse (different client side implementation from the kernel)? -Sam On Thu, Aug 20, 2015 at 3:56

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Good joke ) 2015-08-21 2:06

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Specifically, the snap behavior (we already know that the pgs went inconsistent while the pool was in writeback mode, right?). -Sam On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just sj...@redhat.com wrote: Yeah, I'm trying to confirm that the issues did happen in writeback mode. -Sam On Thu, Aug

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
I mean in forward mode - it;s permanent problem - snapshots not working. And for writeback mode after we change max_bytes/object values, it;s around 30 by 70... 70% of time it;s works... 30% - not. Looks like for old images - snapshots works fine (images which already exists before we change

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Our initial values for journal sizes was enough, but flush time was 5 secs, so we increase journal side to fit flush timeframe min|max for 29/30 seconds. I mean filestore max sync interval = 30 filestore min sync interval = 29 when said flush time 2015-08-21 2:16 GMT+03:00 Samuel Just

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
It would help greatly if, on a disposable cluster, you could reproduce the snapshot problem with debug osd = 20 debug filestore = 20 debug ms = 1 on all of the osds and attach the logs to the bug report. That should make it easier to work out what is going on. -Sam On Thu, Aug 20, 2015 at 4:40

Re: [ceph-users] PCIE-SSD OSD bottom performance issue

2015-08-20 Thread Christian Balzer
Hello, On Thu, 20 Aug 2015 15:47:46 +0800 scott_tan...@yahoo.com wrote: The reason that you're not getting any replies is because we're not psychic/telepathic/clairvoyant. Meaning that you're not giving us enough information by far. dear ALL: I used PCIE-SSD to OSD disk . But I found

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Interesting. How often do you delete an image? I'm wondering if whatever this is happened when you deleted these two images. -Sam On Thu, Aug 20, 2015 at 3:42 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Sam, i try to understand which rbd contain this chunks.. but no luck. No rbd

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Yes, will do. What we see. When cache tier in forward mod, if i did rbd snap create - it's use rbd_header not from cold tier, but from hot-tier, butm this 2 headers not synced And can;t be evicted from hot-storage, as it;s locked by KVM (Qemu). If i kill lock, evict header - all start to work..

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool.

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Also, what do you mean by change journal side? -Sam On Thu, Aug 20, 2015 at 4:15 PM, Samuel Just sj...@redhat.com wrote: Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM,

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: No, when we start draining cache - bad pgs was in place... We have big

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Ok, create a ticket with a timeline and all of this information, I'll try to look into it more tomorrow. -Sam On Thu, Aug 20, 2015 at 4:25 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Exactly пятница, 21 августа 2015 г. пользователь Samuel Just написал: And you adjusted the

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Attachment blocked, so post as text... root@zzz:~# cat update_osd.sh #!/bin/bash ID=$1 echo Process OSD# ${ID} DEV=`mount | grep ceph-${ID} | cut -d -f 1` echo OSD# ${ID} hosted on ${DEV::-1} TYPE_RAW=`smartctl -a ${DEV} | grep Rota | cut -d -f 6` if [ ${TYPE_RAW} == Solid ] then

[ceph-users] PCIE-SSD OSD bottom performance issue

2015-08-20 Thread scott_tan...@yahoo.com
dear Loic: I'm sorry to bother you.But I have a question about ceph. I used PCIE-SSD to OSD disk . But I found it very bottom performance. I have two hosts, each host 1 PCIE-SSD,so i create two osd by PCIE-SSD. ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.35999

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Ok, so images are regularly removed. In that case, these two objects probably are left over from previously removed images. Once ceph-objectstore-tool can dump the SnapSet from those two objects, you will probably find that those two snapdir objects each have only one bogus clone, in which case

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Right ( but also was rebalancing cycle 2 day before pgs corrupted) 2015-08-21 2:23 GMT+03:00 Samuel Just sj...@redhat.com: Specifically, the snap behavior (we already know that the pgs went inconsistent while the pool was in writeback mode, right?). -Sam On Thu, Aug 20, 2015 at 4:22 PM,

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Exactly пятница, 21 августа 2015 г. пользователь Samuel Just написал: And you adjusted the journals by removing the osd, recreating it with a larger journal, and reinserting it? -Sam On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor igor.voloshane...@gmail.com javascript:; wrote: Right

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
And you adjusted the journals by removing the osd, recreating it with a larger journal, and reinserting it? -Sam On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Right ( but also was rebalancing cycle 2 day before pgs corrupted) 2015-08-21 2:23 GMT+03:00

Re: [ceph-users] Ceph OSD nodes in XenServer VMs

2015-08-20 Thread Steven McDonald
Hi Jiri, On Thu, 20 Aug 2015 11:55:55 +1000 Jiri Kanicky j...@ganomi.com wrote: We are experimenting with an idea to run OSD nodes in XenServer VMs. We believe this could provide better flexibility, backups for the nodes etc. Could you expand on this? As written, it seems like a bad idea to

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just
Hmm, that might actually be client side. Can you attempt to reproduce with rbd-fuse (different client side implementation from the kernel)? -Sam On Thu, Aug 20, 2015 at 3:56 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist.

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
No, when we start draining cache - bad pgs was in place... We have big rebalance (disk by disk - to change journal side on both hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 pgs inconsistent... In writeback - yes, looks like snapshot works good. but it's stop to

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Andrija Panic
Guys, I'm Igor's colleague, working a bit on CEPH, together with Igor. This is production cluster, and we are becoming more desperate as the time goes by. Im not sure if this is appropriate place to seek commercial support, but anyhow, I do it... If anyone feels like and have some experience

Re: [ceph-users] Ceph File System ACL Support

2015-08-20 Thread Yan, Zheng
The code is at https://github.com/ceph/samba.git wip-acl. So far the code does not handle default ACL (files created by samba do not inherit parent directory's default ACL) Regards Yan, Zheng On Tue, Aug 18, 2015 at 6:57 PM, Gregory Farnum gfar...@redhat.com wrote: On Mon, Aug 17, 2015 at 4:12

Re: [ceph-users] Bad performances in recovery

2015-08-20 Thread J-P Methot
Hi, Just to update the mailing list, we ended up going back to default ceph.conf without any additional settings than what is mandatory. We are now reaching speeds we never reached before, both in recovery and in regular usage. There was definitely something we set in the ceph.conf bogging

Re: [ceph-users] requests are blocked - problem

2015-08-20 Thread Nick Fisk
-Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jacek Jarosiewicz Sent: 20 August 2015 07:31 To: Nick Fisk n...@fisk.me.uk; ceph-us...@ceph.com Subject: Re: [ceph-users] requests are blocked - problem On 08/19/2015 03:41 PM, Nick

Re: [ceph-users] Bad performances in recovery

2015-08-20 Thread Alex Gorbachev
Just to update the mailing list, we ended up going back to default ceph.conf without any additional settings than what is mandatory. We are now reaching speeds we never reached before, both in recovery and in regular usage. There was definitely something we set in the ceph.conf bogging

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Ah, this is kind of silly. I think you don't have 37 errors, but 2 errors. pg 2.490 object 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing snap 141. If you look at the objects after that in the log: 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
The feature bug for the tool is http://tracker.ceph.com/issues/12740. -Sam On Thu, Aug 20, 2015 at 2:52 PM, Samuel Just sj...@redhat.com wrote: Ah, this is kind of silly. I think you don't have 37 errors, but 2 errors. pg 2.490 object

[ceph-users] Email lgx...@nxtzas.com trying to subscribe to tracker.ceph.com

2015-08-20 Thread Dan Mick
Someone using the email address lgx...@nxtzas.com is trying to subscribe to the Ceph Redmine tracker, but neither redmine nor I can use that email address; it bounces with lgx...@nxtzas.com: Host or domain name not found. Name service error for name=nxtzas.com type=: Host not found

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
thank you Sam! I also noticed this linked errors during scrub... Now all lools like reasonable! So we will wait for bug to be closed. do you need any help on it? I mean i can help with coding/testing/etc... 2015-08-21 0:52 GMT+03:00 Samuel Just sj...@redhat.com: Ah, this is kind of silly.

Re: [ceph-users] requests are blocked - problem

2015-08-20 Thread Christian Balzer
Hello, On Thu, 20 Aug 2015 08:25:16 +0200 Jacek Jarosiewicz wrote: On 08/20/2015 03:07 AM, Christian Balzer wrote: For a realistic comparison with your current setup, a total rebuild would be in order. Provided your cluster is testing only at this point. Given your current HW, that

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Samuel, we turned off cache layer few hours ago... I will post ceph.log in few minutes For snap - we found issue, was connected with cache tier.. 2015-08-20 19:23 GMT+03:00 Samuel Just sj...@redhat.com: Ok, you appear to be using a replicated cache tier in front of a replicated base tier.

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
What was the issue? -Sam On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Samuel, we turned off cache layer few hours ago... I will post ceph.log in few minutes For snap - we found issue, was connected with cache tier.. 2015-08-20 19:23 GMT+03:00 Samuel

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Issue, that in forward mode, fstrim doesn't work proper, and when we take snapshot - data not proper update in cache layer, and client (ceph) see damaged snap.. As headers requested from cache layer. 2015-08-20 19:53 GMT+03:00 Samuel Just sj...@redhat.com: What was the issue? -Sam On Thu,

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Ok, you appear to be using a replicated cache tier in front of a replicated base tier. Please scrub both inconsistent pgs and post the ceph.log from before when you started the scrub until after. Also, what command are you using to take snapshots? -Sam On Thu, Aug 20, 2015 at 3:59 AM,

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Also, was there at any point a power failure/power cycle event, perhaps on osd 56? -Sam On Thu, Aug 20, 2015 at 9:23 AM, Samuel Just sj...@redhat.com wrote: Ok, you appear to be using a replicated cache tier in front of a replicated base tier. Please scrub both inconsistent pgs and post the

Re: [ceph-users] Bad performances in recovery

2015-08-20 Thread Jan Schermer
Are you sure it was because of configuration changes? Maybe it was restarting the OSDs that fixed it? We often hit an issue with backfill_toofull where the recovery/backfill processes get stuck until we restart the daemons (sometimes setting recovery_max_active helps as well). It still shows

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Samuel Just
Actually, now that I think about it, you probably didn't remove the images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but other images (that's why the scrub errors went down briefly, those objects -- which were

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Andrija Panic
This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken...

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Sam, i try to understand which rbd contain this chunks.. but no luck. No rbd images block names started with this... Actually, now that I think about it, you probably didn't remove the images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and