Re: [ceph-users] 答复: how to see file object-mappings for cephfuse client

2015-12-07 Thread Yan, Zheng
On Mon, Dec 7, 2015 at 1:52 PM, Wuxiangwei wrote: > Thanks Yan, what if we wanna see some more specific or detailed information? > E.g. with cephfs we may run 'cephfs /mnt/a.txt show_location --offset' to > find the location of a given offset. > When using default layout, object size is 4M, (of

Re: [ceph-users] rbd_inst.create

2015-12-07 Thread NEVEU Stephane
Hi, One more question rbd.py / rados.py, is there a way to retrieve watchers on rbd images ? ImageBusy class is raised when trying to use lock_exclusive() or lock_shared() but I cannot find how to list watchers... Thanks [@@ THALES GROUP INTERNAL @@] -Message d'origine- De : Jason Dill

[ceph-users] 答复: 答复: how to see file object-mappings for cephfuse client

2015-12-07 Thread Wuxiangwei
it looks simple if everything stays as its default value. However, we do want to change the stripe unit size and stripe count to achieve possible higher performance. If so, it would be too troublesome to manually do the calculation every time when we want to locate a given offset(and maybe a ce

[ceph-users] french meetup website

2015-12-07 Thread eric mourgaya
Hi, I glad to write that a new website ( in french ) is now available. This website is managed by the ceph breizh community. You can find a report of the last meetup on this page: ceph breizh (http://ceph.bzh) enjoy it and join us. -- Eric Mourgaya, Respectons la

Re: [ceph-users] 答复: 答复: how to see file object-mappings for cephfuse client

2015-12-07 Thread John Spray
On Mon, Dec 7, 2015 at 9:13 AM, Wuxiangwei wrote: > > it looks simple if everything stays as its default value. However, we do want > to change the stripe unit size and stripe count to achieve possible higher > performance. If so, it would be too troublesome to manually do the > calculation eve

Re: [ceph-users] ceph-disk list crashes in infernalis

2015-12-07 Thread Loic Dachary
Thanks ! On 06/12/2015 17:50, Stolte, Felix wrote: > Hi Loic, > > output is: > > /dev: > insgesamt 0 > crw--- 1 root root 10, 235 Dez 2 17:02 autofs > drwxr-xr-x 2 root root1000 Dez 2 17:02 block > drwxr-xr-x 2 root root 60 Dez 2 17:02 bsg > crw--- 1 root root

[ceph-users] poor performance when recovering

2015-12-07 Thread Libin Wu
Hi, cephers I'm doing the performance test of ceph when recovering. The scene is simple: 1. run fio on 6 krbd device 2. stop one OSD for 10 seconds 3. start that OSD However, when the OSD up and start recovering, the performance of fio drop down from 9k to 1k for about 20 seconds. At the same tii

Re: [ceph-users] poor performance when recovering

2015-12-07 Thread Libin Wu
Btw, my ceph version is 0.80.11 2015-12-07 21:45 GMT+08:00 Libin Wu : > Hi, cephers > > I'm doing the performance test of ceph when recovering. The scene is simple: > 1. run fio on 6 krbd device > 2. stop one OSD for 10 seconds > 3. start that OSD > > However, when the OSD up and start recovering,

Re: [ceph-users] poor performance when recovering

2015-12-07 Thread Oliver Dzombic
Hi, maybe you should first upgrade. " Posted by sage November 19th, 2015 This is a bugfix release for Firefly. As the Firefly 0.80.x series is nearing its planned end of life in January 2016 it may also be the last. " I think you are wasting time, trying to analyse/fix issues on a ver

[ceph-users] osd process threads stack up on osds failure

2015-12-07 Thread Kostis Fardelas
Hi cephers, after one OSD node crash (6 OSDs in total), we experienced an increase of approximately 230-260 threads for every other OSD node. We have 26 OSD nodes with 6 OSDs per node, so this is approximately 40 threads per osd. The OSD node has joined the cluster after 15-20 minutes. The only wo

Re: [ceph-users] rbd_inst.create

2015-12-07 Thread Jason Dillaman
lock_exclusive() / lock_shared() methods are not related to image watchers. Instead, it is tied to the advisory locking mechanism -- and list_lockers() can be used to query who has a lock. -- Jason Dillaman - Original Message - > From: "NEVEU Stephane" > To: "Jason Dillaman" > Cc

Re: [ceph-users] osd process threads stack up on osds failure

2015-12-07 Thread Gregory Farnum
On Mon, Dec 7, 2015 at 6:59 AM, Kostis Fardelas wrote: > Hi cephers, > after one OSD node crash (6 OSDs in total), we experienced an increase > of approximately 230-260 threads for every other OSD node. We have 26 > OSD nodes with 6 OSDs per node, so this is approximately 40 threads > per osd. The

[ceph-users] Another script to make backups/replication of RBD images

2015-12-07 Thread Vandeir Eduardo
Hi, just sharing a little script I made to backup/replicate RBD images: https://github.com/vandeir/rbd-backup It was made based on the script available in this URL: https://www.rapide.nl/blog/item/ceph_-_rbd_replication.html It is used to create snapshots of Ceph RBD images e then export those s

[ceph-users] CEPH Replication

2015-12-07 Thread Le Quang Long
Hi all, I have one question Why did default replication change to 3 in Ceph Firefly? I think 2 copys of object is enough for backup. And increase the number of replication also increase latency when object has to be replicated to secondary and tertiory OSD. So why default replication is 3, not 2

Re: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04

2015-12-07 Thread Claes Sahlström
Hi all, Sorry to bother with this, I am trying my best to solve it, but I am quite stuck. I will continue to try to dig out more information any way I can, but I am quite clueless right now when it comes to why my OSDs don´t come up. Any help is of course very appreciated. I increased the logg

[ceph-users] scrub error with ceph

2015-12-07 Thread Erming Pei
Hi, I found there are 128 scrub errors in my ceph system. Checked with health detail and found many pgs with stuck unclean issue. Should I repair all of them? Or what I should do? [root@gcloudnet ~]# ceph -s cluster a4d0879f-abdc-4f9d-8a4b-53ce57d822f1 health HEALTH_ERR 128 pgs inc

Re: [ceph-users] osd process threads stack up on osds failure

2015-12-07 Thread Kostis Fardelas
Hi Greg, the node reboot unexpectedly. The timeline goes like this according to ceph cluster logs: 12:36:56 - 12:37:02 osds reported down 12:42:00 - 12:42:05 osds reported out 13:50:44 - 13:50:49 osds booted again The thread count in all other OSD nodes was ramping up from 12:36 until appr. 14:00

Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-07 Thread Blair Bethwaite
Hi Matt, (CC'ing in ceph-users too - similar reports there: http://www.spinics.net/lists/ceph-users/msg23037.html) We've seen something similar for KVM [lib]RBD clients acting as NFS gateways within our OpenStack cloud, the NFS services were locking up and causing client timeouts whenever we star

[ceph-users] osd wasn't marked as down/out when it's storage folder was deleted

2015-12-07 Thread Kane Kim
I've deleted: rm -rf /var/lib/ceph/osd/ceph-0/current folder but looks like ceph never noticed that: ceph osd df ID WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS 4 0.09270 1.0 97231M 54661M 42570M 56.22 1.34 95 0 0.09270 1.0 97231M 139M 97091M 0.14 0.00 69 5 0.09270 1.000

[ceph-users] rbd merge-diff error

2015-12-07 Thread Alex Gorbachev
When trying to merge two results of rbd export-diff, the following error occurs: iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151500 spin1/scrun1@autosnap120720151502 /data/volume1/scrun1-120720151502.bck iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151504 spin1/scrun1@au

Re: [ceph-users] rbd merge-diff error

2015-12-07 Thread Josh Durgin
On 12/07/2015 03:29 PM, Alex Gorbachev wrote: When trying to merge two results of rbd export-diff, the following error occurs: iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151500 spin1/scrun1@autosnap120720151502 /data/volume1/scrun1-120720151502.bck iss@lab2-b1:~$ rbd export-diff -

Re: [ceph-users] 答复: How long will the logs be kept?

2015-12-07 Thread David Zafman
dout() is used for an OSD to log information about what it is doing locally and might become very chatty. It is saved on the local nodes disk only. clog is the cluster log and is used for major events that should be known by the administrator (see ceph -w). Clog should be used sparingly a

Re: [ceph-users] poor performance when recovering

2015-12-07 Thread Libin Wu
Yeah, we will upgrade in the near future. But i'm afraid the recovery problem also existed in the hammer version. So, why recovery affect performance so much, any plan to improve it? 2015-12-07 22:29 GMT+08:00 Oliver Dzombic : > Hi, > > maybe you should first upgrade. > > " > > Posted by sage

[ceph-users] [Ceph-Users] Upgrade Path to Hammer

2015-12-07 Thread Shinobu Kinjo
Hello, Have any of you tried to upgrade the Ceph cluster through the following upgrade path. Dumpling -> Firefly -> Hammer * Each version is newest. After upgrading from Dumpling, Firefly to Hammer following this: http://docs.ceph.com/docs/master/install/upgrading-ceph/ I ended up with hitti

Re: [ceph-users] [Ceph-Users] Upgrade Path to Hammer

2015-12-07 Thread Gregory Farnum
As that ticket indicates, older versions of the code didn't create the backtraces, so obviously they aren't present. That certainly includes Dumpling! -Greg On Monday, December 7, 2015, Shinobu Kinjo wrote: > Hello, > > Have any of you tried to upgrade the Ceph cluster through the following > up

Re: [ceph-users] french meetup website

2015-12-07 Thread Alexandre DERUMIER
Hi Eric, too bad, I was not aware about this meetup, could you make an announcement for the next one next time ? I'll glad to share my experience with my full ssd production cluster. Regards, Alexandre - Mail original - De: "eric mourgaya" À: "ceph-users" Envoyé: Lundi 7 Décembre

Re: [ceph-users] [Ceph-Users] Upgrade Path to Hammer

2015-12-07 Thread Shinobu Kinjo
Is there anything we have to do? or that upgrade path is not doable... Shinobu - Original Message - From: "Gregory Farnum" To: "Shinobu Kinjo" Cc: "ceph-users" Sent: Tuesday, December 8, 2015 10:36:34 AM Subject: Re: [ceph-users] [Ceph-Users] Upgrade Path to Hammer As that ticket ind

Re: [ceph-users] [Ceph-Users] Upgrade Path to Hammer

2015-12-07 Thread Gregory Farnum
The warning is informational -- it doesn't harm anything. Future writes to those files/directories will generate backtraces and it'll go away. On Monday, December 7, 2015, Shinobu Kinjo wrote: > Is there anything we have to do? > or that upgrade path is not doable... > > Shinobu > > - Origi

Re: [ceph-users] [Ceph-Users] Upgrade Path to Hammer

2015-12-07 Thread Shinobu Kinjo
Thanks! - Original Message - From: "Gregory Farnum" To: "Shinobu Kinjo" Cc: "ceph-users" Sent: Tuesday, December 8, 2015 12:06:51 PM Subject: Re: [ceph-users] [Ceph-Users] Upgrade Path to Hammer The warning is informational -- it doesn't harm anything. Future writes to those files/dire

Re: [ceph-users] osd wasn't marked as down/out when it's storage folder was deleted

2015-12-07 Thread GuangYang
It is actually not part of ceph. For some files under the folder, they are only access during OSD booting up, so removal would not cause a problem there. For some other files, OSD would keep a open handle, in which case, even you remove those files from within filesystem, they are not erased as

Re: [ceph-users] scrub error with ceph

2015-12-07 Thread GuangYang
Before issuing scrub, you may check if those scrub errors would point to one (or a small subset of) disk/OSD, and if so, did those objects put in a specified interval? It is a large amount of scrub errors in a small cluster, which might be caused by some hardware issue ? __

[ceph-users] after loss of journal, osd fails to start with failed assert OSDMapRef OSDService::get_map(epoch_t) ret != null

2015-12-07 Thread Benedikt Fraunhofer
Hello List, after some crash of a box, the journal vanished. Creating a new one with --mkjournal results in the osd beeing unable to start. Does anyone want to dissect this any further or should I just trash the osd and recreate it? Thx in advance Benedikt 2015-12-01 07:46:31.505255 7fadb7f1e9

[ceph-users] osd dies on pg repair with FAILED assert(!out->snaps.empty())

2015-12-07 Thread Benedikt Fraunhofer
Hello Cephers! trying to repair an inconsistent PG results in the osd dying with an assertion failure: 0> 2015-12-01 07:22:13.398006 7f76d6594700 -1 osd/SnapMapper.cc: In function 'int SnapMapper::get_snaps(const hobject_t& , SnapMapper::object_snaps*)' thread 7f76d6594700 time 2015-12-01 07

[ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-07 Thread Benedikt Fraunhofer
Hello Cephers, lately, our ceph-cluster started to show some weird behavior: the osd boxes show a load of 5000-15000 before the osds get marked down. Usually the box is fully usable, even "apt-get dist-upgrade" runs smoothly, you can read and write to any disk, only things you can't do are strace

Re: [ceph-users] after loss of journal, osd fails to start with failed assert OSDMapRef OSDService::get_map(epoch_t) ret != null

2015-12-07 Thread Jan Schermer
The rule of thumb is that the data on OSD is gone if the related journal is gone. Journal doesn't just "vanish", though, so you should investigate further... This log is from the new empty journal, right? Jan > On 08 Dec 2015, at 08:08, Benedikt Fraunhofer wrote: > > Hello List, > > after so

Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-07 Thread Jan Schermer
What is the setting of sysctl kernel.pid_max? You relly need to have this: kernel.pid_max = 4194304 (I think it also sets this as well: kernel.threads-max = 4194304) I think you are running out of processs IDs. Jan > On 08 Dec 2015, at 08:10, Benedikt Fraunhofer wrote: > > Hello Cephers, > >

Re: [ceph-users] after loss of journal, osd fails to start with failed assert OSDMapRef OSDService::get_map(epoch_t) ret != null

2015-12-07 Thread Benedikt Fraunhofer
Hi Jan, 2015-12-08 8:12 GMT+01:00 Jan Schermer : > Journal doesn't just "vanish", though, so you should investigate further... We tried putting journals as files to overcome the changes in ceph-deploy where you can't have the journals unencrypted but only the disks itself. (and/or you can't have

Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-07 Thread Benedikt Fraunhofer
Hi Jan, we initially had to bump it once we had more than 12 osds per box. But it'll change that to the values you provided. Thx! Benedikt 2015-12-08 8:15 GMT+01:00 Jan Schermer : > What is the setting of sysctl kernel.pid_max? > You relly need to have this: > kernel.pid_max = 4194304 > (I thi

Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-07 Thread Jan Schermer
And how many pids do you have currently? This should do it I think # ps axH |wc -l Jan > On 08 Dec 2015, at 08:26, Benedikt Fraunhofer wrote: > > Hi Jan, > > we initially had to bump it once we had more than 12 osds > per box. But it'll change that to the values you provided. > > Thx! > > Be

Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-07 Thread Benedikt Fraunhofer
Hi Jan, we had 65k for pid_max, which made kernel.threads-max = 1030520. or kernel.threads-max = 256832 (looks like it depends on the number of cpus?) currently we've root@ceph1-store209:~# sysctl -a | grep -e thread -e pid kernel.cad_pid = 1 kernel.core_uses_pid = 0 kernel.ns_last_pid = 60298 k

Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-07 Thread Jan Schermer
Doesn't look near the limit currently (but I suppose you rebooted it in the meantime?). Did iostat say anything about the drives? (btw dm-1 and dm-6 are what? Is that your data drives?) - were they overloaded really? Jan > On 08 Dec 2015, at 08:41, Benedikt Fraunhofer wrote: > > Hi Jan, > >

Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-07 Thread Benedikt Fraunhofer
Hi Jan, > Doesn't look near the limit currently (but I suppose you rebooted it in the > meantime?). the box this numbers came from has an uptime of 13 days so it's one of the boxes that did survive yesterdays half-cluster-wide-reboot. > Did iostat say anything about the drives? (btw dm-1 and dm