[ceph-users] running Firefly client (0.80.1) against older version (dumpling 0.67.10) cluster?

2014-08-14 Thread Nigel Williams
Anyone know if this is safe in the short term? we're rebuilding our nova-compute nodes and can make sure the Dumpling versions are pinned as part of the process in the future. ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] running Firefly client (0.80.1) against older version (dumpling 0.67.10) cluster?

2014-08-14 Thread Alexandre DERUMIER
Same question here, I'm contributor on proxmox, and we don't known if we can upgrade librbd safely, for users with dumpling cluster. Also, for ceph enterprise , s oes inktank support dumpling enterprise + firefly librbd ? - Mail original - De: Nigel Williams

Re: [ceph-users] ceph cluster expansion

2014-08-14 Thread Christian Balzer
Hello, On Wed, 13 Aug 2014 14:55:29 +0100 James Eckersall wrote: Hi Christian, Most of our backups are rsync or robocopy (windows), so they are incremental file-based backups. There will be a high level of parallelism as the backups run mostly overnight with similar start times. In that

Re: [ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)

2014-08-14 Thread John Morris
On 08/13/2014 11:36 PM, Christian Balzer wrote: Hello, On Thu, 14 Aug 2014 03:38:11 + David Moreau Simard wrote: Hi, Trying to update my continuous integration environment.. same deployment method with the following specs: - Ubuntu Precise, Kernel 3.2, Emperor (0.72.2) - Yields a

Re: [ceph-users] running Firefly client (0.80.1) against older version (dumpling 0.67.10) cluster?

2014-08-14 Thread Wido den Hollander
On 08/14/2014 08:18 AM, Nigel Williams wrote: Anyone know if this is safe in the short term? we're rebuilding our nova-compute nodes and can make sure the Dumpling versions are pinned as part of the process in the future. The client should be fully backwards compatible. It figures out which

[ceph-users] Tracking the system calls for OSD write

2014-08-14 Thread Sudarsan, Rajesh
Hi, I am trying to track the actual system open and write call on a OSD when a new file is created and written. So far, my tracking is as follows: Using the debug log messages, I located the first write call in do_osd_ops function (case CEPH_OSD_OP_WRITE) in os/ReplicatedPG.cc (line 3727)

Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean

2014-08-14 Thread Riederer, Michael
Hi Craig, Yes we have stability problems. The cluster is definitely not suitable for a production environment. I will not describe the details here. I want to get to know ceph and this is possible with the Test-cluster. Some osds are very slow, less than 15 MB / sec writable. Also increases

Re: [ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)

2014-08-14 Thread Christian Balzer
Hello, On Thu, 14 Aug 2014 01:38:05 -0500 John Morris wrote: On 08/13/2014 11:36 PM, Christian Balzer wrote: Hello, On Thu, 14 Aug 2014 03:38:11 + David Moreau Simard wrote: Hi, Trying to update my continuous integration environment.. same deployment method with the

Re: [ceph-users] CRUSH map advice

2014-08-14 Thread Christian Balzer
Hello, On Tue, 12 Aug 2014 10:53:21 -0700 Craig Lewis wrote: On Mon, Aug 11, 2014 at 11:26 PM, John Morris j...@zultron.com wrote: On 08/11/2014 08:26 PM, Craig Lewis wrote: Your MON nodes are separate hardware from the OSD nodes, right? Two nodes are OSD + MON, plus a separate

Re: [ceph-users] cache pools on hypervisor servers

2014-08-14 Thread Robert van Leeuwen
Personally I am not worried too much about the hypervisor - hypervisor traffic as I am using a dedicated infiniband network for storage. It is not used for the guest to guest or the internet traffic or anything else. I would like to decrease or at least smooth out the traffic peaks between

Re: [ceph-users] ceph cluster inconsistency?

2014-08-14 Thread Kenneth Waegeman
I have: osd_objectstore = keyvaluestore-dev in the global section of my ceph.conf [root@ceph002 ~]# ceph osd erasure-code-profile get profile11 directory=/usr/lib64/ceph/erasure-code k=8 m=3 plugin=jerasure ruleset-failure-domain=osd technique=reed_sol_van the ecdata pool has this as profile

[ceph-users] OSD disk replacement best practise

2014-08-14 Thread Guang Yang
Hi cephers, Most recently I am drafting the run books for OSD disk replacement, I think the rule of thumb is to reduce data migration (recover/backfill), and I thought the following procedure should achieve the purpose: 1. ceph osd out osd.XXX (mark it out to trigger data migration) 2. ceph

Re: [ceph-users] running Firefly client (0.80.1) against older version (dumpling 0.67.10) cluster?

2014-08-14 Thread Sage Weil
On Thu, 14 Aug 2014, Nigel Williams wrote: Anyone know if this is safe in the short term? we're rebuilding our nova-compute nodes and can make sure the Dumpling versions are pinned as part of the process in the future. It's safe, with the possible exception of radosgw, that generally needs

[ceph-users] osd pool stats

2014-08-14 Thread Luis Periquito
Hi, I've just added a few more OSDs to my cluster. As it was expected the system started rebalancing all the PGs to the new nodes. pool .rgw.buckets id 24 -221/-182 objects degraded (121.429%) recovery io 27213 kB/s, 53 objects/s client io 27434 B/s rd, 0 B/s wr, 66 op/s the status

Re: [ceph-users] Performance really drops from 700MB/s to 10MB/s

2014-08-14 Thread Mariusz Gronczewski
Actual OSD (/var/log/ceph/ceph-osd.$id) logs would be more useful. Few ideas: * do 'ceph health detail' to get detail of which OSD is stalling * 'ceph osd perf' to see latency of each osd * 'ceph --admin-daemon /var/run/ceph/ceph-osd.$id.asok dump_historic_ops' shows recent slow ops I actually

[ceph-users] Cache tiering and target_max_bytes

2014-08-14 Thread Paweł Sadowski
Hello, I've a cluster of 35 OSD (30 HDD, 5 SSD) with cache tiering configured. During tests it looks like ceph is not respecting target_max_bytes settings. Steps to reproduce: - configure cache tiering - set target_max_bytes to 32G (on hot pool) - write more than 32G of data - nothing happens

Re: [ceph-users] Performance really drops from 700MB/s to 10MB/s

2014-08-14 Thread German Anders
Hi Mariusz, Thanks a lot for the ideas, I've rebooted the client server, map again the rbd and launch the fio test again, this time it work... very rarewhile running the test I run also: ceph@cephmon01:~$ ceph osd perf osdid fs_commit_latency(ms) fs_apply_latency(ms) 0

Re: [ceph-users] rados bench no clean cleanup

2014-08-14 Thread Kenneth Waegeman
- Message from zhu qiang zhu_qiang...@foxmail.com - Date: Fri, 8 Aug 2014 10:00:18 +0800 From: zhu qiang zhu_qiang...@foxmail.com Subject: RE: [ceph-users] rados bench no clean cleanup To: 'Kenneth Waegeman' kenneth.waege...@ugent.be, 'ceph-users'

Re: [ceph-users] Cache tiering and target_max_bytes

2014-08-14 Thread Sage Weil
On Thu, 14 Aug 2014, Pawe? Sadowski wrote: Hello, I've a cluster of 35 OSD (30 HDD, 5 SSD) with cache tiering configured. During tests it looks like ceph is not respecting target_max_bytes settings. Steps to reproduce: - configure cache tiering - set target_max_bytes to 32G (on hot pool)

Re: [ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)

2014-08-14 Thread David Moreau Simard
Ah, I was afraid it would be related to the amount of replicas versus the amount of host buckets. Makes sense. I was unable to reproduce the issue with three hosts and one OSD on each host. Thanks. -- David Moreau Simard On Aug 14, 2014, at 12:36 AM, Christian Balzer

Re: [ceph-users] cache pools on hypervisor servers

2014-08-14 Thread Andrei Mikhailovsky
Hi guys, Could someone from the ceph team please comment on running osd cache pool on the hypervisors? Is this a good idea, or will it create a lot of performance issues? Anyone in the ceph community that has done this? Any results to share? Many thanks Andrei - Original Message

Re: [ceph-users] OSD disk replacement best practise

2014-08-14 Thread Smart Weblications GmbH - Florian Wiessner
Am 14.08.2014 13:29, schrieb Guang Yang: Hi cephers, Most recently I am drafting the run books for OSD disk replacement, I think the rule of thumb is to reduce data migration (recover/backfill), and I thought the following procedure should achieve the purpose: 1. ceph osd out osd.XXX

Re: [ceph-users] cache pools on hypervisor servers

2014-08-14 Thread Sage Weil
On Thu, 14 Aug 2014, Andrei Mikhailovsky wrote: Hi guys, Could someone from the ceph team please comment on running osd cache pool on the hypervisors? Is this a good idea, or will it create a lot of performance issues? It doesn't sound like an especially good idea. In general you want the

Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean

2014-08-14 Thread Craig Lewis
It sound likes you need to throttle recovery. I have this in my ceph.conf: [osd] osd max backfills = 1 osd recovery max active = 1 osd recovery op priority = 1 Those configs, plus SSD journals, really helped the stability of my cluster during recovery. Before I made those changes, I

[ceph-users] Translating a RadosGW object name into a filename on disk

2014-08-14 Thread Craig Lewis
In my effort to learn more of the details of Ceph, I'm trying to figure out how to get from an object name in RadosGW, through the layers, down to the files on disk. clewis@clewis-mac ~ $ s3cmd ls s3://cpltest/ 2014-08-13 23:0214M 28dde9db15fdcb5a342493bc81f91151

Re: [ceph-users] Performance really drops from 700MB/s to 10MB/s

2014-08-14 Thread Craig Lewis
I find graphs really help here. One screen that has all the disk I/O and latency for all OSDs makes it easy to pin point the bottleneck. If you don't have that, I'd go low tech: Watch the blinky lights. It's really easy to see which disk is the hotspot. On Thu, Aug 14, 2014 at 6:56 AM,

[ceph-users] How to create multiple OSD's per host?

2014-08-14 Thread Bruce McFarland
I've tried using ceph-deploy but it wants to assign the same id for each osd and I end up with a bunch of prepared ceph-disk's and only 1 active. If I use the manual short form method the activate step fails and there are no xfs mount points on the ceph-disks. If I use the manual long form it

Re: [ceph-users] Performance really drops from 700MB/s to 10MB/s

2014-08-14 Thread German Anders
I use nmon on each OSD server, this is a really good tool to find out what is going on regarding CPU, Mem, Disks and Networking German Anders --- Original message --- Asunto: Re: [ceph-users] Performance really drops from 700MB/s to 10MB/s De: Craig Lewis

Re: [ceph-users] CRUSH map advice

2014-08-14 Thread Craig Lewis
On Thu, Aug 14, 2014 at 12:47 AM, Christian Balzer ch...@gol.com wrote: Hello, On Tue, 12 Aug 2014 10:53:21 -0700 Craig Lewis wrote: That's a low probability, given the number of disks you have. I would've taken that bet (with backups). As the number of OSDs goes up, the probability of

[ceph-users] Musings

2014-08-14 Thread Robert LeBlanc
We are looking to deploy Ceph in our environment and I have some musings that I would like some feedback on. There are concerns about scaling a single Ceph instance to the PBs of size we would use, so the idea is to start small like once Ceph cluster per rack or two. Then as we feel more

Re: [ceph-users] librados: client.admin authentication error

2014-08-14 Thread John Wilkins
Can you provide some background? I've just reworked the cephx authentication sections. They are still in a wip branch, and as you ask the question, it occurs to me that we do not have a troubleshooting section for authentication issues. It could be any number of things: 1. you don't have the

Re: [ceph-users] ceph --status Missing keyring

2014-08-14 Thread John Wilkins
Dan, Do you have /etc/ceph/ceph.client.admin.keyring, or is that in a local directory? Ceph will be looking for it in the /etc/ceph directory by default. See if adding read permissions works, e.g., sudo chmod +r. You can also try sudo when executing ceph. On Wed, Aug 6, 2014 at 6:55 AM,

Re: [ceph-users] Cache tiering and target_max_bytes

2014-08-14 Thread Paweł Sadowski
W dniu 14.08.2014 17:20, Sage Weil pisze: On Thu, 14 Aug 2014, Pawe? Sadowski wrote: Hello, I've a cluster of 35 OSD (30 HDD, 5 SSD) with cache tiering configured. During tests it looks like ceph is not respecting target_max_bytes settings. Steps to reproduce: - configure cache tiering -

Re: [ceph-users] Cache tiering and target_max_bytes

2014-08-14 Thread Sage Weil
On Thu, 14 Aug 2014, Pawe? Sadowski wrote: W dniu 14.08.2014 17:20, Sage Weil pisze: On Thu, 14 Aug 2014, Pawe? Sadowski wrote: Hello, I've a cluster of 35 OSD (30 HDD, 5 SSD) with cache tiering configured. During tests it looks like ceph is not respecting target_max_bytes settings.

Re: [ceph-users] How to create multiple OSD's per host?

2014-08-14 Thread Bruce McFarland
This is an example of the output from 'ceph-deploy osd create [data] [journal' I've noticed that all of the 'ceph-conf' commands use the same parameter of '-name=osd.' Everytime ceph-deploy is called. I end up with 30 osd's - 29 in the prepared and 1 active according to the 'ceph-disk list'

Re: [ceph-users] can osd start up if journal is lost and it has not been replayed?

2014-08-14 Thread yuelongguang
hi could you tell the reason, why 'the journal is lost, the OSD is lost'? if journal is lost, actually it only lost part which ware not replayed. let take a similar case as example, a osd is down for some time , its journal is out of date(lose part of journal), but it can catch up with other

[ceph-users] help to confirm if journal includes everything a OP has

2014-08-14 Thread yuelongguang
hi,all By reading the code , i notice everything of a OP is encoded into Transaction which is writed into journal later. does journal record everything(meta,xattr,file data...) of a OP. if so everything is writed into disk twice, and journal always reaches full state, right? thanks

Re: [ceph-users] How to create multiple OSD's per host?

2014-08-14 Thread Jason King
2014-08-15 7:56 GMT+08:00 Bruce McFarland bruce.mcfarl...@taec.toshiba.com : This is an example of the output from ‘ceph-deploy osd create [data] [journal’ I’ve noticed that all of the ‘ceph-conf’ commands use the same parameter of ‘–name=osd.’ Everytime ceph-deploy is called. I end up

Re: [ceph-users] How to create multiple OSD's per host?

2014-08-14 Thread Jason King
2014-08-15 7:56 GMT+08:00 Bruce McFarland bruce.mcfarl...@taec.toshiba.com : This is an example of the output from ‘ceph-deploy osd create [data] [journal’ I’ve noticed that all of the ‘ceph-conf’ commands use the same parameter of ‘–name=osd.’ Everytime ceph-deploy is called. I end up

Re: [ceph-users] How to create multiple OSD's per host?

2014-08-14 Thread Jason King
2014-08-15 7:56 GMT+08:00 Bruce McFarland bruce.mcfarl...@taec.toshiba.com : This is an example of the output from ‘ceph-deploy osd create [data] [journal’ I’ve noticed that all of the ‘ceph-conf’ commands use the same parameter of ‘–name=osd.’ Everytime ceph-deploy is called. I end up

Re: [ceph-users] ceph cluster inconsistency?

2014-08-14 Thread Haomai Wang
Hi Kenneth, I don't find valuable info in your logs, it lack of the necessary debug output when accessing crash code. But I scan the encode/decode implementation in GenericObjectMap and find something bad. For example, two oid has same hash and their name is: A: rb.data.123 B: rb-123 In

Re: [ceph-users] Tracking the system calls for OSD write

2014-08-14 Thread Shu, Xinxin
The system call is invoked in FileStore::_do_transaction(). Cheers, xinxin From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sudarsan, Rajesh Sent: Thursday, August 14, 2014 3:01 PM To: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com Subject: [ceph-users] Tracking

Re: [ceph-users] How to create multiple OSD's per host?

2014-08-14 Thread Bruce McFarland
I’ll try the prepare/activiate commands again. I spent the least amount of time with them since activate _always_ failed for me. I’ll go back and check my logs, but probably because I was attempting to activate the same location I used in the ‘prepare’ instead of the partition 1 like you