Re: [ceph-users] OSD balancing problems

2014-11-19 Thread Lei Dong
We've encountered this problem a lot. As far as I know the best practice should be making the distribution of PG across OSDs as even as you can after you create the pool and before you write any data. 1. the disk utilization = (PGs per OSD) * (files per PG). Ceph is good at making  (files per PG

[ceph-users] How to collect ceph linux rbd log

2014-11-19 Thread lijian
Hi, I want to collect linux kernel rbd log, I know it use the dout() method to debug after I read the linux kernel code, But how to enable it and where I can find it? Thanks Jian Li ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.cep

Re: [ceph-users] Ceph performance - 10 times slower

2014-11-19 Thread Mark Nelson
On 11/19/2014 06:51 PM, Jay Janardhan wrote: Can someone help me what I can tune to improve the performance? The cluster is pushing data at about 13 MB/s with a single copy of data while the underlying disks can push 100+MB/s. Can anyone help me with this? *rados bench results:* Concurrency R

[ceph-users] Ceph performance - 10 times slower

2014-11-19 Thread Jay Janardhan
Can someone help me what I can tune to improve the performance? The cluster is pushing data at about 13 MB/s with a single copy of data while the underlying disks can push 100+MB/s. Can anyone help me with this? *rados bench results:* Concurrency Replication size Write(MB/s) Seq Read(

[ceph-users] Stuck OSD

2014-11-19 Thread Jon Kåre Hellan
Hi I'm testing a Giant cluster. There are 6 OSD on 3 virtual machines. One OSD is marked down and out. The process still exists, it is in uninterruptible sleep. It has stopped logging. I've uploaded what I think are relevant fragments of the log to pastebin: http://pastebin.com/Y42GvGjr Ca

[ceph-users] pg's degraded

2014-11-19 Thread JIten Shah
After rebuilding a few OSD’s, I see that the pg’s are stuck in degraded mode. Sone are in the unclean and others are in the stale state. Somehow the MDS is also degraded. How do I recover the OSD’s and the MDS back to healthy ? Read through the documentation and on the web but no luck so far. p

Re: [ceph-users] How to add/remove/move an MDS?

2014-11-19 Thread Gregory Farnum
You don't really need to do much. There are some "ceph mds" commands that let you clean things up in the MDSMap if you like, but moving an MDS essentially it boils down to: 1) make sure your new node has a cephx key (probably for a new MDS entity named after the new host, but not strictly necessary

Re: [ceph-users] Giant upgrade - stability issues

2014-11-19 Thread Andrei Mikhailovsky
Well, the heartbeats are failing due to networking errors preventing the heartbeats from arriving. That is causing osds to go down, and that is causing pgs to become degraded. You'll have to work out what is preventing the tcp connections from being stable. -Sam AM: Sam, I will start the ne

Re: [ceph-users] Giant upgrade - stability issues

2014-11-19 Thread Samuel Just
Well, the heartbeats are failing due to networking errors preventing the heartbeats from arriving. That is causing osds to go down, and that is causing pgs to become degraded. You'll have to work out what is preventing the tcp connections from being stable. -Sam On Wed, Nov 19, 2014 at 1:39 PM,

Re: [ceph-users] Giant upgrade - stability issues

2014-11-19 Thread Andrei Mikhailovsky
>You indicated that osd 12 and 16 were the ones marked down, but it >looks like only 0,1,2,3,7 were marked down in the ceph.log you sent. >The logs for 12 and 16 did indicate that they had been partitioned >from the other nodes. I'd bet that you are having intermittent >network trouble since

Re: [ceph-users] OSD balancing problems

2014-11-19 Thread Gregory Farnum
I think these numbers are about what is expected. You could try a couple things to improve it, but neither of them are common: 1) increase the number of PGs (and pgp_num) a lot more. I you decide to experiment with this, watch your CPU and memory numbers carefully. 2) try to correct for the inequ

Re: [ceph-users] Ceph Monitoring with check_MK

2014-11-19 Thread Kostis Fardelas
Hi Robert, an improvement to your checks could be the addition of check parameters (instead of using hard coded values for warn and crit) so that someone can change their values in main.mk. Hope to find some time soon and send you a PR about it. Nice job btw! On 19 November 2014 18:23, Robert Sand

Re: [ceph-users] CephFS unresponsive at scale (2M files,

2014-11-19 Thread Kevin Sumner
Making mds cache size 5 million seems to have helped significantly, but we’re still seeing issues occasionally on metadata reads while under load. Settings over 5 million don’t seem to have any noticeable impact on this problem. I’m starting the upgrade to Giant today. -- Kevin Sumner ke...@su

[ceph-users] How to add/remove/move an MDS?

2014-11-19 Thread Erik Logtenberg
Hi, I noticed that the docs [1] on adding and removing an MDS are not yet written... [1] https://ceph.com/docs/master/rados/deployment/ceph-deploy-mds/ I would like to do exactly that, however. I have an MDS on one machine, but I'd like a faster machine to take over instead. In fact, It would be

Re: [ceph-users] Ceph Monitoring with check_MK

2014-11-19 Thread Robert Sander
Hi, On 14.11.2014 11:38, Nick Fisk wrote: > I've just been testing your ceph check and I have made a small modification > to allow it to adjust itself to suit the autoscaling of the units Ceph > outputs. Thanks for the feedback. I took your idea, added PB and KB, and pushed it to github again:

[ceph-users] Can I limit buffering for each object in radosgw?

2014-11-19 Thread Mustafa Muhammad
Hi, I am using radosgw, I have a large number of connections with huge files, the memory usage keeps getting higher until killed by the kernel, I *think* because of buffering for each request. Is the a way to limit the buffer size for each object (or for each connection)? and What do you suggest?

Re: [ceph-users] Giant upgrade - stability issues

2014-11-19 Thread Samuel Just
logs/andrei » grep failed ceph.log.7 2014-11-12 01:37:19.857143 mon.0 192.168.168.13:6789/0 969265 : cluster [INF] osd.3 192.168.168.200:6818/26170 failed (3 reports from 3 peers after 22.000550 >= grace 20.995772) 2014-11-12 01:37:21.176073 mon.0 192.168.168.13:6789/0 969287 : cluster [INF] osd.0

Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-19 Thread Travis Rhoden
Hi Massimiliano, On Tue, Nov 18, 2014 at 5:23 PM, Massimiliano Cuttini wrote: > Then. > ...very good! :) > > Ok, the next bad thing is that I have installed GIANT on Admin node. > However ceph-deploy ignore ADMIN node installation and install FIREFLY. > Now i have ceph-deploy of Giant on my

[ceph-users] rogue mount in /var/lib/ceph/tmp/mnt.eml1yz ?

2014-11-19 Thread SCHAER Frederic
Hi, I rebooted a node (I'm doing some tests, and breaking many things ;) ), I see I have : [root@ceph0 ~]# mount|grep sdp1 /dev/sdp1 on /var/lib/ceph/tmp/mnt.eml1yz type xfs (rw,noatime,attr2,inode64,noquota) /dev/sdp1 on /var/lib/ceph/osd/ceph-55 type xfs (rw,noatime,attr2,inode64,noquota) [

[ceph-users] OSD balancing problems

2014-11-19 Thread Stephane Boisvert
Hi, I know a lot of people already asked these questions but looking at all the answers I'm still having problems with my OSD balancing. Disks size is 4TB drives. We are seeing differences up to 10% Here is a DF from one server: /dev/sdc1 3.7T 3.0T 680G 82% /var/lib/cep

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-19 Thread JF Le Fillâtre
Hello again, So whatever magic allows the Dell MD1200 to report the slot position for each disk isn't present in your JBODs. Time for something else. There are two sides to your problem: 1) Identifying which disk is where in your JBOD Quite easy. Again I'd go for a udev rule + script that will

Re: [ceph-users] incorrect pool size, wrong ruleset?

2014-11-19 Thread houmles
Currently Firefly on Debian stable, all updated. I already tried it with Giant and it's same. But it's look like I solved it. I change crush tunables to optimal and now it shows the size right. And even when I switch back to default it shows it right. It's weird, but hopefully it's solved for now.

Re: [ceph-users] Poor RBD performance as LIO iSCSI target

2014-11-19 Thread David Moreau Simard
Rama, Thanks for your reply. My end goal is to use iSCSI (with LIO/targetcli) to export rbd block devices. I was encountering issues with iSCSI which are explained in my previous emails. I ended up being able to reproduce the problem at will on various Kernel and OS combinations, even on raw RB

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-19 Thread SCHAER Frederic
Hi Thanks. I hoped it would be it, but no ;) With this mapping : lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdb -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:0/end_device-1:1:0/target1:0:1/1:0:1:0/block/sdb lrwxrwxr

[ceph-users] Regarding Federated Gateways - Zone Sync Issues

2014-11-19 Thread Vinod H I
Hi, I am using firefly version 0.80.7. I am testing disaster recovery mechanism for rados gateways. I have followed the federated gateway setup as mentioned in the docs. There is one region with two zones on the same cluster. After sync(using radosgw-agent, with "--sync-scope=full"), container crea

Re: [ceph-users] Bonding woes

2014-11-19 Thread Roland Giesler
Maybe I should rephrase my question by asking what the relationship is between bonding and ethtool? *Roland* On 18 November 2014 22:14, Roland Giesler wrote: > Hi people, I have two identical servers (both Sun X2100 M2's) that form > part of a cluster of 3 machines (other machines will be adde

[ceph-users] ceph osd perf

2014-11-19 Thread NEVEU Stephane
Hi all, I'm running firefly 0.80.5 and here is osd perf output : ceph osd perf osdid fs_commit_latency(ms) fs_apply_latency(ms) 0 1111 1 1111 2 1370 3