Re: [ceph-users] Ceph OSDs with bcache experience

2015-10-22 Thread Wido den Hollander
On 10/21/2015 03:30 PM, Mark Nelson wrote: > > > On 10/21/2015 01:59 AM, Wido den Hollander wrote: >> On 10/20/2015 07:44 PM, Mark Nelson wrote: >>> On 10/20/2015 09:00 AM, Wido den Hollander wrote: Hi, In the "newstore direction" thread on ceph-devel I wrote that I'm using

Re: [ceph-users] ceph-fuse and its memory usage

2015-10-22 Thread Yan, Zheng
On Thu, Oct 22, 2015 at 4:47 AM, Gregory Farnum wrote: > On Tue, Oct 13, 2015 at 10:09 PM, Goncalo Borges > wrote: >> Hi all... >> >> Thank you for the feedback, and I am sorry for my delay in replying. >> >> 1./ Just to recall the problem, I was

Re: [ceph-users] CephFS and page cache

2015-10-22 Thread Burkhard Linke
Hi, On 10/22/2015 02:54 AM, Gregory Farnum wrote: On Sun, Oct 18, 2015 at 8:27 PM, Yan, Zheng wrote: On Sat, Oct 17, 2015 at 1:42 AM, Burkhard Linke wrote: Hi, I've noticed that CephFS (both ceph-fuse and kernel client in

Re: [ceph-users] ceph and upgrading OS version

2015-10-22 Thread Luis Periquito
There are several routes you can follow for this work. The best one will depend on cluster size, current data, pool definition (size), performance expectations, etc. They range from doing dist-upgrade a node at a time, to remove-upgrade-then-add nodes to the cluster. But knowing that ceph is

Re: [ceph-users] ceph and upgrading OS version

2015-10-22 Thread Andrei Mikhailovsky
Any thoughts anyone? Is it safe to perform OS version upgrade on the osd and mon servers? Thanks Andrei - Original Message - From: "Andrei Mikhailovsky" To: ceph-us...@ceph.com Sent: Tuesday, 20 October, 2015 8:05:19 PM Subject: [ceph-users] ceph and

Re: [ceph-users] how to understand deep flatten implementation

2015-10-22 Thread Jason Dillaman
The flatten operation is implemented by writing zero bytes to each object within a clone image. This causes librbd to copyup the backing object from the parent image to the clone image. A copyup is just a guarded write that will not write to the clone if the object already exists (i.e. new

Re: [ceph-users] ceph-deploy for "deb http://ceph.com/debian-hammer/ trusty main"

2015-10-22 Thread David Clarke
On 23/10/15 09:08, Kjetil Jørgensen wrote: > Hi, > > this seems to not get me ceph-deploy from ceph.com . > > http://download.ceph.com/debian-hammer/pool/main/c/ceph/ceph_0.94.4-1trusty_amd64.deb > does seem to contain /usr/share/man/man8/ceph-deploy.8.gz, which > conflicts with

[ceph-users] rbd unmap immediately consistent?

2015-10-22 Thread Allen Liao
Does ceph guarantee image consistency if an rbd image is unmapped on one machine then immediately mapped on another machine? If so, does the same apply to issuing a snapshot command on machine B as soon as the unmap command finishes on machine A? In other words, does the unmap operation flush

[ceph-users] [0.94.4] radosgw initialization timeout, failed to initialize

2015-10-22 Thread James O'Neill
I upgraded to 0.94.4 yesterday and now radosgw will not run on any of the servers. The service itself will run, but it's not listening (using civetweb with port 80 specified). Run manually I get the follow output: root@dbp-ceph01:~# /usr/bin/radosgw --cluster=ceph --id radosgw.gateway -d

Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-22 Thread Wido den Hollander
On 10/22/2015 10:57 PM, John-Paul Robinson wrote: > Hi, > > Has anyone else experienced a problem with RBD-to-NFS gateways blocking > nfsd server requests when their ceph cluster has a placement group that > is not servicing I/O for some reason, eg. too few replicas or an osd > with slow request

[ceph-users] tracker.ceph.com downtime today

2015-10-22 Thread Dan Mick
tracker.ceph.com will be brought down today for upgrade and move to a new host. I plan to do this at about 4PM PST (40 minutes from now). Expect a downtime of about 15-20 minutes. More notification to follow. -- Dan Mick Red Hat, Inc. Ceph docs: http://ceph.com/docs

[ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-22 Thread John-Paul Robinson
Hi, Has anyone else experienced a problem with RBD-to-NFS gateways blocking nfsd server requests when their ceph cluster has a placement group that is not servicing I/O for some reason, eg. too few replicas or an osd with slow request warnings? We have an RBD-NFS gateway that stops responding to

Re: [ceph-users] Core dump when running OSD service

2015-10-22 Thread David Zafman
I was focused on fixing the OSD, but you need to determine if some misconfiguration or hardware issue caused a filesystem corruption. David On 10/22/15 3:08 PM, David Zafman wrote: There is a corruption of the osdmaps on this particular OSD. You need determine which maps are bad probably

Re: [ceph-users] pg incomplete state

2015-10-22 Thread John-Paul Robinson
Greg, Thanks for providing this background on the incomplete state. With that context, and a little more digging online and in our environment, I was able to resolve the issue. My cluster is back in health ok. The key to fixing the incomplete state was the information provided by pg query. I

Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-22 Thread John-Paul Robinson
On 10/22/2015 04:03 PM, Wido den Hollander wrote: > On 10/22/2015 10:57 PM, John-Paul Robinson wrote: >> Hi, >> >> Has anyone else experienced a problem with RBD-to-NFS gateways blocking >> nfsd server requests when their ceph cluster has a placement group that >> is not servicing I/O for some

Re: [ceph-users] Core dump when running OSD service

2015-10-22 Thread David Zafman
There is a corruption of the osdmaps on this particular OSD. You need determine which maps are bad probably by bumping the osd debug level to 20. Then transfer them from a working OSD. The newest ceph-objectstore-tool has features to write the maps, but you'll need to build a version

[ceph-users] PGs stuck in active+clean+replay

2015-10-22 Thread Andras Pataki
Hi ceph users, We’ve upgraded to 0.94.4 (all ceph daemons got restarted) – and are in the middle of doing some rebalancing due to crush changes (removing some disks). During the rebalance, I see that some placement groups get stuck in ‘active+clean+replay’ for a long time (essentially until I

Re: [ceph-users] tracker.ceph.com downtime today

2015-10-22 Thread Dan Mick
tracker.ceph.com down now On 10/22/2015 03:20 PM, Dan Mick wrote: > tracker.ceph.com will be brought down today for upgrade and move to a > new host. I plan to do this at about 4PM PST (40 minutes from now). > Expect a downtime of about 15-20 minutes. More notification to follow. > -- Dan

Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-22 Thread Ryan Tokarek
> On Oct 22, 2015, at 3:57 PM, John-Paul Robinson wrote: > > Hi, > > Has anyone else experienced a problem with RBD-to-NFS gateways blocking > nfsd server requests when their ceph cluster has a placement group that > is not servicing I/O for some reason, eg. too few replicas or

Re: [ceph-users] tracker.ceph.com downtime today

2015-10-22 Thread Dan Mick
It's back. New DNS info is propagating its way around. If you absolutely must get to it, newtracker.ceph.com is the new address, but please don't bookmark that, as it will be going away after the transition. Please let me know of any problems you have. On 10/22/2015 04:09 PM, Dan Mick wrote: >

Re: [ceph-users] tracker.ceph.com downtime today

2015-10-22 Thread Dan Mick
Fixed a configuration problem preventing updating issues, and switched the mailer to use ipv4; if you updated and failed, or missed an email notification, that may have been why. On 10/22/2015 04:51 PM, Dan Mick wrote: > It's back. New DNS info is propagating its way around. If you > absolutely

Re: [ceph-users] Ceph OSDs with bcache experience

2015-10-22 Thread Wido den Hollander
On 10/21/2015 11:25 AM, Jan Schermer wrote: > >> On 21 Oct 2015, at 09:11, Wido den Hollander wrote: >> >> On 10/20/2015 09:45 PM, Martin Millnert wrote: >>> The thing that worries me with your next-gen design (actually your current >>> design aswell) is SSD wear. If you use

Re: [ceph-users] Network performance

2015-10-22 Thread Udo Lembke
Hi Jonas, you can create an bond over multible NICs (depends on your switch which modes are possible) to use one IP addresses but more than one NIC. Udo On 21.10.2015 10:23, Jonas Björklund wrote: > Hello, > > In the configuration I have read about "cluster network" and "cluster addr". > Is it

[ceph-users] Problems with ceph_rest_api after update

2015-10-22 Thread Jon Heese
Hello, We are running a Ceph cluster with 3x CentOS 7 MON nodes, and after we updated the ceph packages on the MONs yesterday (from 0.94.3 to 0.94.4), the ceph_rest_api started refusing to run, giving the following error 30 seconds after it's started: [root@ceph-mon01 ~]#

[ceph-users] when an osd is started up, IO will be blocked

2015-10-22 Thread wangsongbo
Hi all, When an osd is started, relative IO will be blocked. According to the test result,the larger iops the clients send , the longer it will take to elapse. Adjustment on all the parameters associate with recovery operations was also found useless. How to reduce the impact of this process

Re: [ceph-users] tracker.ceph.com downtime today

2015-10-22 Thread Dan Mick
Found that issue; reverted the database to the non-backlog-plugin state, created a test bug. Retry? On 10/22/2015 06:54 PM, Dan Mick wrote: > I see that too. I suspect this is because of leftover database columns > from the backlogs plugin, which is removed. Looking into it. > > On 10/22/2015

Re: [ceph-users] [performance] rbd kernel module versus qemu librbd

2015-10-22 Thread hzwuli...@gmail.com
Hi, list We still stuck on this problem, when this problem comes, the CPU usage of qemu-system-x86 if very high(1420): 15801 libvirt- 20 0 33.7g 1.4g 11m R 1420 0.6 1322:26 qemu-system-x86 quem-system-x86 process 15801 is responsible for the VM. Anyone has ever run into this

Re: [ceph-users] Core dump when running OSD service

2015-10-22 Thread James O'Neill
Hi David, Thank you for your suggestion. Unfortunately I did not understand what was involved and in the process of trying to figure it out I think I made it worse. Thankfully it's just a test environment so I just rebuilt all the Ceph servers involved and how it's working. Regards, James

Re: [ceph-users] Network performance

2015-10-22 Thread Jonas Björklund
On Thu, 22 Oct 2015, Udo Lembke wrote: Hi Jonas, you can create an bond over multible NICs (depends on your switch which modes are possible) to use one IP addresses but more than one NIC. Yes, but if an OSD would listen to all IP adresses on the server and use all nics we could have more

Re: [ceph-users] ceph-fuse and its memory usage

2015-10-22 Thread Goncalo Borges
Thank you! It seems there is an explanation for the behavior, which is good! On 10/23/2015 12:44 AM, Gregory Farnum wrote: On Thu, Oct 22, 2015 at 1:59 AM, Yan, Zheng wrote: direct IO only bypass kernel page cache. data still can be cached in ceph-fuse. If I'm correct, the

Re: [ceph-users] [performance] rbd kernel module versus qemu librbd

2015-10-22 Thread hzwuli...@gmail.com
Yeah, you are right. Test the rbd volume form host is fine. Now, at least we could affirm ti's the qemu or kvm problem, not ceph. hzwuli...@gmail.com From: Alexandre DERUMIER Date: 2015-10-23 12:51 To: hzwulibin CC: ceph-users Subject: Re: [ceph-users] [performance] rbd kernel module versus

Re: [ceph-users] tracker.ceph.com downtime today

2015-10-22 Thread Dan Mick
I see that too. I suspect this is because of leftover database columns from the backlogs plugin, which is removed. Looking into it. On 10/22/2015 06:43 PM, Kyle Bader wrote: > I tried to open a new issue and got this error: > > Internal error > > An error occurred on the page you were trying

Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-22 Thread Ryan Tokarek
> On Oct 22, 2015, at 10:19 PM, John-Paul Robinson wrote: > > A few clarifications on our experience: > > * We have 200+ rbd images mounted on our RBD-NFS gateway. (There's > nothing easier for a user to understand than "your disk is full".) Same here, and agreed. It sounds

Re: [ceph-users] [performance] rbd kernel module versus qemu librbd

2015-10-22 Thread hzwuli...@gmail.com
btw, we use perf to track the process qemu-system-x86(15801), there is an abnormal function: Samples: 1M of event 'cycles', Event count (approx.): 1057109744252

Re: [ceph-users] [performance] rbd kernel module versus qemu librbd

2015-10-22 Thread hzwuli...@gmail.com
Oh, no, from the phenomenon. IO in VM is wait for the host to completion. The CPU wait in VM is very high. Anyway, i could try to collect somthing, maybe there are some clues. hzwuli...@gmail.com From: Alexandre DERUMIER Date: 2015-10-23 12:39 To: hzwulibin CC: ceph-users Subject: Re:

Re: [ceph-users] [performance] rbd kernel module versus qemu librbd

2015-10-22 Thread Alexandre DERUMIER
>>Anyway, i could try to collect somthing, maybe there are some clues. And you don't have problem to read/write to this rbd from host with fio-rbd ? (try a read full the rbd volume for example) - Mail original - De: hzwuli...@gmail.com À: "aderumier" Cc:

Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-22 Thread John-Paul Robinson
A few clarifications on our experience: * We have 200+ rbd images mounted on our RBD-NFS gateway. (There's nothing easier for a user to understand than "your disk is full".) * I'd expect more contention potential with a single shared RBD back end, but with many distinct and presumably isolated

Re: [ceph-users] [performance] rbd kernel module versus qemu librbd

2015-10-22 Thread Alexandre DERUMIER
Do you have tried to use perf inside the faulty guest too ? - Mail original - De: hzwuli...@gmail.com À: "aderumier" Cc: "ceph-users" Envoyé: Vendredi 23 Octobre 2015 06:15:07 Objet: Re: Re: [ceph-users] [performance] rbd kernel module versus

Re: [ceph-users] ceph-fuse and its memory usage

2015-10-22 Thread Gregory Farnum
On Thu, Oct 22, 2015 at 1:59 AM, Yan, Zheng wrote: > direct IO only bypass kernel page cache. data still can be cached in > ceph-fuse. If I'm correct, the test repeatedly writes data to 8M > files. The cache make multiple write assimilate into single OSD > write Ugh, of

Re: [ceph-users] ceph same rbd on multiple client

2015-10-22 Thread gjprabu
Hi Frederic, Can you give me some solution, we are spending more time to solve this issue. Regards Prabu On Thu, 15 Oct 2015 17:14:13 +0530 Tyler Bishop tyler.bis...@beyondhosting.net wrote I don't know enough on ocfs to help. Sounds like you

Re: [ceph-users] ceph-hammer and debian jessie - missing files on repository

2015-10-22 Thread Björn Lässig
On 10/21/2015 08:50 PM, Alfredo Deza wrote: This shouldn't be a problem, would you mind trying again? I just managed to install on Debian Jessie without problems after syncing our mirror via IPv4 again (ipv6 is still broken), we have the missing bpo80 packages for debian-jessie! Thanks for

Re: [ceph-users] ceph and upgrading OS version

2015-10-22 Thread Andrei Mikhailovsky
Thanks Luis, I was hoping I could do dist-upgrade one node at a time. my cluster is very small, only 18 osds between the two osd servers and three mons I guess it should be safe to shutdown ceph on one of the osd servers, do the upgrade, reboot, wait for PGs to become active+clean and

Re: [ceph-users] Problems with ceph_rest_api after update

2015-10-22 Thread John Spray
On Thu, Oct 22, 2015 at 3:36 PM, Jon Heese wrote: > Hello, > > > > We are running a Ceph cluster with 3x CentOS 7 MON nodes, and after we > updated the ceph packages on the MONs yesterday (from 0.94.3 to 0.94.4), the > ceph_rest_api started refusing to run, giving the following

Re: [ceph-users] Problems with ceph_rest_api after update

2015-10-22 Thread Jon Heese
John, Aha, thanks for that -- that got me closer to the problem. I forgot an important detail: A few days before the upgrade, I set the cluster and public networks in the config files on the nodes to the "back-end" network, which the MON nodes don't have access to. I suspected that this was a