Re: Memory leak bug in the latest master

2012-11-04 Thread Sage Weil
Hi Yan- Thanks! Tracked this down to teh Pipe AuthSessionHandler. Fixed in bcefc0e80ac2bda6d3b4550a6d849302bcc201d6. sage On Sat, 3 Nov 2012, Yan, Zheng wrote: Hi Everybody, I found there is a memory leak bug in the latest master (c51e1f9b64). The bug affects both ceph-osd and

Re: OSD sizes

2012-11-04 Thread Denis Fondras
Hello, As far I'm concerned I think that 12 disks per servers is way too much. From your experience, what is the correct number of OSD per server ? Denis -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More

Re: poor performance

2012-11-04 Thread Gregory Farnum
On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin nrg3...@gmail.com wrote: Hi all Im planning use ceph for cloud storage. My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb disks per node. Centos 6.2 Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 This is my config

Re: [PATCH V2 2/2] ceph: Fix i_size update race

2012-11-04 Thread Sage Weil
Hi Yan, On Sat, 3 Nov 2012, Yan, Zheng wrote: From: Yan, Zheng zheng.z@intel.com ceph_aio_write() has an optimization that marks cap EPH_CAP_FILE_WR dirty before data is copied to page cache and inode size is updated. If sceph_check_caps() flushes the dirty cap before the inode size is

explicitly specifying pgnum on pool creation

2012-11-04 Thread Sage Weil
The wip-explicit-pgnum changes the 'ceph osd pool create name pgnum' command to require the pg_num value instead of defaulting to 8. This would make it harder for users to get this wrong. On the other hand, it probably also breaks some scripts for deploying OpenStack that create volume and

Re: poor performance

2012-11-04 Thread Gregory Farnum
On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin nrg3...@gmail.com wrote: Hi! This command? ceph tell osd \* bench Output: tell target 'osd' not a valid entity name Well, i did pool by command ceph osd pool create bench2 120 This output of rados -p bench2 bench 30 write --no-cleanup rados

Re: Handle bad file descriptors with EBADF

2012-11-04 Thread Sage Weil
On Sat, 3 Nov 2012, Noah Watkins wrote: The libcephfs interface asserts valid file descriptors, so clients are forced to manage this or risk crashes. Pushed to wip-bad-fd is changes that handle this case by returning -EBADF. Review welcome :) This looks good to me. The one thing I'd change

Re: poor performance

2012-11-04 Thread Gregory Farnum
[Sorry for the blank email; I missed!] On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin nrg3...@gmail.com wrote: Hi! This command? ceph tell osd \* bench Output: tell target 'osd' not a valid entity name I guess it's ceph osd tell \* bench. Try that one. :) Well, i did pool by command ceph

Re: poor performance

2012-11-04 Thread Aleksey Samarin
It`s ok! Output: 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 11.441035 sec at 91650 KB/sec 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 13.225048 sec at 79287 KB/sec 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote

Re: poor performance

2012-11-04 Thread Mark Nelson
On 11/04/2012 03:58 AM, Aleksey Samarin wrote: Hi all Im planning use ceph for cloud storage. My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb disks per node. Centos 6.2 Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 This is my config http://pastebin.com/Pzxafnsm One thing that

Re: poor performance

2012-11-04 Thread Gregory Farnum
That's only nine — where are the other three? If you have three slow disks that could definitely cause the troubles you're seeing. Also, what Mark said about sync versus syncfs. On Sun, Nov 4, 2012 at 1:26 PM, Aleksey Samarin nrg3...@gmail.com wrote: It`s ok! Output: 2012-11-04

Re: poor performance

2012-11-04 Thread Aleksey Samarin
Ok! Well, I'll take these tests and write about the results. btw, disks are the same, as some may be faster than others? 2012/11/4 Gregory Farnum g...@inktank.com: That's only nine — where are the other three? If you have three slow disks that could definitely cause the troubles you're seeing.

Re: poor performance

2012-11-04 Thread Aleksey Samarin
Well, i create ceph cluster with 2 osd ( 1 osd per node), 2 mon, 2 mds. here is what I did: ceph osd pool create bench ceph osd tell \* bench rados -p bench bench 30 write --no-cleanup output: Maintaining 16 concurrent writes of 4194304 bytes for at least 30 seconds. Object prefix:

RE: RGW: Pools .rgw .rgw.control .users.uid .users.email .users

2012-11-04 Thread Yann ROBIN
Hi, Does it mean that we should destroy those pool and recreate them with higher pgnum ? -- Yann -Message d'origine- De : ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] De la part de Yehuda Sadeh Envoyé : vendredi 2 novembre 2012 20:20 À : Sylvain Munaut

Re: [PATCH V2 2/2] ceph: Fix i_size update race

2012-11-04 Thread Yan, Zheng
On Sun, Nov 4, 2012 at 7:29 PM, Sage Weil s...@inktank.com wrote: Hi Yan, On Sat, 3 Nov 2012, Yan, Zheng wrote: From: Yan, Zheng zheng.z@intel.com ceph_aio_write() has an optimization that marks cap EPH_CAP_FILE_WR dirty before data is copied to page cache and inode size is updated.

Re: Ignoresync hack no longer applies on 3.6.5

2012-11-04 Thread Sage Weil
On Fri, 2 Nov 2012, Nick Bartos wrote: Sage, A while back you gave us a small kernel hack which allowed us to mount the underlying OSD xfs filesystems in a way that they would ignore system wide syncs (kernel hack + mounting with the reused mand option), to workaround a deadlock problem

Re: poor performance

2012-11-04 Thread Mark Nelson
On 11/04/2012 07:18 AM, Aleksey Samarin wrote: Well, i create ceph cluster with 2 osd ( 1 osd per node), 2 mon, 2 mds. here is what I did: ceph osd pool create bench ceph osd tell \* bench rados -p bench bench 30 write --no-cleanup output: Maintaining 16 concurrent writes of 4194304

Re: [PATCH V2 2/2] ceph: Fix i_size update race

2012-11-04 Thread Sage Weil
On Sun, 4 Nov 2012, Yan, Zheng wrote: On Sun, Nov 4, 2012 at 7:29 PM, Sage Weil s...@inktank.com wrote: Hi Yan, On Sat, 3 Nov 2012, Yan, Zheng wrote: From: Yan, Zheng zheng.z@intel.com ceph_aio_write() has an optimization that marks cap EPH_CAP_FILE_WR dirty before data is

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-04 Thread Sage Weil
On Sun, 4 Nov 2012, Stefan Priebe wrote: Hello, while benchmarking the new v2 rbd format i saw several crashes of KVM. First i thought this was kernel related but i was just happening faster with one kernel than the other. Can you try the wip-rbd-read branch? It has a bunch of fixes that

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-04 Thread Stefan Priebe
Can i merge wip-rbd-read into master? Stefan Am 04.11.2012 15:06, schrieb Sage Weil: On Sun, 4 Nov 2012, Stefan Priebe wrote: Hello, while benchmarking the new v2 rbd format i saw several crashes of KVM. First i thought this was kernel related but i was just happening faster with one kernel

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-04 Thread Sage Weil
On Sun, 4 Nov 2012, Stefan Priebe wrote: Can i merge wip-rbd-read into master? Yeah. I'm going to do a bit more testing first before I do it, but it should apply cleanly. Hopefully later today. sage Stefan Am 04.11.2012 15:06, schrieb Sage Weil: On Sun, 4 Nov 2012, Stefan Priebe

Re: [PATCH V2 2/2] ceph: Fix i_size update race

2012-11-04 Thread Yan, Zheng
On Sun, Nov 4, 2012 at 10:01 PM, Sage Weil s...@inktank.com wrote: On Sun, 4 Nov 2012, Yan, Zheng wrote: On Sun, Nov 4, 2012 at 7:29 PM, Sage Weil s...@inktank.com wrote: Hi Yan, On Sat, 3 Nov 2012, Yan, Zheng wrote: From: Yan, Zheng zheng.z@intel.com ceph_aio_write() has an

Re: slow fio random read benchmark, need help

2012-11-04 Thread Alexandre DERUMIER
Did your RAID setup improve anything? I have tried to launch 2 fio test in parallel, on 2 disks in the same guest vm, I get 2500iops for each test Running 2 fio tests, on 2 differents guests, give me 5000iops for each test. I really don't understand...Maybe something don't use parallelim

Re: Handle bad file descriptors with EBADF

2012-11-04 Thread Noah Watkins
On Sun, Nov 4, 2012 at 4:17 AM, Sage Weil s...@inktank.com wrote: but this is actually doing the lookup twice. A better pattern: mapfoo, bar::iterator p = container.find(key); if (p == container.end()) return NULL; return p-second; Cool, thanks for

Re: poor performance

2012-11-04 Thread Aleksey Samarin
What may be possible solutions? Update centos to 6.3? About issue with writes to lots of disk, i think parallel dd command will be good as test! :) 2012/11/4 Mark Nelson mark.nel...@inktank.com: On 11/04/2012 07:18 AM, Aleksey Samarin wrote: Well, i create ceph cluster with 2 osd ( 1 osd per

Re: OSD sizes

2012-11-04 Thread Sébastien Han
Well, there is not really an ideal number. For instance, at Dreamhost they put 12 disks per machines, but they have really big and powerful server. The problem is: as many disks per machine you have as data you have to migrate in case of failure. At the end, it's up to you... It also highly

Re: [PATCH V2 2/2] ceph: Fix i_size update race

2012-11-04 Thread Sage Weil
On Sun, 4 Nov 2012, Yan, Zheng wrote: Short write happens when we fail to get 'Fb' cap for all pages. Why shouldn't we fall back to sync write, I think some user programs assume short write never happen unless ENOSPC. If generic_file_aio_write return and its return value shows there was

btrfs for mon server?

2012-11-04 Thread Stefan Priebe
Hello, does the mon service also suffer from btrfs as fs? Right now i've only the osd's running btrfs. Greets, Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-04 Thread Sage Weil
On Sun, 4 Nov 2012, Stefan Priebe wrote: Am 04.11.2012 15:12, schrieb Sage Weil: On Sun, 4 Nov 2012, Stefan Priebe wrote: Can i merge wip-rbd-read into master? Yeah. I'm going to do a bit more testing first before I do it, but it should apply cleanly. Hopefully later today. Any

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-04 Thread Stefan Priebe
Am 04.11.2012 20:19, schrieb Sage Weil: On Sun, 4 Nov 2012, Stefan Priebe wrote: Am 04.11.2012 15:12, schrieb Sage Weil: On Sun, 4 Nov 2012, Stefan Priebe wrote: Can i merge wip-rbd-read into master? Yeah. I'm going to do a bit more testing first before I do it, but it should apply

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-04 Thread Sage Weil
On Sun, 4 Nov 2012, Stefan Priebe wrote: Am 04.11.2012 20:19, schrieb Sage Weil: On Sun, 4 Nov 2012, Stefan Priebe wrote: Am 04.11.2012 15:12, schrieb Sage Weil: On Sun, 4 Nov 2012, Stefan Priebe wrote: Can i merge wip-rbd-read into master? Yeah. I'm going to do a bit more

Re: Ignoresync hack no longer applies on 3.6.5

2012-11-04 Thread Nick Bartos
Awesome, thanks! I'll let you know how it goes. On Sun, Nov 4, 2012 at 5:50 AM, Sage Weil s...@inktank.com wrote: On Fri, 2 Nov 2012, Nick Bartos wrote: Sage, A while back you gave us a small kernel hack which allowed us to mount the underlying OSD xfs filesystems in a way that they would

Re: Ubuntu R-Series plans for Ceph

2012-11-04 Thread Wido den Hollander
On 16-10-12 19:14, James Page wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi All I've started to put together a Launchpad Blueprint to act as a placeholder for discussion at the Ubuntu Developer Summit in a couple of weeks time. I'd be interested in feedback on what folks would

Re: [PATCH V2 2/2] ceph: Fix i_size update race

2012-11-04 Thread Yan, Zheng
On Mon, Nov 5, 2012 at 12:45 AM, Sage Weil s...@inktank.com wrote: On Sun, 4 Nov 2012, Yan, Zheng wrote: Short write happens when we fail to get 'Fb' cap for all pages. Why shouldn't we fall back to sync write, I think some user programs assume short write never happen unless ENOSPC. If

Re: Ignoresync hack no longer applies on 3.6.5

2012-11-04 Thread Nick Bartos
Unfortunately I'm still seeing deadlocks. The trace was taken after a 'sync' from the command line was hung for a couple minutes. There was only one debug message (one fs on the system was mounted with 'mand'): kernel: [11441.168954] [8113538a] ? sync_fs_one_sb+0x4d/0x4d Here's the

Large numbers of OSD per node

2012-11-04 Thread Andrew Thrift
Hi, We are evaluating CEPH for deployment. I was wondering if there are any current best practices around the number of OSD's per node ? e.g. We are looking at deploying 3 nodes, each with 72x SAS disks, and 2x 10gigabit Ethernet bonded. Would this best be configured as 72 OSD's per