[RESEND PATCH V5] ceph:Implement readv/preadv for sync operation

2013-09-24 Thread majianpeng
For readv/preadv sync-operatoin, ceph only do the first iov. It don't think other iovs.Now implement this. V5: -before getattr,it must put caps which already holded avoid deadlock. -only do generic_segment_checks for sync-read avoid do again in func generic_file_aio_read V4: -modify one b

Re: [RESEND PATCH V5] ceph:Implement readv/preadv for sync operation

2013-09-24 Thread Yan, Zheng
On 09/24/2013 04:05 PM, majianpeng wrote: > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > index 3de8982..5422d8e 100644 > --- a/fs/ceph/file.c > +++ b/fs/ceph/file.c > @@ -408,51 +408,94 @@ more: > * > * If the read spans object boundary, just do multiple reads. > */ > -static ssize_t ceph_s

Re: Re: [RESEND PATCH V5] ceph:Implement readv/preadv for sync operation

2013-09-24 Thread majianpeng
>On 09/24/2013 04:05 PM, majianpeng wrote: >> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> index 3de8982..5422d8e 100644 >> --- a/fs/ceph/file.c >> +++ b/fs/ceph/file.c >> @@ -408,51 +408,94 @@ more: >> * >> * If the read spans object boundary, just do multiple reads. >> */ >> -static ssiz

Re: Object Write Latency

2013-09-24 Thread Dan van der Ster
Hi, Yes, the journal is a file on same XFS partition as the data. I have some results from a rados bench to a single-copy pool on the JBOD disks. [root@p05151113777233 ~]# rados bench -p test 30 write -t 1 -b 5 ... Total time run: 30.027296 Total writes made: 612 Write size:

Re: Object Write Latency

2013-09-24 Thread Dan van der Ster
On Mon, Sep 23, 2013 at 8:18 PM, Sage Weil wrote: > You > might try measuring that directly and comparing it to the 33ms > append+fsync that you previously saw. dd with fsync is quite slow... [root@p05151113777233 fio]# time dd if=/dev/zero of=/var/lib/ceph/osd/osd.1045/testtest bs=5 count=1 1+0

Re: [RESEND PATCH V5] ceph:Implement readv/preadv for sync operation

2013-09-24 Thread Yan, Zheng
On 09/24/2013 04:54 PM, majianpeng wrote: >> On 09/24/2013 04:05 PM, majianpeng wrote: >>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >>> index 3de8982..5422d8e 100644 >>> --- a/fs/ceph/file.c >>> +++ b/fs/ceph/file.c >>> @@ -408,51 +408,94 @@ more: >>> * >>> * If the read spans object bounda

Re: Re: [RESEND PATCH V5] ceph:Implement readv/preadv for sync operation

2013-09-24 Thread majianpeng
>On 09/24/2013 04:54 PM, majianpeng wrote: >>> On 09/24/2013 04:05 PM, majianpeng wrote: diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 3de8982..5422d8e 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -408,51 +408,94 @@ more: * * If the read spans obje

Re: Re: [RESEND PATCH V5] ceph:Implement readv/preadv for sync operation

2013-09-24 Thread majianpeng
>On 09/24/2013 04:05 PM, majianpeng wrote: >> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> index 3de8982..5422d8e 100644 >> --- a/fs/ceph/file.c >> +++ b/fs/ceph/file.c >> @@ -408,51 +408,94 @@ more: >> * >> * If the read spans object boundary, just do multiple reads. >> */ >> -static ssiz

Re: Object Write Latency

2013-09-24 Thread Dan van der Ster
Some more tests... ~journal in XFS: [root@p05151113777233 fio]# time dd if=/dev/zero of=/var/lib/ceph/osd/osd.1045/testtest bs=512 count=1 conv=fsync 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.0423365 s, 12.1 kB/s real 0m0.044s user 0m0.000s sys 0m0.001s ~directio via the FS [ro

Re: Object Write Latency

2013-09-24 Thread Sébastien Han
Ideally a partition using the first sectors of the disk. I usually do a tiny partition at the beginning of the device and leave the rest for ods_data. Sébastien Han Cloud Engineer "Always give 100%. Unless you're giving blood.” Phone: +33 (0)1 49 70 99 72 Mail: sebastien@enovance.com A

Re: Object Write Latency

2013-09-24 Thread Dan van der Ster
Thanks Sebastien, we will try that. BTW, as you know we're using the puppet modules of enovance, which set up a file-based journal. Did you guys already puppetize a blockdev journal? I'll be doing that now if not... Cheers, Dan CERN IT On Tue, Sep 24, 2013 at 2:30 PM, Sébastien Han wrote: > Ideal

"ceph osd map" pointer to the code

2013-09-24 Thread Sébastien Han
Hi guys, Does anyone can point me to the piece of code called during the “ceph osd map ‘pool’ ‘object’” command please? Thanks! Sébastien Han Cloud Engineer "Always give 100%. Unless you're giving blood.” Phone: +33 (0)1 49 70 99 72 Mail: sebastien@enovance.com Address : 10, rue de la

Re: Object Write Latency

2013-09-24 Thread Sébastien Han
Hi Dan, Yes I noticed :), no we haven’t done anything on that side yet. But I’ll be happy to see this happening, that way better than a file of the fs. Would be nice if you could push something then :) Cheers. Sébastien Han Cloud Engineer "Always give 100%. Unless you're giving blood.” Ph

Re: "ceph osd map" pointer to the code

2013-09-24 Thread Joao Eduardo Luis
On 09/24/2013 01:36 PM, Sébastien Han wrote: Hi guys, Does anyone can point me to the piece of code called during the “ceph osd map ‘pool’ ‘object’” command please? Thanks! $ git grep -n 'osd map' src/mon src/mon/MonCommands.h:338:COMMAND("osd map " \ src/mon/OSDMonitor.cc:2093: } else if (p

Re: "ceph osd map" pointer to the code

2013-09-24 Thread Sébastien Han
Thanks Joao. Sébastien Han Cloud Engineer "Always give 100%. Unless you're giving blood.” Phone: +33 (0)1 49 70 99 72 Mail: sebastien@enovance.com Address : 10, rue de la Victoire - 75009 Paris Web : www.enovance.com - Twitter : @enovance On September 24, 2013 at 3:07:52 PM, Joao Eduar

Re: Object Write Latency

2013-09-24 Thread Sage Weil
On Tue, 24 Sep 2013, Dan van der Ster wrote: > Hi, > Yes, the journal is a file on same XFS partition as the data. > > I have some results from a rados bench to a single-copy pool on the JBOD > disks. > > [root@p05151113777233 ~]# rados bench -p test 30 write -t 1 -b 5 > ... > Total time run:

Re: Fwd: 4 failed, 298 passed in dzafman-2013-09-23_17:50:06-rados-wip-5862-testing-basic-plana

2013-09-24 Thread Sage Weil
On Tue, 24 Sep 2013, David Zafman wrote: > > Rados suite test run results for wip-5862.  2 "scrub mismatch" from mon > (known problem).  2 are valgrind issues found with mds and osd.  What is the osd valgrind failure? And the osd.4 crash on 13443? (Note that the teuthology.log will include mess

Re: 4 failed, 298 passed in dzafman-2013-09-23_17:50:06-rados-wip-5862-testing-basic-plana

2013-09-24 Thread David Zafman
The osd.4 crash in 13443 is bug #5951: 2013-09-23 21:23:28.378428 1034e700 0 filestore(/var/lib/ceph/osd/ceph-4) error (17) File exists not handled on operation 20 (6579.0.0, or op 0, counting from 0) 2013-09-23 21:23:28.862204 1034e700 0 filestore(/var/lib/ceph/osd/ceph-4) unexpected error

Re: [ceph-users] Scaling RBD module

2013-09-24 Thread Sage Weil
On Tue, 24 Sep 2013, Travis Rhoden wrote: > This "noshare" option may have just helped me a ton -- I sure wish I would > have asked similar questions sooner, because I have seen the same failure to > scale.  =) > > One question -- when using the "noshare" option (or really, even without it) > are

Re: [ceph-users] Scaling RBD module

2013-09-24 Thread Travis Rhoden
On Tue, Sep 24, 2013 at 5:16 PM, Sage Weil wrote: > On Tue, 24 Sep 2013, Travis Rhoden wrote: >> This "noshare" option may have just helped me a ton -- I sure wish I would >> have asked similar questions sooner, because I have seen the same failure to >> scale. =) >> >> One question -- when using

RE: [ceph-users] Scaling RBD module

2013-09-24 Thread Sage Weil
Hi Somnath! On Tue, 24 Sep 2013, Somnath Roy wrote: > > Hi Sage, > > We did quite a few experiment to see how ceph read performance can scale up. > Here is the summary. > >   > > 1. > > First we tried to see how far a single node cluster with one osd can scale > up. We started with cuttlefish

possible bug in blacklist

2013-09-24 Thread Mandell Degerness
See trace below. We run this command on system restart in order to clear any blacklist which was created while node was mis-behaving. Now, rather than giving a reasonable error, it causes at Traceback: [root@node-172-20-0-13 ~]# ceph osd blacklist rm 172.20.0.13 Traceback (most recent call last):

RE: [ceph-users] Scaling RBD module

2013-09-24 Thread Somnath Roy
Hi Sage, Thanks for your input. I will try those. Please see my response inline. Thanks & Regards Somnath -Original Message- From: Sage Weil [mailto:s...@inktank.com] Sent: Tuesday, September 24, 2013 3:47 PM To: Somnath Roy Cc: Travis Rhoden; Josh Durgin; ceph-devel@vger.kernel.org; Anir

[RESEND PATCH V6] ceph:Implement readv/preadv for sync operation

2013-09-24 Thread majianpeng
For readv/preadv sync-operatoin, ceph only do the first iov. It don't think other iovs.Now implement this. V6: Fix some bugs; V5: -before getattr,it must put caps which already holded avoid deadlock. -only do generic_segment_checks for sync-read avoid do again in func generic_file_aio_re

Re: [RESEND PATCH V6] ceph:Implement readv/preadv for sync operation

2013-09-24 Thread Yan, Zheng
On 09/25/2013 09:07 AM, majianpeng wrote: > For readv/preadv sync-operatoin, ceph only do the first iov. > It don't think other iovs.Now implement this. > > V6: >Fix some bugs; > V5: > -before getattr,it must put caps which already holded avoid deadlock. > -only do generic_segment_checks f

Re: Re: [RESEND PATCH V6] ceph:Implement readv/preadv for sync operation

2013-09-24 Thread majianpeng
>On 09/25/2013 09:07 AM, majianpeng wrote: >> For readv/preadv sync-operatoin, ceph only do the first iov. >> It don't think other iovs.Now implement this. >> >> V6: >>Fix some bugs; >> V5: >> -before getattr,it must put caps which already holded avoid deadlock. >> -only do generic_segment

Re: [RESEND PATCH V6] ceph:Implement readv/preadv for sync operation

2013-09-24 Thread Yan, Zheng
On 09/25/2013 11:23 AM, majianpeng wrote: >> On 09/25/2013 09:07 AM, majianpeng wrote: >>> For readv/preadv sync-operatoin, ceph only do the first iov. >>> It don't think other iovs.Now implement this. >>> >>> V6: >>>Fix some bugs; >>> V5: >>> -before getattr,it must put caps which already ho

Re: [RESEND PATCH V6] ceph:Implement readv/preadv for sync operation

2013-09-24 Thread Sage Weil
Applied this to the testing branch. Thanks, everyone! sage On Wed, 25 Sep 2013, Yan, Zheng wrote: > On 09/25/2013 11:23 AM, majianpeng wrote: > >> On 09/25/2013 09:07 AM, majianpeng wrote: > >>> For readv/preadv sync-operatoin, ceph only do the first iov. > >>> It don't think other iovs.Now imp