Re: RGW multi-tenancy APIs overview

2015-11-18 Thread Pete Zaitcev
On Mon, 9 Nov 2015 21:36:47 -0800 Yehuda Sadeh-Weinraub wrote: > In the supported domains configuration, we can specify for each domain > whether a subdomain for it would be a bucket (as it is now), or > whether it would be a tenant (which implies the possibility of > bucket.tenant). This only af

Re: request_queue use-after-free - inode_detach_wb()

2015-11-18 Thread Tejun Heo
Hello, Ilya. On Wed, Nov 18, 2015 at 04:48:06PM +0100, Ilya Dryomov wrote: > Just to be clear, the bdi/wb vs inode lifetime rules are that inodes > should always be within bdi/wb? There's been a lot of churn in this Yes, that's where *I* think we should be headed. Stuff in lower layers should s

Re: request_queue use-after-free - inode_detach_wb()

2015-11-18 Thread Ilya Dryomov
On Wed, Nov 18, 2015 at 4:30 PM, Tejun Heo wrote: > Hello, Ilya. > > On Wed, Nov 18, 2015 at 04:12:07PM +0100, Ilya Dryomov wrote: >> > It's stinky that the bdi is going away while the inode is still there. >> > Yeah, blkdev inodes are special and created early but I think it makes >> > sense to k

Re: request_queue use-after-free - inode_detach_wb()

2015-11-18 Thread Tejun Heo
Hello, Ilya. On Wed, Nov 18, 2015 at 04:12:07PM +0100, Ilya Dryomov wrote: > > It's stinky that the bdi is going away while the inode is still there. > > Yeah, blkdev inodes are special and created early but I think it makes > > sense to keep the underlying structures (queue and bdi) around while

Re: request_queue use-after-free - inode_detach_wb()

2015-11-18 Thread Ilya Dryomov
On Tue, Nov 17, 2015 at 9:56 PM, Tejun Heo wrote: > Hello, Ilya. > > On Mon, Nov 16, 2015 at 09:59:18PM +0100, Ilya Dryomov wrote: > ... >> Looking at __blkdev_put(), the issue becomes clear: we are taking >> precautions to flush before calling out to ->release() because, at >> least according to

Re: [PATCH 2/3] net/ceph: do not define list_entry_next

2015-11-18 Thread Sergey Senozhatsky
On (11/18/15 14:13), Ilya Dryomov wrote: [..] > > Someone beat you to it ;) > > https://github.com/ceph/ceph-client/commit/76b4a27faebb369c1c50df01ef08b614a2854fc5 Oh, OK then :) Thanks! -ss -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a messag

Re: [PATCH 2/3] net/ceph: do not define list_entry_next

2015-11-18 Thread Ilya Dryomov
On Wed, Nov 18, 2015 at 1:13 PM, Sergey Senozhatsky wrote: > Cosmetic. > > Do not define list_entry_next() and use list_next_entry() > from list.h. > > Signed-off-by: Sergey Senozhatsky > --- > net/ceph/messenger.c | 8 +++- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/

Re: problem about pgmeta object?

2015-11-18 Thread Sage Weil
On Wed, 18 Nov 2015, Ning Yao wrote: > Hi, Sage > > pgmeta object is a meta-object (like __head___2) without > significant information. It is created when in PG::_init() when > handling pg_create and split_coll and always exits there during pg's > life cycle until pg is removed in RemoveWQ

Re: Crc32 Challenge

2015-11-18 Thread Dan van der Ster
Hi, I checked the partial crc after each iteration in google's python implementation and found that the crc of the last iteration matches ceph's [1]: >>> from crc32c import crc >>> crc('foo bar baz') crc 1197962378 crc 3599162226 crc 2946501991 crc 2501826906 crc 3132034983 crc 3851841059 crc 274

Re: RGW multi-tenancy APIs overview

2015-11-17 Thread Pete Zaitcev
need to add this. Good point. I think I'll drop that. > >> Does that work the same for object copy, and acls? > > > > ACLs do not list buckets, only users, which may be qualified (tenant$user). > > Not tenant:user? I forgot what happened when we did tenant:user. There was s

Re: RGW multi-tenancy APIs overview

2015-11-17 Thread Yehuda Sadeh-Weinraub
On Tue, Nov 17, 2015 at 3:47 PM, Pete Zaitcev wrote: > On Mon, 9 Nov 2015 21:36:47 -0800 > Yehuda Sadeh-Weinraub wrote: > > We discussed this a bit on RGW team meeting in BJ, and there were some > developments, so for the sake of update here goes. > >> > #1 Back-end and radosgw-admin use '/' or "

Re: RGW multi-tenancy APIs overview

2015-11-17 Thread Pete Zaitcev
On Mon, 9 Nov 2015 21:36:47 -0800 Yehuda Sadeh-Weinraub wrote: We discussed this a bit on RGW team meeting in BJ, and there were some developments, so for the sake of update here goes. > > #1 Back-end and radosgw-admin use '/' or "tenant/bucket". This is what is > > literally stored in RADOS, be

Re: request_queue use-after-free - inode_detach_wb()

2015-11-17 Thread Tejun Heo
Hello, Ilya. On Mon, Nov 16, 2015 at 09:59:18PM +0100, Ilya Dryomov wrote: ... > Looking at __blkdev_put(), the issue becomes clear: we are taking > precautions to flush before calling out to ->release() because, at > least according to the comment, ->release() can free queue; we are > recording o

Re: [PATCH 2/3] libceph: use list_next_entry instead of list_entry_next

2015-11-17 Thread Ilya Dryomov
On Mon, Nov 16, 2015 at 2:46 PM, Geliang Tang wrote: > list_next_entry has been defined in list.h, so I replace list_entry_next > with it. > > Signed-off-by: Geliang Tang > --- > net/ceph/messenger.c | 7 ++- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/net/ceph/messeng

Re: [PATCH 2/2] fs/ceph: ceph_frag_contains_value can be boolean

2015-11-17 Thread Yan, Zheng
> On Nov 17, 2015, at 14:52, Yaowei Bai wrote: > > This patch makes ceph_frag_contains_value return bool to improve > readability due to this particular function only using either one or > zero as its return value. > > No functional change. > > Signed-off-by: Yaowei Bai > --- > include/linux/

Re: [CEPH] OSD daemons running with a large number of threads

2015-11-17 Thread Sage Weil
On Tue, 17 Nov 2015, ghislain.cheval...@orange.com wrote: > Hi, > > Context: > Firefly 0.80.9 > Ubuntu 14.04.1 > Almost a production platform in an openstack environment > 176 OSD (SAS and SSD), 2 crushmap-oriented storage classes , 8 servers in 2 > rooms, 3 monitors on openstack controllers > U

Re: Newly added monitor infinitely sync store

2015-11-16 Thread Guang Yang
On Mon, Nov 16, 2015 at 5:42 PM, Sage Weil wrote: > On Mon, 16 Nov 2015, Guang Yang wrote: >> I spoke to a leveldb expert, it looks like this is a known pattern on >> LSM tree data structure - the tail latency for range scan could be far >> longer than avg/median since it might need to mmap severa

Re: Newly added monitor infinitely sync store

2015-11-16 Thread Sage Weil
On Mon, 16 Nov 2015, Guang Yang wrote: > I spoke to a leveldb expert, it looks like this is a known pattern on > LSM tree data structure - the tail latency for range scan could be far > longer than avg/median since it might need to mmap several sst files > to get the record. > > Hi Sage, > Do you

Re: Newly added monitor infinitely sync store

2015-11-16 Thread Guang Yang
I spoke to a leveldb expert, it looks like this is a known pattern on LSM tree data structure - the tail latency for range scan could be far longer than avg/median since it might need to mmap several sst files to get the record. Hi Sage, Do you see any harm to increase the default value for this s

Re: v0.80.11 QE validation status

2015-11-16 Thread Loic Dachary
c: "ceph-qa" , "Ceph Development" > , "Sage Weil" , "Alfredo Deza" > > Sent: Monday, November 16, 2015 2:01:42 PM > Subject: Re: v0.80.11 QE validation status > > Loic, > > I am not actually sure about resolving #11104. > &g

Re: v0.80.11 QE validation status

2015-11-16 Thread Tamil Muthamizhan
" fix. Regards, Tamil - Original Message - From: "Yuri Weinstein" To: "Loic Dachary" Cc: "ceph-qa" , "Ceph Development" , "Sage Weil" , "Alfredo Deza" Sent: Monday, November 16, 2015 2:01:42 PM Subject: Re: v0.80.11 QE v

Re: v0.80.11 QE validation status

2015-11-16 Thread Yuri Weinstein
Loic, I am not actually sure about resolving #11104. Warren? Thx YuriW On Mon, Nov 16, 2015 at 1:04 PM, Loic Dachary wrote: > Hi Yuri, > > Thanks for the update :-) Should we mark #11104 as resolved ? > > Cheers > > On 16/11/2015 19:45, Yuri Weinstein wrote: >> This release QE validation took

Re: v0.80.11 QE validation status

2015-11-16 Thread Loic Dachary
Hi Yuri, Thanks for the update :-) Should we mark #11104 as resolved ? Cheers On 16/11/2015 19:45, Yuri Weinstein wrote: > This release QE validation took longer time due to the #11104 > additional fixing/testing and discovered related to it issues ##13794, > 13622 > > We agreed to release v0.80

Re: v0.80.11 QE validation status

2015-11-16 Thread Yuri Weinstein
This release QE validation took longer time due to the #11104 additional fixing/testing and discovered related to it issues ##13794, 13622 We agreed to release v0.80.11 based on tests results. Thx YuriW On Wed, Oct 28, 2015 at 9:04 AM, Yuri Weinstein wrote: > Summary of suites executed for this

Re: scrub randomization and load threshold

2015-11-16 Thread Dan van der Ster
On Mon, Nov 16, 2015 at 6:13 PM, Sage Weil wrote: > On Mon, 16 Nov 2015, Dan van der Ster wrote: >> On Mon, Nov 16, 2015 at 4:58 PM, Dan van der Ster >> wrote: >> > On Mon, Nov 16, 2015 at 4:32 PM, Dan van der Ster >> > wrote: >> >> On Mon, Nov 16, 2015 at 4:20 PM, Sage Weil wrote: >> >>> On

Re: scrub randomization and load threshold

2015-11-16 Thread Sage Weil
On Mon, 16 Nov 2015, Dan van der Ster wrote: > On Mon, Nov 16, 2015 at 4:58 PM, Dan van der Ster wrote: > > On Mon, Nov 16, 2015 at 4:32 PM, Dan van der Ster > > wrote: > >> On Mon, Nov 16, 2015 at 4:20 PM, Sage Weil wrote: > >>> On Mon, 16 Nov 2015, Dan van der Ster wrote: > Instead of ke

Re: scrub randomization and load threshold

2015-11-16 Thread Dan van der Ster
On Mon, Nov 16, 2015 at 4:58 PM, Dan van der Ster wrote: > On Mon, Nov 16, 2015 at 4:32 PM, Dan van der Ster wrote: >> On Mon, Nov 16, 2015 at 4:20 PM, Sage Weil wrote: >>> On Mon, 16 Nov 2015, Dan van der Ster wrote: Instead of keeping a 24hr loadavg, how about we allow scrubs whenever >>>

Re: scrub randomization and load threshold

2015-11-16 Thread Dan van der Ster
On Mon, Nov 16, 2015 at 4:32 PM, Dan van der Ster wrote: > On Mon, Nov 16, 2015 at 4:20 PM, Sage Weil wrote: >> On Mon, 16 Nov 2015, Dan van der Ster wrote: >>> Instead of keeping a 24hr loadavg, how about we allow scrubs whenever >>> the loadavg is decreasing (or below the threshold)? As long as

Re: scrub randomization and load threshold

2015-11-16 Thread Dan van der Ster
On Mon, Nov 16, 2015 at 4:20 PM, Sage Weil wrote: > On Mon, 16 Nov 2015, Dan van der Ster wrote: >> Instead of keeping a 24hr loadavg, how about we allow scrubs whenever >> the loadavg is decreasing (or below the threshold)? As long as the >> 1min loadavg is less than the 15min loadavg, we should

Re: scrub randomization and load threshold

2015-11-16 Thread Sage Weil
On Mon, 16 Nov 2015, Dan van der Ster wrote: > Instead of keeping a 24hr loadavg, how about we allow scrubs whenever > the loadavg is decreasing (or below the threshold)? As long as the > 1min loadavg is less than the 15min loadavg, we should be ok to allow > new scrubs. If you agree I'll add the p

Re: a problem about FileStore::_destroy_collection

2015-11-16 Thread Sage Weil
On Mon, 16 Nov 2015, yangruifeng.09...@h3c.com wrote: > an ENOTEMPTY error mybe happen when removing a pg in previous > versions?but the error is hidden in new versions? When did this change? sage > _destroy_collection maybe return 0 when get_index or prep_delete return < 0; > > is this intend

Re: scrub randomization and load threshold

2015-11-16 Thread Dan van der Ster
On Thu, Nov 12, 2015 at 4:34 PM, Dan van der Ster wrote: > On Thu, Nov 12, 2015 at 4:10 PM, Sage Weil wrote: >> On Thu, 12 Nov 2015, Dan van der Ster wrote: >>> On Thu, Nov 12, 2015 at 2:29 PM, Sage Weil wrote: >>> > On Thu, 12 Nov 2015, Dan van der Ster wrote: >>> >> Hi, >>> >> >>> >> Firstly,

Re: Newly added monitor infinitely sync store

2015-11-13 Thread Guang Yang
Thanks Sage! I will definitely try those patches. For this one, I finally managed to bring the new monitor in by increasing the mon_sync_timeout from its default 60 to 6 to make sure the syncing does not restart and result in an infinite loop.. On Fri, Nov 13, 2015 at 5:04 PM, Sage Weil wrot

Re: Newly added monitor infinitely sync store

2015-11-13 Thread Sage Weil
On Fri, 13 Nov 2015, Guang Yang wrote: > Thanks Sage! > > On Fri, Nov 13, 2015 at 4:15 PM, Sage Weil wrote: > > On Fri, 13 Nov 2015, Guang Yang wrote: > >> I was wrong the previous analysis, it was not the iterator got reset, > >> the problem I can see now, is that during the syncing, a new round

Re: Newly added monitor infinitely sync store

2015-11-13 Thread Guang Yang
Thanks Sage! On Fri, Nov 13, 2015 at 4:15 PM, Sage Weil wrote: > On Fri, 13 Nov 2015, Guang Yang wrote: >> I was wrong the previous analysis, it was not the iterator got reset, >> the problem I can see now, is that during the syncing, a new round of >> election kicked off and thus it needs to pro

Re: Newly added monitor infinitely sync store

2015-11-13 Thread Sage Weil
On Fri, 13 Nov 2015, Guang Yang wrote: > I was wrong the previous analysis, it was not the iterator got reset, > the problem I can see now, is that during the syncing, a new round of > election kicked off and thus it needs to probe the newly added > monitor, however, since it hasn't been synced yet

Re: Firefly EOL date - still Jan 2016?

2015-11-13 Thread Nathan Cutler
> Does anyone on the stable release team have an interest in doing > releases beyond that date, or should we announce that as a firm date? For now my vote is to stick to the schedule and declare EOL on January 31, but I'm willing to negotiate :-) Nathan -- To unsubscribe from this list: send the

Re: Firefly EOL date - still Jan 2016?

2015-11-13 Thread Loic Dachary
Hi Ken, On 13/11/2015 22:15, Ken Dreyer wrote: > Hi folks, > > This is mainly directed at the stable release team members > (http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO), since > they are the ones doing the work of backporting :) > > On http://docs.ceph.com/docs/master/releases/, i

Re: recommendations for good newbie bug to look at?

2015-11-13 Thread Snyder, Emile
>Looks like we only have two tagged right now :( but periodically >things in the tracker get tagged with "new-dev". > >http://tracker.ceph.com/projects/ceph/search?utf8=✓&issues=1&q=new-dev > >...and looking at that, the osdmap_subscribe ones I think are mostly >dealt with in https://github.com/ce

Re: recommendations for good newbie bug to look at?

2015-11-13 Thread Gregory Farnum
Looks like we only have two tagged right now :( but periodically things in the tracker get tagged with "new-dev". http://tracker.ceph.com/projects/ceph/search?utf8=✓&issues=1&q=new-dev ...and looking at that, the osdmap_subscribe ones I think are mostly dealt with in https://github.com/ceph/ceph/

Re: Notes from a discussion a design to allow EC overwrites

2015-11-13 Thread Tianshan Qu
; case which becomes rare-er and rare-er as you scale-out) > > Allen Samuels > Software Architect, Emerging Storage Solutions > > 2880 Junction Avenue, Milpitas, CA 95134 > T: +1 408 801 7030| M: +1 408 780 6416 > allen.samu...@sandisk.com > > > -Original Message-----

RE: Notes from a discussion a design to allow EC overwrites

2015-11-13 Thread Allen Samuels
Storage Solutions 2880 Junction Avenue, Milpitas, CA 95134 T: +1 408 801 7030| M: +1 408 780 6416 allen.samu...@sandisk.com -Original Message- From: Samuel Just [mailto:sj...@redhat.com] Sent: Friday, November 13, 2015 7:39 AM To: Sage Weil Cc: ceph-devel@vger.kernel.org; Allen Samuels ;

Re: Notes from a discussion a design to allow EC overwrites

2015-11-13 Thread Samuel Just
Lazily persisting the intermediate entries would certainly also work, but there's an argument that it needlessly adds to the write transaction. Actually, we probably want to avoid having small writes be full stripe writes -- with a 8+3 code the difference between modifying a single stripelet and m

Re: Notes from a discussion a design to allow EC overwrites

2015-11-13 Thread Sage Weil
On Thu, 12 Nov 2015, Samuel Just wrote: > I was present for a discussion about allowing EC overwrites and thought it > would be good to summarize it for the list: > > Commit Protocol: > 1) client sends write to primary > 2) primary reads in partial stripes needed for partial stripe > overwrites fr

Re: data-at-rest compression

2015-11-13 Thread Sage Weil
On Fri, 13 Nov 2015, Alyona Kiselyova wrote: > Hi, > I was working on pluggable compression interface in this work > (https://github.com/ceph/ceph/pull/6361). In Igor's pull request was > suggested to reuse common plugin infrastructure from unmerged > wip-plugin branch. Now I'm working on adaptati

Re: data-at-rest compression

2015-11-13 Thread Alyona Kiselyova
Hi, I was working on pluggable compression interface in this work (https://github.com/ceph/ceph/pull/6361). In Igor's pull request was suggested to reuse common plugin infrastructure from unmerged wip-plugin branch. Now I'm working on adaptation of it, and as I see, I need only this two commits fr

RE: Increasing # Shards vs multi-OSDs per device

2015-11-12 Thread Blinick, Stephen L
evel@vger.kernel.org; Mark Nelson; Samuel Just; Kyle Bader; Somnath Roy Subject: Re: Increasing # Shards vs multi-OSDs per device -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I should have the weighted round robin queue ready in the next few days. I shaking out a few bugs from converting it fro

RE: [CEPH][Crush][Tunables] issue when updating tunables

2015-11-12 Thread ghislain.chevalier
Thx Sage It's clear now Best regards -Message d'origine- De : Sage Weil [mailto:s...@newdream.net] Envoyé : jeudi 12 novembre 2015 16:01 À : CHEVALIER Ghislain IMT/OLPS Cc : ceph-devel@vger.kernel.org Objet : RE: [CEPH][Crush][Tunables] issue when updating tunables On Thu, 1

Re: scrub randomization and load threshold

2015-11-12 Thread Dan van der Ster
On Thu, Nov 12, 2015 at 4:10 PM, Sage Weil wrote: > On Thu, 12 Nov 2015, Dan van der Ster wrote: >> On Thu, Nov 12, 2015 at 2:29 PM, Sage Weil wrote: >> > On Thu, 12 Nov 2015, Dan van der Ster wrote: >> >> Hi, >> >> >> >> Firstly, we just had a look at the new >> >> osd_scrub_interval_randomize_r

Re: scrub randomization and load threshold

2015-11-12 Thread Sage Weil
On Thu, 12 Nov 2015, Dan van der Ster wrote: > On Thu, Nov 12, 2015 at 2:29 PM, Sage Weil wrote: > > On Thu, 12 Nov 2015, Dan van der Ster wrote: > >> Hi, > >> > >> Firstly, we just had a look at the new > >> osd_scrub_interval_randomize_ratio option and found that it doesn't > >> really solve the

RE: [CEPH][Crush][Tunables] issue when updating tunables

2015-11-12 Thread Sage Weil
if it is a fresh cluster you are better off with straw_calc_version = 1. (Same goes for old clusters, if you can tolerate a bit of initial rebalancing.) sage > > Best regards > > -Message d'origine- > De : Sage Weil [mailto:s...@newdream.net] > Envoyé : mardi

Re: scrub randomization and load threshold

2015-11-12 Thread Dan van der Ster
On Thu, Nov 12, 2015 at 2:29 PM, Sage Weil wrote: > On Thu, 12 Nov 2015, Dan van der Ster wrote: >> Hi, >> >> Firstly, we just had a look at the new >> osd_scrub_interval_randomize_ratio option and found that it doesn't >> really solve the deep scrubbing problem. Given the default options, >> >> o

RE: [CEPH][Crush][Tunables] issue when updating tunables

2015-11-12 Thread ghislain.chevalier
0, > "require_feature_tunables": 1, > "require_feature_tunables2": 1, > "require_feature_tunables3": 1, > "has_v2_rules": 0, > "has_v3_rules": 0} - there's an issue with the tunables detection and update? Best regards

Re: scrub randomization and load threshold

2015-11-12 Thread Sage Weil
On Thu, 12 Nov 2015, Dan van der Ster wrote: > Hi, > > Firstly, we just had a look at the new > osd_scrub_interval_randomize_ratio option and found that it doesn't > really solve the deep scrubbing problem. Given the default options, > > osd_scrub_min_interval = 60*60*24 > osd_scrub_max_interval

Re: [PATCH] mm: Allow GFP_IOFS for page_cache_read page cache allocation

2015-11-12 Thread Jan Kara
On Wed 11-11-15 15:13:53, mho...@kernel.org wrote: > From: Michal Hocko > > page_cache_read has been historically using page_cache_alloc_cold to > allocate a new page. This means that mapping_gfp_mask is used as the > base for the gfp_mask. Many filesystems are setting this mask to > GFP_NOFS to

Re: test

2015-11-11 Thread Mark Nelson
whatever you did, it appears to work. :) On 11/11/2015 05:44 PM, Somnath Roy wrote: Sorry for the spam , having some issues with devl -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger

Re: Increasing # Shards vs multi-OSDs per device

2015-11-11 Thread Robert LeBlanc
rnals to one device. > > 3-- Page cache effects should be negated in these numbers. As you can see in > the other presentation we did one run with 2TB of data, which showed higher > performance (1.35M IOPS). But the rest of the tests were run with 4.8TB data > (replicated twice)

RE: Increasing # Shards vs multi-OSDs per device

2015-11-11 Thread Blinick, Stephen L
ers. As you can see in the other presentation we did one run with 2TB of data, which showed higher performance (1.35M IOPS). But the rest of the tests were run with 4.8TB data (replicated twice), and uniform random distribution. While we did use 'norandommap' for client performan

RE: Increasing # Shards vs multi-OSDs per device

2015-11-11 Thread Somnath Roy
Thanks for the data Stephen. Some feedback: 1. I don't think single OSD is still there to serve 460K read iops irrespective of how many shards/threads you are running. I didn't have your NVMe data earlier :-)..But, probably for 50/60K SAS SSD iops single OSD per drive is good enough. I hope you

Re: Increasing # Shards vs multi-OSDs per device

2015-11-11 Thread Mark Nelson
Hi Stephen, That's about what I expected to see, other than the write performance drop with more shards. We clearly still have some room for improvement. Good job doing the testing! Mark On 11/11/2015 02:57 PM, Blinick, Stephen L wrote: Sorry about the microphone issues in the performance

RE: 11/11/2015 Weekly Ceph Performance Meeting IS ON!

2015-11-11 Thread James (Fei) Liu-SSI
Hi Mark, Have been busy this morning and missed the meeting today. Would be possible to upload the recording video to the site at your most convenient time? Thanks, James -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf

Re: new scrub and repair discussion

2015-11-11 Thread kefu chai
On Wed, Nov 11, 2015 at 10:43 PM, 王志强 wrote: > 2015-11-11 19:44 GMT+08:00 kefu chai : >> currently, scrub and repair are pretty primitive. there are several >> improvements which need to be made: >> >> - user should be able to initialize scrub of a PG or an object >> - int scrub(pg_t, AioCompl

Re: new scrub and repair discussion

2015-11-11 Thread kefu chai
On Wed, Nov 11, 2015 at 9:25 PM, Sage Weil wrote: > On Wed, 11 Nov 2015, kefu chai wrote: >> currently, scrub and repair are pretty primitive. there are several >> improvements which need to be made: >> >> - user should be able to initialize scrub of a PG or an object >> - int scrub(pg_t, AioC

Re: new scrub and repair discussion

2015-11-11 Thread 王志强
2015-11-11 19:44 GMT+08:00 kefu chai : > currently, scrub and repair are pretty primitive. there are several > improvements which need to be made: > > - user should be able to initialize scrub of a PG or an object > - int scrub(pg_t, AioCompletion*) > - int scrub(const string& pool, const s

Re: disabling buffer::raw crc cache

2015-11-11 Thread Sage Weil
On Wed, 11 Nov 2015, Ning Yao wrote: > 2015-11-11 21:13 GMT+08:00 Sage Weil : > > On Wed, 11 Nov 2015, Ning Yao wrote: > >> >>>the code logic would touch crc cache is bufferlist::crc32c and > >> >>>invalidate_crc. > >> >>Also for pg_log::_write_log(), but seems it is always miss and use at > >> >>

Re: disabling buffer::raw crc cache

2015-11-11 Thread Ning Yao
2015-11-11 21:13 GMT+08:00 Sage Weil : > On Wed, 11 Nov 2015, Ning Yao wrote: >> >>>the code logic would touch crc cache is bufferlist::crc32c and >> >>>invalidate_crc. >> >>Also for pg_log::_write_log(), but seems it is always miss and use at >> >>once, no need to cache crc actually? >> > Oh, no,

Re: new scrub and repair discussion

2015-11-11 Thread Sage Weil
On Wed, 11 Nov 2015, kefu chai wrote: > currently, scrub and repair are pretty primitive. there are several > improvements which need to be made: > > - user should be able to initialize scrub of a PG or an object > - int scrub(pg_t, AioCompletion*) > - int scrub(const string& pool, const s

Re: disabling buffer::raw crc cache

2015-11-11 Thread Sage Weil
On Wed, 11 Nov 2015, Ning Yao wrote: > >>>the code logic would touch crc cache is bufferlist::crc32c and > >>>invalidate_crc. > >>Also for pg_log::_write_log(), but seems it is always miss and use at > >>once, no need to cache crc actually? > > Oh, no, it will be hit in FileJournal writing > Still

Re: disabling buffer::raw crc cache

2015-11-11 Thread Ning Yao
>>>the code logic would touch crc cache is bufferlist::crc32c and >>>invalidate_crc. >>Also for pg_log::_write_log(), but seems it is always miss and use at >>once, no need to cache crc actually? > Oh, no, it will be hit in FileJournal writing Still miss as buffer::ptr length diff with ::encode(cr

Re: disabling buffer::raw crc cache

2015-11-11 Thread Ning Yao
>>the code logic would touch crc cache is bufferlist::crc32c and invalidate_crc. >Also for pg_log::_write_log(), but seems it is always miss and use at >once, no need to cache crc actually? Oh, no, it will be hit in FileJournal writing Regards Ning Yao 2015-11-11 18:03 GMT+08:00 Ning Yao : >>>the

Re: disabling buffer::raw crc cache

2015-11-11 Thread Ning Yao
>>the code logic would touch crc cache is bufferlist::crc32c and invalidate_crc. Also for pg_log::_write_log(), but seems it is always miss and use at once, no need to cache crc actually? So we may need to add some option to enable or disable it, or some identifier to instruct bufferlist wheher cr

Re: disabling buffer::raw crc cache

2015-11-11 Thread 池信泽
the code logic would touch crc cache is bufferlist::crc32c and invalidate_crc. we call bufferlist::crc32 when sending or receiving message and writing filejournal. I miss something critical? I am agree with you that the benefit from that cache cache is every limit. 2015-11-11 16:25 GMT+08:00 Evge

Re: disabling buffer::raw crc cache

2015-11-11 Thread Evgeniy Firsov
Rb-tree construction, insertion, which needs memory allocation, mutex lock, unlock is more CPU expensive then streamlined crc calculation of sometimes 100 bytes or less. On 11/11/15, 12:03 AM, "池信泽" wrote: >Ah, I confuse that why the crc cache logic would exhaust so much cpu. > >2015-11-11 15:27

Re: disabling buffer::raw crc cache

2015-11-11 Thread 池信泽
Ah, I confuse that why the crc cache logic would exhaust so much cpu. 2015-11-11 15:27 GMT+08:00 Evgeniy Firsov : > Hello, Guys! > > While running CPU bound 4k block workload, I found that disabling crc > cache in the buffer::raw gives around 7% performance improvement. > > If there is no strong u

Re: [ceph-users] Permanent MDS restarting under load

2015-11-10 Thread Oleksandr Natalenko
10.11.2015 22:38, Gregory Farnum wrote: Which requests are they? Are these MDS operations or OSD ones? Those requests appeared in ceph -w output and are the follows: https://gist.github.com/5045336f6fb7d532138f Is that correct that there are OSD operations blocked? osd.3 is one of data poo

Re: why ShardedWQ in osd using smart pointer for PG?

2015-11-10 Thread Gregory Farnum
The xlist has means of efficiently removing entries from a list. I think you'll find those in the path where we start tearing down a PG, and membership on this list is a bit different from membership in the ShardedThreadPool. It's all about the particulars of each design, and I don't have that in m

Re: why ShardedWQ in osd using smart pointer for PG?

2015-11-10 Thread 池信泽
I wonder if we want to keep the PG from going out of scope at an inopportune time, why snap_trim_queue and scrub_queue declared as xlist instead of xlist? 2015-11-11 2:28 GMT+08:00 Gregory Farnum : > On Tue, Nov 10, 2015 at 7:19 AM, 池信泽 wrote: >> hi, all: >> >> op_wq is declared as ShardedTh

Re: [ceph-users] v9.2.0 Infernalis released

2015-11-10 Thread Alfredo Deza
* build: fix junit detection on Fedora 22 (Ira Cooper) > * build: fix pg ref disabling (William A. Kennington III) > * build: fix ppc build (James Page) > * build: install-deps: misc fixes (Loic Dachary) > * build: install-deps.sh improvements (Loic Dachary) > * build: install-deps

Re: [ceph-users] Permanent MDS restarting under load

2015-11-10 Thread Gregory Farnum
On Tue, Nov 10, 2015 at 6:32 AM, Oleksandr Natalenko wrote: > Hello. > > We have CephFS deployed over Ceph cluster (0.94.5). > > We experience constant MDS restarting under high IOPS workload (e.g. > rsyncing lots of small mailboxes from another storage to CephFS using > ceph-fuse client). First,

Re: why ShardedWQ in osd using smart pointer for PG?

2015-11-10 Thread Gregory Farnum
On Tue, Nov 10, 2015 at 7:19 AM, 池信泽 wrote: > hi, all: > > op_wq is declared as ShardedThreadPool::ShardedWQ < pair OpRequestRef> > &op_wq. I do not know why we should use PGRef in this? > > Because the overhead of the smart pointer is not small. Maybe the > raw point PG* is also OK? >

Re: infernalis build package on debian jessie : dh_install: ceph missing files (usr/lib/libos_tp.so.*), aborting

2015-11-10 Thread Alexandre DERUMIER
Sorry,my fault, I had an old --without-lttng flag in my build packages. - Mail original - De: "aderumier" À: "ceph-devel" Envoyé: Mardi 10 Novembre 2015 15:06:19 Objet: infernalis build package on debian jessie : dh_install: ceph missing files (usr/lib/libos_tp.so.*), aborting Hi, I

Re: Backlog for the Ceph tracker

2015-11-10 Thread Loic Dachary
On 10/11/2015 16:34, Loic Dachary wrote: > But http://tracker.ceph.com/projects/ceph/agile_versions looks better :-) It appears to be a crippled version of a proprietary product http://www.redminecrm.com/projects/agile/pages/last My vote would be to de-install it since it is even less flexible

Re: Backlog for the Ceph tracker

2015-11-10 Thread Loic Dachary
But http://tracker.ceph.com/projects/ceph/agile_versions looks better :-) On 10/11/2015 16:28, Loic Dachary wrote: > Hi Sam, > > I crafted a custom query that could be used as a replacement for the backlog > plugin > >http://tracker.ceph.com/projects/ceph/issues?query_id=86 > > It displays

Re: a home for backport snippets

2015-11-10 Thread Loic Dachary
Hi, The new snippets home is at https://pypi.python.org/pypi/ceph-workbench and http://ceph-workbench.dachary.org/root/ceph-workbench. The first snippet was merged by Nathan yesterday[1], the backport documentation updated accordingly[2], and I used it after merging half a dozen hammer backpor

Re: [CEPH][Crush][Tunables] issue when updating tunables

2015-11-10 Thread Sage Weil
On Tue, 10 Nov 2015, ghislain.cheval...@orange.com wrote: > Hi all, > > Context: > Firefly 0.80.9 > Ubuntu 14.04.1 > Almost a production platform in an openstack environment > 176 OSD (SAS and SSD), 2 crushmap-oriented storage classes , 8 servers in 2 > rooms, 3 monitors on openstack controllers

Re: How to modify affiliation?

2015-11-10 Thread Loic Dachary
Hi, You can submit a patch to https://github.com/ceph/ceph/blob/master/.organizationmap Cheers On 10/11/2015 09:21, chen kael wrote: > Hi,ceph-dev > who can tell me how to modify my affiliation? > Thanks! > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the bo

Re: NIC with Erasure offload feature support and Ceph

2015-11-10 Thread Mike Almateia
03-Nov-15 18:07, Gregory Farnum пишет: On Tue, Nov 3, 2015 at 3:15 AM, Mike wrote: Hello! In our project we planing build a petabayte cluster with Erasure pool. Also we looking on Mellanox ConnectX-4 Lx EN Cards/ConnectX-4 EN Cards for using its a offloading erasure code feature. Someone use

Re: make check bot resumed

2015-11-09 Thread Loic Dachary
is enough to rebase and repush it. Cheers On 09/11/2015 15:33, Loic Dachary wrote: > Hi, > > The machine sending notifications for the make check bot failed during the > week-end. It was rebooted and it should resume its work. > > The virtual machine was actually re

Re: RGW multi-tenancy APIs overview

2015-11-09 Thread Yehuda Sadeh-Weinraub
On Mon, Nov 9, 2015 at 9:10 PM, Pete Zaitcev wrote: > With ticket 5073 getting close to complete, we're getting the APIs mostly Great! thanks for all the work you've done to get this closer to completion. > nailed down. Most of them come down to selection a syntax separator > character. Unfortun

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Milosz Tanski
On Mon, Nov 9, 2015 at 3:49 PM, Samuel Just wrote: > On Mon, Nov 9, 2015 at 12:31 PM, Robert LeBlanc wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA256 >> >> On Mon, Nov 9, 2015 at 12:47 PM, Samuel Just wrote: >>> What I really want from PrioritizedQueue (and from the dmclock/mclock >>

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 It sounds like dmclock/mclock will alleviate a lot of the concerns I have as long as it can be smart like you said. It sounds like the queue thread was already tried so there is experience behind the current implementation vs. me thinking it might be

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Samuel Just
On Mon, Nov 9, 2015 at 1:30 PM, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > On Mon, Nov 9, 2015 at 1:49 PM, Samuel Just wrote: >> We basically don't want a single thread to see all of the operations -- it >> would cause a tremendous bottleneck and complicate the

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On Mon, Nov 9, 2015 at 1:49 PM, Samuel Just wrote: > We basically don't want a single thread to see all of the operations -- it > would cause a tremendous bottleneck and complicate the design > immensely. It's shouldn't be necessary anyway since PG

Re: [PATCH 1/9] drivers/staging/media/davinci_vpfe/vpfe_mc_capture.c: use correct structure type name in sizeof

2015-11-09 Thread Laurent Pinchart
Hi Julia, Thank you for the patch. On Tuesday 29 July 2014 17:16:43 Julia Lawall wrote: > From: Julia Lawall > > Correct typo in the name of the type given to sizeof. Because it is the > size of a pointer that is wanted, the typo has no impact on compilation or > execution. > > This problem w

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Samuel Just
On Mon, Nov 9, 2015 at 12:31 PM, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > On Mon, Nov 9, 2015 at 12:47 PM, Samuel Just wrote: >> What I really want from PrioritizedQueue (and from the dmclock/mclock >> approaches that are also being worked on) is a solution to

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On Mon, Nov 9, 2015 at 12:47 PM, Samuel Just wrote: > What I really want from PrioritizedQueue (and from the dmclock/mclock > approaches that are also being worked on) is a solution to the problem > of efficiently deciding which op to do next taking

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Samuel Just
What I really want from PrioritizedQueue (and from the dmclock/mclock approaches that are also being worked on) is a solution to the problem of efficiently deciding which op to do next taking into account fairness across io classes and ops with different costs. On Mon, Nov 9, 2015 at 11:19 AM, Rob

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Thanks, I think some of the fog is clearing. I was wondering how operations between threads were keeping the order of operations in PGs, that explains it. My original thoughts were to have a queue in front and behind the Prio/WRR queue. Threads sche

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Haomai Wang
On Tue, Nov 10, 2015 at 2:19 AM, Samuel Just wrote: > Ops are hashed from the messenger (or any of the other enqueue sources > for non-message items) into one of N queues, each of which is serviced > by M threads. We can't quite have a single thread own a single queue > yet because the current de

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Samuel Just
Ops are hashed from the messenger (or any of the other enqueue sources for non-message items) into one of N queues, each of which is serviced by M threads. We can't quite have a single thread own a single queue yet because the current design allows multiple threads/queue (important because if a sy

<    1   2   3   4   5   6   7   8   9   10   >