RE: chooseleaf may cause some unnecessary pg migrations

2015-10-13 Thread Xusangdi
Straw2. But I had also run the same test for straw alg, which generated quite similar results. > -Original Message- > From: Robert LeBlanc [mailto:rob...@leblancnet.us] > Sent: Tuesday, October 13, 2015 10:21 PM > To: xusangdi 11976 (RD) > Cc: sw...@redhat.com; ceph-devel@vger.kernel.org

Re: dump_historic_ops, slow requests

2015-10-13 Thread Gregory Farnum
On Mon, Oct 12, 2015 at 2:22 PM, Deneau, Tom wrote: > I have a small ceph cluster (3 nodes, 5 osds each, journals all just > partitions > on the spinner disks) and I have noticed that when I hit it with a bunch of > rados bench clients all doing writes of large (40M objects)

RE: chooseleaf may cause some unnecessary pg migrations

2015-10-13 Thread Xusangdi
Please see inline. > -Original Message- > From: ceph-devel-ow...@vger.kernel.org > [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of > Sage Weil > Sent: Wednesday, October 14, 2015 12:45 AM > To: xusangdi 11976 (RD) > Cc: ceph-devel@vger.kernel.org > Subject: Re: chooseleaf may

Re: Re: The questions of data collection and cache tiering in Ceph

2015-10-13 Thread Gregory Farnum
On Mon, Oct 12, 2015 at 6:37 AM, 蔡毅 wrote: > Greg, > Thank you a lot for your timely reply. These are really helpful for me.I > also have some doubts. > In Ceph, besides monitoring pool, pg, object, it can also acquire other > statistics such as CPU, IOPS, BW. In

Re: MDS stuck in a crash loop

2015-10-13 Thread Gregory Farnum
On Sun, Oct 11, 2015 at 7:36 PM, Milosz Tanski wrote: > On Sun, Oct 11, 2015 at 6:44 PM, Milosz Tanski wrote: >> On Sun, Oct 11, 2015 at 6:01 PM, Milosz Tanski wrote: >>> On Sun, Oct 11, 2015 at 5:33 PM, Milosz Tanski

Re: [ceph-users] v9.1.0 Infernalis release candidate released

2015-10-13 Thread Goncalo Borges
Hi Sage... I've seen that the rh6 derivatives have been ruled out. This is a problem in our case since the OS choice in our systems is, somehow, imposed by CERN. The experiments software is certified for SL6 and the transition to SL7 will take some time. This is kind of a showstopper

Re: [ceph-users] Initial performance cluster SimpleMessenger vs AsyncMessenger results

2015-10-13 Thread Haomai Wang
Yep, as I said below, I consider to add auto scale up/down for worker threads with connection load balance ability. It may let users not entangled with how much thread number I need. :-( Actually thread number for config value is a pain in ceph osd io stack. On Tue, Oct 13, 2015 at 2:45 PM,

RE: [ceph-users] Initial performance cluster SimpleMessenger vs AsyncMessenger results

2015-10-13 Thread Dałek , Piotr
> -Original Message- > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > ow...@vger.kernel.org] On Behalf Of Somnath Roy > Sent: Tuesday, October 13, 2015 8:46 AM > > Thanks Haomai.. > Since Async messenger is always using a constant number of threads , there > could be a

RE: [ceph-users] Initial performance cluster SimpleMessenger vs AsyncMessenger results

2015-10-13 Thread Somnath Roy
Thanks Haomai.. Since Async messenger is always using a constant number of threads , there could be a potential performance problem of scaling up the client connections keeping the constant number of OSDs ? May be it's a good tradeoff.. Regards Somnath -Original Message- From: Haomai

Re: [ceph-users] Initial performance cluster SimpleMessenger vs AsyncMessenger results

2015-10-13 Thread Haomai Wang
On Tue, Oct 13, 2015 at 12:18 PM, Somnath Roy wrote: > Mark, > > Thanks for this data. This means probably simple messenger (not OSD core) is > not doing optimal job of handling memory. > > > > Haomai, > > I am not that familiar with Async messenger code base, do you have

chooseleaf may cause some unnecessary pg migrations

2015-10-13 Thread Xusangdi
Hi Sage, Recently when I was learning about the crush rules I noticed that the step chooseleaf may cause some unnecessary pg migrations when OSDs are outed. For example, for a cluster of 4 hosts with 2 OSDs each, after host1(osd.2, osd.3) is down, the mapping differences would be like this:

Re: enable rbd on ec pool ?

2015-10-13 Thread Loic Dachary
Hi Tomy, On 13/10/2015 06:13, Tomy Cheru wrote: > Is there a patch available to enable rbd over an EC pool ? You have to go through a cache tier instead of using it directly. See http://docs.ceph.com/docs/master/rados/operations/cache-tiering/ for more information. Cheers > > Currently its

Re: enable rbd on ec pool ?

2015-10-13 Thread Loic Dachary
On 13/10/2015 09:59, Tomy Cheru wrote: > Hi Loic, > Thanks for your response, > > however am specifically looking for a patch to enable rbd on ec pool(am aware > of cache tier option). Ah :-) I'm not aware of such a patch, even in draft state. > Thanks, > tomy > >

Re: Kernel RBD Readahead

2015-10-13 Thread Olivier Bonvalet
Le mardi 25 août 2015 à 17:50 +0300, Ilya Dryomov a écrit : > > Ok. I might try and create a 4.1 kernel with the blk-mq queue > depth/IO size + readahead +max_segments fixes in as I'm think the > TCP_NODELAY bug will still be present in my old 3.14 kernel. > > I can build 4.2-rc8 + readahead

Re: Kernel RBD Readahead

2015-10-13 Thread Ilya Dryomov
On Tue, Oct 13, 2015 at 11:33 AM, Olivier Bonvalet wrote: > Le mardi 25 août 2015 à 17:50 +0300, Ilya Dryomov a écrit : >> > Ok. I might try and create a 4.1 kernel with the blk-mq queue >> depth/IO size + readahead +max_segments fixes in as I'm think the >> TCP_NODELAY bug

Re: enable rbd on ec pool ?

2015-10-13 Thread Sage Weil
On Tue, 13 Oct 2015, Loic Dachary wrote: > > > On 13/10/2015 09:59, Tomy Cheru wrote: > > Hi Loic, > > Thanks for your response, > > > > however am specifically looking for a patch to enable rbd on ec pool(am > > aware of cache tier option). > > Ah :-) I'm not aware of such a

Re: Fwd: monitor crashing

2015-10-13 Thread Sage Weil
On Tue, 13 Oct 2015, Luis Periquito wrote: > the store.db dir is 3.4GB big :( > > can I do it on my side? Nevermind, I was able to reproduce it from the bugzilla. I've pushed a branch wip-ecpool-hammer. Not sure which distro you're on, but packages will appear at gitbuilder.ceph.com in 30-45

Re: Fwd: monitor crashing

2015-10-13 Thread Loic Dachary
https://github.com/ceph/ceph/compare/hammer...wip-ecpool-hammer In order to bypass the crush verification, you could: ceph tell mon.* injectargs --crushtool /bin/true Cheers On 13/10/2015 15:41, Sage Weil wrote: > On Tue, 13 Oct 2015, Luis Periquito wrote: >> the store.db dir is 3.4GB big :(

Re: Fwd: monitor crashing

2015-10-13 Thread Sage Weil
On Tue, 13 Oct 2015, Loic Dachary wrote: > https://github.com/ceph/ceph/compare/hammer...wip-ecpool-hammer > > In order to bypass the crush verification, you could: > > ceph tell mon.* injectargs --crushtool /bin/true Ah, good trick! http://tracker.ceph.com/issues/13477 is the ticket,

Re: Fwd: monitor crashing

2015-10-13 Thread Sage Weil
On Tue, 13 Oct 2015, Luis Periquito wrote: > Any ideas? I'm growing desperate :( > > I've tried compiling from source, and including > https://github.com/ceph/ceph/pull/5276, but it still crashes on boot > of the ceph-mon If you can email a (link to a) tarball of your mon data directory I'd love

Re: Fwd: monitor crashing

2015-10-13 Thread Luis Periquito
the store.db dir is 3.4GB big :( can I do it on my side? On Tue, Oct 13, 2015 at 2:25 PM, Sage Weil wrote: > On Tue, 13 Oct 2015, Luis Periquito wrote: >> Any ideas? I'm growing desperate :( >> >> I've tried compiling from source, and including >>

Re: Kernel RBD Readahead

2015-10-13 Thread Olivier Bonvalet
Le mardi 13 octobre 2015 à 12:20 +0200, Ilya Dryomov a écrit : > On Tue, Oct 13, 2015 at 11:33 AM, Olivier Bonvalet < > ceph.l...@daevel.fr> wrote: > > Le mardi 25 août 2015 à 17:50 +0300, Ilya Dryomov a écrit : > > > > Ok. I might try and create a 4.1 kernel with the blk-mq queue > > > depth/IO

Fwd: monitor crashing

2015-10-13 Thread Luis Periquito
Any ideas? I'm growing desperate :( I've tried compiling from source, and including https://github.com/ceph/ceph/pull/5276, but it still crashes on boot of the ceph-mon -- Forwarded message -- From: Luis Periquito Date: Tue, Oct 13, 2015 at 12:26 PM Subject:

Re: throttles

2015-10-13 Thread Sage Weil
On Mon, 12 Oct 2015, Deneau, Tom wrote: > Looking at the perf counters on my osds, I see wait counts for the following > throttle related perf counters: (This is from trying to benchmark using > multiple rados bench client processes). > >throttle-filestore_bytes

Re: [ceph-users] Initial performance cluster SimpleMessenger vs AsyncMessenger results

2015-10-13 Thread Mark Nelson
Hi Haomai, Great! I haven't had a chance to dig in and look at it with valgrind yet, but if I get a chance after I'm done with newstore fragment testing and somnath's writepath work I'll try to go back and dig in if you haven't had a chance yet. Mark On 10/12/2015 09:56 PM, Haomai Wang

Re: [ceph-users] Initial performance cluster SimpleMessenger vs AsyncMessenger results

2015-10-13 Thread Sage Weil
On Tue, 13 Oct 2015, Haomai Wang wrote: > resend > > On Tue, Oct 13, 2015 at 10:56 AM, Haomai Wang wrote: > > COOL > > > > Interesting that async messenger will consume more memory than simple, in my > > mind I always think async should use less memory. I will give a look

Re: Fwd: monitor crashing

2015-10-13 Thread Sage Weil
On Tue, 13 Oct 2015, Luis Periquito wrote: > Hi Sage, > > awesome help. > > Sorry for not telling before, but I'm running 2xMON in precise and > 1xMON in trusty. Looking at the status page > (http://ceph.com/gitbuilder.cgi) it seems the precise build is > failing... Can you have a look? I've

Re: chooseleaf may cause some unnecessary pg migrations

2015-10-13 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Are you testing with straw or straw2? - Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Tue, Oct 13, 2015 at 2:22 AM, Xusangdi wrote: > Hi Sage, > > Recently when I was learning about the

Re: Fwd: monitor crashing

2015-10-13 Thread Luis Periquito
Hi Sage, awesome help. Sorry for not telling before, but I'm running 2xMON in precise and 1xMON in trusty. Looking at the status page (http://ceph.com/gitbuilder.cgi) it seems the precise build is failing... Can you have a look? thanks, On Tue, Oct 13, 2015 at 2:59 PM, Sage Weil

RE: enable rbd on ec pool ?

2015-10-13 Thread Tomy Cheru
Thanks Sage, Loic -Original Message- From: Sage Weil [mailto:s...@newdream.net] Sent: Tuesday, October 13, 2015 6:51 PM To: Loic Dachary Cc: Tomy Cheru; ceph-devel@vger.kernel.org Subject: Re: enable rbd on ec pool ? On Tue, 13 Oct 2015, Loic Dachary wrote: > > > On 13/10/2015 09:59,

Re: Initial performance cluster SimpleMessenger vs AsyncMessenger results

2015-10-13 Thread Mark Nelson
On 10/12/2015 11:12 PM, Gregory Farnum wrote: On Mon, Oct 12, 2015 at 9:50 AM, Mark Nelson wrote: Hi Guy, Given all of the recent data on how different memory allocator configurations improve SimpleMessenger performance (and the effect of memory allocators and transparent

RE: throttles

2015-10-13 Thread Deneau, Tom
> -Original Message- > From: Sage Weil [mailto:s...@newdream.net] > Sent: Tuesday, October 13, 2015 7:44 AM > To: Deneau, Tom > Cc: ceph-devel@vger.kernel.org > Subject: Re: throttles > > On Mon, 12 Oct 2015, Deneau, Tom wrote: > > Looking at the perf counters on my osds, I see wait

RE: throttles

2015-10-13 Thread Somnath Roy
BTW, you can completely turn off these throttles ( other than the filestore throttle ) by setting the value to 0. Thanks & Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Deneau, Tom Sent: Tuesday, October

RE: throttles

2015-10-13 Thread Deneau, Tom
I remember previously there were some options that could be reset thru the admin socket and some that required an osd restart. Do the ones below require an osd restart? -- Tom > -Original Message- > From: Somnath Roy [mailto:somnath@sandisk.com] > Sent: Tuesday, October 13, 2015

Re: chooseleaf may cause some unnecessary pg migrations

2015-10-13 Thread Sage Weil
Hi Sangdi, On Tue, 13 Oct 2015, Xusangdi wrote: > Hi Sage, > > Recently when I was learning about the crush rules I noticed that the step > chooseleaf may cause some unnecessary pg migrations when OSDs are outed. > For example, for a cluster of 4 hosts with 2 OSDs each, after host1(osd.2, >

Re: Fwd: monitor crashing

2015-10-13 Thread Luis Periquito
Thanks for all the help Sage. The cluster is now back to life with your awesome patch. On Tue, Oct 13, 2015 at 3:35 PM, Sage Weil wrote: > On Tue, 13 Oct 2015, Luis Periquito wrote: >> Hi Sage, >> >> awesome help. >> >> Sorry for not telling before, but I'm running 2xMON in

Re: [ceph-users] Potential OSD deadlock?

2015-10-13 Thread Sage Weil
On Mon, 12 Oct 2015, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > After a weekend, I'm ready to hit this from a different direction. > > I replicated the issue with Firefly so it doesn't seem an issue that > has been introduced or resolved in any nearby version.

Re: throttles

2015-10-13 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 In my experience with throttles, we had to restart the OSD, the admin socket would not apply the change to a running OSD. - Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Tue, Oct 13, 2015 at

streamlining release notes

2015-10-13 Thread Sage Weil
The process of walking through commits for each release and writing up the notes is tedious and error prone, and used to take 1-2 hours for each release. Since we've not been doing dev releases as frequently this cycle, the ~2000 odd commits for the infernalis rc promises to take even longer.

Re: streamlining release notes

2015-10-13 Thread Ken Dreyer
On Tue, Oct 13, 2015 at 11:34 AM, Sage Weil wrote: > What do you think? Great idea. This should also help when sharing information between the Hammer and Firefly release notes, since we can copy and paste in each PR. - Ken -- To unsubscribe from this list: send the line

pre-Infernalis ceph-disk bug

2015-10-13 Thread Jeremy Hanmer
I think I've found a bug in ceph-disk when running on Ubuntu 14.04 (and I believe 12.04 as well, but haven't confirmed) and using --dmcrypt. The problem is that when update_partition() is called, partprobe is used to re-read the partition table (as opposed to partx on all other distros) and it

v9.1.0 Infernalis release candidate released

2015-10-13 Thread Sage Weil
This is the first Infernalis release candidate. There have been some major changes since hammer, and the upgrade process is non-trivial. Please read carefully. Getting the release candidate - The v9.1.0 packages are pushed to the development release repositories::

Re: v9.1.0 Infernalis release candidate released

2015-10-13 Thread Joao Eduardo Luis
On 13/10/15 22:01, Sage Weil wrote: > * *RADOS*: > * The RADOS cache tier can now proxy write operations to the base > tier, allowing writes to be handled without forcing migration of > an object into the cache. > * The SHEC erasure coding support is no longer flagged as >

Re: pre-Infernalis ceph-disk bug

2015-10-13 Thread Loic Dachary
Hi, On 14/10/2015 00:02, Jeremy Hanmer wrote: > I think I've found a bug in ceph-disk when running on Ubuntu 14.04 > (and I believe 12.04 as well, but haven't confirmed) and using > --dmcrypt. > > The problem is that when update_partition() is called, partprobe is > used to re-read the partition

Re: pre-Infernalis ceph-disk bug

2015-10-13 Thread Jeremy Hanmer
Cool, I'll try to clean things up and submit a PR (probably one for wrote: > Hi, > > On 14/10/2015 00:02, Jeremy Hanmer wrote: >> I think I've found a bug in ceph-disk when running on Ubuntu 14.04 >> (and I believe 12.04 as well, but haven't confirmed) and using >> --dmcrypt. >> >> The problem is