Re: ceph-mon always election when change crushmap in firefly

2015-09-24 Thread Sage Weil
On Thu, 24 Sep 2015, Alexander Yang wrote: > I use 'ceph osd crush dump | tail -n 20' get : > > "type": 1, > "min_size": 1, > "max_size": 10, > "steps": [ > { "op": "take", > "item": -62, >

RE: Very slow recovery/peering with latest master

2015-09-24 Thread Podoski, Igor
> -Original Message- > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > ow...@vger.kernel.org] On Behalf Of Sage Weil > Sent: Thursday, September 24, 2015 3:32 AM > To: Handzik, Joe > Cc: Somnath Roy; Samuel Just; Samuel Just (sam.j...@inktank.com); ceph- > devel > Subject: Re:

aarch64 not using crc32?

2015-09-24 Thread Sage Weil
Hi Pankaj, In order to get the build going on the new trusty gitbuilder I had to make this change: https://github.com/ceph/ceph/commit/3123b2c5d3b72c9d43b10d8f296305d41b68b730 It was clearly a bug, but what worries me is that the fact that I hit it means the HWCAP_CRC32 is not

aarch64 test builds for trusty now available

2015-09-24 Thread Sage Weil
We now have a gitbuilder up and running building test packages for arm64 (aarch64). The hardware for these builds has been graciously provided by Cavium (thank you!). Trusty aarch64 users can now install packages with ceph-deploy install --dev BRANCH HOST and build results are visible at

[Manila] CephFS native driver

2015-09-24 Thread John Spray
Hi all, I've recently started work on a CephFS driver for Manila. The (early) code is here: https://github.com/openstack/manila/compare/master...jcsp:ceph It requires a special branch of ceph which is here: https://github.com/ceph/ceph/compare/master...jcsp:wip-manila This isn't done yet

full cluster/pool handling

2015-09-24 Thread Sage Weil
Xuan Liu recently pointed out that there is a problem with our handling for full clusters/pools: we don't allow any writes when full, including delete operations. While fixing a separate full issue I ended up making several fixes and cleanups in the full handling code in

Re: Seek advice for using Ceph to provice NAS service

2015-09-24 Thread Jevon Qiao
Any comments or suggestions? Thanks, Jevon On 23/9/15 10:21, Jevon Qiao wrote: Hi Sage and other Ceph experts, This is a greeting from Jevon, I'm from China and working in a company which are using Ceph as the backend storage. At present, I'm evaluating the following two options of using

Re: full cluster/pool handling

2015-09-24 Thread Sage Weil
On Thu, 24 Sep 2015, Yehuda Sadeh-Weinraub wrote: > On Thu, Sep 24, 2015 at 5:30 AM, Sage Weil wrote: > > Xuan Liu recently pointed out that there is a problem with our handling > > for full clusters/pools: we don't allow any writes when full, > > including delete operations. >

Re: full cluster/pool handling

2015-09-24 Thread Yehuda Sadeh-Weinraub
On Thu, Sep 24, 2015 at 5:30 AM, Sage Weil wrote: > Xuan Liu recently pointed out that there is a problem with our handling > for full clusters/pools: we don't allow any writes when full, > including delete operations. > > While fixing a separate full issue I ended up making

Re: full cluster/pool handling

2015-09-24 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On Thu, Sep 24, 2015 at 6:30 AM, Sage Weil wrote: > Xuan Liu recently pointed out that there is a problem with our handling > for full clusters/pools: we don't allow any writes when full, > including delete operations. > > While fixing a separate

Re: full cluster/pool handling

2015-09-24 Thread Sage Weil
On Thu, 24 Sep 2015, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > > On Thu, Sep 24, 2015 at 6:30 AM, Sage Weil wrote: > > Xuan Liu recently pointed out that there is a problem with our handling > > for full clusters/pools: we don't allow any writes when full, >

Re: Adding Data-At-Rest compression support to Ceph

2015-09-24 Thread Igor Fedotov
Samuel, I completely agree about the need to have a blueprint before the implementation. But I think we should fix what approach to use ( when and how to perform the compression) first. I'll summarize existing suggestions and their Pros and Cons shortly. Thus we'll be able to discuss them more

Re: full cluster/pool handling

2015-09-24 Thread Yehuda Sadeh-Weinraub
On Thu, Sep 24, 2015 at 7:50 AM, Sage Weil wrote: > On Thu, 24 Sep 2015, Yehuda Sadeh-Weinraub wrote: >> On Thu, Sep 24, 2015 at 5:30 AM, Sage Weil wrote: >> > Xuan Liu recently pointed out that there is a problem with our handling >> > for full

Re: Adding Data-At-Rest compression support to Ceph

2015-09-24 Thread Sage Weil
On Thu, 24 Sep 2015, Igor Fedotov wrote: > On 23.09.2015 21:03, Gregory Farnum wrote: > > On Wed, Sep 23, 2015 at 6:15 AM, Sage Weil wrote: > > > > > > > > > > The idea of making the primary responsible for object compression > > > > > really concerns me. It means for instance

Re: Adding Data-At-Rest compression support to Ceph

2015-09-24 Thread Igor Fedotov
As for me that's the first time I hear about it. But if we introduce pluggable compression back-ends that would be pretty easy to try. Thanks, Igor. On 24.09.2015 18:41, HEWLETT, Paul (Paul) wrote: Out of curiosity have you considered the Google compression algos:

Re: Adding Data-At-Rest compression support to Ceph

2015-09-24 Thread Igor Fedotov
On 23.09.2015 21:03, Gregory Farnum wrote: On Wed, Sep 23, 2015 at 6:15 AM, Sage Weil wrote: The idea of making the primary responsible for object compression really concerns me. It means for instance that a single random access will likely require access to multiple

Re: Adding Data-At-Rest compression support to Ceph

2015-09-24 Thread HEWLETT, Paul (Paul)
Out of curiosity have you considered the Google compression algos: http://google-opensource.blogspot.co.uk/2015/09/introducing-brotli-new-comp ression.html Paul On 24/09/2015 16:34, "ceph-devel-ow...@vger.kernel.org on behalf of Sage Weil"

Re: Adding Data-At-Rest compression support to Ceph

2015-09-24 Thread Igor Fedotov
On 24.09.2015 18:34, Sage Weil wrote: I was also assuming each stripe unit would be independently compressed, but I didn't think about the efficiency. This approach implies that you'd want a relatively large stripe size (100s of KB or more). Hmm, a quick google search suggests the zlib

Re: aarch64 not using crc32?

2015-09-24 Thread Pankaj Garg
Hi Sage, I actually had the same issue just a couple weeks back. The hardware actually has the CRC32 capability and we have tested it. This issue lies the toolchain on the machine. There are several versions of .h files present with different HWCAP #defines. We need to fix it so that we are

Re: Adding Data-At-Rest compression support to Ceph

2015-09-24 Thread Igor Fedotov
On 24.09.2015 19:03, Sage Weil wrote: On Thu, 24 Sep 2015, Igor Fedotov wrote: There is probably no need in strict alignment with the stripe size. We can use block sizes that client provides on write dynamically. If some client writes in stripes - then we compress that block. If others use

RE: Very slow recovery/peering with latest master

2015-09-24 Thread Somnath Roy
Yeah , Igor may be.. Meanwhile, I am able to get gdb trace of the hang.. (gdb) bt #0 0x7f6f6bf043bd in read () at ../sysdeps/unix/syscall-template.S:81 #1 0x7f6f6af3b066 in ?? () from /lib/x86_64-linux-gnu/libblkid.so.1 #2 0x7f6f6af43ae2 in ?? () from

Re: Adding Data-At-Rest compression support to Ceph

2015-09-24 Thread Igor Fedotov
On 24.09.2015 19:03, Sage Weil wrote: On Thu, 24 Sep 2015, Igor Fedotov wrote: Dynamic stripe sizes are possible but it's a significant change from the way the EC pool currently works. I would make that a separate project (as its useful in its own right) and not complicate the compression

Re: Adding Data-At-Rest compression support to Ceph

2015-09-24 Thread Gregory Farnum
On Thu, Sep 24, 2015 at 8:13 AM, Igor Fedotov wrote: > On 23.09.2015 21:03, Gregory Farnum wrote: >> >> On Wed, Sep 23, 2015 at 6:15 AM, Sage Weil wrote: > > > The idea of making the primary responsible for object compression > really

Re: Where does the data go ??

2015-09-24 Thread Gregory Farnum
On Tue, Sep 22, 2015 at 6:58 PM, Tomy Cheru wrote: > Noticed while benchmarking newstore with rocksdb backend that, the data is > missing in "dev/osd0/fragments" > >>64k sized objects produces content in above mentioned dir, however missing >>with <=64k sized objects >

Re: Adding Data-At-Rest compression support to Ceph

2015-09-24 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I'm probably missing something, but since we are talking about data at rest, can't we just have the OSD compress the object as it goes to disk? Instead of rbd\udata.1ba49c10d9b00c.6859__head_2AD1002B__11 it would be

Re: Adding Data-At-Rest compression support to Ceph

2015-09-24 Thread Samuel Just
The catch is that currently accessing 4k in the middle of a 4MB object does not require reading the whole object, so you'd need some kind of logical offset -> compressed offset mapping. -Sam On Thu, Sep 24, 2015 at 10:36 AM, Robert LeBlanc wrote: > -BEGIN PGP SIGNED

Re: full cluster/pool handling

2015-09-24 Thread Gregory Farnum
On Thu, Sep 24, 2015 at 8:04 AM, Sage Weil wrote: > On Thu, 24 Sep 2015, Robert LeBlanc wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA256 >> >> >> On Thu, Sep 24, 2015 at 6:30 AM, Sage Weil wrote: >> > Xuan Liu recently pointed out that there is a problem with our

Re: Very slow recovery/peering with latest master

2015-09-24 Thread Gregory Farnum
On Wed, Sep 23, 2015 at 4:42 PM, Handzik, Joe wrote: > Ok. When configuring with ceph-disk, it does something nifty and actually > gives the OSD the uuid of the disk's partition as its fsid. I bootstrap off > that to get an argument to pass into the function you have

Re: full cluster/pool handling

2015-09-24 Thread John Spray
On Thu, Sep 24, 2015 at 7:26 PM, Gregory Farnum wrote: > On Thu, Sep 24, 2015 at 8:04 AM, Sage Weil wrote: >> On Thu, 24 Sep 2015, Robert LeBlanc wrote: >>> -BEGIN PGP SIGNED MESSAGE- >>> Hash: SHA256 >>> >>> >>> On Thu, Sep 24, 2015 at 6:30 AM, Sage

Re: full cluster/pool handling

2015-09-24 Thread Gregory Farnum
On Thu, Sep 24, 2015 at 1:36 PM, John Spray wrote: > On Thu, Sep 24, 2015 at 7:26 PM, Gregory Farnum wrote: >> That latter switch already exists, by the way, although I don't think >> it's actually enforced via cephx caps (it should be) — the Objecter >>

a patch to improve cephfs direct io performance

2015-09-24 Thread zhucaifeng
Hi, all When using cephfs, we find that cephfs direct io is very slow. For example, installing Windows 7 takes more than 1 hour on a virtual machine whose disk is a file in cephfs. The cause is that when doing direct io, both ceph_sync_direct_write and ceph_sync_read iterate iov elements one by

how to sepcify the point a rbd mapped?

2015-09-24 Thread Jaze Lee
Hello, I know we can map a rbd image to a block device into kernel by ‘rbd map’ command。 But we can not specify which block device. For example, if i have a rbd named rbd_0, i want it mapped into /dev/rbd_0 when it mapped. Is there some way to do that? -- 谦谦君子 -- To unsubscribe