Re: Is BlueFS an alternative of BlueStore?

2016-01-07 Thread Sage Weil
On Thu, 7 Jan 2016, Javen Wu wrote: > Hi Sage, > > Sorry to bother you. I am not sure if it is appropriate to send email to you > directly, but I cannot find any useful information to address my confusion > from Internet. Hope you can help me. > > Occasionally, I heard that you are going to

Re: Long peering - throttle at FileStore::queue_transactions

2016-01-04 Thread Sage Weil
On Mon, 4 Jan 2016, Guang Yang wrote: > Hi Cephers, > Happy New Year! I got question regards to the long PG peering.. > > Over the last several days I have been looking into the *long peering* > problem when we start a OSD / OSD host, what I observed was that the > two peering working threads

Re: Fwd: how io works when backfill

2015-12-29 Thread Sage Weil
, 1, 2] > > is my ananysis right? Yep! sage > > 2015-12-29 1:30 GMT+08:00 Sage Weil <s...@newdream.net>: > > On Mon, 28 Dec 2015, Zhiqiang Wang wrote: > >> 2015-12-27 20:48 GMT+08:00 Dong Wu <archer.wud...@gmail.com>: > >> > Hi, > >>

Re: Fwd: how io works when backfill

2015-12-28 Thread Sage Weil
On Mon, 28 Dec 2015, Zhiqiang Wang wrote: > 2015-12-27 20:48 GMT+08:00 Dong Wu : > > Hi, > > When add osd or remove osd, ceph will backfill to rebalance data. > > eg: > > - pg1.0[1, 2, 3] > > - add an osd(eg. osd.7) > > - ceph start backfill, then pg1.0 osd set changes

Re: How to configure if there are tow network cards in Client

2015-12-28 Thread Sage Weil
On Fri, 25 Dec 2015, ?? wrote: > Hi all, > When we read the code, we haven?t find the function that the client can > bind a specific IP. In Ceph?s configuration, we could only find the parameter > ?public network?, but it seems acts on the OSD but not the client. > There is a scenario

Re: [ceph-users] why not add (offset,len) to pglog

2015-12-25 Thread Sage Weil
On Fri, 25 Dec 2015, Ning Yao wrote: > Hi, Dong Wu, > > 1. As I currently work for other things, this proposal is abandon for > a long time > 2. This is a complicated task as we need to consider a lots such as > (not just for writeOp, as well as truncate, delete) and also need to > consider the

Re: Fwd: Client still connect failed leader after that mon down

2015-12-21 Thread Sage Weil
ection mode. > >> > >> After we back ported non-blocking mode of async msg from higher ceph > >> version, we haven't encountered such issue yet. > >> > >> > >> Regards, > >> Zhi Zhang (David) > >> Contact: zhang.david2...@gmail.com >

Re: Client still connect failed leader after that mon down

2015-12-17 Thread Sage Weil
On Thu, 17 Dec 2015, Jaze Lee wrote: > Hello cephers: > In our test, there are three monitors. We find client run ceph > command will slow when the leader mon is down. Even after long time, a > client run ceph command will also slow in first time. > >From strace, we find that the client first

Re: puzzling disapearance of /dev/sdc1

2015-12-17 Thread Sage Weil
On Thu, 17 Dec 2015, Loic Dachary wrote: > Hi Ilya, > > This is another puzzling behavior (the log of all commands is at > http://tracker.ceph.com/issues/14094#note-4). in a nutshell, after a > series of sgdisk -i commands to examine various devices including > /dev/sdc1, the /dev/sdc1 file

Re: cmake

2015-12-16 Thread Sage Weil
On Wed, 16 Dec 2015, John Spray wrote: > On Wed, Dec 16, 2015 at 5:33 PM, Sage Weil <sw...@redhat.com> wrote: > > The work to transition to cmake has stalled somewhat. I've tried to use > > it a few times but keep running into issues that make it unusable for me. >

cmake

2015-12-16 Thread Sage Weil
The work to transition to cmake has stalled somewhat. I've tried to use it a few times but keep running into issues that make it unusable for me. Not having make check is a big one, but I think the hackery required to get that going points to the underlying problem(s). I seems like the main

Re: Improving Data-At-Rest encryption in Ceph

2015-12-16 Thread Sage Weil
On Wed, 16 Dec 2015, Adam Kupczyk wrote: > On Tue, Dec 15, 2015 at 3:23 PM, Lars Marowsky-Bree wrote: > > On 2015-12-14T14:17:08, Radoslaw Zarzynski wrote: > > > > Hi all, > > > > great to see this revived. > > > > However, I have come to see some concerns

Re: Improving Data-At-Rest encryption in Ceph

2015-12-15 Thread Sage Weil
I agree with Lars's concerns: the main problems with the current dm-crypt approach are that there isn't any key management integration yet and the root volume and swap aren't encrypted. Those are easy to solve (and I'm hoping we'll be able to address them in time for Jewel). On the other hand,

Re: The max single write IOPS on single RBD

2015-12-11 Thread Sage Weil
On Fri, 11 Dec 2015, Zhi Zhang wrote: > Hi Guys, > > We have a small 4 nodes cluster. Here is the hardware configuration. > > 11 x 300GB SSD, 24 cores, 32GB memory per one node. > all the nodes connected within one 1Gb/s network. > > So we have one Monitor and 44 OSDs for testing kernel RBD

Re: [ceph-users] Client io blocked when removing snapshot

2015-12-10 Thread Sage Weil
On Thu, 10 Dec 2015, Jan Schermer wrote: > Removing snapshot means looking for every *potential* object the snapshot can > have, and this takes a very long time (6TB snapshot will consist of 1.5M > objects (in one replica) assuming the default 4MB object size). The same > applies to large thin

Re: Quering since when a PG is inactive

2015-12-09 Thread Sage Weil
Hi Wido! On Wed, 9 Dec 2015, Wido den Hollander wrote: > Hi, > > I'm working on a patch in PGMonitor.cc that sets the state to HEALTH_ERR > if >= X PGs are stuck non-active. > > This works for me now, but I would like to add a timer that a PG has to > be inactive for more than Y seconds. > >

Re: Filestore without journal

2015-12-09 Thread Sage Weil
On Wed, 9 Dec 2015, changtao381 wrote: > Hi Cephers, > > Why it is use journal with Filestore ? From my understand, it is used to > prevent partial write. > > In my view ,it is needn't journal for Filestore as to the scenario for EC > backend and for RGW object storage application. > > For EC

Re: problem about pgmeta object?

2015-12-09 Thread Sage Weil
On Wed, 9 Dec 2015, Ning Yao wrote: > The function and transactions corresponding with pgmeta object is listed > below: > touch()   and   remove()  for  pgmeta object creation and deletion > _omap_setkeys()  and  _omap_rmkeys()  to  update k/v data in omap for > pgmeta(pg_info, epoch,

Re: new OSD re-using old OSD id fails to boot

2015-12-09 Thread Sage Weil
On Wed, 9 Dec 2015, Wei-Chung Cheng wrote: > Hi Loic, > > I try to reproduce this problem on my CentOS7. > I can not do the same issue. > This is my version: > ceph version 10.0.0-928-g8eb0ed1 (8eb0ed1dcda9ee6180a06ee6a4415b112090c534) > Would you describe more detail? > > > Hi David, Sage, >

Re: new OSD re-using old OSD id fails to boot

2015-12-09 Thread Sage Weil
roken device?) > > > > That can replace the failure osd before it go into the `out` state. > > Or we could always set the osd noout? > > > > In fact, I think these is a different problems between David and Loic. > > (these two problems are the same import

Re: problem about pgmeta object?

2015-12-08 Thread Sage Weil
On Tue, 8 Dec 2015, Ning Yao wrote: > Umm, it seems that MemStore requires in memory meta object to keep the > attributes. So it is not a direct way to remove the pg_meta object > backend storage. Any suggestions? > I think we can just skip the pg_meta operation in FileStore api based > on the

Re: new OSD re-using old OSD id fails to boot

2015-12-08 Thread Sage Weil
On Tue, 8 Dec 2015, David Zafman wrote: > Remember I really think we want a disk replacement feature that would retain > the OSD id so that it avoids unnecessary data movement. See tracker > http://tracker.ceph.com/issues/13732 Yeah, I totally agree. We just need to form an opinion on how...

Re: OSD public / cluster network isolation using VRF:s

2015-12-07 Thread Sage Weil
On Mon, 7 Dec 2015, Martin Millnert wrote: > > Note that on a largish cluster the public/client traffic is all > > north-south, while the backend traffic is also mostly north-south to the > > top-of-rack and then east-west. I.e., within the rack, almost everything > > is north-south, and

[GIT PULL] Ceph update for -rc4

2015-12-04 Thread Sage Weil
Hi Linus, Please pull the following fix from git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git for-linus This addresses a refcounting bug that leads to a use-after-free. Thanks! sage Ilya Dryomov (1):

RE: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Sage Weil
1- I agree we should avoid shared_ptr whenever possible. 2- unique_ptr should not have any more overhead than a raw pointer--the compiler is enforcing the single-owner semantics. See for example https://msdn.microsoft.com/en-us/library/hh279676.aspx "It is exactly is efficient as a

Re: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Sage Weil
On Thu, 3 Dec 2015, Casey Bodley wrote: > > Well, yeah we are, it's just the actual Transaction structure which > > wouldn't be dynamic -- the buffers and many other fields would still > > hit the allocator. > > -Sam > > Sure. I was looking specifically at the tradeoffs between allocating > and

Re: OSD public / cluster network isolation using VRF:s

2015-12-03 Thread Sage Weil
On Thu, 3 Dec 2015, w...@42on.com wrote: > Why all the trouble and complexity? I personally always try to avoid the > two networks and run with one. Also in large L3 envs. > > I like the idea that one machine has one IP I have to monitor. > > I would rethink about what a cluster network really

ack vs commit

2015-12-03 Thread Sage Weil
>From the beginning Ceph has had two kinds of acks for rados write/update operations: ack (indicating the operation is accepted, serialized, and staged in the osd's buffer cache) and commit (indicating the write is durable). The client, if it saw a failure on the OSD before getting the

Re: Fwd: Fwd: [newstore (again)] how disable double write WAL

2015-12-01 Thread Sage Weil
Hi David, On Tue, 1 Dec 2015, David Casier wrote: > Hi Sage, > With a standard disk (4 to 6 TB), and a small flash drive, it's easy > to create an ext4 FS with metadata on flash > > Example with sdg1 on flash and sdb on hdd : > > size_of() { > blockdev --getsize $1 > } > > mkdmsetup() { >

Re: Compiling for FreeBSD

2015-12-01 Thread Sage Weil
anding issue that I know of. It breaks > >> interoperability between FreeBSD and Linux Ceph nodes. I posted a > >> patch to fix it, but it doesn't look like it's been merged yet. > >> http://tracker.ceph.com/issues/6636 > > > > > > In the issues I fin

Re: CodingStyle on existing code

2015-12-01 Thread Sage Weil
On Tue, 1 Dec 2015, Wido den Hollander wrote: > > On 01-12-15 16:00, Gregory Farnum wrote: > > On Tue, Dec 1, 2015 at 5:47 AM, Loic Dachary wrote: > >> > >> > >> On 01/12/2015 14:10, Wido den Hollander wrote: > >>> Hi, > >>> > >>> While working on mon/PGMonitor.cc I see that

Re: Compiling for FreeBSD

2015-12-01 Thread Sage Weil
On Tue, 1 Dec 2015, Willem Jan Withagen wrote: > On 1-12-2015 14:30, Sage Weil wrote: > > On Tue, 1 Dec 2015, Willem Jan Withagen wrote: > > > On 30-11-2015 14:21, Sage Weil wrote: > > > > The problem with all of the porting code in general is that it is doomed &

Re: Compiling for FreeBSD

2015-12-01 Thread Sage Weil
On Tue, 1 Dec 2015, Willem Jan Withagen wrote: > On 30-11-2015 14:21, Sage Weil wrote: > > The problem with all of the porting code in general is that it is doomed > > to break later on if we don't have (at least) ongoing build tests. In > > order for a FreeBSD or OSX port t

Re: Compiling for FreeBSD

2015-11-30 Thread Sage Weil
The problem with all of the porting code in general is that it is doomed to break later on if we don't have (at least) ongoing build tests. In order for a FreeBSD or OSX port to continue working we need VMs that run either gitbuilder or a jenkins job or similar so that we can tell when it

Re: How to open clog debug

2015-11-30 Thread Sage Weil
On Mon, 30 Nov 2015, Wukongming wrote: > Hi, All > > Does anyone know how to open clog debug? It's usually something like monc->clog.debug() << "hi there\n"; sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to

Re: why my cluster become unavailable (min_size of pool)

2015-11-26 Thread Sage Weil
> hzwulibin > 2015-11-26 > > - > ????"hzwulibin"<hzwuli...@gmail.com> > ?2015-11-23 09:00 > Sage Weil,Haomai Wang > ???ceph-devel > ???Re: why my cluster become unavailable > > Hi, Sage > > Thanks!

Re: rgw/civetweb privileged port bind

2015-11-26 Thread Sage Weil
On Thu, 26 Nov 2015, Karol Mroz wrote: > Hello, > > As I understand it, with the release of infernalis, ceph > daemons are no longer being run as root. Thus, rgw/civetweb > is unable to bind to privileged ports: > > http://tracker.ceph.com/issues/13600 > > We encountered this problem as well in

Re: Cache Tiering Investigation and Potential Patch

2015-11-25 Thread Sage Weil
On Wed, 25 Nov 2015, Nick Fisk wrote: > Presentation from the performance meeting. > > I seem to be unable to post to Ceph-devel, so can someone please repost > there if useful. Copying ceph-devel. The problem is just that your email is HTML-formatted. If you send it in plaintext vger won't

RE: Cache Tiering Investigation and Potential Patch

2015-11-25 Thread Sage Weil
On Wed, 25 Nov 2015, Nick Fisk wrote: > Hi Sage > > > -Original Message- > > From: Sage Weil [mailto:s...@newdream.net] > > Sent: 25 November 2015 17:38 > > To: Nick Fisk <n...@fisk.me.uk> > > Cc: 'ceph-users' <ceph-us...@lists.ceph.com&g

RE: Cache Tiering Investigation and Potential Patch

2015-11-25 Thread Sage Weil
On Wed, 25 Nov 2015, Nick Fisk wrote: > > > Yes I think that should definitely be an improvement. I can't quite > > > get my head around how it will perform in instances where you miss 1 > > > hitset but all others are a hit. Like this: > > > > > > H H H M H H H H H H H H > > > > > > And recency

Re: cluster busy, cause heartbeat exceptional, cluster becomes more busy

2015-11-25 Thread Sage Weil
On Wed, 25 Nov 2015, Chenxiaowei wrote: > We met another serious problem as follows: > > During backfill,rbd client send ops to cluster, slow request came up, and so > > When osd heartbeat came in,  check cct->get_heartbeat_map()->is_healthy() > return false, > > So other osd will not

Re: Fwd: [newstore (again)] how disable double write WAL

2015-11-24 Thread Sage Weil
e work together ? > > Regards, > Sébastien > > > Début du message réexpédié : > > > > De: David Casier <david.cas...@aevoo.fr> > > Date: 12 octobre 2015 20:52:26 UTC+2 > > À: Sage Weil <s...@newdream.net>, Ceph Development > > <c

Re: Multiple OSDs suicide because of client issues?

2015-11-23 Thread Sage Weil
On Mon, 23 Nov 2015, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > We set the debugging to 0/0, but are you talking about lines like: > >-12> 2015-11-20 20:59:47.138746 7f70067de700 -1 osd.177 103793 > heartbeat_check: no reply from osd.133 since back

Re: Multiple OSDs suicide because of client issues?

2015-11-23 Thread Sage Weil
On Mon, 23 Nov 2015, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > Is there a way through the admin socket or inject args that can tell > the OSD process to dump the in memory logs without crashing? Do you Yep, 'ceph daemon osd.NN log dump'. > have an idea of the

Re: Crc32 Challenge

2015-11-23 Thread Sage Weil
On Mon, 23 Nov 2015, Gregory Farnum wrote: > On Tue, Nov 17, 2015 at 10:51 AM, chris holcombe > wrote: > > Hello Ceph Devs, > > > > I'm almost certain at this point that I have discovered a major bug in > > ceph's crc32c mechanism.

Re: why my cluster become unavailable

2015-11-21 Thread Sage Weil
On Sun, 22 Nov 2015, Haomai Wang wrote: > On Thu, Nov 19, 2015 at 11:26 PM, Libin Wu wrote: > > Hi, cepher > > > > I have a cluster of 6 OSD server, every server has 8 OSDs. > > > > I out 4 OSDs on every server, then my client io is blocking. > > > > I reboot my client and

Re: OSD replacement feature

2015-11-20 Thread Sage Weil
On Fri, 20 Nov 2015, Wei-Chung Cheng wrote: > Hi Loic and cephers, > > Sure, I have time to help (comment) on this feature replace a disk. > This is a useful feature to handle disk failure :p > > An simple step is described on http://tracker.ceph.com/issues/13732 : > 1. set noout flag - if the

Re: problem about pgmeta object?

2015-11-18 Thread Sage Weil
On Wed, 18 Nov 2015, Ning Yao wrote: > Hi, Sage > > pgmeta object is a meta-object (like __head___2) without > significant information. It is created when in PG::_init() when > handling pg_create and split_coll and always exits there during pg's > life cycle until pg is removed in

Re: [CEPH] OSD daemons running with a large number of threads

2015-11-17 Thread Sage Weil
On Tue, 17 Nov 2015, ghislain.cheval...@orange.com wrote: > Hi, > > Context: > Firefly 0.80.9 > Ubuntu 14.04.1 > Almost a production platform in an openstack environment > 176 OSD (SAS and SSD), 2 crushmap-oriented storage classes , 8 servers in 2 > rooms, 3 monitors on openstack controllers >

Re: scrub randomization and load threshold

2015-11-16 Thread Sage Weil
On Mon, 16 Nov 2015, Dan van der Ster wrote: > Instead of keeping a 24hr loadavg, how about we allow scrubs whenever > the loadavg is decreasing (or below the threshold)? As long as the > 1min loadavg is less than the 15min loadavg, we should be ok to allow > new scrubs. If you agree I'll add the

Re: a problem about FileStore::_destroy_collection

2015-11-16 Thread Sage Weil
On Mon, 16 Nov 2015, yangruifeng.09...@h3c.com wrote: > an ENOTEMPTY error mybe happen when removing a pg in previous > versions?but the error is hidden in new versions? When did this change? sage > _destroy_collection maybe return 0 when get_index or prep_delete return < 0; > > is this

Re: scrub randomization and load threshold

2015-11-16 Thread Sage Weil
On Mon, 16 Nov 2015, Dan van der Ster wrote: > On Mon, Nov 16, 2015 at 4:58 PM, Dan van der Ster <d...@vanderster.com> wrote: > > On Mon, Nov 16, 2015 at 4:32 PM, Dan van der Ster <d...@vanderster.com> > > wrote: > >> On Mon, Nov 16, 2015 at 4:20 PM,

Re: Newly added monitor infinitely sync store

2015-11-16 Thread Sage Weil
te: > > Thanks Sage! I will definitely try those patches. > > > > For this one, I finally managed to bring the new monitor in by > > increasing the mon_sync_timeout from its default 60 to 6 to make > > sure the syncing does not restart and result in an infinite loop.

Re: Newly added monitor infinitely sync store

2015-11-13 Thread Sage Weil
On Fri, 13 Nov 2015, Guang Yang wrote: > I was wrong the previous analysis, it was not the iterator got reset, > the problem I can see now, is that during the syncing, a new round of > election kicked off and thus it needs to probe the newly added > monitor, however, since it hasn't been synced

Re: Newly added monitor infinitely sync store

2015-11-13 Thread Sage Weil
On Fri, 13 Nov 2015, Guang Yang wrote: > Thanks Sage! > > On Fri, Nov 13, 2015 at 4:15 PM, Sage Weil <s...@newdream.net> wrote: > > On Fri, 13 Nov 2015, Guang Yang wrote: > >> I was wrong the previous analysis, it was not the iterator got reset, > >> th

Re: data-at-rest compression

2015-11-13 Thread Sage Weil
On Fri, 13 Nov 2015, Alyona Kiselyova wrote: > Hi, > I was working on pluggable compression interface in this work > (https://github.com/ceph/ceph/pull/6361). In Igor's pull request was > suggested to reuse common plugin infrastructure from unmerged > wip-plugin branch. Now I'm working on

Re: Notes from a discussion a design to allow EC overwrites

2015-11-13 Thread Sage Weil
On Thu, 12 Nov 2015, Samuel Just wrote: > I was present for a discussion about allowing EC overwrites and thought it > would be good to summarize it for the list: > > Commit Protocol: > 1) client sends write to primary > 2) primary reads in partial stripes needed for partial stripe > overwrites

Re: scrub randomization and load threshold

2015-11-12 Thread Sage Weil
On Thu, 12 Nov 2015, Dan van der Ster wrote: > Hi, > > Firstly, we just had a look at the new > osd_scrub_interval_randomize_ratio option and found that it doesn't > really solve the deep scrubbing problem. Given the default options, > > osd_scrub_min_interval = 60*60*24 > osd_scrub_max_interval

Re: scrub randomization and load threshold

2015-11-12 Thread Sage Weil
On Thu, 12 Nov 2015, Dan van der Ster wrote: > On Thu, Nov 12, 2015 at 2:29 PM, Sage Weil <s...@newdream.net> wrote: > > On Thu, 12 Nov 2015, Dan van der Ster wrote: > >> Hi, > >> > >> Firstly, we just had a look at the new > >> osd_scrub_interv

RE: [CEPH][Crush][Tunables] issue when updating tunables

2015-11-12 Thread Sage Weil
can trigger some data movement the next time the crush map is adjusted, so we leave it off to be conservative. And we want the profile to match exactly what setting the profile sets. But it's confusing since it isn't 1:1 with what clients support. And if it is a fresh cluster you are better off with

merge commits reminder

2015-11-11 Thread Sage Weil
Just a reminder: we'd like to generate the release changelog from the merge commits. Whenever merging a pull request, please remember to: - edit the first line to be what will will appear in the changelog. Prefix it with the subsystem and give it a short, meaningful description. - if the

Re: disabling buffer::raw crc cache

2015-11-11 Thread Sage Weil
On Wed, 11 Nov 2015, Ning Yao wrote: > 2015-11-11 21:13 GMT+08:00 Sage Weil <s...@newdream.net>: > > On Wed, 11 Nov 2015, Ning Yao wrote: > >> >>>the code logic would touch crc cache is bufferlist::crc32c and > >> >>>invalidate_crc. > >

Re: new scrub and repair discussion

2015-11-11 Thread Sage Weil
On Wed, 11 Nov 2015, kefu chai wrote: > currently, scrub and repair are pretty primitive. there are several > improvements which need to be made: > > - user should be able to initialize scrub of a PG or an object > - int scrub(pg_t, AioCompletion*) > - int scrub(const string& pool, const

Re: disabling buffer::raw crc cache

2015-11-11 Thread Sage Weil
On Wed, 11 Nov 2015, Ning Yao wrote: > >>>the code logic would touch crc cache is bufferlist::crc32c and > >>>invalidate_crc. > >>Also for pg_log::_write_log(), but seems it is always miss and use at > >>once, no need to cache crc actually? > > Oh, no, it will be hit in FileJournal writing >

Re: [CEPH][Crush][Tunables] issue when updating tunables

2015-11-10 Thread Sage Weil
On Tue, 10 Nov 2015, ghislain.cheval...@orange.com wrote: > Hi all, > > Context: > Firefly 0.80.9 > Ubuntu 14.04.1 > Almost a production platform in an openstack environment > 176 OSD (SAS and SSD), 2 crushmap-oriented storage classes , 8 servers in 2 > rooms, 3 monitors on openstack

RE: Cannot start osd due to permission of journal raw device

2015-11-09 Thread Sage Weil
ckage should install it in /lib/udev/rules.d or similar... sage > > -Original Message----- > > From: Sage Weil [mailto:s...@newdream.net] > > Sent: Friday, November 6, 2015 6:33 PM > > To: Chen, Xiaoxi > > Cc: ceph-devel@vger.kernel.org > > Subject: Re: C

Re: Help on ext4/xattr linux kernel stability issue / ceph xattr use?

2015-11-09 Thread Sage Weil
On Mon, 9 Nov 2015, Laurent GUERBY wrote: > Hi, > > Part of our ceph cluster is using ext4 and we recently hit major kernel > instability in the form of kernel lockups every few hours, issues > opened: > > http://tracker.ceph.com/issues/13662 > https://bugzilla.kernel.org/show_bug.cgi?id=107301

RE: Cannot start osd due to permission of journal raw device

2015-11-09 Thread Sage Weil
ample, https://github.com/ceph/ceph/blob/master/udev/95-ceph-osd.rules#L4-L5 sage > > > -Original Message----- > > From: Sage Weil [mailto:s...@newdream.net] > > Sent: Monday, November 9, 2015 9:18 PM > > To: Chen, Xiaoxi > > Cc: ceph-devel@vger.kernel.org >

Re: ceph encoding optimization

2015-11-09 Thread Sage Weil
On Mon, 9 Nov 2015, Gregory Farnum wrote: > On Wed, Nov 4, 2015 at 7:07 AM, Gregory Farnum wrote: > > The problem with this approach is that the encoded versions need to be > > platform-independent ? they are shared over the wire and written to > > disks that might get

There is no next; only jewel

2015-11-09 Thread Sage Weil
Hey everyone, Just a reminder that now that infernalis is out and we're back to focusing on jewel, we should send all bug fixes to the 'jewel' branch (which functions the same way the old 'next' branch did). That is, bug fixes -> jewel new features -> master Every dev release (hopefully

Re: ceph encoding optimization

2015-11-08 Thread Sage Weil
On Sat, 7 Nov 2015, Haomai Wang wrote: > Hi sage, > > Could we know about your progress to refactor MSubOP and hobject_t, > pg_stat_t decode problem? > > We could work on this based on your work if any. See Piotr's last email on this thead... it has Josh's patch attached. sage > > > On

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-08 Thread Sage Weil
IGNED MESSAGE- > > Hash: SHA256 > > > > On Fri, Nov 6, 2015 at 3:12 AM, Sage Weil wrote: > >> On Thu, 5 Nov 2015, Robert LeBlanc wrote: > >>> -BEGIN PGP SIGNED MESSAGE- > >>> Hash: SHA256 > >>> > >>> Thanks Greg

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-06 Thread Sage Weil
On Thu, 5 Nov 2015, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > Thanks Gregory, > > People are most likely busy and haven't had time to digest this and I > may be expecting more excitement from it (I'm excited due to the > results and probably also that such a

RE: Specify omap path for filestore

2015-11-06 Thread Sage Weil
can give it both an ssd and hdd. sage >   > > Thus for New-Newstore, we just focus on data pool? > >   > > From: Sage Weil [mailto:s...@newdream.net] > Sent: Friday, November 6, 2015 1:11 AM > To: Ning Yao; Chen, Xiaoxi > Cc: Xue, Chendi; Samuel Just; ceph-devel@vger

v9.2.0 Infernalis released

2015-11-06 Thread Sage Weil
* aarch64: add optimized version of crc32c (Yazen Ghannam, Steve Capper) * auth: cache/reuse crypto lib key objects, optimize msg signature check (Sage Weil) * auth: reinit NSS after fork() (#11128 Yan, Zheng) * autotools: fix out of tree build (Krxysztof Kosinski) * autotools: impr

dm-clock queue

2015-11-04 Thread Sage Weil
Hi Gunna, Eric- I wanted to make sure you were connected as we've talked to both of you independently about the new request queue in the OSD to support dm-clock and I want to make sure our efforts are coordinated. I thnk the first goal is probably to implement something that works and

Re: civetweb upstream/downstream divergence

2015-11-04 Thread Sage Weil
On Wed, 4 Nov 2015, Ken Dreyer wrote: > On Wed, Nov 4, 2015 at 1:25 PM, Ken Dreyer <kdre...@redhat.com> wrote: > > On Tue, Nov 3, 2015 at 4:22 AM, Sage Weil <sw...@redhat.com> wrote: > >> On Tue, 3 Nov 2015, Nathan Cutler wrote: > >>> IMHO the first step

Re: ceph encoding optimization

2015-11-04 Thread Sage Weil
On Wed, 4 Nov 2015, ??? wrote: > hi, all: > > I am focus on the cpu usage of ceph now. I find the struct (such > as pg_info_t , transaction and so on) encode and decode exhaust too > much cpu resource. > > For now, we should encode every member variable one by one which > calling

Re: civetweb upstream/downstream divergence

2015-11-03 Thread Sage Weil
On Tue, 3 Nov 2015, Nathan Cutler wrote: > IMHO the first step should be to get rid of the evil submodule. Arguably > the most direct path leading to this goal is to simply package up the > downstream civetweb (i.e. 1.6 plus all the downstream patches) for all > the supported distros. The

ordered writeback for rbd client cache

2015-11-02 Thread Sage Weil
Just found this: https://www.usenix.org/conference/fast13/technical-sessions/presentation/koller which should be helpful in constructing a persistent client-side writeback cache for RBD that preserves consistency. sage -- To unsubscribe from this list: send the line "unsubscribe

Re: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-11-01 Thread Sage Weil
On Sun, 1 Nov 2015, Sage Weil wrote: > On Sun, 1 Nov 2015, ??? wrote: > > Yes, I think so. > > keeping them separate and pass them to > > ObjectStore::queue_transactions() would increase the time on > > transaction encode process and exhaust more cpu. > > >

Re: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-10-31 Thread Sage Weil
On Sat, 31 Oct 2015, ??? wrote: > hi, all: > > There are two ObjectStore::Transaction in > ReplicatedBackend::submit_transaction, one is op_t and the other one > is local_t. Is that something > critilal logic we should consider? > > If we could reuse variable op_t it would be great.

Re: Fix OP dequeuing order

2015-10-28 Thread Sage Weil
On Wed, 28 Oct 2015, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > I created a pull request to fix an op dequeuing order problem. I'm not > sure if I need to mention it here. > > https://github.com/ceph/ceph/pull/6417 Wow, good catch. Have you found that this

Re: values of "ceph daemon osd.x perf dump objecters " are zero

2015-10-28 Thread Sage Weil
Objecter is the client side, but you're dumping stats on the osd. The only time it is used as a client there is with cache tiering. sage On Wed, 28 Oct 2015, Libin Wu wrote: > Hi, all > > As my understand, command "ceph daemon osd.x perf dump objecters" should > output the perf data of

Re: pg scrub check problem

2015-10-28 Thread Sage Weil
On Wed, 28 Oct 2015, changtao381 wrote: > Hi, > > I?m testing the deep-scrub function of ceph. And the test steps are below : > > 1) I put an object on ceph using command : > rados put test.txt test.txt ?p testpool > > The size of testpool is 3, so there three replicates on three osds: > >

Re: [Newstore] FIO Read's from Cient causes OSD *** Caught signal (Aborted) **

2015-10-28 Thread Sage Weil
Hi Vish- This is not too surprising, but I am inclined to ignore it for now: i'm in the midst of a major rewrite anyway to use a raw block device instead of the file system. sage On Wed, 28 Oct 2015, Vish (Vishwanath) Maram-SSI wrote: > Hi, > > We are observing a crash of OSD whenever we

Re: why package ceph-fuse needs packages ceph?

2015-10-26 Thread Sage Weil
On Mon, 26 Oct 2015, Jaze Lee wrote: > Hello, > I think the ceph-fuse is just a client, why it needs packages ceph? > I found when i install ceph-fuse, it will install package ceph. > But when i install ceph-common, it will not install package ceph. > > May be ceph-fuse is not

v0.94.5 Hammer released

2015-10-26 Thread Sage Weil
cache read (#13559, Jason Dillaman) * osd: osd/ReplicatedPG: remove stray debug line (#13455, Sage Weil) * tests: qemu workunit refers to apt-mirror.front.sepia.ceph.com (#13420, Yuan Zhou) For the complete changelog, see http://docs.ceph.com/docs/master/_downloads/v0.94.5.txt Getting Ceph

RE: Lock contention in do_rule

2015-10-24 Thread Sage Weil
Message- > From: ceph-devel-ow...@vger.kernel.org > [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy > Sent: Friday, October 23, 2015 7:02 PM > To: Sage Weil > Cc: ceph-devel@vger.kernel.org > Subject: RE: Lock contention in do_rule > > Thanks for the clari

[GIT PULL] Ceph updates for -rc7

2015-10-23 Thread Sage Weil
Hi Linus, Please pull the following two patches from git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git for-linus One is a stopgap to prevent a stack blowout when users have a deep chain of image clones. (We'll rewrite this code to be non-recursive for the next window, but

Re: Really slow cache-evict

2015-10-23 Thread Sage Weil
On Fri, 23 Oct 2015, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > We are testing out cache tiering, but when evicting the cache on an > idle cluster it is extremely slow (10 objects per minutes). Looking at > top some of the OSD processes are busy, but the disks

Re: Lock contention in do_rule

2015-10-23 Thread Sage Weil
On Sat, 24 Oct 2015, Somnath Roy wrote: > Hi Sage, > We are seeing the following mapper_lock is heavily contended and commenting > out this lock is improving performance ~10 % (in the short circuit path). > This is called for every io from osd_is_valid_op_target(). > I looked into the code ,but,

Re: newstore direction

2015-10-22 Thread Sage Weil
On Wed, 21 Oct 2015, Ric Wheeler wrote: > You will have to trust me on this as the Red Hat person who spoke to pretty > much all of our key customers about local file systems and storage - customers > all have migrated over to using normal file systems under Oracle/DB2. > Typically, they use XFS

Re: MDS stuck in a crash loop

2015-10-22 Thread Sage Weil
On Thu, 22 Oct 2015, John Spray wrote: > On Thu, Oct 22, 2015 at 1:43 PM, Milosz Tanski wrote: > > On Wed, Oct 21, 2015 at 5:33 PM, John Spray wrote: > >> On Wed, Oct 21, 2015 at 10:33 PM, John Spray wrote: > John, I know you've got >

Re: newstore direction

2015-10-21 Thread Sage Weil
On Wed, 21 Oct 2015, Ric Wheeler wrote: > On 10/21/2015 04:22 AM, Orit Wasserman wrote: > > On Tue, 2015-10-20 at 14:31 -0400, Ric Wheeler wrote: > > > On 10/19/2015 03:49 PM, Sage Weil wrote: > > > > The current design is based on two simple ideas: > > >

librbd regression with Hammer v0.94.4 -- use caution!

2015-10-21 Thread Sage Weil
There is a regression in librbd in v0.94.4 that can cause VMs to crash. For now, please refrain from upgrading hypervisor nodes or other librbd users to v0.94.4. http://tracker.ceph.com/issues/13559 The problem does not affect server-side daemons (ceph-mon, ceph-osd, etc.). Jason's

Re: newstore direction

2015-10-21 Thread Sage Weil
On Tue, 20 Oct 2015, Ric Wheeler wrote: > > Now: > > 1 io to write a new file > >1-2 ios to sync the fs journal (commit the inode, alloc change) > >(I see 2 journal IOs on XFS and only 1 on ext4...) > > 1 io to commit the rocksdb journal (currently 3, but will drop to >

Re: what does ms_objecter do in OSD ?

2015-10-21 Thread Sage Weil
On Wed, 21 Oct 2015, Jaze Lee wrote: > Hello, >I find this messenger do not bind to any ip, so i do not know why we do > that. >Does any one know what ms_object can do ? Thanks a lot. It is the librados client that is used by the rados copy-from operation and for cache tiering (to

Re: newstore direction

2015-10-20 Thread Sage Weil
On Tue, 20 Oct 2015, Haomai Wang wrote: > On Tue, Oct 20, 2015 at 3:49 AM, Sage Weil <sw...@redhat.com> wrote: > > The current design is based on two simple ideas: > > > > 1) a key/value interface is better way to manage all of our internal > > metadata (object me

RE: newstore direction

2015-10-20 Thread Sage Weil
api and storing the data directly on a block/page interface makes more sense to me. sage > > -Original Message- > > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > > ow...@vger.kernel.org] On Behalf Of James (Fei) Liu-SSI > > Sent: Tuesday, October 20,

  1   2   3   4   5   6   7   8   9   10   >