hammer mon failure

2016-01-05 Thread Samuel Just
http://tracker.ceph.com/issues/14236 New hammer mon failure in the nightlies (missing a map apparently?), can you take a look? -Sam -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at

Re: OSD data file are OSD logs

2016-01-04 Thread Samuel Just
IIRC, you are running giant. I think that's the log rotate dangling fd bug (not fixed in giant since giant is eol). Fixed upstream 8778ab3a1ced7fab07662248af0c773df759653d, firefly backport is b8e3f6e190809febf80af66415862e7c7e415214. -Sam On Mon, Jan 4, 2016 at 3:37 PM, Guang Yang

Re: Long peering - throttle at FileStore::queue_transactions

2016-01-04 Thread Samuel Just
We need every OSDMap persisted before persisting later ones because we rely on there being no holes for a bunch of reasons. The deletion transactions are more interesting. It's not part of the boot process, these are deletions resulting from merging in a log from a peer which logically removed

Re: Notes from a discussion a design to allow EC overwrites

2015-12-14 Thread Samuel Just
gt; subsequent release (i.e., I'm out of large blocks, but there's plenty of >> fragmented available space -- This can happen, but's a pretty pathological >> case which becomes rare-er and rare-er as you scale-out) >> >> Allen Samuels >> Software Architect, Emerging Storage S

Re: ceph-osd mem usage growth

2015-12-10 Thread Samuel Just
The short answer is that you aren't supposed to store large things in xattrs at all. If you feel it's a "vulnerability", than we could add a config option to reject xattrs over a particular size. -Sam On Thu, Dec 10, 2015 at 8:24 AM, Igor Fedotov wrote: > Hi Cephers, > >

Re: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Samuel Just
Well, yeah we are, it's just the actual Transaction structure which wouldn't be dynamic -- the buffers and many other fields would still hit the allocator. -Sam On Thu, Dec 3, 2015 at 11:29 AM, Casey Bodley wrote: > > - Original Message - >> Eh, Sage had a point that

Re: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Samuel Just
Eh, Sage had a point that Transaction has a bunch of little fields which would have to be filled in -- its move constructor would be less trivial than unique_ptr's. -Sam On Thu, Dec 3, 2015 at 11:12 AM, Adam C. Emerson wrote: > On 03/12/2015, Casey Bodley wrote: > [snip] >>

Re: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Samuel Just
>From a simplicity point of view, I'd rather just move a Transaction object than use a unique_ptr. Maybe the overhead doesn't end up being significant? -Sam On Thu, Dec 3, 2015 at 1:23 PM, Casey Bodley wrote: > > - Original Message - >> On Thu, 3 Dec 2015, Casey

Re: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Samuel Just
. If there is an actual user which requires the transaction to stick around afterwards *and* cannot afford to copy it, then we can talk about a design which accomplishes that. -Sam On Thu, Dec 3, 2015 at 9:50 AM, Samuel Just <sj...@redhat.com> wrote: > As far as I know, there are no current us

Re: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Samuel Just
and if it is using Transaction object afterwards. > > Thanks & Regards > Somnath > -Original Message- > From: Adam C. Emerson [mailto:aemer...@redhat.com] > Sent: Thursday, December 03, 2015 9:25 AM > To: Somnath Roy > Cc: Sage Weil; Samuel Just (sam.j...@inktank.

Re: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Samuel Just
e are getting.. > > Thanks & Regards > Somnath > > -Original Message- > From: Adam C. Emerson [mailto:aemer...@redhat.com] > Sent: Thursday, December 03, 2015 9:17 AM > To: Somnath Roy > Cc: Casey Bodley; Sage Weil; Samuel Just (sam.j...@inktank.com); > ceph-deve

Re: OSD memory usage during startup - advice needed

2015-11-19 Thread Samuel Just
Actually, looks like Xiaoxi beat you to it for infernalis! 42a3ab95ec459042e92198fb061c8393146bd1b4 -Sam On Thu, Nov 19, 2015 at 12:30 PM, Marcin Gibuła wrote: >> Judging from debug output, the problem is in journal recovery, when it >> tries to delete object with huge

Re: Notes from a discussion a design to allow EC overwrites

2015-11-13 Thread Samuel Just
carefully about the implications of that. -Sam On Fri, Nov 13, 2015 at 5:35 AM, Sage Weil <sw...@redhat.com> wrote: > On Thu, 12 Nov 2015, Samuel Just wrote: >> I was present for a discussion about allowing EC overwrites and thought it >> would be good to summarize it for the list:

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Samuel Just
--- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Mon, Nov 9, 2015 at 9:49 AM, Samuel Just <sj...@redhat.com> wrote: >> It's partially in the unified queue. The primary's background work >> for kicking off a reco

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Samuel Just
It's partially in the unified queue. The primary's background work for kicking off a recovery operation is not in the unified queue, but the messages to the replicas (pushes, pull, backfill scans) as well as their replies are in the unified queue as normal messages. I've got a branch moving the

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Samuel Just
On Mon, Nov 9, 2015 at 12:31 PM, Robert LeBlanc <rob...@leblancnet.us> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > On Mon, Nov 9, 2015 at 12:47 PM, Samuel Just wrote: >> What I really want from PrioritizedQueue (and from the dmclock/mclock >>

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Samuel Just
Ac76SoPyDSTmAcMVt0tj > /1BQAnk/I5rlCL5CKTxb2LR1/5WJt0eh7xtyKU1B0yh4G7JlMf/3kmrznOWu > VEUUA3mJ1depDToadnECnCZMKHrGYC36XCy8xq3FDqhvl4BWV0VMA+yi1uhj > zZ5udKKbN5Cxo/Sc48DG8wz9lQKn4LPCH2PD81oTcTfyd1iG2oNNkchrXa6K > iwed > =WjDS > -----END PGP SIGNATURE- > > Robert LeBlanc > PGP Fingerpr

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-09 Thread Samuel Just
On Mon, Nov 9, 2015 at 1:30 PM, Robert LeBlanc <rob...@leblancnet.us> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > On Mon, Nov 9, 2015 at 1:49 PM, Samuel Just wrote: >> We basically don't want a single thread to see all of the operations -- it >

Re: Question about how rebuild works.

2015-11-06 Thread Samuel Just
; object ops we'd have to compare against both of those values. But I'm > not sure how many sites that's likely to be, what other kinds of paths > rely on last_backfill_started, or if I'm missing something. > -Greg > > On Fri, Nov 6, 2015 at 8:30 AM, Samuel Just <sj...@redhat.com&

Re: Request for Comments: Weighted Round Robin OP Queue

2015-11-04 Thread Samuel Just
I didn't look into it closely, but that almost certainly means that your queue is reordering primary->replica replicated write messages. -Sam On Wed, Nov 4, 2015 at 8:54 AM, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > I've got some rough

Re: Specify omap path for filestore

2015-11-02 Thread Samuel Just
er.kernel.org [mailto:ceph-devel- >> ow...@vger.kernel.org] On Behalf Of Xue, Chendi >> Sent: Friday, October 30, 2015 10:05 AM >> To: 'Samuel Just' >> Cc: ceph-devel@vger.kernel.org >> Subject: Specify omap path for filestore >> >> Hi, Sam >> >> Last week I i

Re: Specify omap path for filestore

2015-11-02 Thread Samuel Just
The osd keeps some metadata in the leveldb store, so you don't want to delete it. I'm still not clear on why pg data being there causes trouble. -Sam On Mon, Nov 2, 2015 at 10:26 AM, Samuel Just <sj...@redhat.com> wrote: > Maybe, I figured that the call to DBObjectMap::sync in FileSt

Re: 答复: 答复: 答复: another peering stuck caused by net problem.

2015-11-02 Thread Samuel Just
.09...@h3c.com <yangruifeng.09...@h3c.com> wrote: > ok. > > thanks > Ruifeng Yang > > -邮件原件- > 发件人: Samuel Just [mailto:sj...@redhat.com] > 发送时间: 2015年11月3日 9:03 > 收件人: yangruifeng 09209 (RD) > 抄送: chenxiaowei 11245 (RD); Sage Weil (sw...@redhat.com) >

Re: 答复: 答复: 答复: 答复: another peering stuck caused by net problem.

2015-11-02 Thread Samuel Just
> [mailto:ceph-devel-ow...@vger.kernel.org] 代表 Samuel Just > 发送时间: 2015年11月3日 9:12 > 收件人: yangruifeng 09209 (RD) > 抄送: chenxiaowei 11245 (RD); Sage Weil (sw...@redhat.com); > ceph-devel@vger.kernel.org > 主题: Re: 答复: 答复: 答复: another peering stuck caused by net problem. > > The prob

Re: 答复: 答复: 答复: 答复: 答复: another peering stuck caused by net problem.

2015-11-02 Thread Samuel Just
are related to > peering is correctly received by peers? > > thanks > Ruifeng Yang. > > -----邮件原件- > 发件人: Samuel Just [mailto:sj...@redhat.com] > 发送时间: 2015年11月3日 9:28 > 收件人: yangruifeng 09209 (RD) > 抄送: chenxiaowei 11245 (RD); Sage Weil (sw...@redhat.com); > ceph-d

Re: PG: all requests stuck when acting set < min_size

2015-10-28 Thread Samuel Just
to not have committed those updates. -Sam On Tue, Oct 27, 2015 at 5:19 PM, Brad Hubbard <bhubb...@redhat.com> wrote: > - Original Message - >> From: "Samuel Just" <sj...@redhat.com> >> To: "Gregory Farnum" <gfar...@redhat.com> >> Cc: &qu

Re: Async reads, sync writes, op thread model discussion

2015-10-28 Thread Samuel Just
the overhead and the performance improvement for an async backend. -Sam On Fri, Aug 28, 2015 at 2:25 PM, Samuel Just <sj...@redhat.com> wrote: > Oh, yeah, we'll definitely test for correctness for async reads on > filestore, I'm just worried about validating the performance > assumpti

Re: PG: all requests stuck when acting set < min_size

2015-10-27 Thread Samuel Just
Actually, we really can't accept reads below min_size and still keep the properties we want it to have. Suppose we have 3 osds (a, b, and c) which see writes 0...1000. min_size is 2. If a and b are then powered off only having committed up to 900 (therefore the client could only have seen up to

Re: newstore direction

2015-10-22 Thread Samuel Just
Since the changes which moved the pg log and the pg info into the pg object space, I think it's now the case that any transaction submitted to the objectstore updates a disjoint range of objects determined by the sequencer. It might be easier to exploit that parallelism if we control allocation

Re: newstore direction

2015-10-22 Thread Samuel Just
Ah, except for the snapmapper. We can split the snapmapper in the same way, though, as long as we are careful with the name. -Sam On Thu, Oct 22, 2015 at 4:42 PM, Samuel Just <sj...@redhat.com> wrote: > Since the changes which moved the pg log and the pg info into the pg > object sp

Re: Ceph erasure coding

2015-10-22 Thread Samuel Just
Not on purpose... out of curiosity, why do you want to do that? -Sam On Thu, Oct 22, 2015 at 9:44 AM, Kjetil Babington wrote: > Hi, > > I have a question about the capabilities of the erasure coding API in > Ceph. Let's say that I have 10 data disks and 4 parity disks, is it

Re: rados and the next hammer release v0.94.4

2015-10-06 Thread Samuel Just
The failure labeled 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 5' is a bit odd, the osd crashed due to 0> 2015-10-03 05:21:10.776554 7fce619f0700 -1 common/HeartbeatMap.cc: In function 'bool

Re: Data-at-rest compression at EC pools blueprint

2015-09-30 Thread Samuel Just
Seems like a reasonable start. Quick back-of-the-envelope calculation suggests 2k of (logical_offset, compressed_offset) pairs per 1MB of data with 8MB/pair and 4k chunks, which is probably ok to stuff into an xattr. You should restructure the blueprint to make it independent of EC.

Re: Adding Data-At-Rest compression support to Ceph

2015-09-24 Thread Samuel Just
The catch is that currently accessing 4k in the middle of a 4MB object does not require reading the whole object, so you'd need some kind of logical offset -> compressed offset mapping. -Sam On Thu, Sep 24, 2015 at 10:36 AM, Robert LeBlanc wrote: > -BEGIN PGP SIGNED

Re: Adding Data-At-Rest compression support to Ceph

2015-09-23 Thread Samuel Just
I think before moving forward with any sort of implementation, the design would need to be pretty much completely mapped out -- particularly how the offset mapping will be handled and stored. The right thing to do would be to produce a blueprint and submit it to the list. I also would vastly

Re: perf counters from a performance discrepancy

2015-09-23 Thread Samuel Just
Just to eliminate a variable, can you reproduce this on master, first with the simple messenger, and then with the async messenger? (make sure to switch the messengers on all daemons and clients, just put it in the [global] section on all configs). -Sam On Wed, Sep 23, 2015 at 1:05 PM, Deneau,

async messenger peering hang

2015-09-23 Thread Samuel Just
I'm seeing some rados runs stuck on peering messages not getting sent by the async messenger: http://tracker.ceph.com/issues/13213. Can you take a look? -Sam -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More

Re: Very slow recovery/peering with latest master

2015-09-23 Thread Samuel Just
Wow. Why would that take so long? I think you are correct that it's only used for metadata, we could just add a config value to disable it. -Sam On Wed, Sep 23, 2015 at 3:48 PM, Somnath Roy wrote: > Sam/Sage, > I debugged it down and found out that the >

Re: [ceph-users] Potential OSD deadlock?

2015-09-22 Thread Samuel Just
I looked at the logs, it looks like there was a 53 second delay between when osd.17 started sending the osd_repop message and when osd.13 started reading it, which is pretty weird. Sage, didn't we once see a kernel issue which caused some messages to be mysteriously delayed for many 10s of

Re: Async reads, sync writes, op thread model discussion

2015-08-28 Thread Samuel Just
- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Samuel Just Sent: Thursday, August 27, 2015 4:22 PM To: Milosz Tanski Cc: Matt Benjamin; Haomai Wang; Yehuda Sadeh-Weinraub; Sage Weil; ceph-devel Subject: Re: Async reads, sync writes, op thread

Re: Async reads, sync writes, op thread model discussion

2015-08-27 Thread Samuel Just
://www.redhat.com/en/technologies/storage tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 - Original Message - From: Milosz Tanski mil...@adfin.com To: Haomai Wang haomaiw...@gmail.com Cc: Yehuda Sadeh-Weinraub ysade...@redhat.com, Samuel Just sj...@redhat.com, Sage Weil s

repushed next branch for now -- don't use it, use infernalis instead

2015-08-27 Thread Samuel Just
I repushed next to make the upgrade suites happy for tonight. -Sam -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: OSD::do_mon_report - do we need holding osd_lock

2015-08-18 Thread Samuel Just
Probably! A quick glance at do_mon_report doesn't seem to turn up anything I'd expect to be really hard to refactor. You do need to break out the required data (into OSDService, I'd think) so that the lock is not necessary. -Sam On Mon, Aug 17, 2015 at 6:10 PM, GuangYang yguan...@outlook.com

Async reads, sync writes, op thread model discussion

2015-08-11 Thread Samuel Just
Currently, there are some deficiencies in how the OSD maps ops onto threads: 1. Reads are always syncronous limiting the queue depth seen from the device and therefore the possible parallelism. 2. Writes are always asyncronous forcing even very fast writes to be completed in a seperate

Re: Async reads, sync writes, op thread model discussion

2015-08-11 Thread Samuel Just
2015, Samuel Just wrote: Currently, there are some deficiencies in how the OSD maps ops onto threads: 1. Reads are always syncronous limiting the queue depth seen from the device and therefore the possible parallelism. 2. Writes are always asyncronous forcing even very fast writes

Re: c++11 merged in master

2015-08-07 Thread Samuel Just
lambdas and autos for everyone! -Sam On Fri, Aug 7, 2015 at 10:52 AM, Sage Weil sw...@redhat.com wrote: Yay! Thanks, Casey! This initially breaks the precise and el6 builds on master. I've fixed the precise gitbuilders with sudo apt-get install python-software-properties echo | sudo

Re: rados and the next hammer release v0.94.3

2015-08-04 Thread Samuel Just
Looks good! -Sam On Tue, Aug 4, 2015 at 3:23 AM, Loic Dachary l...@dachary.org wrote: Hi Sam, The backport for http://tracker.ceph.com/issues/12465 has been merged (https://github.com/ceph/ceph/pull/5405). Since http://tracker.ceph.com/issues/12410 and http://tracker.ceph.com/issues/12536

Re: [ceph-users] Is it safe to increase pg number in a production environment

2015-08-04 Thread Samuel Just
It will cause a large amount of data movement. Each new pg after the split will relocate. It might be ok if you do it slowly. Experiment on a test cluster. -Sam On Mon, Aug 3, 2015 at 12:57 AM, 乔建峰 scaleq...@gmail.com wrote: Hi Cephers, This is a greeting from Jevon. Currently, I'm

Re: Per Object Scrub

2015-08-04 Thread Samuel Just
That's the idea. The blueprint has some information about the design. I do want to add per-object scrub as part of the scrub/repair improvements. -Sam On Tue, Aug 4, 2015 at 9:34 AM, Adam Manzanares adam.manzana...@hgst.com wrote: I noticed that there is a blueprint for osd: scrub and repair

Re: radosgw - stuck ops

2015-08-04 Thread Samuel Just
What if instead the request had a marker that would cause the OSD to reply with EAGAIN if the pg is unhealthy? -Sam On Tue, Aug 4, 2015 at 8:41 AM, Yehuda Sadeh-Weinraub ysade...@redhat.com wrote: On Mon, Aug 3, 2015 at 6:53 PM, GuangYang yguan...@outlook.com wrote: Hi Yehuda, Recently with

C++11 and librados C++

2015-08-03 Thread Samuel Just
It seems like it's about time for us to make the jump to C++11. This is probably going to have an impact on users of the librados C++ bindings. It seems like such users would have to recompile code using the librados C++ libraries after upgrading the librados library version. Is that

Re: [ceph-users] Inconsistent PGs that ceph pg repair does not fix

2015-08-03 Thread Samuel Just
Hrm, that's certainly supposed to work. Can you make a bug? Be sure to note what version you are running (output of ceph-osd -v). -Sam On Mon, Aug 3, 2015 at 12:34 PM, Andras Pataki apat...@simonsfoundation.org wrote: Summary: I am having problems with inconsistent PG's that the 'ceph pg

Re: hammer mon regression I think, probably shouldn't do a hammer release until it's sorted

2015-07-29 Thread Samuel Just
Could also be a next/master regression, don't actually see anything recent in the mon in hammer. -Sam - Original Message - From: Samuel Just sj...@redhat.com To: Loic Dachary ldach...@redhat.com, Sage Weil sw...@redhat.com, Kefu Chai kc...@redhat.com, Joao Luis jl...@redhat.com Cc: Ceph

hammer mon regression I think, probably shouldn't do a hammer release until it's sorted

2015-07-29 Thread Samuel Just
http://tracker.ceph.com/issues/12410 Not sure what's going on yet, but the hammer mon doesn't seem to be sending the full map to the master osd properly. -Sam -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More

Re: Strange issue with CRUSH

2015-07-09 Thread Samuel Just
I've seen some odd teuthology in the last week or two which seems to be anomalous rjenkins hash behavior as well. http://tracker.ceph.com/issues/12231 -Sam - Original Message - From: Sage Weil sw...@redhat.com To: Gleb Borisov borisov.g...@gmail.com Cc: ceph-devel@vger.kernel.org Sent:

Re: ceph-objectstore-tool import failures

2015-07-07 Thread Samuel Just
the recovery temp objects at least aren't valuable to keep around. -Sam - Original Message - From: Sage Weil sw...@redhat.com To: Samuel Just sj...@redhat.com Cc: David Zafman dzaf...@redhat.com, ceph-devel@vger.kernel.org Sent: Tuesday, July 7, 2015 10:22:32 AM Subject: Re: ceph

Re: ceph-objectstore-tool import failures

2015-07-07 Thread Samuel Just
Sounds reasonable to me. -Sam - Original Message - From: David Zafman dzaf...@redhat.com To: Samuel Just sj...@redhat.com, Sage Weil sw...@redhat.com Cc: ceph-devel@vger.kernel.org Sent: Tuesday, July 7, 2015 1:56:36 PM Subject: Re: ceph-objectstore-tool import failures I'm going

librados clone_range

2015-06-23 Thread Samuel Just
ObjectWriteOperations currently allow you to perform a clone_range from another object with the same object locator. Years ago, rgw used this as part of multipart upload. Today, the implementation complicates the OSD considerably, and it doesn't appear to have any users left. Is there anyone

Re: firefly v0.80.10 QE validation status 6/15/2015

2015-06-15 Thread Samuel Just
I do not consider 11914 to be a blocker. -Sam - Original Message - From: Yuri Weinstein ywein...@redhat.com To: Ceph Development ceph-devel@vger.kernel.org Cc: Loic Dachary ldach...@redhat.com, Xinxin Shu xinxin@intel.com Sent: Monday, June 15, 2015 9:37:20 AM Subject: firefly

Rados multi-object transaction use cases

2015-06-12 Thread Samuel Just
In the Infernalis CDS, we had a session on RADOS multi-object transactions. I'd like to continue the discussion at the upcoming Jewel CDS. I thought I'd prime the discussion by asking: if librados supported multi-object read and write transactions, what would you use them for? Some idea of

Re: Teuthology error 'exception on parallel execution'

2015-06-04 Thread Samuel Just
If you look farther up in the logs, you'll probably find an earlier failure of something under the parallel task. -Sam - Original Message - From: Zhiqiang Wang zhiqiang.w...@intel.com To: ceph-devel@vger.kernel.org Sent: Wednesday, June 3, 2015 12:38:04 AM Subject: Teuthology error

Re: 'Racing read got wrong version' during proxy write testing

2015-06-04 Thread Samuel Just
log entries). -Sam - Original Message - From: Samuel Just sj...@redhat.com To: Zhiqiang Wang zhiqiang.w...@intel.com Cc: David Zafman dzaf...@redhat.com, Sage Weil sw...@redhat.com, ceph-devel@vger.kernel.org Sent: Thursday, June 4, 2015 11:12:14 AM Subject: Re: 'Racing read got wrong

Re: Adding chance_test_backfill_full thrasher in the ec tasks

2015-06-01 Thread Samuel Just
Yep, that should be included in the ec thrashing tests. -Sam - Original Message - From: shylesh kumar shylesh.mo...@gmail.com To: Samuel Just sj...@redhat.com, l...@dachary.org Cc: ceph-devel@vger.kernel.org Sent: Saturday, May 30, 2015 8:27:20 AM Subject: Adding chance_test_backfill_full

Discuss: New default recovery config settings

2015-05-29 Thread Samuel Just
Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io. We are talking about changing the defaults as follows: osd_max_backfills to 1 (from 10) osd_recovery_max_active to 3 (from 15) osd_recovery_op_priority to 1 (from

Re: Release pg.lock in build scrub map for primary

2015-03-31 Thread Samuel Just
We stopped dropping the lock when chunky scrub was added. I would prefer to not drop the lock. -Sam - Original Message - From: Sage Weil sw...@redhat.com To: Jianpeng Ma jianpeng...@intel.com Cc: sj...@redhat.com, ceph-devel@vger.kernel.org Sent: Tuesday, March 31, 2015 6:44:39 AM

Re: Bounding OSD memory requirements during peering/recovery

2015-03-13 Thread Samuel Just
I've opened a bug for this (http://tracker.ceph.com/issues/0), I bet it's related to the new logic for allowing recovery below min_size. Exactly what sha1 was running on the osds during this time period? -Sam On 03/13/2015 08:36 AM, Dan van der Ster wrote: On Fri, Mar 13, 2015 at 1:52

Re: Bounding OSD memory requirements during peering/recovery

2015-03-13 Thread Samuel Just
Also, are you certain that all were running the same version? -Sam On 03/13/2015 01:42 PM, Samuel Just wrote: I've opened a bug for this (http://tracker.ceph.com/issues/0), I bet it's related to the new logic for allowing recovery below min_size. Exactly what sha1 was running on the osds

OSD Sessions at CDS

2015-03-05 Thread Samuel Just
There were several OSD sessions at CDS on Wednesday, I'll try to summarize some of the key points. ==EC Pool Overwrite Support=== https://wiki.ceph.com/Planning/Blueprints/Infernalis/osd% 3A_erasure_coding_pool_overwrite_support One take away from the

Re: rados and the next firefly release

2015-03-02 Thread Samuel Just
Yeah, we don't need to worry about that one, good for QE to start. -Sam - Original Message - From: Loic Dachary l...@dachary.org To: Samuel Just sj...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Saturday, February 28, 2015 7:42:53 AM Subject: rados and the next

Re: Recovery question

2015-02-23 Thread Samuel Just
Message - From: Somnath Roy somnath@sandisk.com To: Sage Weil sw...@redhat.com Cc: Samuel Just (sam.j...@inktank.com) sam.j...@inktank.com, Ceph Development ceph-devel@vger.kernel.org Sent: Monday, February 23, 2015 2:00:04 PM Subject: RE: Recovery question Sage, I don't understand how osd

Re: dumpling integration branch for v0.67.12 ready for QE

2015-02-18 Thread Samuel Just
Yup, 10694 is a known bug in dumpling which we probably don't want to fix. The rados tests look ok to me I think. -Sam On Wed, Feb 18, 2015 at 9:38 AM, Yuri Weinstein ywein...@redhat.com wrote: Hi all I updated all issues in http://tracker.ceph.com/issues/10560 Based on what is listed

Re: dumpling integration branch for v0.67.12 ready for QE

2015-02-12 Thread Samuel Just
Yeah, the rados run had too much environmental noise to be useful. -Sam On Thu, Feb 12, 2015 at 2:06 PM, Yuri Weinstein ywein...@redhat.com wrote: I linked all issues related to this release testing to the ticket http://tracker.ceph.com/issues/10560 After the team leads make a call of those,

Re: Standardization of perf counters comments

2015-02-11 Thread Samuel Just
I agree with the non-optional part. -Sam On Wed, Feb 11, 2015 at 10:02 AM, Gregory Farnum g...@gregs42.com wrote: On Wed, Feb 11, 2015 at 9:33 AM, Alyona Kiseleva akisely...@mirantis.com wrote: Hi, I would like to propose something. There are a lot of perf counters in different places in

Re: K/V interface buffer transaction

2015-02-11 Thread Samuel Just
Well, the transaction is atomic, so if the key is set twice, you can certainly ignore the first one. -Sam On Wed, Feb 11, 2015 at 2:20 PM, Somnath Roy somnath@sandisk.com wrote: Hi, My code had a bug during printing log. I was using map to store the attribute keys in sorted order and that

Re: K/V interface buffer transaction

2015-02-10 Thread Samuel Just
In general, you do need to preserve the order. You are free to determine when re-ordering is safe though. -Sam On Tue, Feb 10, 2015 at 8:59 AM, Somnath Roy somnath@sandisk.com wrote: Hi Sage/Haomai, Our K/V store works best if the keys of the objects within a transaction are sorted. We

Re: Confused about SnapMapper::get_prefix

2015-02-09 Thread Samuel Just
:00 Samuel Just sam.j...@inktank.com: Should probably be cast to long unsigned with lX conversion specifier? -Sam On Tue, Feb 3, 2015 at 9:21 AM, Samuel Just sam.j...@inktank.com wrote: It looks like snapid_t is a uint64_t, but snprintf expects an unsigned there. -Sam On Tue, Feb

Re: scrub scheduling

2015-02-09 Thread Samuel Just
I think most of the noise so far has been about the fairly narrow problem that all of the osds tend to like to scrub at the same time. I think we could get a lot of the way to fixing that by simply randomizing the per-pg scrub schedule time when the pg registers itself for the next scrub. -Sam On

Re: Bucket index op - lock contention hang op threads

2015-02-05 Thread Samuel Just
Sure, a81f3e6e61abfc7eca7743a83bf4af810705b449. The intention was actually to allow writes on degraded objects for replicated pools (to avoid a 4k rbd write blocking on a 4mb recovery), but I think it solves this issue as well. -Sam On Thu, Feb 5, 2015 at 1:39 PM, GuangYang yguan...@outlook.com

Re: rados and the next dumpling release

2015-02-05 Thread Samuel Just
Yeah, that's probably good. -Sam On Thu, Feb 5, 2015 at 2:36 PM, Loic Dachary l...@dachary.org wrote: Hi Sam, The rados teuthology suite for the next dumpling release as found in https://github.com/ceph/ceph/commits/dumpling-backports came back green

Re: Bucket index op - lock contention hang op threads

2015-02-05 Thread Samuel Just
Recent changes already merged for hammer should prevent blocking the thread on the ondisk_read_lock by expanding the ObjectContext::rwstate lists mostly as you suggested. -Sam On Thu, Feb 5, 2015 at 1:36 AM, GuangYang yguan...@outlook.com wrote: Hi ceph-devel, In our ceph cluster (with rgw), we

Re: Confused about SnapMapper::get_prefix

2015-02-03 Thread Samuel Just
It looks like snapid_t is a uint64_t, but snprintf expects an unsigned there. -Sam On Tue, Feb 3, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: On Tue, Feb 3, 2015 at 4:12 AM, Ding Dinghua dingdinghu...@gmail.com wrote: Hi all: I don't understand why SnapMapper::get_prefix

Re: Confused about SnapMapper::get_prefix

2015-02-03 Thread Samuel Just
Should probably be cast to long unsigned with lX conversion specifier? -Sam On Tue, Feb 3, 2015 at 9:21 AM, Samuel Just sam.j...@inktank.com wrote: It looks like snapid_t is a uint64_t, but snprintf expects an unsigned there. -Sam On Tue, Feb 3, 2015 at 9:15 AM, Gregory Farnum g...@gregs42

Re: Supporting partial writes for EC backend

2015-01-30 Thread Samuel Just
Basically, it's an architectural choice: https://github.com/ceph/ceph/blob/master/doc/dev/osd_internals/erasure_coding/pgbackend.rst#client-writes If we wanted to support partial writes, we'd probably want to introduce a second EC pool type with different tradeoffs. Note, you can get slow

Re: dumpling integration : valgrind leak

2015-01-30 Thread Samuel Just
Yeah, that's probably new. -Sam On Fri, Jan 30, 2015 at 7:00 AM, Loic Dachary l...@dachary.org wrote: Hi Sam, I stumbled upon what seems to be a leak at http://pulpito.ceph.com/loic-2015-01-29_15:41:06-rados-dumpling-backports---basic-multi/730305/ and the valgrind xml file is at

Re: Supporting partial writes for EC backend

2015-01-30 Thread Samuel Just
easier. On Jan 30, 2015, at 8:59 AM, Samuel Just sam.j...@inktank.com wrote: Basically, it's an architectural choice: https://github.com/ceph/ceph/blob/master/doc/dev/osd_internals/erasure_coding/pgbackend.rst#client-writes If we wanted to support partial writes, we'd probably want

Re: idempotent op (esp delete)

2015-01-27 Thread Samuel Just
I think the O(n) scan is fine for now, and we can add an index to the most recent entry for each object + embedded pointers in the log entries allowing us to walk backwards through the entries for an object. -Sam On Tue, Jan 27, 2015 at 6:30 AM, Sage Weil sw...@redhat.com wrote: On Tue, 27 Jan

Re: idempotent op (esp delete)

2015-01-26 Thread Samuel Just
The pg_log_t variant does seem to be cleaner. -Sam On Mon, Jan 26, 2015 at 9:21 AM, Sage Weil sw...@redhat.com wrote: On Mon, 26 Jan 2015, Wang, Zhiqiang wrote: The downside of this approach is that we may need to search the pg_log for a specific object in every write io? Not quite.

Re: ./os/ObjectStore.h: 598: FAILED assert(op-oid om.size())

2015-01-19 Thread Samuel Just
We actually found another problem with this series as well. http://tracker.ceph.com/issues/10534 Looks like d427ca35404a30e1f428859c3274e030f2f83ef6 reversed the order of localt (which contains the create_collection) and op_t (which contains all of the operations on the object in the temp

Re: FAILED assert(peer_missing.count(fromshard))

2015-01-16 Thread Samuel Just
1) The part where you add the operator and change the debug output looks good. 2) The other part looks like it should be an assert? Or it should complain to the central log so that it causes the test to fail at least? 1 and 2 should be separate commits. -Sam On Fri, Jan 16, 2015 at 8:39 AM,

Re: upcoming release giant v0.87.1 : leads feedback needed

2015-01-16 Thread Samuel Just
Some of the rados errors look concerning, I'll have a look. -Sam On Fri, Jan 16, 2015 at 12:34 PM, Loic Dachary l...@dachary.org wrote: Hi Again, As it turns out we're going to have giant point releases after all. In a nutshell there is going to be one or two point release for giant, until

Re: upcoming release giant v0.87.1 : leads feedback needed

2015-01-16 Thread Samuel Just
Nvm, the rados failures look normal. That set of test results seems fine to me from a rados point of view. -Sam On Fri, Jan 16, 2015 at 1:34 PM, Samuel Just sam.j...@inktank.com wrote: Some of the rados errors look concerning, I'll have a look. -Sam On Fri, Jan 16, 2015 at 12:34 PM, Loic

Re: ERROR: assert(last_e.version.version e.version.version)

2015-01-14 Thread Samuel Just
, right? 2015-01-10 2:22 GMT+08:00 Samuel Just sam.j...@inktank.com: This should be handled by divergent log entry trimming. It looks more like the filestore became inconsistent after the flip and failed to record some transactions. You'll want to make sure your filestore/filesystem/disk

Re: ERROR: assert(last_e.version.version e.version.version)

2015-01-09 Thread Samuel Just
This should be handled by divergent log entry trimming. It looks more like the filestore became inconsistent after the flip and failed to record some transactions. You'll want to make sure your filestore/filesystem/disk configuration isn't causing inconsistencies. -Sam On Tue, Jan 6, 2015 at

Re: Higher OSD disk util due to RBD snapshots from Dumpling to Firefly

2015-01-02 Thread Samuel Just
Odd, sounds like it might be rbd client side? -Sam On Thu, Jan 1, 2015 at 1:30 AM, Stefan Priebe s.pri...@profihost.ag wrote: hi, Am 31.12.2014 um 17:21 schrieb Wido den Hollander: Hi, Last week I upgraded a 250 OSD cluster from Dumpling 0.67.10 to Firefly 0.80.7 and after the upgrade

Re: why do we need SNAPDIR ?

2015-01-02 Thread Samuel Just
The SNAPDIR object only exists if the 1) head does not exist and 2) there are clones. The purpose of the object is to hold the snapset xattr which would normally be on the head object. -Sam On Wed, Dec 24, 2014 at 7:27 PM, Nicheal zay11...@gmail.com wrote: Dear develops, I find that SNAPDIR

Re: Higher OSD disk util due to RBD snapshots from Dumpling to Firefly

2015-01-02 Thread Samuel Just
That may not be related. -Sam On Fri, Jan 2, 2015 at 10:43 AM, Stefan Priebe s.pri...@profihost.ag wrote: Am 02.01.2015 um 17:49 schrieb Samuel Just: Odd, sounds like it might be rbd client side? -Sam That one was already on list: https://www.mail-archive.com/ceph-devel@vger.kernel.org

Re: Improving latency and ordering of the backfilling workload

2014-12-17 Thread Samuel Just
I think the concern is that the priority information might be obsolete by the time it gets the remote reservation. We might need to refresh the reservation periodically if, for example, the number of pgs requiring backfill on a particular osd is increasing. -Sam On Mon, Dec 15, 2014 at 10:13 AM,

Re: [ceph-users] fiemap bug on giant

2014-11-24 Thread Samuel Just
at 5:03 AM, Samuel Just sam.j...@inktank.com wrote: Bug #10166 (http://tracker.ceph.com/issues/10166) can cause recovery to result in incorrect object sizes on giant if the setting 'filestore fiemap' is set to true. This setting is disabled by default. This should be fixed in a future point

Re: watch/notify changes ready for review

2014-11-17 Thread Samuel Just
I think I agree that the failed notify api is superfluous. Also, would that not imply that the handle_error callback had already fired with an ETIMEDOUT? -Sam On Mon, Nov 17, 2014 at 4:42 PM, Sage Weil s...@newdream.net wrote: On Thu, 13 Nov 2014, Sage Weil wrote: My pile of watch/notify

Re: PG down

2014-11-13 Thread Samuel Just
It looks like the acting set went down to the min allowable size and went active with osd 8. At that point you needed every member of that acting set to go active later on to avoiding loosing writes. You can prevent this by setting a min_size above the number of data chunks. -Sam On Thu, Nov

  1   2   3   4   >