RE: Rbd map failure in 3.16.0-55

2015-12-12 Thread Somnath Roy
Ilya, If we map with 'nocrc' would that help ? Thanks & Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ilya Dryomov Sent: Saturday, December 12, 2015 3:12 AM To: Varada Kari Cc: ceph-devel@vger.kernel.org

Re: Rbd map failure in 3.16.0-55

2015-12-12 Thread Somnath Roy
that crc enabled is recommended..we will come back for your help if it is really hurting performance.. Thanks Somnath Sent from my iPhone > On Dec 12, 2015, at 10:56 AM, Ilya Dryomov <idryo...@gmail.com> wrote: > >> On Sat, Dec 12, 2015 at 6:42 PM, Somnath Roy <somnath..

RE: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Somnath Roy
emer...@redhat.com] Sent: Thursday, December 03, 2015 5:44 PM To: Somnath Roy Cc: Samuel Just; Casey Bodley; Sage Weil; Samuel Just (sam.j...@inktank.com); ceph-devel@vger.kernel.org Subject: Re: queue_transaction interface + unique_ptr + performance On 04/12/2015, Somnath Roy wrote: [snip] > # Test

RE: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Somnath Roy
-Original Message- From: Somnath Roy Sent: Thursday, December 03, 2015 10:16 PM To: 'Adam C. Emerson' Cc: Samuel Just; Casey Bodley; Sage Weil; Samuel Just (sam.j...@inktank.com); ceph-devel@vger.kernel.org Subject: RE: queue_transaction interface + unique_ptr + performance Adam, May be I am

RE: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Somnath Roy
on this mail chain in case you have missed) taking Transaction, any thought of that ? Should we reconsider having two queue_transaction interface ? Thanks & Regards Somnath -Original Message- From: Sage Weil [mailto:s...@newdream.net] Sent: Thursday, December 03, 2015 3:50 AM To: Som

RE: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Somnath Roy
I don't think make_shared / make_unique is part of c++11 (and ceph is using that). It is part of c++14 I guess.. Thanks & Regards Somnath -Original Message- From: Casey Bodley [mailto:cbod...@redhat.com] Sent: Thursday, December 03, 2015 7:17 AM To: Sage Weil Cc: Somnath Roy; Sa

RE: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Somnath Roy
:sj...@redhat.com] Sent: Thursday, December 03, 2015 3:24 PM To: Casey Bodley Cc: Sage Weil; Somnath Roy; Samuel Just (sam.j...@inktank.com); ceph-devel@vger.kernel.org Subject: Re: queue_transaction interface + unique_ptr + performance From a simplicity point of view, I'd rather just move a Tra

RE: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Somnath Roy
sage- From: Adam C. Emerson [mailto:aemer...@redhat.com] Sent: Thursday, December 03, 2015 9:17 AM To: Somnath Roy Cc: Casey Bodley; Sage Weil; Samuel Just (sam.j...@inktank.com); ceph-devel@vger.kernel.org Subject: Re: queue_transaction interface + unique_ptr + performance On 03/12/2015, Somnath

RE: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Somnath Roy
. Thanks & Regards Somnath -Original Message- From: Adam C. Emerson [mailto:aemer...@redhat.com] Sent: Thursday, December 03, 2015 9:25 AM To: Somnath Roy Cc: Sage Weil; Samuel Just (sam.j...@inktank.com); ceph-devel@vger.kernel.org Subject: Re: queue_transaction interface + unique

RE: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Somnath Roy
03, 2015 9:51 AM To: Somnath Roy Cc: Adam C. Emerson; Sage Weil; Samuel Just (sam.j...@inktank.com); ceph-devel@vger.kernel.org Subject: Re: queue_transaction interface + unique_ptr + performance As far as I know, there are no current users which want to use the Transaction later. You could also

queue_transaction interface + unique_ptr

2015-12-02 Thread Somnath Roy
Hi Sage/Sam, As discussed in today's performance meeting , I am planning to change the queue_transactions() interface to the following. int queue_transactions(Sequencer *osr, list& tls, Context *onreadable, Context *ondisk=0, Context

RE: queue_transaction interface + unique_ptr + performance

2015-12-02 Thread Somnath Roy
Thanks James for looking into this.. Shared_ptr used heavily in the OSD.cc/Replicated PG path.. Regards Somnath -Original Message- From: James (Fei) Liu-SSI [mailto:james@ssi.samsung.com] Sent: Wednesday, December 02, 2015 7:50 PM To: Somnath Roy; Sage Weil (s...@newdream.net

RE: queue_transaction interface + unique_ptr + performance

2015-12-02 Thread Somnath Roy
and for shared_ptr overhead is >2X. Thanks & Regards Somnath -Original Message- From: Somnath Roy Sent: Wednesday, December 02, 2015 7:59 PM To: 'James (Fei) Liu-SSI'; Sage Weil (s...@newdream.net); Samuel Just (sam.j...@inktank.com) Cc: ceph-devel@vger.kernel.org Subject: RE: queue_tran

RE: queue_transaction interface + unique_ptr + performance

2015-12-02 Thread Somnath Roy
d ptr: %d\n",micros_used); std::cout <<"Existing..\n"; return 0; } So, my guess is, the heavy use of these smart pointers in the Ceph IO path is bringing iops/core down substantially. My suggestion is *not to introduce* any smart pointers in the objectstore interface. T

Write path changes

2015-11-20 Thread Somnath Roy
Hi Sage, FYI, I have sent out a new PR addressing your earlier comments and some more enhancement. Here it is.. https://github.com/ceph/ceph/pull/6670 Did some exhaustive comparison with ceph latest master code base and found up to 32 OSDs (4 OSD nodes , one per 8TB SAS SSD) , my changes are

Regarding op_t, local_t

2015-11-18 Thread Somnath Roy
Hi Sage, I saw we are now having single transaction in submit_transaction. But, in the replication path we are still having two transaction, can't we merge it to one there ? Thanks & Regards Somnath -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message

RE: Regarding op_t, local_t

2015-11-18 Thread Somnath Roy
one transaction would help there. Thanks & Regards Somnath -Original Message- From: 池信泽 [mailto:xmdx...@gmail.com] Sent: Wednesday, November 18, 2015 6:00 PM To: Somnath Roy Cc: ceph-devel@vger.kernel.org Subject: Re: Regarding op_t, local_t Good catch. I think it does make sense.

test

2015-11-11 Thread Somnath Roy
Sorry for the spam , having some issues with devl -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Increasing # Shards vs multi-OSDs per device

2015-11-11 Thread Somnath Roy
2:57 PM To: ceph-devel@vger.kernel.org; Mark Nelson; Samuel Just; Kyle Bader; Somnath Roy Subject: Increasing # Shards vs multi-OSDs per device Sorry about the microphone issues in the performance meeting today today. This is a followup to the 11/4 performance meeting where we discussed increa

RE: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-11-01 Thread Somnath Roy
Sage, Is it possible that we can't reuse the op_t because it could be still there in the messenger queue before calling parent->log_operation() ? Thanks & Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of

RE: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-11-01 Thread Somnath Roy
Huh..It seems the op_t is already copied in generate_subop() -> ::encode(*op_t, wr->get_data());...So, this shouldn't be an issue.. -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy Sent: Sunday, Novem

RE: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-10-31 Thread Somnath Roy
BTW, latest code base is already separating out 2 transaction. No more append call.. Thanks & Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ning Yao Sent: Saturday, October 31, 2015 8:35 AM To: Sage Weil

RE: why we should use two Mutex in OSD ShardData?

2015-10-30 Thread Somnath Roy
Sent: Friday, October 30, 2015 8:38 AM To: Somnath Roy Cc: ceph-devel@vger.kernel.org Subject: Re: why we should use two Mutex in OSD ShardData? I do not see any improvement by moving to single mutex. I just fell puzzle why we use two mutex. But I also do not see any improvement using two mutex i

RE: Lock contention in do_rule

2015-10-24 Thread Somnath Roy
Thanks Sage, I will test with this patch.. Regards Somnath -Original Message- From: Sage Weil [mailto:s...@newdream.net] Sent: Saturday, October 24, 2015 3:04 PM To: Somnath Roy Cc: ceph-devel@vger.kernel.org Subject: RE: Lock contention in do_rule On Sat, 24 Oct 2015, Somnath Roy

RE: Lock contention in do_rule

2015-10-23 Thread Somnath Roy
ead ? Thanks & Regards Somnath -Original Message- From: Sage Weil [mailto:s...@newdream.net] Sent: Friday, October 23, 2015 6:10 PM To: Somnath Roy Cc: ceph-devel@vger.kernel.org Subject: Re: Lock contention in do_rule On Sat, 24 Oct 2015, Somnath Roy wrote: > Hi Sage, >

RE: Lock contention in do_rule

2015-10-23 Thread Somnath Roy
l-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy Sent: Friday, October 23, 2015 7:02 PM To: Sage Weil Cc: ceph-devel@vger.kernel.org Subject: RE: Lock contention in do_rule Thanks for the clarification Sage.. I don't have much knowledge on this part , but,

Lock contention in do_rule

2015-10-23 Thread Somnath Roy
Hi Sage, We are seeing the following mapper_lock is heavily contended and commenting out this lock is improving performance ~10 % (in the short circuit path). This is called for every io from osd_is_valid_op_target(). I looked into the code ,but, couldn't understand the purpose of the lock , it

RE: newstore direction

2015-10-19 Thread Somnath Roy
Sage, I fully support that. If we want to saturate SSDs , we need to get rid of this filesystem overhead (which I am in process of measuring). Also, it will be good if we can eliminate the dependency on the k/v dbs (for storing allocators and all). The reason is the unknown write amps they

Re: XFS xattr limit and Ceph

2015-10-15 Thread Somnath Roy
d to limit the number of xattrs > >> On Thu, Oct 15, 2015 at 10:54 PM, Somnath Roy <somnath@sandisk.com> >> wrote: >> Sage, >> Why we are using XFS max inline xattr value as 10 only ? >> >> OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10) &

XFS xattr limit and Ceph

2015-10-15 Thread Somnath Roy
Sage, Why we are using XFS max inline xattr value as 10 only ? OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10) XFS is supporting 1k limit I guess. Is there any performance reason behind that ? Thanks & Regards Somnath PLEASE NOTE: The information

RE: [ceph-users] Initial performance cluster SimpleMessenger vs AsyncMessenger results

2015-10-13 Thread Somnath Roy
Wang [mailto:haomaiw...@gmail.com] Sent: Monday, October 12, 2015 11:35 PM To: Somnath Roy Cc: Mark Nelson; ceph-devel; ceph-us...@lists.ceph.com Subject: Re: [ceph-users] Initial performance cluster SimpleMessenger vs AsyncMessenger results On Tue, Oct 13, 2015 at 12:18 PM, Somnath Roy <somn

RE: throttles

2015-10-13 Thread Somnath Roy
BTW, you can completely turn off these throttles ( other than the filestore throttle ) by setting the value to 0. Thanks & Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Deneau, Tom Sent: Tuesday, October

RE: throttles

2015-10-12 Thread Somnath Roy

RE: perf counters from a performance discrepancy

2015-10-08 Thread Somnath Roy
If I remember correctly, Nick faced similar issue and we debugged down to the xattr access issue in the find_object_context(). I am not sure if it is resolved though for him or not. Thanks & Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org

Pull request for FileStore write path optimization

2015-09-29 Thread Somnath Roy
Hi Mark, I have sent out the following pull request for my write path changes. https://github.com/ceph/ceph/pull/6112 Meanwhile, if you want to give it a spin to your SSD cluster , take the following branch. https://github.com/somnathr/ceph/tree/wip-write-path-optimization 1. Please use the

RE: Very slow recovery/peering with latest master

2015-09-28 Thread Somnath Roy
nput/output errors on accessing the drives which are not reserved for this host. This is an inefficiency part of blkid* calls (?) since calls like fdisk/lsscsi are not taking time. Regards Somnath -Original Message- From: Chen, Xiaoxi [mailto:xiaoxi.c...@intel.com] Sent: Monday, September 28

RE: Very slow recovery/peering with latest master

2015-09-24 Thread Somnath Roy
ux/x86_64/clone.S:111 Strace was not helpful much since other threads are not block and keep printing the futex traces.. Thanks & Regards Somnath -Original Message- From: Podoski, Igor [mailto:igor.podo...@ts.fujitsu.com] Sent: Wednesday, September 23, 2015 11:33 PM To: Somnath Roy C

RE: Very slow recovery/peering with latest master

2015-09-23 Thread Somnath Roy
Sent: Wednesday, September 23, 2015 4:07 PM To: Somnath Roy Cc: Samuel Just (sam.j...@inktank.com); Sage Weil (s...@newdream.net); ceph-devel Subject: Re: Very slow recovery/peering with latest master Wow. Why would that take so long? I think you are correct that it's only used for metadata, we could

RE: Very slow recovery/peering with latest master

2015-09-23 Thread Somnath Roy
wip-write-path-optimization/src# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 14.04.2 LTS Release:14.04 Codename: trusty Thanks & Regards Somnath -Original Message- From: Somnath Roy Sent: Wednesday, September 16, 2015

RE: Very slow recovery/peering with latest master

2015-09-23 Thread Somnath Roy
<mailto:joseph.t.hand...@hpe.com] Sent: Wednesday, September 23, 2015 4:20 PM To: Samuel Just Cc: Somnath Roy; Samuel Just (sam.j...@inktank.com); Sage Weil (s...@newdream.net); ceph-devel Subject: Re: Very slow recovery/peering with latest master I added that, there is code up the st

Copyright header

2015-09-23 Thread Somnath Roy
Hi Sage, In the latest master, I am seeing a new Copyright header entry for HP in the file Filestore.cc. Is this incidental ? * Copyright (c) 2015 Hewlett-Packard Development Company, L.P. Thanks & Regards Somnath PLEASE NOTE: The information contained in

RE: Very slow recovery/peering with latest master

2015-09-16 Thread Somnath Roy
ion getting slower ? Let me know if more verbose logging is required and how should I share the log.. Thanks & Regards Somnath -Original Message- From: Gregory Farnum [mailto:gfar...@redhat.com] Sent: Wednesday, September 16, 2015 11:35 AM To: Somnath Roy Cc: ceph-devel Subject: Re: Ve

Very slow recovery/peering with latest master

2015-09-15 Thread Somnath Roy
Hi, I am seeing very slow recovery when I am adding OSDs with the latest master. Also, If I just restart all the OSDs (no IO is going on in the cluster) , cluster is taking a significant amount of time to reach in active+clean state (and even detecting all the up OSDs). I saw the

RE: Question about big EC pool.

2015-09-13 Thread Somnath Roy
<mailto:mike.almat...@gmail.com] Sent: Sunday, September 13, 2015 10:39 AM To: Somnath Roy; ceph-devel Subject: Re: Question about big EC pool. 13-Sep-15 01:12, Somnath Roy пишет: > 12-Sep-15 19:34, Somnath Roy пишет: >> >I don't think there is any limit from Ceph side.. >

RE: Question about big EC pool.

2015-09-12 Thread Somnath Roy
<mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Mike Almateia Sent: Saturday, September 12, 2015 12:13 PM To: ceph-devel Subject: Re: Question about big EC pool. 12-Sep-15 19:34, Somnath Roy пишет: > I don't think there is any limit from Ceph side.. > We are testing with ~768 TB d

RE: Question about big EC pool.

2015-09-12 Thread Somnath Roy
I don't think there is any limit from Ceph side.. We are testing with ~768 TB deployment with 4:2 EC on Flash and it is working well so far.. Thanks & Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Mike

RE: Regarding journal replay

2015-09-10 Thread Somnath Roy
Yeah, thanks Sage for confirming this. Regards Somnath -Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Thursday, September 10, 2015 3:04 PM To: Somnath Roy Cc: ceph-devel Subject: Re: Regarding journal replay On Thu, 10 Sep 2015, Somnath Roy wrote: > Sage et.

Regarding journal replay

2015-09-10 Thread Somnath Roy
Sage et. al, Could you please let me know what will happen during journal replay in this scenario ? 1. Say last committed seq is 3 and after that one more independent transaction with say 4 came. Transaction seq 4, has say delete xattr, delete object, create a new object, set xattr 2. Seq 4

RE: Ceph Write Path Improvement

2015-09-09 Thread Somnath Roy
load it is with QD = 8 and num_job= 1 and 10. Thanks & Regards Somnath -Original Message- From: Blinick, Stephen L [mailto:stephen.l.blin...@intel.com] Sent: Thursday, September 03, 2015 1:02 PM To: Somnath Roy Cc: ceph-devel Subject: RE: Ceph Write Path Improvement Somnath -

RE: Ceph Write Path Improvement

2015-09-03 Thread Somnath Roy
data with that config. That's why I have introduced a new throttling scheme that should benefit in all the scenarios. Thanks & Regards Somnath -Original Message- From: Mark Nelson [mailto:mnel...@redhat.com] Sent: Thursday, September 03, 2015 9:42 AM To: Robert LeBlanc; Somnath Ro

RE: Ceph Write Path Improvement

2015-09-03 Thread Somnath Roy
en L [mailto:stephen.l.blin...@intel.com] Sent: Thursday, September 03, 2015 1:02 PM To: Somnath Roy Cc: ceph-devel Subject: RE: Ceph Write Path Improvement Somnath -- thanks for publishing all the data, will be great to look at it offline. I didn't find this info: How many RBD volumes, and what size,

Ceph Write Path Improvement

2015-09-02 Thread Somnath Roy
Hi, Here is the link of the document I presented in today's performance meeting. https://docs.google.com/presentation/d/1lCoLpFRjD8t_YCeHyWDV7ddv7ZkwfETgyjUzXw0-ttU/edit?usp=sharing It has the benchmark result of the filestore changes I proposed earlier for the ceph write path optimization.

RE: Ceph Hackathon: More Memory Allocator Testing

2015-08-23 Thread Somnath Roy
please use the patch to verify this ? Did you build fio/rados bench also with tcmalloc/jemalloc ? If not, how/why it is improving ? Thanks Regards Somnath -Original Message- From: Alexandre DERUMIER [mailto:aderum...@odiso.com] Sent: Sunday, August 23, 2015 6:13 AM To: Somnath Roy Cc

RE: Ceph Hackathon: More Memory Allocator Testing

2015-08-22 Thread Somnath Roy
[mailto:aderum...@odiso.com] Sent: Saturday, August 22, 2015 9:57 AM To: Somnath Roy Cc: Sage Weil; Milosz Tanski; Shishir Gowda; Stefan Priebe; Mark Nelson; ceph-devel Subject: Re: Ceph Hackathon: More Memory Allocator Testing Wanted to know is there any reason we didn't link client libraries

RE: Ceph Hackathon: More Memory Allocator Testing

2015-08-22 Thread Somnath Roy
Somnath -Original Message- From: Sage Weil [mailto:s...@newdream.net] Sent: Saturday, August 22, 2015 6:56 AM To: Milosz Tanski Cc: Shishir Gowda; Somnath Roy; Stefan Priebe; Alexandre DERUMIER; Mark Nelson; ceph-devel Subject: Re: Ceph Hackathon: More Memory Allocator Testing On Fri, 21 Aug

RE: Ceph Hackathon: More Memory Allocator Testing

2015-08-19 Thread Somnath Roy
...@odiso.com] Sent: Wednesday, August 19, 2015 9:55 AM To: Somnath Roy Cc: Mark Nelson; ceph-devel Subject: Re: Ceph Hackathon: More Memory Allocator Testing I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. I think it is per

RE: Ceph Hackathon: More Memory Allocator Testing

2015-08-19 Thread Somnath Roy
I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. Also, I think there is no point

RE: Ceph Hackathon: More Memory Allocator Testing

2015-08-19 Thread Somnath Roy
environment to get this done How do I do that ? I am using Ubuntu and can't afford to remove libc* packages. Thanks Regards Somnath -Original Message- From: Stefan Priebe [mailto:s.pri...@profihost.ag] Sent: Wednesday, August 19, 2015 1:18 PM To: Somnath Roy; Alexandre DERUMIER; Mark Nelson Cc

RE: Ceph Hackathon: More Memory Allocator Testing

2015-08-19 Thread Somnath Roy
Yeah , I can see ceph-osd/ceph-mon built with jemalloc. Thanks Regards Somnath -Original Message- From: Stefan Priebe [mailto:s.pri...@profihost.ag] Sent: Wednesday, August 19, 2015 1:41 PM To: Somnath Roy; Alexandre DERUMIER; Mark Nelson Cc: ceph-devel Subject: Re: Ceph Hackathon

RE: Ceph Hackathon: More Memory Allocator Testing

2015-08-19 Thread Somnath Roy
: Wednesday, August 19, 2015 1:31 PM To: Somnath Roy; Alexandre DERUMIER; Mark Nelson Cc: ceph-devel Subject: Re: Ceph Hackathon: More Memory Allocator Testing Am 19.08.2015 um 22:29 schrieb Somnath Roy: Hmm...We need to fix that as part of configure/Makefile I guess (?).. Since we have done

RE: Ceph Hackathon: More Memory Allocator Testing

2015-08-18 Thread Somnath Roy
Mark, Thanks for verifying this. Nice report ! Since there is a big difference in memory consumption with jemalloc, I would say a recovery performance data or client performance data during recovery would be helpful. Thanks Regards Somnath -Original Message- From:

RE: Async reads, sync writes, op thread model discussion

2015-08-12 Thread Somnath Roy
Haomai, Yes, one of the goals is to make async read xattr.. IMO, this scheme should benefit in the following scenario.. Ops within a PG will not be serialized any more as long as it is not coming on the same object and this could be a big win. In our workload at least we are not seeing the

RE: FileStore should not use syncfs(2)

2015-08-05 Thread Somnath Roy
earlier, in case of only fsync approach, we still need to do a db sync to make sure the leveldb stuff persisted, right ? Thanks Regards Somnath -Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Wednesday, August 05, 2015 2:27 PM To: Somnath Roy Cc: ceph-devel

RE: More ondisk_finisher thread?

2015-08-04 Thread Somnath Roy
Yes, it has to re-acquire pg_lock today.. But, between journal write and initiating the ondisk ack, there is one context switche in the code path. So, I guess the pg_lock is not the only one that is causing this 1 ms delay... Not sure increasing the finisher threads will help in the pg_lock case

RE: Ceph write path optimization

2015-07-29 Thread Somnath Roy
On Tue, 28 Jul 2015, Somnath Roy wrote: Hi, Eventually, I have a working prototype and able to gather some performance comparison data with the changes I was talking about in the last performance meeting. Mark's suggestion of a write up was long pending, so, trying to summarize what I am

RE: Ceph write path optimization

2015-07-29 Thread Somnath Roy
inline -Original Message- From: Christoph Hellwig [mailto:h...@infradead.org] Sent: Tuesday, July 28, 2015 11:57 PM To: Somnath Roy Cc: ceph-devel@vger.kernel.org Subject: Re: Ceph write path optimization On Tue, Jul 28, 2015 at 09:08:27PM +, Somnath Roy wrote: 2. Each filestore Op

RE: Ceph write path optimization

2015-07-29 Thread Somnath Roy
(or similar to existing one today). Thanks Regards Somnath -Original Message- From: Shu, Xinxin [mailto:xinxin@intel.com] Sent: Wednesday, July 29, 2015 12:50 AM To: Somnath Roy; ceph-devel@vger.kernel.org Subject: RE: Ceph write path optimization Hi Somnath, any performance data

RE: Ceph write path optimization

2015-07-28 Thread Somnath Roy
Hi Lukas, According to (http://linux.die.net/man/8/mkfs.xfs) lazy-count is by default set to 1 not 0 with newer kernel. I am using 3.16.0-41-generic, so, should be fine. Thanks Regards Somnath -Original Message- From: Somnath Roy Sent: Tuesday, July 28, 2015 3:04 PM To: 'Łukasz

RE: Ceph write path optimization

2015-07-28 Thread Somnath Roy
Haomai, in line with [Somnath].. Thanks Regards Somnath -Original Message- From: Haomai Wang [mailto:haomaiw...@gmail.com] Sent: Tuesday, July 28, 2015 7:18 PM To: Somnath Roy Cc: ceph-devel@vger.kernel.org Subject: Re: Ceph write path optimization On Wed, Jul 29, 2015 at 5:08 AM

Ceph write path optimization

2015-07-28 Thread Somnath Roy
Hi, Eventually, I have a working prototype and able to gather some performance comparison data with the changes I was talking about in the last performance meeting. Mark's suggestion of a write up was long pending, so, trying to summarize what I am trying to do. Objective: --- 1. Is

RE: Ceph write path optimization

2015-07-28 Thread Somnath Roy
...@gmail.com [mailto:mr.e...@gmail.com] On Behalf Of Lukasz Redynk Sent: Tuesday, July 28, 2015 2:46 PM To: Somnath Roy Cc: ceph-devel@vger.kernel.org Subject: Re: Ceph write path optimization Hi, Have you tried to tune XFS mkfs options? From mkfs.xfs(8) a) (log section, -l) lazy-count=value

RE: Probable memory leak in Hammer write path ?

2015-07-01 Thread Somnath Roy
. Thanks Greg for asking me to relook at tcmalloc otherwise I was kind of out of option :-).. Regards Somnath -Original Message- From: Somnath Roy Sent: Wednesday, July 01, 2015 4:58 PM To: 'Gregory Farnum' Cc: ceph-devel@vger.kernel.org Subject: RE: Probable memory leak in Hammer write path

RE: Probable memory leak in Hammer write path ?

2015-07-01 Thread Somnath Roy
Thanks Greg! Yeah, I will double check..But, I built the code without tcmalloc (with glibc) and it was also showing the similar behavior. Thanks Regards Somnath -Original Message- From: Gregory Farnum [mailto:g...@gregs42.com] Sent: Wednesday, July 01, 2015 9:07 AM To: Somnath Roy Cc

RE: Probable memory leak in Hammer write path ?

2015-06-29 Thread Somnath Roy
is to install ceph from ceph.com and see the behavior. Thanks Regards Somnath -Original Message- From: Gregory Farnum [mailto:g...@gregs42.com] Sent: Monday, June 29, 2015 3:53 AM To: Somnath Roy Cc: ceph-devel@vger.kernel.org Subject: Re: Probable memory leak in Hammer write path ? I'm confused

RE: CRC32 of messages

2015-06-29 Thread Somnath Roy
28, 2015 11:27 PM To: ceph-devel@vger.kernel.org Subject: RE: CRC32 of messages -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- ow...@vger.kernel.org] On Behalf Of Somnath Roy Sent: Friday, June 26, 2015 7:52 PM ceph_crc32c_intel_fast is ~6 times faster

RE: Probable memory leak in Hammer write path ?

2015-06-28 Thread Somnath Roy
Some more data point.. 1. I am not seeing this in 3.13.0-24-generic 2. Seeing this in 3.16.0-23-generic , 3.19.0-21-generic Could this be related to gcc 4.9.* ? Thanks Regards Somnath -Original Message- From: Somnath Roy Sent: Saturday, June 27, 2015 5:57 PM To: ceph-devel

Probable memory leak in Hammer write path ?

2015-06-27 Thread Somnath Roy
Hi, I am chasing a substantial memory leak in latest Hammer code base in the write path since yesterday and wanted to know if anybody else is also observing this or not. This is as simple as running a fio-rbd random_write workload in my single OSD server with say block size 16K and num_jobs =

RE: CRC32 of messages

2015-06-26 Thread Somnath Roy
ceph_crc32c_intel_fast is ~6 times faster than ceph_crc32c_sctp. If you are not using intel cpus or you have older intel cpus where this sse4 instruction sets are not enabled , the performance will be badly impacted as you saw. If you are building ceph yourself, make sure you have 'yasm'

RE: [ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread Somnath Roy
Guang, Try to play around with the following conf attributes specially filestore_max_inline_xattr_size and filestore_max_inline_xattrs // Use omap for xattrs for attrs over // filestore_max_inline_xattr_size or OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override

RE: Rados multi-object transaction use cases

2015-06-12 Thread Somnath Roy
Also, wouldn't this help in case of some kind of write coalescing for librbd/librados and sending one transaction down in case of multiple ? Thanks Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Yehuda

RE: rbd_cache, limiting read on high iops around 40k

2015-06-10 Thread Somnath Roy
Hi Alexandre, Thanks for sharing the data. I need to try out the performance on qemu soon and may come back to you if I need some qemu setting trick :-) Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Alexandre DERUMIER Sent:

Regarding hadoop over RGW blueprint

2015-06-10 Thread Somnath Roy
Hi Yuan/Jian I was going through your following blueprint. http://tracker.ceph.com/projects/ceph/wiki/Hadoop_over_Ceph_RGW_status_update This is very interesting. I have some query though. 1. Did you guys benchmark RGW + S3 interface integrated with Hadoop. This should work as is today. Are

RE: Regarding hadoop over RGW blueprint

2015-06-10 Thread Somnath Roy
Hadoop + S3 + RGWProxy + RGW ? Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Zhang, Jian Sent: Wednesday, June 10, 2015 7:06 PM To: Somnath Roy; ceph-devel Cc: Zhang, Jian Subject: RE: Regarding hadoop over RGW

RE: Regarding hadoop over RGW blueprint

2015-06-10 Thread Somnath Roy
Thanks Yuan ! This is helpful. Regards Somnath -Original Message- From: Zhou, Yuan [mailto:yuan.z...@intel.com] Sent: Wednesday, June 10, 2015 8:44 PM To: Somnath Roy; Zhang, Jian; ceph-devel Subject: RE: Regarding hadoop over RGW blueprint Hi Somnath, The background was a bit

RE: Looking to improve small I/O performance

2015-06-07 Thread Somnath Roy
06, 2015 11:06 PM To: Somnath Roy Cc: Dałek, Piotr; ceph-devel Subject: Re: Looking to improve small I/O performance -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 This is the test that we are running that simulates the workload size and ratios of our typical servers. Of course we are not doing

RE: Looking to improve small I/O performance

2015-06-06 Thread Somnath Roy
Robert, You can try the following config option to enable asyn messenger. ms_type = async enable_experimental_unrecoverable_data_corrupting_features = ms-type-async BTW, what kind of workload you are trying , random read or write ? Also, is this SSD or HDD cluster ? Thanks Regards Somnath

RE: Discuss: New default recovery config settings

2015-05-29 Thread Somnath Roy
Sam, We are seeing some good client IO results during recovery by using the following values.. osd recovery max active = 1 osd max backfills = 1 osd recovery threads = 1 osd recovery op priority = 1 It is all flash though. The recovery time in case of entire node (~120 TB) failure/a single

RE: CephFS + Erasure coding

2015-05-05 Thread Somnath Roy
Thanks Wang ! But, is this supported right now or coming with object stub implementation in Infernalis ? Regards Somnath -Original Message- From: Wang, Zhiqiang [mailto:zhiqiang.w...@intel.com] Sent: Tuesday, May 05, 2015 7:42 PM To: Somnath Roy; Gregory Farnum Cc: ceph-devel Subject

RE: Hitting tcmalloc bug even with patch applied

2015-04-27 Thread Somnath Roy
) not to resolve the traces. Thanks Regards Somnath -Original Message- From: Milosz Tanski [mailto:mil...@adfin.com] Sent: Monday, April 27, 2015 7:53 AM To: Alexandre DERUMIER; ceph-devel; Somnath Roy Subject: Re: Hitting tcmalloc bug even with patch applied On 4/27/15 9:21 AM, Alexandre

RE: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-27 Thread Somnath Roy
...@lists.ceph.com, ceph-devel ceph-devel@vger.kernel.org, Somnath Roy somnath@sandisk.com, Milosz Tanski mil...@adfin.com Envoyé: Vendredi 24 Avril 2015 20:02:15 Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops We haven't done any

RE: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-27 Thread Somnath Roy
Somnath -Original Message- From: Mark Nelson [mailto:mnel...@redhat.com] Sent: Monday, April 27, 2015 10:42 AM To: Somnath Roy; Alexandre DERUMIER Cc: ceph-users; ceph-devel; Milosz Tanski Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance

RE: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-23 Thread Somnath Roy
Alexandre, You can configure with --with-jemalloc or ./do_autogen -J to build ceph with jemalloc. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Alexandre DERUMIER Sent: Thursday, April 23, 2015 4:56 AM To: Mark

RE: Regarding newstore performance

2015-04-16 Thread Somnath Roy
...@vger.kernel.org] On Behalf Of Somnath Roy Sent: Wednesday, April 15, 2015 9:22 PM To: Chen, Xiaoxi; Haomai Wang Cc: ceph-devel Subject: RE: Regarding newstore performance Thanks Xiaoxi.. But, I have already initiated test by making db/ a symbolic link to another SSD..Will share the result soon. Regards

RE: tcmalloc issue

2015-04-16 Thread Somnath Roy
Thanks James ! We will try this out. Regards Somnath -Original Message- From: James Page [mailto:james.p...@ubuntu.com] Sent: Thursday, April 16, 2015 4:48 AM To: Chaitanya Huilgol; Somnath Roy; Sage Weil; ceph-maintain...@ceph.com Cc: ceph-devel@vger.kernel.org Subject: Re: tcmalloc

RE: Regarding newstore performance

2015-04-15 Thread Somnath Roy
- From: Haomai Wang [mailto:haomaiw...@gmail.com] Sent: Wednesday, April 15, 2015 5:23 AM To: Somnath Roy Cc: ceph-devel Subject: Re: Regarding newstore performance On Wed, Apr 15, 2015 at 2:01 PM, Somnath Roy somnath@sandisk.com wrote: Hi Sage/Mark, I did some WA experiment with newstore

RE: Regarding newstore performance

2015-04-15 Thread Somnath Roy
Thanks Xiaoxi.. But, I have already initiated test by making db/ a symbolic link to another SSD..Will share the result soon. Regards Somnath -Original Message- From: Chen, Xiaoxi [mailto:xiaoxi.c...@intel.com] Sent: Wednesday, April 15, 2015 6:48 PM To: Somnath Roy; Haomai Wang Cc

RE: Regarding newstore performance

2015-04-15 Thread Somnath Roy
/suggestion is much appreciated. Thanks Regards Somnath -Original Message- From: Somnath Roy Sent: Monday, April 13, 2015 4:54 PM To: ceph-devel Subject: Regarding newstore performance Sage, I was doing some preliminary performance testing of newstore on a single OSD (SSD) , single replication

Regarding newstore performance

2015-04-13 Thread Somnath Roy
Sage, I was doing some preliminary performance testing of newstore on a single OSD (SSD) , single replication setup. Here is my findings so far. Test: - 64K random writes with QD= 64 using fio_rbd. Results : -- 1. With all default settings, I am seeing very spiky

Preliminary RDMA vs TCP numbers

2015-04-08 Thread Somnath Roy
Hi, Please find the preliminary performance numbers of TCP Vs RDMA (XIO) implementation (on top of SSDs) in the following link. http://www.slideshare.net/somnathroy7568/ceph-on-rdma The attachment didn't go through it seems, so, I had to use slideshare. Mark, If we have time, I can present it

RE: Preliminary RDMA vs TCP numbers

2015-04-08 Thread Somnath Roy
I used the default TCP setting in Ubuntu 14.04. -Original Message- From: Andrey Korolyov [mailto:and...@xdel.ru] Sent: Wednesday, April 08, 2015 1:28 AM To: Somnath Roy Cc: ceph-us...@lists.ceph.com; ceph-devel Subject: Re: Preliminary RDMA vs TCP numbers On Wed, Apr 8, 2015 at 11:17 AM

  1   2   3   >