puzzled with the design pattern of ceph journal, really ruining performance

2014-09-17 Thread 姚宁
on pg_log and neglect ceph journal? It will save lots of bandwidth, and also based on the consistent pg_log epoch, we can always recovery data from its peering osd, right? But this will lead to recovery more objects if the osd crash. Nicheal -- To unsubscribe from this list: send the line

RE: puzzled with the design pattern of ceph journal, really ruining performance

2014-09-17 Thread Somnath Roy
Hi Nicheal, Not only recovery , IMHO the main purpose of ceph journal is to support transaction semantics since XFS doesn't have that. I guess it can't be achieved with pg_log/pg_info. Thanks Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel

RE: puzzled with the design pattern of ceph journal, really ruining performance

2014-09-17 Thread Chen, Xiaoxi
@vger.kernel.org Subject: RE: puzzled with the design pattern of ceph journal, really ruining performance Hi Nicheal, Not only recovery , IMHO the main purpose of ceph journal is to support transaction semantics since XFS doesn't have that. I guess it can't be achieved with pg_log/pg_info

Re: puzzled with the design pattern of ceph journal, really ruining performance

2014-09-17 Thread Alexandre DERUMIER
- Mail original - De: Xiaoxi Chen xiaoxi.c...@intel.com À: Somnath Roy somnath@sandisk.com, ?? zay11...@gmail.com, ceph-devel@vger.kernel.org Envoyé: Mercredi 17 Septembre 2014 09:59:37 Objet: RE: puzzled with the design pattern of ceph journal, really ruining performance Hi Nicheal

Re: puzzled with the design pattern of ceph journal, really ruining performance

2014-09-17 Thread Mark Nelson
: Somnath Roy somnath@sandisk.com, ?? zay11...@gmail.com, ceph-devel@vger.kernel.org Envoyé: Mercredi 17 Septembre 2014 09:59:37 Objet: RE: puzzled with the design pattern of ceph journal, really ruining performance Hi Nicheal, 1. The main purpose of journal is provide transaction semantics (prevent

Re: puzzled with the design pattern of ceph journal, really ruining performance

2014-09-17 Thread Alexandre DERUMIER
pattern of ceph journal, really ruining performance On 09/17/2014 09:20 AM, Alexandre DERUMIER wrote: 2. Have you got any data to prove the O_DSYNC or fdatasync kill the performance of journal? In our previous test, the journal SSD (use a partition of a SSD as a journal for a particular OSD

RE: puzzled with the design pattern of ceph journal, really ruining performance

2014-09-17 Thread Chen, Xiaoxi
...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Alexandre DERUMIER Sent: Thursday, September 18, 2014 5:13 AM To: Mark Nelson Cc: Somnath Roy; ??; ceph-devel@vger.kernel.org; Chen, Xiaoxi Subject: Re: puzzled with the design pattern of ceph journal, really ruining performance

Re: puzzled with the design pattern of ceph journal, really ruining performance

2014-09-17 Thread Mark Nelson
; ??; ceph-devel@vger.kernel.org; Chen, Xiaoxi Subject: Re: puzzled with the design pattern of ceph journal, really ruining performance FWIW, the journal will coalesce writes quickly when there are many concurrent 4k client writes. Once you hit around 8 4k IOs per OSD, the journal will start coalescing

RE: Deadlock in ceph journal

2014-08-24 Thread Ma, Jianpeng
don't use aio when closing journal. And using fsync(). It make the code simple. How about this? Thanks! Jianpeng ceph-devel@vger.kernel.org Subject: Re: Deadlock in ceph journal On 23/08/14 10:22, Somnath Roy wrote: I think it is using direct io for non-aio mode as well. Thanks

RE: Deadlock in ceph journal

2014-08-24 Thread Sage Weil
() on write_thread_entry) and the result is looks good. I want to revert the patch which don't use aio when closing journal. And using fsync(). It make the code simple. How about this? Thanks! Jianpeng ceph-devel@vger.kernel.org Subject: Re: Deadlock in ceph journal On 23/08

RE: Deadlock in ceph journal

2014-08-24 Thread Ma, Jianpeng
); ceph-devel@vger.kernel.org Subject: RE: Deadlock in ceph journal Sounds good. Can you send a patch? sage On Mon, 25 Aug 2014, Ma, Jianpeng wrote: Hi all, At weekend, I read the kernel code about aio direction. For close(), it don't wait aio to complete. But for fsync

Re: Deadlock in ceph journal

2014-08-22 Thread Mark Kirkwood
On 22/08/14 12:49, Sage Weil wrote: On Fri, 22 Aug 2014, Mark Kirkwood wrote: On 22/08/14 03:23, Sage Weil wrote: I've pushed the patch to wip-filejournal. Mark, can you test please? I've tested wip-filejournal and looks good (25 test runs, good journal header each time). Thanks!

RE: Deadlock in ceph journal

2014-08-22 Thread Somnath Roy
@vger.kernel.org Subject: Re: Deadlock in ceph journal On 22/08/14 12:49, Sage Weil wrote: On Fri, 22 Aug 2014, Mark Kirkwood wrote: On 22/08/14 03:23, Sage Weil wrote: I've pushed the patch to wip-filejournal. Mark, can you test please? I've tested wip-filejournal and looks good (25 test

RE: Deadlock in ceph journal

2014-08-22 Thread Sage Weil
, August 22, 2014 3:19 PM To: Sage Weil Cc: Ma, Jianpeng; Somnath Roy; Samuel Just (sam.j...@inktank.com); ceph-devel@vger.kernel.org Subject: Re: Deadlock in ceph journal On 22/08/14 12:49, Sage Weil wrote: On Fri, 22 Aug 2014, Mark Kirkwood wrote: On 22/08/14 03:23, Sage Weil wrote: I've

Re: Deadlock in ceph journal

2014-08-22 Thread Mark Kirkwood
On 23/08/14 10:22, Somnath Roy wrote: I think it is using direct io for non-aio mode as well. Thanks Regards Somnath One thing that does still concern me - if I understand what is happening here correctly: we write to the journal using aio until we want to stop doing writes (presumably

Re: Deadlock in ceph journal

2014-08-21 Thread Mark Kirkwood
Will do. On 21/08/14 19:30, Ma, Jianpeng wrote: Mark After sage merge this into wip-filejournal, can you test again? I think at present only you can do this work! -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org

RE: Deadlock in ceph journal

2014-08-21 Thread Sage Weil
: Deadlock in ceph journal On Thu, 21 Aug 2014, Ma, Jianpeng wrote: Yes, Maybe for io_submit, it must use io_getevent. Otherwise the result is undefined. If stop_write == true, we don't use aio. How about this way? That seems reasonable, now that I understand why it doesn't work

Re: Deadlock in ceph journal

2014-08-21 Thread Mark Kirkwood
On 22/08/14 03:23, Sage Weil wrote: I've pushed the patch to wip-filejournal. Mark, can you test please? I've tested wip-filejournal and looks good (25 test runs, good journal header each time). Cheers Mark -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the

Re: Deadlock in ceph journal

2014-08-21 Thread Sage Weil
On Fri, 22 Aug 2014, Mark Kirkwood wrote: On 22/08/14 03:23, Sage Weil wrote: I've pushed the patch to wip-filejournal. Mark, can you test please? I've tested wip-filejournal and looks good (25 test runs, good journal header each time). Thanks! Merged. sage -- To unsubscribe from

RE: Deadlock in ceph journal

2014-08-20 Thread Sage Weil
; jianpeng...@intel.com Subject: RE: Deadlock in ceph journal On Wed, 20 Aug 2014, Somnath Roy wrote: Thanks Sage ! So, the latest master should have the fix, right ? The original patch that caused the regression is reverted, but we'd like to reapply it if we sort out the issues. wip

RE: Deadlock in ceph journal

2014-08-20 Thread Ma, Jianpeng
: Samuel Just (sam.j...@inktank.com); ceph-devel@vger.kernel.org; Mark Kirkwood; Ma, Jianpeng Subject: RE: Deadlock in ceph journal I suspect what is really needed is a drain_aio() function that will wait for all pending aio ops to complete on shutdown. What happens to those IOs if the process

RE: Deadlock in ceph journal

2014-08-20 Thread Sage Weil
To: Somnath Roy Cc: Samuel Just (sam.j...@inktank.com); ceph-devel@vger.kernel.org; Mark Kirkwood; Ma, Jianpeng Subject: RE: Deadlock in ceph journal I suspect what is really needed is a drain_aio() function that will wait for all pending aio ops to complete on shutdown. What

RE: Deadlock in ceph journal

2014-08-19 Thread Sage Weil
: ceph-us...@lists.ceph.com Subject: Deadlock in ceph journal   Hi Sage/Sam, During our testing we found a potential deadlock scenario in the filestore journal code base. This is happening because of two reason.   1.   This is because code is not signaling aio_cond from

RE: Deadlock in ceph journal

2014-08-19 Thread Somnath Roy
; jianpeng...@intel.com Subject: RE: Deadlock in ceph journal [Copying ceph-devel, dropping ceph-users] Yeah, that looks like a bug. I pushed wip-filejournal that reapplies Jianpeng's original patch and this one. I'm not certain about last other suggested fix, though, but I'm hoping that this fix

RE: Deadlock in ceph journal

2014-08-19 Thread Somnath Roy
...@inktank.com); ceph-devel@vger.kernel.org; Mark Kirkwood; jianpeng...@intel.com Subject: RE: Deadlock in ceph journal Thanks Sage ! So, the latest master should have the fix, right ? Regards Somnath -Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Tuesday, August 19, 2014 8:55

RE: Deadlock in ceph journal

2014-08-19 Thread Sage Weil
@vger.kernel.org; Mark Kirkwood; jianpeng...@intel.com Subject: RE: Deadlock in ceph journal [Copying ceph-devel, dropping ceph-users] Yeah, that looks like a bug. I pushed wip-filejournal that reapplies Jianpeng's original patch and this one. I'm not certain about last other

RE: Deadlock in ceph journal

2014-08-19 Thread Somnath Roy
...@intel.com Subject: RE: Deadlock in ceph journal On Wed, 20 Aug 2014, Somnath Roy wrote: Thanks Sage ! So, the latest master should have the fix, right ? The original patch that caused the regression is reverted, but we'd like to reapply it if we sort out the issues. wip-filejournal has the offending

Re: Deadlock in ceph journal

2014-08-19 Thread Mark Kirkwood
Not yet, If you have to use master either revert commit 4eb18dd487da4cb621dcbecfc475fc0871b356ac or apply the patch for fixing the hang mentioned here https://github.com/ceph/ceph/pull/2185 Otherwise you could use the wip-filejournal branch which Sage has just added! Cheers Mark On

Re: Deadlock in ceph journal

2014-08-19 Thread Mark Kirkwood
Sorry, I see that sage has reverted it. On 20/08/14 16:58, Mark Kirkwood wrote: Not yet, If you have to use master either revert commit 4eb18dd487da4cb621dcbecfc475fc0871b356ac or apply the patch for fixing the hang mentioned here https://github.com/ceph/ceph/pull/2185 -- To unsubscribe

Re: Ceph journal

2012-11-03 Thread Gregory Farnum
On Thu, Nov 1, 2012 at 10:33 PM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: 2012/11/1 Mark Nelson mark.nel...@inktank.com: It will do that for a while, based on how you've tweaked the flush intervals and various journal settings to determine how much data ceph will allow to

Re: Ceph journal

2012-11-01 Thread Mark Nelson
On 11/01/2012 04:18 PM, Gandalf Corvotempesta wrote: 2012/10/31 Stefan Kleijkers ste...@kleijkers.nl: As far as I know, this is correct. You get a ACK (on the write) back after it landed on ALL three journals (or/and osds in case of BTRFS in parallel mode). So If you lose one node, you still

Re: Ceph journal

2012-10-31 Thread Tren Blackburn
On Wed, Oct 31, 2012 at 2:18 PM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: In a multi replica cluster (for example, replica = 3) is safe to set journal on a tmpfs? As fa as I understood with journal enabled all writes are wrote on journal and then to disk in a second time.

Re: Ceph journal

2012-10-31 Thread Stefan Kleijkers
Hello, On 10/31/2012 10:24 PM, Tren Blackburn wrote: On Wed, Oct 31, 2012 at 2:18 PM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: In a multi replica cluster (for example, replica = 3) is safe to set journal on a tmpfs? As fa as I understood with journal enabled all writes are

Re: Ceph journal

2012-10-31 Thread Sage Weil
On Wed, 31 Oct 2012, Tren Blackburn wrote: On Wed, Oct 31, 2012 at 2:18 PM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: In a multi replica cluster (for example, replica = 3) is safe to set journal on a tmpfs? As fa as I understood with journal enabled all writes are wrote

Re: Ceph journal

2012-10-31 Thread Stefan Kleijkers
Hello, On 10/31/2012 10:58 PM, Gandalf Corvotempesta wrote: 2012/10/31 Tren Blackburn t...@eotnetworks.com: Unless you're using btrfs which writes to the journal and osd fs concurrently, if you lose the journal device (such as due to a reboot), you've lost the osd device, requiring it to be

Re: Ceph journal

2012-10-31 Thread Sébastien Han
Hi, Personally I won't take the risk to loose transactions. If a client writes into a journal, assuming it's the first write and if the server crashs for whatever reason, you have high risk of un-consistent data. Because you just lost what was in the journal. Tmpfs is the cheapest solution for