g security and moving orphan inode to reflink destination.
> Use the __tracker variant while taking inode lock to avoid recursive
> locking in the ocfs2_init_security_and_acl() call chain.
>
> Signed-off-by: Ashish Samant
Reviewed-by: Junxiao Bi
>
> V1->V2:
> Modify commit message
On 12/27/2017 08:21 PM, Changwei Ge wrote:
> Hi Junxiao,
>
> On 2017/12/27 18:02, Junxiao Bi wrote:
>> Hi Changwei,
>>
>>
>> On 12/26/2017 03:55 PM, Changwei Ge wrote:
>>> A crash issue was reported by John.
>>> The call trace f
Hi Changwei,
On 12/26/2017 03:55 PM, Changwei Ge wrote:
> A crash issue was reported by John.
> The call trace follows:
> ocfs2_split_extent+0x1ad3/0x1b40 [ocfs2]
> ocfs2_change_extent_flag+0x33a/0x470 [ocfs2]
> ocfs2_mark_extent_written+0x172/0x220 [ocfs2]
> ocfs2_dio_end_io+0x62d/0x910 [ocfs2]
On 12/27/2017 03:46 PM, Changwei Ge wrote:
> Hi Junxiao,
>
> On 2017/12/27 15:35, Junxiao Bi wrote:
>> Hi Changwei,
>>
>> On 12/26/2017 05:20 PM, Changwei Ge wrote:
>>> Hi Alex
>>>
>>> On 2017/12/26 16:20, alex chen wrote:
>>>
Hi Changwei,
On 12/26/2017 05:20 PM, Changwei Ge wrote:
> Hi Alex
>
> On 2017/12/26 16:20, alex chen wrote:
>> Hi Changwei,
>>
>> On 2017/12/26 15:03, Changwei Ge wrote:
>>> The intention of this patch is to provide an option to ocfs2 users whether
>>> to allocate disk space while doing dio write
On 12/19/2017 05:11 PM, Changwei Ge wrote:
> Hi Junxiao,
>
> On 2017/12/19 16:15, Junxiao Bi wrote:
>> Hi Changwei,
>>
>> On 12/19/2017 02:02 PM, Changwei Ge wrote:
>>> On 2017/12/19 11:41, piaojun wrote:
>>>> Hi Changwei,
>>>&g
Hi Changwei,
On 12/19/2017 02:02 PM, Changwei Ge wrote:
> On 2017/12/19 11:41, piaojun wrote:
>> Hi Changwei,
>>
>> On 2017/12/19 11:05, Changwei Ge wrote:
>>> Hi Jun,
>>>
>>> On 2017/12/19 9:48, piaojun wrote:
Hi Changwei,
On 2017/12/18 20:06, Changwei Ge wrote:
> Before ocfs2
Hi Dmitry,
Please wait our new kernel, we will drop this issue and backport
upstream commit c25a1e0671fb ("ocfs2: fix posix_acl_create deadlock") to
fix this issue.
Thanks,
Junxiao.
On 10/23/2017 11:57 PM, Zhen Ren wrote:
> Hi,
>
>>From the backtrace below, it seems very like the issue fixed by
On 10/23/2017 02:51 PM, Eric Ren wrote:
> Hi,
>
> On 10/18/2017 12:44 PM, Junxiao Bi wrote:
>> On 10/18/2017 12:41 PM, Gang He wrote:
>>> Hi Junxiao,
>>>
>>> The problem looks easy to reproduce?
>>> Could you share the trigger script/code for th
th_openat at 8121b112
>> #16 [88008e393df0] do_filp_open at 8121b53a
>> #17 [88008e393ed0] do_sys_open at 81209a5a
>> #18 [88008e393f40] sys_open at 81209bae
>> #19 [88008e393f50] system_call_fastpath at 816e902e
>>
&
at 81209bae
#19 [88008e393f50] system_call_fastpath at 816e902e
inode lock is got by ocfs2_mknod() before call into posix_acl_create().
Signed-off-by: Junxiao Bi
Cc:
---
fs/ocfs2/namei.c | 14 --
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a
a few blocks after the desired start block and the range can cross
> over into the next cluster group and zero out the group descriptor there.
> This can cause filesytem corruption that cannot be fixed by fsck.
>
> Signed-off-by: Ashish Samant
> Cc: sta...@vger.kernel.org
Looks goo
On 10/12/2017 02:37 PM, Gang He wrote:
> Hello list,
>
> We got a o2cb DLM problem from the customer, which is using o2cb stack for
> OCFS2 file system on SLES12SP1(3.12.49-11-default).
> The problem description is as below,
>
> Customer has three node oracle rack cluster
> gal7gblr2084
> gal7gb
On 08/10/2017 06:49 PM, Changwei Ge wrote:
> Hi Joseph,
>
>
> On 2017/8/10 17:53, Joseph Qi wrote:
>> Hi Changwei,
>>
>> On 17/8/9 23:24, ge changwei wrote:
>>> Hi
>>>
>>>
>>> On 2017/8/9 下午7:32, Joseph Qi wrote:
Hi,
On 17/8/7 15:13, Changwei Ge wrote:
> Hi,
>
> In curr
On 03/29/2017 12:01 PM, Joseph Qi wrote:
>
>
> On 17/3/29 09:07, Junxiao Bi wrote:
>> On 03/29/2017 06:31 AM, Andrew Morton wrote:
>>> On Tue, 28 Mar 2017 09:40:45 +0800 Junxiao Bi
>>> wrote:
>>>
>>>> Configfs is the interface for ocfs2-t
Hi Andrew,
On 03/29/2017 11:31 AM, Andrew Morton wrote:
> On Wed, 29 Mar 2017 09:07:08 +0800 Junxiao Bi wrote:
>
>> On 03/29/2017 06:31 AM, Andrew Morton wrote:
>>> On Tue, 28 Mar 2017 09:40:45 +0800 Junxiao Bi wrote:
>>>
>>>> Configfs is the int
On 03/29/2017 06:31 AM, Andrew Morton wrote:
> On Tue, 28 Mar 2017 09:40:45 +0800 Junxiao Bi wrote:
>
>> Configfs is the interface for ocfs2-tools to set configure to
>> kernel. Change heartbeat dead threshold name in configfs will
>> cause compatible issue, s
Configfs is the interface for ocfs2-tools to set configure to
kernel. Change heartbeat dead threshold name in configfs will
cause compatible issue, so revert it.
Fixes: 45b997737a80 ("ocfs2/cluster: use per-attribute show and store methods")
Signed-off-by: Junxiao Bi
---
fs/ocf
On 12/13/2016 01:29 PM, Eric Ren wrote:
> Only check kernel source if we specify "buildkernel" test case.
> The original kernel source web-link cannot be reached,
> so give a new link instead but the md5sum check is missing
> now.
>
> Signed-off-by: Eric Ren
> ---
> programs/python_common/single
On 12/13/2016 01:29 PM, Eric Ren wrote:
> Signed-off-by: Eric Ren
Reviewed-by: Junxiao Bi
> ---
> programs/python_common/multiple_run.sh | 2 +-
> programs/python_common/single_run-WIP.sh | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/p
e, including permission check, (get|set)_(acl|attr), and
> the gfs2 code also do so.
>
> Changes since v1:
> - Let ocfs2_is_locked_by_me() just return true/false to indicate if the
> process gets the cluster lock - suggested by: Joseph Qi
> and Junxiao Bi .
>
> - Chang
he previous patch) for
> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
> ocfs2_setattr().
>
> Changes since v1:
> - Let ocfs2_is_locked_by_me() just return true/false to indicate if the
> process gets the cluster lock - suggested by: Joseph Qi
> and Junxiao
he previous patch) for
> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
> ocfs2_setattr().
>
> Changes since v1:
> 1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
> process gets the cluster lock - suggested by: Joseph Qi
> and Junx
On 01/16/2017 11:06 AM, Eric Ren wrote:
> Hi Junxiao,
>
> On 01/16/2017 10:46 AM, Junxiao Bi wrote:
>>>> If had_lock==true, it is a bug? I think we should BUG_ON for it, that
>>>> can help us catch bug at the first time.
>>> Good idea! But I'm not
On 01/13/2017 02:19 PM, Eric Ren wrote:
> Hi!
>
> On 01/13/2017 12:22 PM, Junxiao Bi wrote:
>> On 01/05/2017 11:31 PM, Eric Ren wrote:
>>> Commit 743b5f1434f5 ("ocfs2: take inode lock in
>>> ocfs2_iop_set/get_acl()")
>>> results in a de
On 01/13/2017 02:12 PM, Eric Ren wrote:
> Hi Junxiao!
>
> On 01/13/2017 11:59 AM, Junxiao Bi wrote:
>> On 01/05/2017 11:31 PM, Eric Ren wrote:
>>> We are in the situation that we have to avoid recursive cluster locking,
>>> but there is no way to check if a
On 01/05/2017 11:31 PM, Eric Ren wrote:
> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
> results in a deadlock, as the author "Tariq Saeed" realized shortly
> after the patch was merged. The discussion happened here
> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-S
On 01/05/2017 11:31 PM, Eric Ren wrote:
> We are in the situation that we have to avoid recursive cluster locking,
> but there is no way to check if a cluster lock has been taken by a
> precess already.
>
> Mostly, we can avoid recursive locking by writing code carefully.
> However, we found that
> 在 2016年12月30日,下午3:12,Eric Ren 写道:
>
> Hi Junxiao,
>
> On 12/30/2016 10:44 AM, Junxiao Bi wrote:
>> Hi Guys,
>>
>> I just done ocfs2-test single/multiple/discontig test on linux
>> next-20161223, all test passed. Thank you for your effort to make th
Hi Guys,
I just done ocfs2-test single/multiple/discontig test on linux
next-20161223, all test passed. Thank you for your effort to make the
good quality.
Thanks,
Junxiao.
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/
Hi Dan,
It will not cause a real issue. -EAGAIN can be only returned in
__ocfs2_page_mkwrite() path where "locked_page" is NULL, so that
function will return VM_FAULT_NOPAGE before accessing "fsdata".
Thanks,
Junxiao.
On 11/17/2016 06:03 PM, Dan Carpenter wrote:
> On Thu, Nov 17, 2016 at 11:08:0
trace 91ac5312a6ee1288 ]---
[34377.618919] Kernel panic - not syncing: Fatal exception
[34377.619910] Kernel Offset: disabled
Signed-off-by: Junxiao Bi
---
fs/ocfs2/dir.c |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index ccd4dcfc3645..39f02b75aaf8 10064
Hi Eric,
> 在 2016年10月19日,下午1:19,Eric Ren 写道:
>
> Hi all!
>
> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
> results in another deadlock as we have discussed in the recent thread:
>https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012454.html
>
> Before
On 10/12/2016 06:54 PM, Eric Ren wrote:
> Hi,
>
> On 10/12/2016 05:45 PM, Junxiao Bi wrote:
>> On 10/12/2016 05:34 PM, Eric Ren wrote:
>>> Hi Junxiao,
>>>
>>> On 10/12/2016 02:47 PM, Junxiao Bi wrote:
>>>> On 10/12/2016 10:36 AM, Eric R
On 10/12/2016 05:34 PM, Eric Ren wrote:
> Hi Junxiao,
>
> On 10/12/2016 02:47 PM, Junxiao Bi wrote:
>> On 10/12/2016 10:36 AM, Eric Ren wrote:
>>> Hi,
>>>
>>> When backporting those patches, I find that they are already in our
>>> product kernel,
Hi all,
I just finished a full ocfs2 test(single/multiple/discontig) on
linux-next/next-20161006. All test case passed. That's a good sign of
quality. Thank you for your effort.
Thanks,
Junxiao.
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
h
On 10/12/2016 10:36 AM, Eric Ren wrote:
> Hi,
>
> When backporting those patches, I find that they are already in our
> product kernel, maybe
> via "stable kernel" policy, although our product kernel is 4.4 while the
> patches were merged
> into 4.6.
>
> Seems it's another deadlock that happens w
Hi Eric,
On 10/11/2016 10:42 AM, Eric Ren wrote:
> Hi Junxiao,
>
> As the subject, the testing hung there on a kernel without your patches:
>
> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang"
> and
> "ocfs2: fix posix_acl_create deadlock"
>
> The stack trace is:
> ```
> o
On 10/09/2016 04:47 PM, Gang He wrote:
> Hello Guys,
>
> If you use debugfs.ocfs2 to list system files for a ocfs2 file system, you
> can find these two system files.
> sles12sp1-node1:/ # debugfs.ocfs2 /dev/sdb1
> debugfs.ocfs2 1.8.2
> debugfs: ls //
> 6 16 12 .
> 6
On 09/13/2016 10:04 AM, Joseph Qi wrote:
> Hi Junxiao,
>
> On 2016/9/12 18:03, Junxiao Bi wrote:
>> Every time, ocfs2_extend_trans() included a credit for truncate log inode,
>> but as that inode had been managed by jbd2 running transaction first time,
>> it will no
] [] ? syscall_trace_enter_phase1+0x153/0x180
[ 685.240467] [] SyS_unlinkat+0x22/0x40
[ 685.240468] [] system_call_fastpath+0x12/0x71
[ 685.240469] ---[ end trace a62437cb060baa71 ]---
[ 685.240470] JBD2: rm wants too many credits (149 > 128)
Signed-off-by: Junxiao Bi
---
fs/ocfs2/alloc.c |
128)
Signed-off-by: Junxiao Bi
---
fs/ocfs2/alloc.c | 29 ++---
1 file changed, 10 insertions(+), 19 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 7dabbc31060e..51128789a661 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -5922,7 +5922,6 @@ b
On 08/30/2016 03:23 AM, Ashish Samant wrote:
> Hi Eric,
>
> The easiest way to reproduce this is :
>
> 1. Create a random file of say 10 MB
> xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
> 2. Reflink it
> reflink -f 10MBfile reflnktest
> 3. Punch a hole at starting at cluster boundary w
rated to the upgraded one. This
will cause an outage. Since negotiate hb timeout behavior didn't
change without this commit, so revert it.
Signed-off-by: Junxiao Bi
---
fs/ocfs2/cluster/tcp_internal.h |5 +
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/fs/ocf
On 07/09/2016 05:23 AM, Andrew Morton wrote:
> On Thu, 7 Jul 2016 10:24:48 +0800 Junxiao Bi wrote:
>
>> Journal replay will be run when do recovery for a dead node,
>> to avoid the stale cache impact, all blocks of dead node's
>> journal inode were reload from dis
doing recovery was
improved from 120s to 1s.
Signed-off-by: Junxiao Bi
---
fs/ocfs2/journal.c | 42 +++---
1 file changed, 23 insertions(+), 19 deletions(-)
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index e607419cdfa4..67179cf60525 100644
--- a/fs/
,
> Jiufei
>
> On 2016/6/17 17:28, Junxiao Bi wrote:
>> Journal replay will be run when do recovery for a dead node,
>> to avoid the stale cache impact, all blocks of dead node's
>> journal inode were reload from disk. This hurts the performance,
>> check wheth
On 06/24/2016 06:13 AM, Andrew Morton wrote:
> On Thu, 23 Jun 2016 09:17:53 +0800 Junxiao Bi wrote:
>
>> Hi Andrew,
>>
>> Did you miss this patch to your tree?
>
> I would have seen it eventually. Explicitly cc'ing me on patches
> helps, please.
I see, w
Hi Andrew,
Did you miss this patch to your tree?
Thanks,
Junxiao.
On 06/17/2016 05:43 PM, Joseph Qi wrote:
> On 2016/6/17 17:28, Junxiao Bi wrote:
>> Journal replay will be run when do recovery for a dead node,
>> to avoid the stale cache impact, all blocks of dead node's
&
familiar with
> this part code,
> I want to know if there is any sync mechanism to make sure the block cache
> for another node journal file is really the latest data?
I don't see that is needed, because those stale info will not be used
except journal replay.
Thanks,
doing recovery was
improved from 120s to 1s.
Signed-off-by: Junxiao Bi
---
fs/ocfs2/journal.c | 41 ++---
1 file changed, 22 insertions(+), 19 deletions(-)
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index e607419cdfa4..bc0e21e8a674 100644
--- a/fs/
On 06/17/2016 04:32 PM, Joseph Qi wrote:
> On 2016/6/17 15:50, Junxiao Bi wrote:
>> Hi Joseph,
>>
>> On 06/17/2016 03:44 PM, Joseph Qi wrote:
>>> Hi Junxiao,
>>>
>>> On 2016/6/17 14:10, Junxiao Bi wrote:
>>>> Journal replay will be r
Hi Joseph,
On 06/17/2016 03:44 PM, Joseph Qi wrote:
> Hi Junxiao,
>
> On 2016/6/17 14:10, Junxiao Bi wrote:
>> Journal replay will be run when do recovery for a dead node,
>> to avoid the stale cache impact, all blocks of dead node's
>> journal inode were
doing recovery was
improved from 120s to 1s.
Signed-off-by: Junxiao Bi
---
fs/ocfs2/journal.c | 41 ++---
1 file changed, 22 insertions(+), 19 deletions(-)
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index e607419cdfa4..8b808afd5f82 100644
--- a/fs/
Two new messages are added to support negotiating hb timeout. Stopping
nodes talking old version to mount as they will cause the negotiation
fail.
Signed-off-by: Junxiao Bi
---
fs/ocfs2/cluster/tcp_internal.h |5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/ocfs2
On 05/20/2016 04:30 PM, Gang He wrote:
> Hello Joseph, Junxiao and All,
>
> I got some benchmark related questions, but due to hardware limitation in our
> local lab, we have not enough information to answer the questions.
> Could you help to look at the questions, if you ever did the related te
On 05/25/2016 06:35 AM, Mark Fasheh wrote:
> On Mon, May 23, 2016 at 02:50:28PM -0700, Andrew Morton wrote:
>> From: Junxiao Bi
>> Subject: ocfs2: o2hb: add negotiate timer
>
> Thank you for the well written patch description by the way.
>
>
>> This series of
On 05/10/2016 01:58 PM, Andrew Morton wrote:
> On Tue, 10 May 2016 12:53:41 +0800 Junxiao Bi wrote:
>
>> These two patches is to fix recursive locking deadlock issues. As we
>> talked with Mark before, recursive locking is not allowed in ocfs2,
>> so these two patches
Hi Tiger,
Only those two process reported call trace from the two nodes? If so,
looks a little different from my hung where it is a recursive locking of
cluster lock. Any way, i just post the fixed to my issue to the mail
list, you can have a try.
Thanks,
Junxiao.
On 05/09/2016 09:20 PM, 서정우 wro
verted. And same deadlock happened in ocfs2_reflink.
This fix is to revert back using ocfs2_init_acl.
Fixes: 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure")
Signed-off-by: Tariq Saeed
Signed-off-by: Junxiao Bi
---
fs/ocfs2/acl.c | 63 +++
e inode lock in ocfs2_iop_set/get_acl()")
Signed-off-by: Tariq Saeed
Signed-off-by: Junxiao Bi
---
fs/ocfs2/acl.c | 24
fs/ocfs2/acl.h |1 +
fs/ocfs2/file.c |4 ++--
3 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.
Hi,
These two patches is to fix recursive locking deadlock issues. As we
talked with Mark before, recursive locking is not allowed in ocfs2,
so these two patches fixes the deadlock issue with reverting back
patches to avoid recursive locking. Please review.
Thanks,
Junxiao.
"
>"heartbeart on region %s (%s)\n",
>config_item_name(®->hr_item),
>
> Thanks
> Changkuo
>
> 发件人: ocfs2-devel-boun...@oss.oracle.com
> [mailto:ocfs2-devel-boun...@oss.oracle.com] 代表 Junxiao Bi
> 发送时间: 2015年
On 03/31/2016 10:56 AM, Gang He wrote:
> Hello Joseph and Junxiao,
>
> Did you encounter this issue in the past? I doubt this is possible a race
> condition bug (rather than data inconsistency).
Never saw this. fsck report any corruption?
Thanks,
Junxiao.
>
> Thanks
> Gang
>
>
>> Hello G
Hi Yiwen,
On 03/26/2016 10:54 AM, jiangyiwen wrote:
> Hi, Junxiao
> This patch may have a problem. That is journal of every nodes become
> abort when storage down, and then when storage up, because journal
> has become abort, all of operations of metadata will fail. So how to
> restore environment
This is v1 version, I sent out V2 patch set before to fix all code style
issue.
On 03/24/2016 04:12 AM, a...@linux-foundation.org wrote:
> From: Junxiao Bi
> Subject: ocfs2: o2hb: add negotiate timer
>
> This series of patches is to fix the issue that when storage down, all
> n
0800c0
[ 254.792273] ---[ end trace 823969e602e4aaac ]---
Fixes: a4a1dfa4bb8b("ocfs2/cluster: fix memory leak in o2hb_region_release")
Signed-off-by: Junxiao Bi
---
fs/ocfs2/cluster/heartbeat.c |4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/cluster/heartb
On 03/15/2016 09:55 AM, Shichangkuo wrote:
> Hi all,
> When NodeA want to unlock lock-res1, and it send message to NodeB, but in
> NodeB any lock queues (granted, converting, blocked) could not find this lock
> for some unknown reason, then NodeB reply DLM_IVLOCKID.
> In this situation, N
On 03/04/2016 08:47 AM, Shichangkuo wrote:
> Hi All,
>
> I have removed a file which was very important to me by mistake.
> Does someone once encounta problem like this, and could the file be
> undeleted?
Maybe you can rebuild it manually if released clusters are not reused.
Umount the volume
got
2. network between nodes down
3. nodes panic
---
Changes from V1:
- code style fix.
Junxiao Bi (6):
ocfs2: o2hb: add negotiate timer
ocfs2: o2hb: add NEGO_TIMEOUT message
ocfs2: o2hb: add NEGOTIATE_APPROVE message
ocfs2: o2hb: add some user/debug log
ocfs2: o2hb
hr_last_timeout_start should be set as the last time where hb is still OK.
When hb write timeout, hung time will be (jiffies -
hr_last_timeout_start).
Signed-off-by: Junxiao Bi
Reviewed-by: Ryan Ding
Cc: Gang He
Cc: rwxybh
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Joseph Qi
Signed-off-by: Andrew
Signed-off-by: Junxiao Bi
---
fs/ocfs2/cluster/heartbeat.c |8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 8ec85cac894e..023c72d35498 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster
. For any node doesn't
receive this message or meets some issue when handling this message, it
will be fenced. If storage up at any time, o2hb_thread will run and
re-queue all the timer, nothing will be affected by these two steps.
Signed-off-by: Junxiao Bi
Reviewed-by: Ryan Ding
Cc:
This message is used to re-queue write timeout timer and negotiate timer
when all nodes suffer a write hung to storage, this makes node not fence
self if storage down.
Signed-off-by: Junxiao Bi
Reviewed-by: Ryan Ding
Cc: Gang He
Cc: rwxybh
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Joseph Qi
Signed-off-by: Junxiao Bi
Reviewed-by: Ryan Ding
Cc: Gang He
Cc: rwxybh
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Joseph Qi
Signed-off-by: Andrew Morton
---
fs/ocfs2/cluster/heartbeat.c | 39 ---
1 file changed, 32 insertions(+), 7 deletions(-)
diff --git
This message is sent to master node when non-master nodes's negotiate
timer expired. Master node records these nodes in a bitmap which is used
to do write timeout timer re-queue decision.
Signed-off-by: Junxiao Bi
Reviewed-by: Ryan Ding
Cc: Gang He
Cc: rwxybh
Cc: Mark Fasheh
Cc: Joel B
when
o2hb_do_disk_heartbeat return an error, this is the same behavior with
o2hb without negotiate timer.
Signed-off-by: Junxiao Bi
Reviewed-by: Ryan Ding
Cc: Gang He
Cc: rwxybh
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Joseph Qi
Signed-off-by: Andrew Morton
---
fs/ocfs2/cluster/heartbeat.c | 10 ++
1
Hi,
The following panic is triggered when run ocfs2 xattr test on
linux-next-20160225. Did anybody ever see this?
[ 254.604228] BUG: unable to handle kernel paging request at
0002000800c0
[ 254.605013] IP: [] kmem_cache_alloc+0x78/0x160
[ 254.605013] PGD 7bbe5067 PUD 0
[ 254.605013] Oops:
Hi Eric,
On 02/19/2016 11:01 AM, Eric Ren wrote:
> Hi Junxiao,
>
> On Wed, Feb 17, 2016 at 10:15:56AM +0800, Junxiao Bi wrote:
>> Hi Eric,
>>
>> I remember i described it before, please search it on ocfs2-devel. For
>> ocfs2 env setup, please refer to README
Hi Eric,
I remember i described it before, please search it on ocfs2-devel. For
ocfs2 env setup, please refer to README in ocfs2-test.
Thanks,
Junxiao.
On 02/16/2016 05:54 PM, Eric Ren wrote:
> Hi Junxiao,
>
I have setup a test env to build and auto do ocfs2 test. With it, Ocfs2
for m
This means the following patches have passed ocfs2-test. The first three
ones are merged by me to avoid that recursive deadlock issue.
=
-inode deadlock in ocfs2_mknode due to using posix_acl_create
-posix_acl_create unsuitable to use in ocfs2_reflink
-revert to using
On 01/26/2016 09:43 AM, xuejiufei wrote:
> Hi Junxiao,
>
> On 2016/1/21 15:34, Junxiao Bi wrote:
>> Hi Jiufei,
>>
>> I didn't find other solution for this issue. You can go with yours.
>> Looks like your second one is more straightforward, there deref work
On 01/25/2016 11:28 AM, Eric Ren wrote:
>> @@ -449,7 +470,11 @@ static int o2hb_nego_timeout_handler(struct o2net_msg
>> *msg, u32 len, void *data,
>> > static int o2hb_nego_approve_handler(struct o2net_msg *msg, u32 len, void
>> > *data,
>> >void **ret_data)
>> > {
On 01/25/2016 11:18 AM, Eric Ren wrote:
>>
>> > @@ -2039,13 +2086,30 @@ static struct config_item
>> > *o2hb_heartbeat_group_make_item(struct config_group *g
>> >
>> >config_item_init_type_name(®->hr_item, name, &o2hb_region_type);
>> >
>> > + /* this is the same way to generate msg key
On 01/22/2016 01:45 PM, Andrew Morton wrote:
> On Fri, 22 Jan 2016 13:12:26 +0800 Junxiao Bi wrote:
>
>> On 01/22/2016 07:47 AM, Andrew Morton wrote:
>>> On Wed, 20 Jan 2016 11:13:35 +0800 Junxiao Bi wrote:
>>>
>>>> This message is sent to master n
On 01/22/2016 07:47 AM, Andrew Morton wrote:
> On Wed, 20 Jan 2016 11:13:35 +0800 Junxiao Bi wrote:
>
>> This message is sent to master node when non-master nodes's
>> negotiate timer expired. Master node records these nodes in
>> a bitmap which is used to do
Hi Joseph,
On 01/22/2016 12:25 PM, Joseph Qi wrote:
> Hi Junxiao,
>
> On 2016/1/21 9:48, Junxiao Bi wrote:
>> On 01/21/2016 08:46 AM, Joseph Qi wrote:
>>> Hi Junxiao,
>>> So you mean the negotiation you added only happens if all nodes storage
>>> link d
Hi Andrew,
On 01/22/2016 07:42 AM, Andrew Morton wrote:
> On Wed, 20 Jan 2016 11:13:34 +0800 Junxiao Bi wrote:
>
>> When storage down, all nodes will fence self due to write timeout.
>> The negotiate timer is designed to avoid this, with it node will
>> wai
Hi Joseph,
On 01/22/2016 08:56 AM, Joseph Qi wrote:
> Hi Junxiao,
>
> On 2016/1/20 11:13, Junxiao Bi wrote:
>> When storage down, all nodes will fence self due to write timeout.
>> The negotiate timer is designed to avoid this, with it node will
>> wait until storag
> still expect that the other 30 disks from the other 3 remaining arrays
> will continue working.
> Of course, I will not have any access to the failed array disks.
>
> I hope this describes better the situation,
>
> Thanks,
>
> Guy
>
> On Wed, Jan 20, 2016
id
nodes fence self if storage down.
To get that log, i am afraid you need configure a console as panic
follows that printk.
Thanks,
Junxiao.
>
> Or any way to find this output (without netconsole), thx?
>
> --------
&
On 01/21/2016 04:10 PM, Eric Ren wrote:
> Hi Junxiao,
>
> On Thu, Jan 21, 2016 at 03:10:20PM +0800, Junxiao Bi wrote:
>> Hi Eric,
>>
>> This patch should fix your issue.
>> "NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock"
>
Hi Jiufei,
I didn't find other solution for this issue. You can go with yours.
Looks like your second one is more straightforward, there deref work can
be removed.
Thanks,
Junxiao.
On 01/11/2016 10:46 AM, xuejiufei wrote:
> Hi all,
> We have found a race between refmap setting and clearing which
Hi Eric,
This patch should fix your issue.
"NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock"
Thanks,
Junxiao.
On 01/20/2016 12:46 AM, Eric Ren wrote:
> This problem was introduced by commit
> a19128260107f951d1b4c421cf98b92f8092b069.
> OCFS2_LOCK_UPCONVERT_FINISHING is set
ke before.
Thanks,
Junxiao.
>
> Thanks,
> Joseph
>
> On 2016/1/20 21:27, Junxiao Bi wrote:
>> Hi Joseph,
>>
>>> 在 2016年1月20日,下午5:18,Joseph Qi 写道:
>>>
>>> Hi Junxiao,
>>> Thanks for the patch set.
>>> In case only one node sto
tency.
> In your patch set, I cannot see any logic to handle this. Am I missing
> something?
No, there is no logic for this. But why didn’t node fence self when storage
down? What make a softirq timer can’t be run, another bug?
Thanks,
Junxiao.
>
> On 2016/1/20 11:13, Junxiao Bi wrote
Hi Guy,
ocfs2 is shared-disk fs, there is no way to do replication like dfs,
also no volume manager integrated in ocfs2. Ocfs2 depends on underlying
storage stack to handler disk failure, so you can configure multipath,
raid or storage to handle removing disk issue. If io error is still
reported t
e to write timeout.
>> With this patch set, all nodes will keep going until storage back
>> online, except if the following issue happens, then all nodes will
>> do as before to fence self.
>> 1. io error got
>> 2. network between nodes down
>> 3. nodes pan
this message, it will be fenced.
If storage up at any time, o2hb_thread will run and re-queue all the
timer, nothing will be affected by these two steps.
Signed-off-by: Junxiao Bi
Reviewed-by: Ryan Ding
---
fs/ocfs2/cluster/heartbeat.c | 52 ++
1 file change
hr_last_timeout_start should be set as the last time where hb is still OK.
When hb write timeout, hung time will be (jiffies - hr_last_timeout_start).
Signed-off-by: Junxiao Bi
Reviewed-by: Ryan Ding
---
fs/ocfs2/cluster/heartbeat.c |2 +-
1 file changed, 1 insertion(+), 1 deletion
1 - 100 of 321 matches
Mail list logo