[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 16/16] fs: dlm: don't allow half transmitted messages

2021-05-21 Thread Alexander Aring
This patch will clean a dirty page buffer if a reconnect occurs. If a page buffer was half transmitted we cannot start inside the middle of a dlm message if a node connects again. I observed invalid length receptions errors and was guessing that this behaviour occurs, after this patch I never saw a

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 15/16] fs: dlm: add midcomms debugfs functionality

2021-05-21 Thread Alexander Aring
This patch adds functionality to debug midcomms per connection state inside a comms directory which is similar like dlm configfs. Currently there exists the possibility to read out two attributes which is the send queue counter and the version of each midcomms node state. Signed-off-by: Alexander

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 11/16] fs: dlm: add functionality to re-transmit a message

2021-05-21 Thread Alexander Aring
This patch introduces a retransmit functionality for a lowcomms message handle. It's just allocates a new buffer and transmit it again, no special handling about prioritize it because keeping bytestream in order. To avoid another connection look some refactor was done to make a new buffer allocati

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 08/16] fs: dlm: public header in out utility

2021-05-21 Thread Alexander Aring
This patch allows to use header_out() and header_in() outside of dlm util functionality. Signed-off-by: Alexander Aring --- fs/dlm/util.c | 4 ++-- fs/dlm/util.h | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/dlm/util.c b/fs/dlm/util.c index cfd0d00b19ae..74a8c5bfe9b5

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 14/16] fs: dlm: add reliable connection if reconnect

2021-05-21 Thread Alexander Aring
This patch introduce to make a tcp lowcomms connection reliable even if reconnects occurs. This is done by an application layer re-transmission handling and sequence numbers in dlm protocols. There are three new dlm commands: DLM_OPTS: This will encapsulate an existing dlm message (and rcom messa

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 13/16] fs: dlm: add union in dlm header for lockspace id

2021-05-21 Thread Alexander Aring
This patch adds union inside the lockspace id to handle it also for another use case for a different dlm command. Signed-off-by: Alexander Aring --- fs/dlm/dlm_internal.h | 5 - fs/dlm/lock.c | 8 fs/dlm/rcom.c | 4 ++-- fs/dlm/util.c | 6 -- 4 files chan

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 12/16] fs: dlm: move out some hash functionality

2021-05-21 Thread Alexander Aring
This patch moves out some lowcomms hash functionality into lowcomms header to provide them to other layers like midcomms as well. Signed-off-by: Alexander Aring --- fs/dlm/lowcomms.c | 9 - fs/dlm/lowcomms.h | 10 ++ 2 files changed, 10 insertions(+), 9 deletions(-) diff --git

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 10/16] fs: dlm: make buffer handling per msg

2021-05-21 Thread Alexander Aring
This patch makes the void pointer handle for lowcomms functionality per message and not per page allocation entry. A refcount handling for the handle was added to keep the message alive until the user doesn't need it anymore. There exists now a per message callback which will be called when alloca

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 09/16] fs: dlm: add more midcomms hooks

2021-05-21 Thread Alexander Aring
This patch prepares hooks to redirect to the midcomms layer which will be used by the midcomms re-transmit handling. There exists the new concept of stateless buffers allocation and commits. This can be used to bypass the midcomms re-transmit handling. It is used by RCOM_STATUS and RCOM_NAMES mess

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 06/16] fs: dlm: cancel work sync othercon

2021-05-21 Thread Alexander Aring
These rx tx flags arguments are for signaling close_connection() from which worker they are called. Obviously the receive worker cannot cancel itself and vice versa for swork. For the othercon the receive worker should only be used, however to avoid deadlocks we should pass the same flags as the or

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 04/16] fs: dlm: set is othercon flag

2021-05-21 Thread Alexander Aring
There is a is othercon flag which is never used, this patch will set it and printout a warning if the othercon ever sends a dlm message which should never be the case. Signed-off-by: Alexander Aring --- fs/dlm/lowcomms.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/dlm/lowcomms.c b/

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 07/16] fs: dlm: fix connection tcp EOF handling

2021-05-21 Thread Alexander Aring
This patch fixes the EOF handling for TCP that if and EOF is received we will close the socket next time the writequeue runs empty. This is a half-closed socket functionality which doesn't exists in SCTP. The midcomms layer will do a half closed socket functionality on DLM side to solve this proble

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 02/16] fs: dlm: add dlm macros for ratelimit log

2021-05-21 Thread Alexander Aring
This patch add ratelimit macro to dlm subsystem and will set the connecting log message to ratelimit. In non blocking connecting cases it will print out this message a lot. Signed-off-by: Alexander Aring --- fs/dlm/dlm_internal.h | 2 ++ fs/dlm/lowcomms.c | 4 ++-- 2 files changed, 4 inserti

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 05/16] fs: dlm: reconnect if socket error report occurs

2021-05-21 Thread Alexander Aring
This patch will change the reconnect handling that if an error occurs if a socket error callback is occurred. This will also handle reconnects in a non blocking connecting case which is currently missing. If error ECONNREFUSED is reported we delay the reconnect by one second. Signed-off-by: Alexan

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 00/16] fs: dlm: introduce dlm re-transmission layer

2021-05-21 Thread Alexander Aring
Hi, this is the final patch-series to make dlm reliable when re-connection occurs. You can easily generate a couple of re-connections by running: tcpkill -9 -i $IFACE port 21064 on your own to test these patches. At some time dlm will detect message drops and will re-transmit messages if necessa

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 03/16] fs: dlm: fix srcu read lock usage

2021-05-21 Thread Alexander Aring
This patch holds the srcu connection read lock in cases where we lookup the connections and accessing it. We don't hold the srcu lock in workers function where the scheduled worker is part of the connection itself. The connection should not be freed if any worker is scheduled or pending. Signed-of

[Cluster-devel] [PATCHv6 v5.13-rc1 dlm/next 01/16] fs: dlm: always run complete for possible waiters

2021-05-21 Thread Alexander Aring
This patch changes the ping_members() result that we always run complete() for possible waiters. We handle the -EINTR error code as successful. This error code is returned if the recovery is stopped which is likely that a new recovery is triggered with a new members configuration and ping_members()

Re: [Cluster-devel] [PATCH 6/6] gfs2: Fix mmap + page fault deadlocks (part 2)

2021-05-21 Thread Andreas Gruenbacher
On Fri, May 21, 2021 at 5:23 PM Jan Kara wrote: > On Thu 20-05-21 16:07:56, Andreas Gruenbacher wrote: > > On Thu, May 20, 2021 at 3:30 PM Jan Kara wrote: > > > On Thu 20-05-21 14:25:36, Andreas Gruenbacher wrote: > > > > Now that we handle self-recursion on the inode glock in gfs2_fault and > >

Re: [Cluster-devel] [PATCH 6/6] gfs2: Fix mmap + page fault deadlocks (part 2)

2021-05-21 Thread Jan Kara
On Thu 20-05-21 16:07:56, Andreas Gruenbacher wrote: > On Thu, May 20, 2021 at 3:30 PM Jan Kara wrote: > > On Thu 20-05-21 14:25:36, Andreas Gruenbacher wrote: > > > Now that we handle self-recursion on the inode glock in gfs2_fault and > > > gfs2_page_mkwrite, we need to take care of more complex

Re: [Cluster-devel] [PATCH 02/15] fs: gfs2: glock: Fix some deficient kernel-doc headers and demote non-conformant ones

2021-05-21 Thread Lee Jones
On Fri, 21 May 2021, Lee Jones wrote: > On Fri, 21 May 2021, Andreas Gruenbacher wrote: > > > On Thu, May 20, 2021 at 2:00 PM Lee Jones wrote: > > > Fixes the following W=1 kernel build warning(s): > > > > > > fs/gfs2/glock.c:365: warning: Function parameter or member 'gl' not > > > described