Dear all, I have been struggling with this issue for many days but can't even guess the cause. John did me a favor and connected to my server to check what's going on, but he couldn't find it either. May someone please suggest an idea. I doubt there is a Wireguard bug. Rgds Hamed
حامد صابر <hsa...@gmail.com> در تاریخ شنبه ۴ سپتامبر ۲۰۲۱ ساعت ۲۱:۲۵ نوشت: > > UPDATE: > During the outage, I ran a simple test to check if both sides have > access to each other's specified UDP ports, then I found something > interesting: > Some 148 bytes length packets are transferred between parties which I > can't recognize what are them. > By the way here I copy my test results. > Anybody can spot what's going on here? > The middle-node host name which runs wg1 is ir-pp and the exit-note > (to free internet) which runs wg2 is sf-do: > (these two parts happened at the same time and are copied from 2 > different servers): > > ======================== > root@ir-pp:~# nc -u sf-do 50840 > 123456 > 1234 > 12 > ^C > root@ir-pp:~# tcpdump 'port 50841' > dropped privs to tcpdump > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes > 20:56:57.607342 IP ir-pp.50841 > sf-do.50840: UDP, length 148 > 20:57:01.042471 IP sf-do.43161 > ir-pp.50841: UDP, length 3 > 20:57:02.361827 IP ir-pp.50841 > sf-do.50840: UDP, length 148 > 20:57:03.635754 IP sf-do.43161 > ir-pp.50841: UDP, length 5 > 20:57:06.740922 IP sf-do.43161 > ir-pp.50841: UDP, length 7 > 20:57:07.552305 IP ir-pp.50841 > sf-do.50840: UDP, length 148 > ^C > 6 packets captured > 15 packets received by filter > 0 packets dropped by kernel > root@ir-pp:~# > > > ======================== > root@sf-do:~# tcpdump 'port 50840' > dropped privs to tcpdump > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes > 20:56:09.126958 IP sf-do.50840 > ir-pp.50841: UDP, length 148 > 20:56:14.758974 IP sf-do.50840 > ir-pp.50841: UDP, length 148 > 20:56:18.222814 IP ir-pp.38136 > sf-do.50840: UDP, length 7 > 20:56:20.391041 IP sf-do.50840 > ir-pp.50841: UDP, length 148 > 20:56:22.307488 IP ir-pp.38136 > sf-do.50840: UDP, length 5 > 20:56:24.702590 IP ir-pp.38136 > sf-do.50840: UDP, length 3 > 20:56:25.510985 IP sf-do.50840 > ir-pp.50841: UDP, length 148 > 20:56:30.630917 IP sf-do.50840 > ir-pp.50841: UDP, length 148 > 20:56:35.750965 IP sf-do.50840 > ir-pp.50841: UDP, length 148 > ^C > 9 packets captured > 9 packets received by filter > 0 packets dropped by kernel > root@sf-do:~# nc -u ir-pp 50841 > 12 > 1234 > 123456 > ^C > root@sf-do:~# > > حامد صابر <hsa...@gmail.com> در تاریخ جمعه ۳ سپتامبر ۲۰۲۱ ساعت ۸:۳۹ نوشت: > > > > Hi again, > > Something new happened which makes me more confused. > > I wrote a small shell-script to check the connection between wg1 and > > wg2, so whenever it drops, the script restarts the wg1 and everything > > comes back. > > Yes yes I don't like this way of addressing issues, but what could I > > do if no meaningful debug information exists, and I predict that it > > might be a bug of Wireguard itself? > > BTW this system worked fine and the anti-censorship VPN chain was up > > and running till this morning. > > The connection went down at 7:49 and didn't come back with > > auto-restart. The process of restarting continued for about 6 minutes, > > and at last at 7:55 it came back. > > During the outage I checked both sides and everything seemed fine. > > Sides could ping the public ip of each other, their wg udp ports were > > accessible to each other, and even handshake seemed to be finished > > correctly (using wg-show command) but peers couldn't ping each other. > > And the most confusing part is everything came back to life without > > any action from my side. Just after 6 minutes of continuously > > restarting the wg1 interface! > > Isn't there a bug? Somebody please help me to debug this problem and > > find the cause. > > > > Here I bring you my shell-script code, and then its related log: > > > > ------------------------- > > #!/bin/bash > > > > exec >>/var/log/wg-ping 2>&1 > > while true > > do > > connection=$(ping -c 1 10.10.10.1) > > time=$(date +%H:%M) > > seconds=$(date +%S) > > seconds=${seconds#0} > > if [[ "$connection" != *"icmp"* ]]; then > > echo " " > > date > > wg-quick down wg1 > > echo " " > > wg-quick up wg1 > > connection=$(ping -c 1 10.10.10.1) > > time=$(date +%T) > > if [[ "$connection" != *"icmp"* ]]; then > > echo "$time ERROR" > > else > > echo "$time OK" > > echo " " > > fi > > elif [[ $seconds -lt 5 ]]; then > > echo "$time OK" > > fi > > sleep 5 > > done > > =============== > > Sample log of simply restarting the wg1 which makes everything fine > > (and happens every few hours): > > ----------------------- > > 01:17 OK > > 01:18 OK > > 01:19 OK > > 01:20 OK > > 01:21 OK > > 01:22 OK > > 01:23 OK > > 01:24 OK > > 01:25 OK > > 01:26 OK > > > > Fri Sep 3 01:26:41 +0430 2021 > > [#] ip route del default dev wg1 table middle > > [#] ip rule del iif wg0 lookup middle > > [#] ip link delete dev wg1 > > > > [#] ip link add wg1 type wireguard > > [#] wg setconf wg1 /dev/fd/63 > > [#] ip -4 address add 10.10.10.2/32 dev wg1 > > [#] ip link set mtu 1420 up dev wg1 > > [#] ip -4 route add 10.10.10.1/32 dev wg1 > > [#] ip route add default dev wg1 table middle > > [#] ip rule add iif wg0 lookup middle > > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0 > > 01:26:42 OK > > > > 01:27 OK > > 01:28 OK > > 01:30 OK > > 01:31 OK > > 01:32 OK > > 01:33 OK > > 01:34 OK > > ===================================================== > > And the log for the confusing situation I explained: > > ----------------------- > > 07:45 OK > > 07:46 OK > > 07:47 OK > > 07:48 OK > > 07:49 OK > > > > Fri Sep 3 07:49:42 +0430 2021 > > [#] ip route del default dev wg1 table middle > > [#] ip rule del iif wg0 lookup middle > > [#] ip link delete dev wg1 > > > > [#] ip link add wg1 type wireguard > > [#] wg setconf wg1 /dev/fd/63 > > [#] ip -4 address add 10.10.10.2/32 dev wg1 > > [#] ip link set mtu 1420 up dev wg1 > > [#] ip -4 route add 10.10.10.1/32 dev wg1 > > [#] ip route add default dev wg1 table middle > > [#] ip rule add iif wg0 lookup middle > > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0 > > 07:49:53 ERROR > > > > Fri Sep 3 07:50:08 +0430 2021 > > [#] ip route del default dev wg1 table middle > > [#] ip rule del iif wg0 lookup middle > > [#] ip link delete dev wg1 > > > > [#] ip link add wg1 type wireguard > > [#] wg setconf wg1 /dev/fd/63 > > [#] ip -4 address add 10.10.10.2/32 dev wg1 > > [#] ip link set mtu 1420 up dev wg1 > > [#] ip -4 route add 10.10.10.1/32 dev wg1 > > [#] ip route add default dev wg1 table middle > > [#] ip rule add iif wg0 lookup middle > > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0 > > 07:50:18 ERROR > > > > ====== > > LOTS OF RETRY LOGS CROPPED > > ====== > > > > Fri Sep 3 07:55:13 +0430 2021 > > [#] ip route del default dev wg1 table middle > > [#] ip rule del iif wg0 lookup middle > > [#] ip link delete dev wg1 > > > > [#] ip link add wg1 type wireguard > > [#] wg setconf wg1 /dev/fd/63 > > [#] ip -4 address add 10.10.10.2/32 dev wg1 > > [#] ip link set mtu 1420 up dev wg1 > > [#] ip -4 route add 10.10.10.1/32 dev wg1 > > [#] ip route add default dev wg1 table middle > > [#] ip rule add iif wg0 lookup middle > > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0 > > 07:55:23 ERROR > > > > Fri Sep 3 07:55:38 +0430 2021 > > [#] ip route del default dev wg1 table middle > > [#] ip rule del iif wg0 lookup middle > > [#] ip link delete dev wg1 > > > > [#] ip link add wg1 type wireguard > > [#] wg setconf wg1 /dev/fd/63 > > [#] ip -4 address add 10.10.10.2/32 dev wg1 > > [#] ip link set mtu 1420 up dev wg1 > > [#] ip -4 route add 10.10.10.1/32 dev wg1 > > [#] ip route add default dev wg1 table middle > > [#] ip rule add iif wg0 lookup middle > > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0 > > 07:55:39 OK > > > > 07:56 OK > > 07:57 OK > > 07:58 OK > > 07:59 OK > > 08:00 OK > > --------------------------------------- > > > > > > > > حامد صابر <hsa...@gmail.com> در تاریخ پنجشنبه ۲ سپتامبر ۲۰۲۱ ساعت > > ۸:۴۷ نوشت: > > > > > > -Thanks for reply. > > > For test reasons, I turned the firewall off on my middle-node server. > > > But I can't understand how this issue may be related to firewall, > > > because most of the time it works (and to me it means firewall is Ok), > > > but sometime for some unknown reason it stops working, which when I > > > restart the wg1 interface with this command everything comes back to > > > life: > > > wg-quick down wg1 && wg-quick up wg1 > > > > > > BTW here it is the firewall status of middle-node: > > > ================== > > > ● firewalld.service - firewalld - dynamic firewall daemon > > > Loaded: loaded (/usr/lib/systemd/system/firewalld.service; > > > disabled; vendor preset: enabled) > > > Active: inactive (dead) > > > Docs: man:firewalld(1) > > > ----------------------------------- > > > Chain INPUT (policy ACCEPT) > > > target prot opt source destination > > > > > > Chain FORWARD (policy ACCEPT) > > > target prot opt source destination > > > > > > Chain OUTPUT (policy ACCEPT) > > > target prot opt source destination > > > ----------------------------------- > > > > > > And the firewall status of exit-node: > > > ==================== > > > Unit firewalld.service could not be found. > > > --------------------------- > > > Chain INPUT (policy ACCEPT) > > > target prot opt source destination > > > ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp > > > dpt:50842 > > > ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:53 > > > > > > Chain FORWARD (policy ACCEPT) > > > target prot opt source destination > > > ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 > > > ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 > > > > > > Chain OUTPUT (policy ACCEPT) > > > target prot opt source destination > > > --------------------- > > > Regards > > > > > > John Lauro <johnala...@gmail.com> در تاریخ چهارشنبه ۱ سپتامبر > > > ۲۰۲۱ ساعت ۲۱:۵۱ نوشت: > > > > > > > > Just a guess, but I would be suspicious about connection tracking > > > > causing the issue. What are your firewall rules? > > > > > > > > On Wed, Sep 1, 2021 at 9:51 AM حامد صابر <hsa...@gmail.com> wrote: > > > >> > > > >> Dear friends, > > > >> I have configured 3 wireguard interfaces on 2 servers to act as a > > > >> chained VPN for me (to bypass the internet censorship in my country), > > > >> with this schema: > > > >> > > > >> client -- wg0 on middle-node -- wg1 on middle node -- wg2 on exit node > > > >> (to free internet) > > > >> > > > >> Everything works fine, but after a while, the connection between wg1 > > > >> and wg2 drops and I can't find the reason. The connection comes back > > > >> to action by simply switching the wg1 down and up again using > > > >> wg-quick. And the amazing behaviour is that sometimes the connection > > > >> comes back to work automatically after some random time passes, > > > >> without any actions from my side (sometimes after a few tens of > > > >> minutes, sometimes after a few hours). > > > >> When the wg1-wg2 connection is not working, anything else between 2 > > > >> servers (middle-node and exit-node) works fine. I mean I can ping the > > > >> public IP of each server from another part, but the local wireguard ip > > > >> of none of them are accessible. > > > >> > > > >> I tried to monitor the situation and read the logs but couldn't find > > > >> what is happening here, so please help! > > > >> > > > >> The configuration: > > > >> ====================== > > > >> > > > >> client (my mobile phone): > > > >> ------------------------------------------- > > > >> [Interface] > > > >> Address = 10.10.20.2/32 > > > >> PrivateKey = <private key of client> > > > >> DNS = 10.10.10.1 > > > >> > > > >> ### Middle Node > > > >> [Peer] > > > >> PublicKey = <public key of wg0> > > > >> PresharedKey = <preshared key> > > > >> AllowedIPs = 0.0.0.0/0 > > > >> Endpoint = middle-node:50842 > > > >> ====================== > > > >> > > > >> wg0 (in middle-node server): > > > >> ------------------------------------------- > > > >> [Interface] > > > >> Address = 10.10.20.1/24 > > > >> ListenPort = 50842 > > > >> PrivateKey = <private key of wg0> > > > >> > > > >> ### Client > > > >> [Peer] > > > >> PublicKey = <public key of client> > > > >> PresharedKey = <preshared key> > > > >> AllowedIPs = 10.10.20.2/32 > > > >> ====================== > > > >> > > > >> wg1 (again in middle-node server): > > > >> ------------------------------------------- > > > >> [Interface] > > > >> Address = 10.10.10.2/32 > > > >> PrivateKey = <private key of wg1> > > > >> > > > >> PostUp = ip route add default dev wg1 table middle > > > >> PostUp = ip rule add iif wg0 lookup middle > > > >> PostUp = wg set wg1 peer <publickey of wg2 (in exit-node)> allowed-ips > > > >> 0.0.0.0/0 > > > >> > > > >> PreDown = ip route del default dev wg1 table middle > > > >> PreDown = ip rule del iif wg0 lookup middle > > > >> > > > >> ### Exit Node > > > >> [Peer] > > > >> PublicKey = <publickey of wg2 (in exit-node)> > > > >> PresharedKey = <preshared key> > > > >> AllowedIPs = 10.10.10.1/32 > > > >> Endpoint = exit-node:50842 > > > >> PersistentKeepalive = 25 > > > >> ====================== > > > >> > > > >> wg2 (in exit-node server): > > > >> ------------------------------------------- > > > >> [Interface] > > > >> Address = 10.10.10.1/24 > > > >> ListenPort = 50842 > > > >> PrivateKey = <private key of wg2> > > > >> > > > >> PostUp = iptables -A FORWARD -i eth0 -o wg2 -j ACCEPT > > > >> PostUp = iptables -A FORWARD -i wg2 -j ACCEPT > > > >> PostUp = iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE > > > >> > > > >> PostDown = iptables -D FORWARD -i eth0 -o wg2 -j ACCEPT > > > >> PostDown = iptables -D FORWARD -i wg2 -j ACCEPT > > > >> PostDown = iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE > > > >> > > > >> ### Middle Node > > > >> [Peer] > > > >> PublicKey = <publickey of wg1 (in middle-node)> > > > >> PresharedKey = <preshared key> > > > >> AllowedIPs = 10.0.0.0/8 > > > >> ====================== > > > >> ====================== > > > >> ====================== > > > >> > > > >> Sample log of dmesg when the wg1-wg2 connection is not working: > > > >> ------------------------------------------- > > > >> [Wed Sep 1 11:19:32 2021] wireguard: wg1: Sending keepalive packet to > > > >> peer 12 (~exit-node-ip~:50842) > > > >> [Wed Sep 1 11:19:44 2021] wireguard: wg0: Sending keepalive packet to > > > >> peer 8 (~client-ip~:65323) > > > >> [Wed Sep 1 11:19:44 2021] wireguard: wg1: Receiving keepalive packet > > > >> from peer 12 (~exit-node-ip~:50842) > > > >> [Wed Sep 1 11:20:09 2021] wireguard: wg0: Receiving handshake > > > >> initiation from peer 8 (~client-ip~:65323) > > > >> [Wed Sep 1 11:20:09 2021] wireguard: wg0: Sending handshake response > > > >> to peer 8 (~client-ip~:65323) > > > >> [Wed Sep 1 11:20:09 2021] wireguard: wg0: Keypair 2867 destroyed for > > > >> peer 8 > > > >> [Wed Sep 1 11:20:09 2021] wireguard: wg0: Keypair 2871 created for > > > >> peer 8 > > > >> [Wed Sep 1 11:20:09 2021] wireguard: wg0: Receiving keepalive packet > > > >> from peer 8 (~client-ip~:65323) > > > >> [Wed Sep 1 11:21:19 2021] wireguard: wg0: Sending keepalive packet to > > > >> peer 8 (~client-ip~:65323) > > > >> [Wed Sep 1 11:21:24 2021] wireguard: wg1: Retrying handshake with > > > >> peer 12 (~exit-node-ip~:50842) because we stopped hearing back after > > > >> 15 seconds > > > >> [Wed Sep 1 11:21:24 2021] wireguard: wg1: Sending handshake > > > >> initiation to peer 12 (~exit-node-ip~:50842) > > > >> [Wed Sep 1 11:21:30 2021] wireguard: wg1: Handshake for peer 12 > > > >> (~exit-node-ip~:50842) did not complete after 5 seconds, retrying (try > > > >> 2) > > > >> ====================== > > > >> > > > >> Sample log of dmesg when the wg1-wg2 connection is coming back using > > > >> manual restart: > > > >> ------------------------------------------- > > > >> [Wed Sep 1 11:45:52 2021] wireguard: wg1: Sending handshake > > > >> initiation to peer 12 (~exit-node-ip~:50842) > > > >> [Wed Sep 1 11:45:52 2021] wireguard: wg0: Sending keepalive packet to > > > >> peer 8 (~client-ip~:2335) > > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Handshake for peer 12 > > > >> (~exit-node-ip~:50842) did not complete after 5 seconds, retrying (try > > > >> 3) > > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Sending handshake > > > >> initiation to peer 12 (~exit-node-ip~:50842) > > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Keypair 2878 destroyed for > > > >> peer 12 > > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Peer 12 > > > >> (~exit-node-ip~:50842) destroyed > > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Interface destroyed > > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Interface created > > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Peer 13 created > > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Sending keepalive packet to > > > >> peer 13 (~exit-node-ip~:50842) > > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Sending handshake > > > >> initiation to peer 13 (~exit-node-ip~:50842) > > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Receiving handshake > > > >> response from peer 13 (~exit-node-ip~:50842) > > > >> [Wed Sep 1 11:45:58 2021] wireguard: wg1: Keypair 2881 created for > > > >> peer 13 > > > >> [Wed Sep 1 11:46:12 2021] wireguard: wg0: Receiving keepalive packet > > > >> from peer 8 (~client-ip~:2335) > > > >> [Wed Sep 1 11:46:14 2021] wireguard: wg1: Receiving keepalive packet > > > >> from peer 13 (~exit-node-ip~:50842) > > > >> [Wed Sep 1 11:46:27 2021] wireguard: wg0: Sending keepalive packet to > > > >> peer 8 (~client-ip~:2335) > > > >> [Wed Sep 1 11:46:28 2021] wireguard: wg1: Receiving keepalive packet > > > >> from peer 13 (~exit-node-ip~:50842) > > > >> [Wed Sep 1 11:46:52 2021] wireguard: wg1: Receiving keepalive packet > > > >> from peer 13 (~exit-node-ip~:50842) > > > >> > > > >> > > > >> Thanks in advance for your kind help