While debugging my 802.11n code I got tired of iwm randomly failing
to associate or even getting into a state where I had to reboot to
get it to work again.

In case it got hung, the newstate task was sleeping in "iwmauth" and
never woke up, which made me revisit how this driver transitions to
AUTH state. I couldn't figure out exactly how it got hung, but I
believe I found a good reason to remove this tsleep entirely.

What's going on here is that the firmware can hop channels in the
background whenever it feels like doing so. To prevent this from
happening during the association procedure, the driver schedules a
"time event" which makes the firmware stay on the current channel
for a specified amount of time. This approach is borrowed from the
Linux driver.

If all goes well, the following happens:

1) The driver's newstate task asks for the time event and goes into
   tsleep() to wait for the firmware to signal the beginning of the
   time event.
2) The firmware signals beginning of the time event.
3) The interrupt handler sets a flag in the softc which indicates
   "time event start" and wakes up the sleeping newstate task.
4) The newstate task is scheduled, checks the flag which says that
   the time event has started, and finishes moving to AUTH state.
5) Association continues and completes
6) The firmware signals the end of the time event.
7) The interrupt handler sets a flag in the softc which indicates
   "time event ended" and wakes up any task sleeping on this
   condition (there is none)

Note that steps 5 and 6/7 can happen "concurrently" if the "time-event
ended" interrupt triggers during step 5, which is no problem.

Sometimes, especially with IWM_DEBUG enabled (lots of printf), this
doesn't work out as planned and the order of events goes like this:

1) The driver's newstate task asks for the time event and goes into
   tsleep() to wait for the firmware to signal the beginning of the
   time event.
2) The firmware signals beginning of the time event.
3) The interrupt handler sets a flag in the softc which indicates
   "time event start" and wakes up the sleeping newstate task.
4) The firmware signals the end of the time event.
5) The interrupt handler sets a flag in the softc which indicates
   "time event ended" and wakes up the sleeping newstate task.
6) The newstate task is scheduled, checks the flag which says that
   the time event has started, but it is not set so association is
   aborted.

I have also observed other variants of these chains of events where
iwm does not manage to associate.

Leaving precise synchronization like this up to the scheduler in our
kernel is not a good idea. There is no guarantee that things will work
out as intended. Moreover, the numbers controlling timing were copied
from Linux which has different scheduling latency.

This diff does away with the unreliable extra checking. It schedules a
time event, busy-waits a short while for good measure, and life goes on.
The firmware might fail to schedule the event, and if so might wander off
the channel for a moment... in which case association is still more
likely to succeed than with these unreliable checks in place.
But generally the time event will be scheduled and things just work out.

While here, also copy some related docs in comments from the Linux driver
and remove an unused argument from iwm_mvm_protect_session().

OK?

Below the diff is a trace which shows how events play out now.
The lines saying "iwm0: TE notif status = 0x01 action = 0x01" indicate
time event start, and the same with "action = 0x02" indicates the end
of the time event -- well after association is complete.

Index: if_iwm.c
===================================================================
RCS file: /cvs/src/sys/dev/pci/if_iwm.c,v
retrieving revision 1.68
diff -u -p -r1.68 if_iwm.c
--- if_iwm.c    25 Nov 2015 03:09:59 -0000      1.68
+++ if_iwm.c    7 Dec 2015 15:13:22 -0000
@@ -272,7 +272,7 @@ int iwm_mvm_send_time_event_cmd(struct i
 int    iwm_mvm_time_event_send_add(struct iwm_softc *, struct iwm_node *,
                                        void *, struct iwm_time_event_cmd_v2 *);
 void   iwm_mvm_protect_session(struct iwm_softc *, struct iwm_node *,
-                               uint32_t, uint32_t, uint32_t);
+                               uint32_t, uint32_t);
 int    iwm_nvm_read_chunk(struct iwm_softc *, uint16_t, uint16_t, uint16_t,
                                uint8_t *, uint16_t *);
 int    iwm_nvm_read_section(struct iwm_softc *, uint16_t, uint8_t *,
@@ -2216,7 +2216,7 @@ iwm_mvm_time_event_send_add(struct iwm_s
 
 void
 iwm_mvm_protect_session(struct iwm_softc *sc, struct iwm_node *in,
-       uint32_t duration, uint32_t min_duration, uint32_t max_delay)
+       uint32_t duration, uint32_t max_delay)
 {
        struct iwm_time_event_cmd_v2 time_cmd;
 
@@ -4988,7 +4988,6 @@ iwm_auth(struct iwm_softc *sc)
        struct ieee80211com *ic = &sc->sc_ic;
        struct iwm_node *in = (void *)ic->ic_bss;
        uint32_t duration;
-       uint32_t min_duration;
        int error;
 
        in->in_assoc = 0;
@@ -5014,32 +5013,14 @@ iwm_auth(struct iwm_softc *sc)
                return error;
        }
 
-       /* a bit superfluous? */
-       while (sc->sc_auth_prot)
-               tsleep(&sc->sc_auth_prot, 0, "iwmauth", 0);
-       sc->sc_auth_prot = 1;
-
-       duration = min(IWM_MVM_TE_SESSION_PROTECTION_MAX_TIME_MS,
-           200 + in->in_ni.ni_intval);
-       min_duration = min(IWM_MVM_TE_SESSION_PROTECTION_MIN_TIME_MS,
-           100 + in->in_ni.ni_intval);
-       iwm_mvm_protect_session(sc, in, duration, min_duration, 500);
-
-       while (sc->sc_auth_prot != 2) {
-               /*
-                * well, meh, but if the kernel is sleeping for half a
-                * second, we have bigger problems
-                */
-               if (sc->sc_auth_prot == 0) {
-                       DPRINTF(("%s: missed auth window!\n", DEVNAME(sc)));
-                       return ETIMEDOUT;
-               } else if (sc->sc_auth_prot == -1) {
-                       DPRINTF(("%s: no time event, denied!\n", DEVNAME(sc)));
-                       sc->sc_auth_prot = 0;
-                       return EAUTH;
-               }
-               tsleep(&sc->sc_auth_prot, 0, "iwmau2", 0);
-       }
+       /*
+        * Prevent the FW from wandering off channel during association
+        * by "protecting" the session with a time event.
+        */
+       /* XXX duration is in units of TU, not MS */
+       duration = IWM_MVM_TE_SESSION_PROTECTION_MAX_TIME_MS;
+       iwm_mvm_protect_session(sc, in, duration, 500 /* XXX magic number */);
+       DELAY(100);
 
        return 0;
 }
@@ -5642,7 +5623,6 @@ iwm_stop(struct ifnet *ifp, int disable)
        sc->sc_flags |= IWM_FLAG_STOPPED;
        sc->sc_generation++;
        sc->sc_scanband = 0;
-       sc->sc_auth_prot = 0;
        ic->ic_scan_lock = IEEE80211_SCAN_UNLOCKED;
        ifp->if_flags &= ~IFF_RUNNING;
        ifq_clr_oactive(&ifp->if_snd);
@@ -6082,16 +6062,8 @@ iwm_notif_intr(struct iwm_softc *sc)
                        struct iwm_time_event_notif *notif;
                        SYNC_RESP_STRUCT(notif, pkt);
 
-                       if (notif->status) {
-                               if (le32toh(notif->action) &
-                                   IWM_TE_V2_NOTIF_HOST_EVENT_START)
-                                       sc->sc_auth_prot = 2;
-                               else
-                                       sc->sc_auth_prot = 0;
-                       } else {
-                               sc->sc_auth_prot = -1;
-                       }
-                       wakeup(&sc->sc_auth_prot);
+                       DPRINTF(("%s: TE notif status = 0x%x action = 0x%x\n",
+                           DEVNAME(sc), notif->status, notif->action));
                        break; }
 
                case IWM_MCAST_FILTER_CMD:
Index: if_iwmreg.h
===================================================================
RCS file: /cvs/src/sys/dev/pci/if_iwmreg.h,v
retrieving revision 1.6
diff -u -p -r1.6 if_iwmreg.h
--- if_iwmreg.h 15 Nov 2015 09:39:09 -0000      1.6
+++ if_iwmreg.h 7 Dec 2015 15:04:18 -0000
@@ -2008,6 +2008,43 @@ struct iwm_time_event_cmd_v1 {
 
 /* Time event - defines for command API v2 */
 
+/**
+ * DOC: Time Events - what is it?
+ *
+ * Time Events are a fw feature that allows the driver to control the presence
+ * of the device on the channel. Since the fw supports multiple channels
+ * concurrently, the fw may choose to jump to another channel at any time.
+ * In order to make sure that the fw is on a specific channel at a certain time
+ * and for a certain duration, the driver needs to issue a time event.
+ *
+ * The simplest example is for BSS association. The driver issues a time event,
+ * waits for it to start, and only then tells mac80211 that we can start the
+ * association. This way, we make sure that the association will be done
+ * smoothly and won't be interrupted by channel switch decided within the fw.
+ */
+
+ /**
+ * DOC: The flow against the fw
+ *
+ * When the driver needs to make sure we are in a certain channel, at a certain
+ * time and for a certain duration, it sends a Time Event. The flow against the
+ * fw goes like this:
+ *     1) Driver sends a TIME_EVENT_CMD to the fw
+ *     2) Driver gets the response for that command. This response contains the
+ *        Unique ID (UID) of the event.
+ *     3) The fw sends notification when the event starts.
+ *
+ * Of course the API provides various options that allow to cover parameters
+ * of the flow.
+ *     What is the duration of the event?
+ *     What is the start time of the event?
+ *     Is there an end-time for the event?
+ *     How much can the event be delayed?
+ *     Can the event be split?
+ *     If yes what is the maximal number of chunks?
+ *     etc...
+ */
+
 /*
  * @IWM_TE_V2_FRAG_NONE: fragmentation of the time event is NOT allowed.
  * @IWM_TE_V2_FRAG_SINGLE: fragmentation of the time event is allowed, but only
Index: if_iwmvar.h
===================================================================
RCS file: /cvs/src/sys/dev/pci/if_iwmvar.h,v
retrieving revision 1.12
diff -u -p -r1.12 if_iwmvar.h
--- if_iwmvar.h 22 Oct 2015 11:51:28 -0000      1.12
+++ if_iwmvar.h 7 Dec 2015 15:04:33 -0000
@@ -305,7 +305,7 @@ struct iwm_ucode_status {
 
 #define IWM_OTP_LOW_IMAGE_SIZE 2048
 
-#define IWM_MVM_TE_SESSION_PROTECTION_MAX_TIME_MS 500
+#define IWM_MVM_TE_SESSION_PROTECTION_MAX_TIME_MS 1000
 #define IWM_MVM_TE_SESSION_PROTECTION_MIN_TIME_MS 400
 
 enum IWM_CMD_MODE {
@@ -440,8 +440,6 @@ struct iwm_softc {
        size_t sc_scan_cmd_len;
        int sc_scan_last_antenna;
        int sc_scanband;
-
-       int sc_auth_prot;
 
        int sc_fixed_ridx;
 

Dec  7 16:11:48 jessup /bsd: switching state SCAN->AUTH
Dec  7 16:11:48 jessup /bsd: sending command 0xd0 qid 9, idx 25
Dec  7 16:11:48 jessup /bsd: sending command 0x8 qid 9, idx 26
Dec  7 16:11:48 jessup /bsd: sending command 0x2b qid 9, idx 27
Dec  7 16:11:48 jessup /bsd: sending command 0x18 qid 9, idx 28
Dec  7 16:11:48 jessup /bsd: Add new TE, duration 1000 TU
Dec  7 16:11:48 jessup /bsd: sending command 0x29 qid 9, idx 29
Dec  7 16:11:48 jessup /bsd: iwm0: TE notif status = 0x1 action = 0x1
Dec  7 16:11:48 jessup /bsd: switching state AUTH->ASSOC
Dec  7 16:11:48 jessup /bsd: sending command 0x18 qid 9, idx 30
Dec  7 16:11:48 jessup /bsd: dtim 1 0
Dec  7 16:11:48 jessup /bsd: DTIM TBTT is 0x18279706686/0x3656b4, offset 0
Dec  7 16:11:48 jessup /bsd: sending command 0x28 qid 9, idx 31
Dec  7 16:11:48 jessup /bsd: switching state ASSOC->RUN
Dec  7 16:11:48 jessup /bsd: Sending power table command on mac id 0x0 for 
power level 1, flags = 0x0
Dec  7 16:11:48 jessup /bsd: Keep alive = 25 sec
Dec  7 16:11:48 jessup /bsd: Disable power management
Dec  7 16:11:48 jessup /bsd: sending command 0xa9 qid 9, idx 32
Dec  7 16:11:48 jessup /bsd: sending command 0xd2 qid 9, idx 33
Dec  7 16:11:48 jessup /bsd: ba_enable_beacon_abort is: 0
Dec  7 16:11:48 jessup /bsd: ba_escape_timer is: 6
Dec  7 16:11:48 jessup /bsd: bf_debug_flag is: 0
Dec  7 16:11:48 jessup /bsd: bf_enable_beacon_filter is: 1
Dec  7 16:11:48 jessup /bsd: bf_energy_delta is: 5
Dec  7 16:11:48 jessup /bsd: bf_escape_timer is: 50
Dec  7 16:11:48 jessup /bsd: bf_roaming_energy_delta is: 1
Dec  7 16:11:48 jessup /bsd: bf_roaming_state is: 72
Dec  7 16:11:48 jessup /bsd: bf_temp_threshold is: 112
Dec  7 16:11:48 jessup /bsd: bf_temp_fast_filter is: 1
Dec  7 16:11:48 jessup /bsd: bf_temp_slow_filter is: 5
Dec  7 16:11:48 jessup /bsd: sending command 0xd2 qid 9, idx 34
Dec  7 16:11:48 jessup /bsd: ba_enable_beacon_abort is: 0
Dec  7 16:11:48 jessup /bsd: ba_escape_timer is: 6
Dec  7 16:11:48 jessup /bsd: bf_debug_flag is: 0
Dec  7 16:11:48 jessup /bsd: bf_enable_beacon_filter is: 1
Dec  7 16:11:48 jessup /bsd: bf_energy_delta is: 5
Dec  7 16:11:48 jessup /bsd: bf_escape_timer is: 50
Dec  7 16:11:48 jessup /bsd: bf_roaming_energy_delta is: 1
Dec  7 16:11:48 jessup /bsd: bf_roaming_state is: 72
Dec  7 16:11:48 jessup /bsd: bf_temp_threshold is: 112
Dec  7 16:11:48 jessup /bsd: bf_temp_fast_filter is: 1
Dec  7 16:11:48 jessup /bsd: bf_temp_slow_filter is: 5
Dec  7 16:11:48 jessup /bsd: sending command 0x2c qid 9, idx 35
Dec  7 16:11:48 jessup /bsd: sending command 0x4e qid 9, idx 36
Dec  7 16:11:49 jessup /bsd: iwm0: TE notif status = 0x1 action = 0x2

Reply via email to