On Mon, Nov 12, 2018 at 06:46:26PM -0200, Martin Pieuchot wrote: > On 11/11/18(Sun) 13:10, Ben Pye wrote: > > Hey all, > > > > In my quest for better OpenBSD support on my Chromebook 13 I have found > > that sdmmc(4)'s current strategy for suspend/resume only really works > > for removable storage. Upon a suspend it marks the device as dying, and > > upon resume will detach the card and re-scan the bus. This makes sense > > for removable SD cards as they may be removed whilst the device is > > asleep, however for non-removable eMMC we can make the assumption that > > the device will not be removed during sleep. > > > > By assuming the device stays present over a sleep we can avoid the > > detach and re-scan behaviour that sdmmc(4) currently uses. This is > > important if you root storage device is an eMMC device /and/ you are > > using softraid(4) as softraid(4) will not handle the detach/attach. > > Despite this on my hardware at least suspend/resume works more reliably > > if we can avoid this detach behaviour too, on this device we would > > really have to detach before we enter sleep as the bus can break during > > suspend and then the detach/reattach fails after resume meaning that my > > sd0 device just disappears. > > > > Instead for non-removable devices upon sleep we can deselect the card > > but otherwise leave it configured. On resume we still perform a host > > reset in sdhc(4) and then we can re-do the device initialisation in > > sdmmc(4). During initialisation we use some of the values determined in > > the initial sdmmc(4) init as the goal is to put the device into the same > > state. We also hook up to the same sdmmc_function's etc so that the > > child devices (scsibus etc) do not have to do anything special on a > > resume. With this behaviour suspend/resume works reliably on my HP > > Chromebook 13 and softraid for encryption continues to work. > > > > There are a few unresolved questions however and that's why I'm posting > > this here now: > > > > 1. How do we know if a device is non-removable, I don't think an MMC > > device indicates this, maybe we have to determine this by the > > controller? > > If you find a way to know that would be interesting. However that > doesn't mean we want to go this road. > > I'd rephrase the question differently. How do we know if a device, > attached to a removable bus, has been removed or swapped while the > machine was suspended? > > Do you know how other OSes handle this case? What kind of changes > would be required to handle this in OpenBSD? At which layer? USB > currently follows the same logic, so it could be solved there as > well with the same stone.
Good point. As far as I can tell Linux attempts to keep the device initialised, removable card or not. They check the CID and if it has changed I think they are detaching it - so for example, a removable card. Perhaps taking this approach would be better - if we could keep removable devices attached if they haven't changed then it would be possible to use them as your root device for example. This would have a greater impact though and might have a more potential for breaking other devices. > > > 2. There are some error cases in sdmmc_mem_scan that are unhandled. They > > shouldn't ever be hit, if they were it would indicate that something to > > do with our non-removable card has changed over a suspend/resume cycle. > > What should happen if such error occurs? I think just propagating the error up to sdmmc_activate should be sufficient as I believe this should result in the device and it's children being detached and freed. In fact if we wanted to take this approach for removable devices then this is where I suppose we would notice that the device has changed. > > > 3. I appear to hit an issue with some commands failing after a suspend > > if I don't add a small delay to sdhc_wait_intr_cold, I haven't yet been > > able to work out why this is. Specifically, the second time we read > > EXT_CSD in sdmmc_mem_mmc_init fails. > > That would be nice to investigate. Such workarounds are a good start, > but not a nice solution. Understood. I just wanted to get it working for now so we could work out when we should even be going down this code path.
