Hi, On 18 April 2017 at 21:03, Andreas Färber <afaer...@suse.de> wrote: > Hi Simon, > > Am 19.04.2017 um 02:12 schrieb Simon Glass: >> On 18 April 2017 at 12:44, Heinrich Schuchardt <xypron.g...@gmx.de> wrote: >>> Set devp even if probing fails. >>> >>> Without the patch the following problem occurs: >>> If the first block device is not probed successfully no block >>> device is passed by bootefi to the EFI executable. >>> >>> The problem was reported by Andreas Färber in >>> https://lists.denx.de/pipermail/u-boot/2017-April/287432.html >>> >>> For testing I used an odroid-c2 with a dts including >>> &sd_emmc_a { >>> status = "okay"; >>> } >>> This device does not exist on the board and cannot be initialized. >> >> Thanks for finding this. Which function is calling this? Can you >> please explain the call sequence to hit this problem? I am worried >> that something else is wrong. > > It is lib/efi_loader/efi_disk.c:efi_disk_register() that is calling > uclass_first_device() and uclass_next_device(). Based on the output we > had concluded that uclass_first_device() fails already - therefore Alex > CC'ed you. > > Unfortunately I find the function interactions inside uclass.c rather > complex and unintuitive, so I am truely amazed at Heinrich's findings. > (Passing the ret code from one function to the next just to return?!) > > Based on his patch we can assume that uclass_find_first_device() set dev > to a non-NULL value, otherwise we wouldn't end up in > uclass_get_device_tail(). > > So if you're saying this patch is wrong, then that would leave > device_probe(). The actual meson_mmc_probe() function you had given a > late Reviewed-by, and it always returns 0. > > Jaehoon had asked about cfg->voltages in meson_mmc_probe(), but I don't > see how that would lead to probe failure here? Whether the values are > correct I have no idea. > > The error message "mmc_init: -95, time 1807", which appears to be a > result of this iteration/probe process, comes from > drivers/mmc/mmc.c:mmc_init(), which is called from > mmc-uclass.c:mmc_blk_probe(). mmc_init() calls mmc_start_init(), which > returns this -EOPNOTSUPP if sd_send_op_cond return -ETIMEDOUT and > mmc_send_op_cond() failed. However, our output does not show "Card did > not respond to voltage select!", which it should in that case. So the > error must be coming from elsewhere. sd_send_op_cond() may return > -EOPNOTSUPP through mmc_start_init(), and so does mmc_complete_op_cond() > through mmc_complete_init(). I would expect either to fail, as in my > case it's an SDIO, and in Heinrich's case it's not connected to > anything. So I don't think MMC is at fault here. > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm64/boot/dts/amlogic/meson-gxbb-vega-s95.dtsi > > See sd_mmc_a, which uses an mmc-pwrseq that depends on a pwm-clock, > neither of which have any support in U-Boot, nor is it actually needed > for booting. > > In any case, failure to probe one device should not break the iteration > of other devices. I.e., if a device fails to probe then instead of > returning from uclass_{first,next}_device() we should look at the next > device until we find one that's okay or have reached the end of the > list. How to best implement that in uclass.c I'm less sure of. > > If I am right you could test this on any board with multiple, e.g., blk > devices where a first or non-last device returns an error code from its > probe, i.e. try changing return 0 in some probe function based on a > static variable not yet being initialized or something. Probably you > could even write some tiny sandbox based test, without actual hardware. > > IIUC this patch changes behavior from iterating over no devices to > iterating over all devices including ones not probed successfully? And > what you intended was to instead iterate over only the probed devices?
I think this is a genuine bug, but exactly where is less obvious. The uclass function don't return a device if they get an error. So at present you need to iterate through the uclass if the intention is to continue after error. That would be uclass_foreach_dev(). But before using the device, device_probe() needs to be called since the iteration does not probe it automatically. That is why people normally use uclass_first/next_device(), since it probes as it goes. However in this case I think it is reasonable to change uclass_first/next_device() to always return the device, even on error. That can best be done by changing those functions rather than uclass_get_device_tail(), which is shared by many functions. We should not change uclass_get_device_tail() since what is does is correct for all other uses (I think). In that case, the function comments for the first/next functions should be updated to indicate the new behaviour, and a test created, perhaps in test/dm/core.c. You were asking about uclass_get_device_tail(). This is common code and is put into a function to ensure consistent behaviour across all functions that return an error code and a device. Consistency is very important with driver model since things can get version confusing when something odd happens in the bowels of the system, as this bug has shown. Regards, Simon _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot