Hi,

On 18 April 2017 at 21:03, Andreas Färber <afaer...@suse.de> wrote:
> Hi Simon,
>
> Am 19.04.2017 um 02:12 schrieb Simon Glass:
>> On 18 April 2017 at 12:44, Heinrich Schuchardt <xypron.g...@gmx.de> wrote:
>>> Set devp even if probing fails.
>>>
>>> Without the patch the following problem occurs:
>>> If the first block device is not probed successfully no block
>>> device is passed by bootefi to the EFI executable.
>>>
>>> The problem was reported by Andreas Färber in
>>> https://lists.denx.de/pipermail/u-boot/2017-April/287432.html
>>>
>>> For testing I used an odroid-c2 with a dts including
>>> &sd_emmc_a {
>>>         status = "okay";
>>> }
>>> This device does not exist on the board and cannot be initialized.
>>
>> Thanks for finding this. Which function is calling this? Can you
>> please explain the call sequence to hit this problem? I am worried
>> that something else is wrong.
>
> It is lib/efi_loader/efi_disk.c:efi_disk_register() that is calling
> uclass_first_device() and uclass_next_device(). Based on the output we
> had concluded that uclass_first_device() fails already - therefore Alex
> CC'ed you.
>
> Unfortunately I find the function interactions inside uclass.c rather
> complex and unintuitive, so I am truely amazed at Heinrich's findings.
> (Passing the ret code from one function to the next just to return?!)
>
> Based on his patch we can assume that uclass_find_first_device() set dev
> to a non-NULL value, otherwise we wouldn't end up in
> uclass_get_device_tail().
>
> So if you're saying this patch is wrong, then that would leave
> device_probe(). The actual meson_mmc_probe() function you had given a
> late Reviewed-by, and it always returns 0.
>
> Jaehoon had asked about cfg->voltages in meson_mmc_probe(), but I don't
> see how that would lead to probe failure here? Whether the values are
> correct I have no idea.
>
> The error message "mmc_init: -95, time 1807", which appears to be a
> result of this iteration/probe process, comes from
> drivers/mmc/mmc.c:mmc_init(), which is called from
> mmc-uclass.c:mmc_blk_probe(). mmc_init() calls mmc_start_init(), which
> returns this -EOPNOTSUPP if sd_send_op_cond return -ETIMEDOUT and
> mmc_send_op_cond() failed. However, our output does not show "Card did
> not respond to voltage select!", which it should in that case. So the
> error must be coming from elsewhere. sd_send_op_cond() may return
> -EOPNOTSUPP through mmc_start_init(), and so does mmc_complete_op_cond()
> through mmc_complete_init(). I would expect either to fail, as in my
> case it's an SDIO, and in Heinrich's case it's not connected to
> anything. So I don't think MMC is at fault here.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm64/boot/dts/amlogic/meson-gxbb-vega-s95.dtsi
>
> See sd_mmc_a, which uses an mmc-pwrseq that depends on a pwm-clock,
> neither of which have any support in U-Boot, nor is it actually needed
> for booting.
>
> In any case, failure to probe one device should not break the iteration
> of other devices. I.e., if a device fails to probe then instead of
> returning from uclass_{first,next}_device() we should look at the next
> device until we find one that's okay or have reached the end of the
> list. How to best implement that in uclass.c I'm less sure of.
>
> If I am right you could test this on any board with multiple, e.g., blk
> devices where a first or non-last device returns an error code from its
> probe, i.e. try changing return 0 in some probe function based on a
> static variable not yet being initialized or something. Probably you
> could even write some tiny sandbox based test, without actual hardware.
>
> IIUC this patch changes behavior from iterating over no devices to
> iterating over all devices including ones not probed successfully? And
> what you intended was to instead iterate over only the probed devices?

I think this is a genuine bug, but exactly where is less obvious.

The uclass function don't return a device if they get an error. So at
present you need to iterate through the uclass if the intention is to
continue after error. That would be uclass_foreach_dev(). But before
using the device, device_probe() needs to be called since the
iteration does not probe it automatically. That is why people normally
use uclass_first/next_device(), since it probes as it goes.

However in this case I think it is reasonable to change
uclass_first/next_device() to always return the device, even on error.
That can best be done by changing those functions rather than
uclass_get_device_tail(), which is shared by many functions. We should
not change uclass_get_device_tail() since what is does is correct for
all other uses (I think).

In that case, the function comments for the first/next functions
should be updated to indicate the new behaviour, and a test created,
perhaps in test/dm/core.c.

You were asking about uclass_get_device_tail(). This is common code
and is put into a function to ensure consistent behaviour across all
functions that return an error code and a device. Consistency is very
important with driver model since things can get version confusing
when something odd happens in the bowels of the system, as this bug
has shown.

Regards,
Simon
_______________________________________________
U-Boot mailing list
U-Boot@lists.denx.de
https://lists.denx.de/listinfo/u-boot

Reply via email to