Hi Tom,

On Fri, 12 Aug 2022 at 10:11, Tom Rini <tr...@konsulko.com> wrote:
>
> On Fri, Aug 12, 2022 at 09:11:11AM -0600, Simon Glass wrote:
> > Hi Tom,
> >
> > On Thu, 11 Aug 2022 at 19:03, Tom Rini <tr...@konsulko.com> wrote:
> > >
> > > On Thu, Aug 11, 2022 at 06:08:51PM -0600, Simon Glass wrote:
> > > > Hi,
> > > >
> > > > On Thu, 11 Aug 2022 at 11:44, Tom Rini <tr...@konsulko.com> wrote:
> > > > >
> > > > > On Thu, Aug 11, 2022 at 09:59:05AM -0700, Tim Harvey wrote:
> > > > > > On Thu, Aug 11, 2022 at 9:48 AM Tom Rini <tr...@konsulko.com> wrote:
> > > > > > >
> > > > > > > On Thu, Aug 11, 2022 at 09:27:44AM -0700, Tim Harvey wrote:
> > > > > > >
> > > > > > > > Greetings,
> > > > > > > >
> > > > > > > > After a couple of hours troubleshooting a bad boot image today I
> > > > > > > > realized the issue was that I had some 0 byte files for the 
> > > > > > > > lpddr4
> > > > > > > > training blobs that are part of the imx8mp binman created image.
> > > > > > > >
> > > > > > > > Digging in I found that if a blob referenced in the binman node 
> > > > > > > > is
> > > > > > > > missing a warning will be output but the missing files will be
> > > > > > > > 'created' as 0 byte files such that the next time you build you 
> > > > > > > > will
> > > > > > > > get no warning (but will have a non-working image). 
> > > > > > > > Additionally the
> > > > > > > > error does not cause a non-zero exit code.
> > > > > > > >
> > > > > > > > I'm not that fluent in python these days, and don't have the 
> > > > > > > > time for
> > > > > > > > a while to try and fix this but I figured I would at least send 
> > > > > > > > this
> > > > > > > > email in case someone else does.
> > > > > > > >
> > > > > > > > Example:
> > > > > > > > # rm *lpddr4*.bin # make sure lpddr4*.bin files referenced in 
> > > > > > > > binman
> > > > > > > > nodes are missing
> > > > > > > > # make distclean imx8mp_venice_defconfig flash.bin && echo 
> > > > > > > > "build ok"
> > > > > > > > ...
> > > > > > > >   BINMAN  flash.bin
> > > > > > > > Image 'main-section' is missing external blobs and is 
> > > > > > > > non-functional: ddr-1d-ime
> > > > > > > > m-fw ddr-1d-dmem-fw ddr-2d-imem-fw ddr-2d-dmem-fw
> > > > > > > > Image 'main-section' has faked external blobs and is 
> > > > > > > > non-functional: lpddr4_pmu_
> > > > > > > > train_1d_imem_202006.bin lpddr4_pmu_train_1d_dmem_202006.bin 
> > > > > > > > lpddr4_pmu_train_2d
> > > > > > > > _imem_202006.bin lpddr4_pmu_train_2d_dmem_202006.bin
> > > > > > > >
> > > > > > > > Some images are invalid
> > > > > > > > ^^^ excellent warning
> > > > > > > > build ok
> > > > > > > > ^^^ not so great that there is a successful exit code
> > > > > > > > # make flash.bin && echo "build ok"
> > > > > > > > ...
> > > > > > > >   BINMAN  flash.bin
> > > > > > > > build ok
> > > > > > > > ^^^ absolutely horrible that 0 byte files were created and thus
> > > > > > > > everything looks good this time around!
> > > > > > > > # stat -c "%s %n" lpddr4*.bin
> > > > > > > > 0 lpddr4_pmu_train_1d_dmem_202006.bin
> > > > > > > > 0 lpddr4_pmu_train_1d_imem_202006.bin
> > > > > > > > 0 lpddr4_pmu_train_2d_dmem_202006.bin
> > > > > > > > 0 lpddr4_pmu_train_2d_imem_202006.bin
> > > > > > >
> > > > > > > So, this isn't the first time someone has had this problem. On 
> > > > > > > the one
> > > > > > > hand, we need CI to pass, and not require fetching of arbitrary 
> > > > > > > further
> > > > > > > images to assemble the binary. On the other hand, we don't want 
> > > > > > > users
> > > > > > > spending a bunch of time because something didn't work and the 
> > > > > > > normal
> > > > > > > way of conveying THIS WON'T WORK is a non-zero exit status. Can we
> > > > > > > easily make some flag for buildman or binman that we do set in CI 
> > > > > > > but
> > > > > > > won't be set by users?
> > > > > > >
> > > > > >
> > > > > > Tom,
> > > > > >
> > > > > > Ok - that makes sense as far as the exit status goes. It would be 
> > > > > > MUCH
> > > > > > easier to catch an error like this if binman didn't create 0 byte
> > > > > > files for missing files so that you at least get the (ascii colored)
> > > > > > message indicating the image is wrong.
> > > >
> > > > Please see this:
> > > >
> > > > https://patchwork.ozlabs.org/project/uboot/patch/20220807154708.1418967-2-...@chromium.org/
> > > >
> > > > > >
> > > > > > I do love the idea of a flag for CI also!
> > > > >
> > > > > So, that's part of the root of the problem here.  We're passing
> > > > > --fake-ext-blobs to binman so that it will create those 0 byte files
> > > > > and it in turn complain. I think we didn't figure out a good way to
> > > > > tell buildman to pass that to binman tho?
> > > >
> > > > I hope the above patch will fix this part of the problem.
> > >
> > > I'm not sure. What I really want to see is something like buildman
> > > --fake-ext-blobs, which in turn does 'make BINMANFLAGS=--fake-ext-blobs'
> > > or 'make BINMAN_FAKE_EXT_BLOBS=1' and that is what sets --fake-ext-blobs
> > > to be passed inside of cmd_binman, so that we can tell CI to do that
> > > (just like we do -E today, to ensure -Werror) but regular users get
> > > build failures.
> >
> > IMO we should add my patch as well as what you are saying here. They
> > are handling slightly different issues.
>
> Yes, the above is useful, to fix a different issue (and should fix some
> issues in my before/after world builds).

OK good. Yes, a different issue as you say.

>
> > As to this one, I can make binman return an error result if any images
> > are invalid, controlled with a -W flag, perhaps, as with buildman.
> >
> > It is not just faked blobs - we need to raise a warning/error for
> > missing ones too.
> >
> > Then we could pass -W from CI using an env var picked up by the
> > Makefile as you say. It could control whether --allow-missing and
> > --fake-ext-blobs are passed to binman.
> >
> > Then a normal build will produce errors.
> >
> > We still need buildman to work locally though, for build testing. I
> > think returning exit code 101 could be good enough for that.
> >
> > Anyway, as I say, these are things for future improvement in addition
> > to my patch, IMO.
>
> I really think this needs to be a fails by default situation, and a new
> flag to buildman to say allow for CI and similar test builds to
> complete (that in turn passes the right flag(s) to binman), and complain
> without non-zero exit status. But the default should be visible /
> descriptive failure with a normal non-zero exit status.

I've been thinking about this. In practice there is no way that a
'buildman -b' build can have the right binary blobs in place. Where
would they come from? So it would always fail. I don't think that is
really what we want? I use buildman in this way all the time. I also
hate binary blobs and don't want them to interrupt my workflow. As you
know the LTO issue has more than doubled my incremental build times
for sandbox and some other boards and I've been in this situation for
a long time (please can you apply that fix? :-)

I agree that the default should be a visible / descriptive failure,
but perhaps we can start with sorting out Makefile, so that by default
it fails and you have to pass an arg to get it to handle missing or
fake blobs. I should have done that at the start. That should fix most
people. This is your env var idea, and I agree with that. I can do a
patch.

Then we get to the people who use buildman for their builds instead of
make. That has recently included me on some machines, now we have the
--ide and -w flags, but I'm not sure how common it is. We don't
document buildman as a way of doing development. Anyway, we need this
to fail too, by default. In this case it is normally building a single
board and using the -w flag.

So perhaps we could have a different rule for -b or multiple boards
than for a 'current source' or '-w' build? I am not sure how we could
do this cleanly, but we could perhaps have a flag like --blobs with
several options:

--blobs fail | allow | fake

where the default value (if no --blobs flag is provided) varies
depending on other flags?

As I read this I suppose you will think this is overkill. But I am
really thinking about how this would affect development and there are
so many things already in the way.

Regards,
Simon

Reply via email to