On Fri, Aug 12, 2022 at 08:21:05PM -0600, Simon Glass wrote:
> Hi Tom,
> 
> On Fri, 12 Aug 2022 at 10:11, Tom Rini <tr...@konsulko.com> wrote:
> >
> > On Fri, Aug 12, 2022 at 09:11:11AM -0600, Simon Glass wrote:
> > > Hi Tom,
> > >
> > > On Thu, 11 Aug 2022 at 19:03, Tom Rini <tr...@konsulko.com> wrote:
> > > >
> > > > On Thu, Aug 11, 2022 at 06:08:51PM -0600, Simon Glass wrote:
> > > > > Hi,
> > > > >
> > > > > On Thu, 11 Aug 2022 at 11:44, Tom Rini <tr...@konsulko.com> wrote:
> > > > > >
> > > > > > On Thu, Aug 11, 2022 at 09:59:05AM -0700, Tim Harvey wrote:
> > > > > > > On Thu, Aug 11, 2022 at 9:48 AM Tom Rini <tr...@konsulko.com> 
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Thu, Aug 11, 2022 at 09:27:44AM -0700, Tim Harvey wrote:
> > > > > > > >
> > > > > > > > > Greetings,
> > > > > > > > >
> > > > > > > > > After a couple of hours troubleshooting a bad boot image 
> > > > > > > > > today I
> > > > > > > > > realized the issue was that I had some 0 byte files for the 
> > > > > > > > > lpddr4
> > > > > > > > > training blobs that are part of the imx8mp binman created 
> > > > > > > > > image.
> > > > > > > > >
> > > > > > > > > Digging in I found that if a blob referenced in the binman 
> > > > > > > > > node is
> > > > > > > > > missing a warning will be output but the missing files will be
> > > > > > > > > 'created' as 0 byte files such that the next time you build 
> > > > > > > > > you will
> > > > > > > > > get no warning (but will have a non-working image). 
> > > > > > > > > Additionally the
> > > > > > > > > error does not cause a non-zero exit code.
> > > > > > > > >
> > > > > > > > > I'm not that fluent in python these days, and don't have the 
> > > > > > > > > time for
> > > > > > > > > a while to try and fix this but I figured I would at least 
> > > > > > > > > send this
> > > > > > > > > email in case someone else does.
> > > > > > > > >
> > > > > > > > > Example:
> > > > > > > > > # rm *lpddr4*.bin # make sure lpddr4*.bin files referenced in 
> > > > > > > > > binman
> > > > > > > > > nodes are missing
> > > > > > > > > # make distclean imx8mp_venice_defconfig flash.bin && echo 
> > > > > > > > > "build ok"
> > > > > > > > > ...
> > > > > > > > >   BINMAN  flash.bin
> > > > > > > > > Image 'main-section' is missing external blobs and is 
> > > > > > > > > non-functional: ddr-1d-ime
> > > > > > > > > m-fw ddr-1d-dmem-fw ddr-2d-imem-fw ddr-2d-dmem-fw
> > > > > > > > > Image 'main-section' has faked external blobs and is 
> > > > > > > > > non-functional: lpddr4_pmu_
> > > > > > > > > train_1d_imem_202006.bin lpddr4_pmu_train_1d_dmem_202006.bin 
> > > > > > > > > lpddr4_pmu_train_2d
> > > > > > > > > _imem_202006.bin lpddr4_pmu_train_2d_dmem_202006.bin
> > > > > > > > >
> > > > > > > > > Some images are invalid
> > > > > > > > > ^^^ excellent warning
> > > > > > > > > build ok
> > > > > > > > > ^^^ not so great that there is a successful exit code
> > > > > > > > > # make flash.bin && echo "build ok"
> > > > > > > > > ...
> > > > > > > > >   BINMAN  flash.bin
> > > > > > > > > build ok
> > > > > > > > > ^^^ absolutely horrible that 0 byte files were created and 
> > > > > > > > > thus
> > > > > > > > > everything looks good this time around!
> > > > > > > > > # stat -c "%s %n" lpddr4*.bin
> > > > > > > > > 0 lpddr4_pmu_train_1d_dmem_202006.bin
> > > > > > > > > 0 lpddr4_pmu_train_1d_imem_202006.bin
> > > > > > > > > 0 lpddr4_pmu_train_2d_dmem_202006.bin
> > > > > > > > > 0 lpddr4_pmu_train_2d_imem_202006.bin
> > > > > > > >
> > > > > > > > So, this isn't the first time someone has had this problem. On 
> > > > > > > > the one
> > > > > > > > hand, we need CI to pass, and not require fetching of arbitrary 
> > > > > > > > further
> > > > > > > > images to assemble the binary. On the other hand, we don't want 
> > > > > > > > users
> > > > > > > > spending a bunch of time because something didn't work and the 
> > > > > > > > normal
> > > > > > > > way of conveying THIS WON'T WORK is a non-zero exit status. Can 
> > > > > > > > we
> > > > > > > > easily make some flag for buildman or binman that we do set in 
> > > > > > > > CI but
> > > > > > > > won't be set by users?
> > > > > > > >
> > > > > > >
> > > > > > > Tom,
> > > > > > >
> > > > > > > Ok - that makes sense as far as the exit status goes. It would be 
> > > > > > > MUCH
> > > > > > > easier to catch an error like this if binman didn't create 0 byte
> > > > > > > files for missing files so that you at least get the (ascii 
> > > > > > > colored)
> > > > > > > message indicating the image is wrong.
> > > > >
> > > > > Please see this:
> > > > >
> > > > > https://patchwork.ozlabs.org/project/uboot/patch/20220807154708.1418967-2-...@chromium.org/
> > > > >
> > > > > > >
> > > > > > > I do love the idea of a flag for CI also!
> > > > > >
> > > > > > So, that's part of the root of the problem here.  We're passing
> > > > > > --fake-ext-blobs to binman so that it will create those 0 byte files
> > > > > > and it in turn complain. I think we didn't figure out a good way to
> > > > > > tell buildman to pass that to binman tho?
> > > > >
> > > > > I hope the above patch will fix this part of the problem.
> > > >
> > > > I'm not sure. What I really want to see is something like buildman
> > > > --fake-ext-blobs, which in turn does 'make BINMANFLAGS=--fake-ext-blobs'
> > > > or 'make BINMAN_FAKE_EXT_BLOBS=1' and that is what sets --fake-ext-blobs
> > > > to be passed inside of cmd_binman, so that we can tell CI to do that
> > > > (just like we do -E today, to ensure -Werror) but regular users get
> > > > build failures.
> > >
> > > IMO we should add my patch as well as what you are saying here. They
> > > are handling slightly different issues.
> >
> > Yes, the above is useful, to fix a different issue (and should fix some
> > issues in my before/after world builds).
> 
> OK good. Yes, a different issue as you say.
> 
> >
> > > As to this one, I can make binman return an error result if any images
> > > are invalid, controlled with a -W flag, perhaps, as with buildman.
> > >
> > > It is not just faked blobs - we need to raise a warning/error for
> > > missing ones too.
> > >
> > > Then we could pass -W from CI using an env var picked up by the
> > > Makefile as you say. It could control whether --allow-missing and
> > > --fake-ext-blobs are passed to binman.
> > >
> > > Then a normal build will produce errors.
> > >
> > > We still need buildman to work locally though, for build testing. I
> > > think returning exit code 101 could be good enough for that.
> > >
> > > Anyway, as I say, these are things for future improvement in addition
> > > to my patch, IMO.
> >
> > I really think this needs to be a fails by default situation, and a new
> > flag to buildman to say allow for CI and similar test builds to
> > complete (that in turn passes the right flag(s) to binman), and complain
> > without non-zero exit status. But the default should be visible /
> > descriptive failure with a normal non-zero exit status.
> 
> I've been thinking about this. In practice there is no way that a
> 'buildman -b' build can have the right binary blobs in place. Where
> would they come from? So it would always fail. I don't think that is
> really what we want? I use buildman in this way all the time. I also
> hate binary blobs and don't want them to interrupt my workflow. As you
> know the LTO issue has more than doubled my incremental build times
> for sandbox and some other boards and I've been in this situation for
> a long time (please can you apply that fix? :-)
> 
> I agree that the default should be a visible / descriptive failure,
> but perhaps we can start with sorting out Makefile, so that by default
> it fails and you have to pass an arg to get it to handle missing or
> fake blobs. I should have done that at the start. That should fix most
> people. This is your env var idea, and I agree with that. I can do a
> patch.
> 
> Then we get to the people who use buildman for their builds instead of
> make. That has recently included me on some machines, now we have the
> --ide and -w flags, but I'm not sure how common it is. We don't
> document buildman as a way of doing development. Anyway, we need this
> to fail too, by default. In this case it is normally building a single
> board and using the -w flag.
> 
> So perhaps we could have a different rule for -b or multiple boards
> than for a 'current source' or '-w' build? I am not sure how we could
> do this cleanly, but we could perhaps have a flag like --blobs with
> several options:
> 
> --blobs fail | allow | fake
> 
> where the default value (if no --blobs flag is provided) varies
> depending on other flags?
> 
> As I read this I suppose you will think this is overkill. But I am
> really thinking about how this would affect development and there are
> so many things already in the way.

I think we're almost on the same page here. What I want is for:
tools/buildman/buildman --fake-ext-blobs
to do 'make BINMAN_FAKE_BLOBS=1' and so this then works:
diff --git a/Makefile b/Makefile
index 1a66f69a4b14..6c406735564c 100644
--- a/Makefile
+++ b/Makefile
@@ -1346,8 +1346,8 @@ cmd_binman = $(srctree)/tools/binman/binman $(if 
$(BINMAN_DEBUG),-D) \
                $(foreach f,$(BINMAN_TOOLPATHS),--toolpath $(f)) \
                 --toolpath $(objtree)/tools \
                $(if $(BINMAN_VERBOSE),-v$(BINMAN_VERBOSE)) \
-               build -u -d u-boot.dtb -O . -m --allow-missing \
-               --fake-ext-blobs \
+               build -u -d u-boot.dtb -O . -m \
+               $(if $(BINMAN_FAKE_BLOBS),--allow-missing --fake-ext-blobs) \
                -I . -I $(srctree) -I $(srctree)/board/$(BOARDDIR) \
                -I arch/$(ARCH)/dts -a of-list=$(CONFIG_OF_LIST) \
                $(foreach f,$(BINMAN_INDIRS),-I $(f)) \


So that then you're either using buildman and can pass the new flag to
get binman to act like before, or you're going to see a more visible
failure when using make and not do what needs to be done (which isn't
always consistent, sigh) to say where to find what blobs.

-- 
Tom

Attachment: signature.asc
Description: PGP signature

Reply via email to