On Thu, 5 Sept 2024 at 14:07, Michal Simek <michal.si...@amd.com> wrote: > > > > On 9/5/24 10:12, Sughosh Ganu wrote: > > On Thu, 5 Sept 2024 at 13:20, Michal Simek <michal.si...@amd.com> wrote: > >> > >> > >> > >> On 9/5/24 09:43, Sughosh Ganu wrote: > >>> On Thu, 5 Sept 2024 at 13:09, Michal Simek <michal.si...@amd.com> wrote: > >>>> > >>>> > >>>> > >>>> On 9/5/24 09:38, Sughosh Ganu wrote: > >>>>> On Thu, 5 Sept 2024 at 13:03, Michal Simek <michal.si...@amd.com> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 9/5/24 08:24, Sughosh Ganu wrote: > >>>>>>> On Wed, 4 Sept 2024 at 17:30, Michal Simek <michal.si...@amd.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 8/30/24 13:40, Sughosh Ganu wrote: > >>>>>>>>> > >>>>>>>>> The following set of patches are miscellaneous fixes and some > >>>>>>>>> hardening of the FWU update logic. > >>>>>>>>> > >>>>>>>>> Sughosh Ganu (6): > >>>>>>>>> fwu: v2: perform some checks before reading metadata > >>>>>>>>> fwu: v2: try reading both copies of metadata > >>>>>>>>> fwu: v1: do a version check for the metadata > >>>>>>>>> fwu: check all images for transitioning out of Trial State > >>>>>>>>> fwu: add dependency checks for selecting FWU metadata version > >>>>>>>>> fwu: do not allow capsule processing on exceeding Trial > >>>>>>>>> Counter > >>>>>>>>> threshold > >>>>>>>>> > >>>>>>>>> include/fwu.h | 11 ++++++ > >>>>>>>>> lib/fwu_updates/Kconfig | 1 + > >>>>>>>>> lib/fwu_updates/fwu.c | 31 +++++++++++++++- > >>>>>>>>> lib/fwu_updates/fwu_v1.c | 18 +++++++-- > >>>>>>>>> lib/fwu_updates/fwu_v2.c | 80 > >>>>>>>>> ++++++++++++++++++++++------------------ > >>>>>>>>> 5 files changed, 99 insertions(+), 42 deletions(-) > >>>>>>>>> > >>>>>>>> > >>>>>>>> I found one more thing. > >>>>>>>> > >>>>>>>> I did this change > >>>>>>>> > >>>>>>>> diff --git a/tools/mkfwumdata.c b/tools/mkfwumdata.c > >>>>>>>> index fbc2067bc12d..dab9530e499c 100644 > >>>>>>>> --- a/tools/mkfwumdata.c > >>>>>>>> +++ b/tools/mkfwumdata.c > >>>>>>>> @@ -293,7 +293,7 @@ static void > >>>>>>>> fwu_fill_version_specific_mdata(struct > >>>>>>>> fwu_mdata_object *mobj) > >>>>>>>> struct fwu_mdata *mdata = mobj->mdata; > >>>>>>>> > >>>>>>>> mdata->metadata_size = mobj->size; > >>>>>>>> - mdata->desc_offset = sizeof(struct fwu_mdata); > >>>>>>>> + mdata->desc_offset = 0x21; sizeof(struct fwu_mdata); > >>>>>>>> > >>>>>>>> for (i = 0; i < MAX_BANKS_V2; i++) > >>>>>>>> mdata->bank_state[i] = i < mobj->banks ? > >>>>>>>> > >>>>>>>> > >>>>>>>> to break desc_offset location. I generated mdata structure and write > >>>>>>>> it to > >>>>>>>> primary mdata partition. > >>>>>>>> > >>>>>>>> This has been spotted by (my debug) which is the reason why I did it. > >>>>>>>> > >>>>>>>> if (g_mdata.desc_offset != FWU_IMG_DESC_OFFSET) { > >>>>>>>> printf("mdata offset is not 0x20\n"); > >>>>>>>> return false; > >>>>>>>> } > >>>>>>>> > >>>>>>>> But I got one more message below which > >>>>>>>> > >>>>>>>> mdata offset is not 0x20 > >>>>>>>> Both FWU metadata copies are valid but do not match. Restoring the > >>>>>>>> secondary > >>>>>>>> partition from the primary > >>>>>>> > >>>>>>> The reason why this logic is in place is to handle the scenario > >>>>>>> whereby, during the metadata update, the primary copy gets updated > >>>>>>> correctly, but due to some reason (like a power cut), the secondary > >>>>>>> copy does not -- this is because the primary copy gets updated first. > >>>>>>> In such a scenario, the secondary copy of metadata can be then > >>>>>>> restored from the primary copy. In the scenario where the primary copy > >>>>>>> is corrupted, it will show up in the crc check, and will get restored > >>>>>>> from the secondary copy. > >>>>>> > >>>>>> But in this case - CRC of primary is correct but data inside is wrong > >>>>>> because > >>>>>> offset is at 0x21 not at 0x20. > >>>>> > >>>>> How will this event play out ? Am I missing something ? When the > >>>>> metadata is getting updated, the offset will be written as 0x20, and > >>>>> not 0x21. And in case the metadata gets corrupted, the CRC check would > >>>>> fail and the primary partition would get restored from the secondary > >>>>> (assuming that the secondary copy is correct). > >>>> > >>>> It is not after update. It is initial state with incorrect mdata > >>>> structure. > >>> > >>> Well, if the metadata partition is being written with incorrect data, > >>> not sure how we combat that (or even if we should). After all, the > >>> spec clearly states that the metadata cannot be protected against > >>> malicious writes. The logic that you pointed out earlier is to handle > >>> the scenario where the primary partition got updated, but the > >>> secondary did not. > >> > >> I don't disagree with you. > >> > >> Scenario here is that primary partition is updated with incorrect data > >> content > >> (but still correct CRC). This is correctly detected that data is not right. > >> It means primary partition is wrong and shouldn't be used and it is not > >> used. > >> > >> But because it has correct CRC syncup code logic is doing syncup. > >> > >> That's why I think we have two codes which have pretty much independent > >> logic > >> what to do. If primary is wrong sync should be from secondary to primary > >> but > >> that's not what code is doing. > > > > The scenario of "primary is wrong" will only play out in case of > > corruption in the metadata. The scenario that you highlight, that of > > the metadata being wrong, but the CRC check succeeds will only happen > > when the metadata has been maliciously written -- > > yes. And I did it on purpose to check your code and all error condition you > are > checking to make sure that code traps them.
The code should detect and fix any genuine error conditions that might come up in the field. Even the check for the case where the two metadata copies are valid but different is a scenario that can actually happen when updating the metadata copies, and there is a reason why we are assuming that the primary is the correct copy and should be used to restore the secondary. If there are any valid scenarios that are not being handled, they should be identified and fixed. > > > if not, then the > > secondary partition should also be written with the wrong metadata, > > and the checks would then fail. In the case of the primary metadata > > getting corrupted, it does get restored from the secondary partition, > > assuming that the secondary partition is intact. > > When primary is cleared/has incorrect CRC there is no problem and recovery > happens but if CRC is right but data inside wrong code doesn't check it. This is where I think that this is not a genuine "error" case. So, if someone has written a metadata copy with wrong data but right CRC, and only written this bad copy to the primary partition, how do we trust that the secondary copy is actually correct ? One can generate a secondary metadata copy where the primary checks pass, but the image information in the metadata is incorrect. -sughosh > > Thanks, > Michal >