On 06/07/2021 13:03, Jan Beulich wrote: > On 06.07.2021 13:23, Andrew Cooper wrote: >> 'count * sizeof(*pfns)' can in principle overflow, but is implausible in >> practice as the time between checkpoints is typically sub-second. >> Nevertheless, simplify the code and remove the risk. >> >> There is no need to loop over the bitmap to calculate count. The number of >> set bits is returned in xc_shadow_op_stats_t which is already collected (and >> ignored). >> >> Bounds check the count against what will fit in REC_LENGTH_MAX. At the time >> of writing, this allows up to 0xffffff pfns. > Well, okay, this then means that an overflow in the reporting of > dirty_count doesn't matter for now, because the limit is lower > anyway. > >> Rearrange the pfns loop to check >> for errors both ways, not simply that there were more pfns than expected. > Hmm, "both ways" to me would mean ... > >> @@ -459,24 +462,20 @@ static int send_checkpoint_dirty_pfn_list(struct >> xc_sr_context *ctx) >> goto err; >> } >> >> - for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i ) >> + for ( i = 0, written = 0; count && i < ctx->restore.p2m_size; ++i, >> --count ) >> { >> if ( !test_bit(i, dirty_bitmap) ) >> continue; >> >> - if ( written > count ) >> - { >> - ERROR("Dirty pfn list exceed"); >> - goto err; >> - } >> - >> pfns[written++] = i; >> } >> >> - rec.length = count * sizeof(*pfns); >> - >> - iov[1].iov_base = pfns; >> - iov[1].iov_len = rec.length; >> + if ( written != stats.dirty_count ) >> + { >> + ERROR("Mismatch between dirty bitmap bits (%u), and dirty_count >> (%u)", >> + written, stats.dirty_count); >> + goto err; >> + } > ... you then also check that there are no further bit set in the > bitmap. As said elsewhere, I'm not convinced using statistics as > a basis for actual operation (rather than just reporting) is > appropriate.
I'm not interested in inference based on the name of the structure. > I'm unaware of there being any spelled out guarantee > that the numbers reported back from the hypercall are accurate. The live loop uses this information already for this purpose. If it is wrong, we've got bigger problems that this. ~Andrew