On 20.02.2023 17:51, Matias Ezequiel Vara Larsen wrote:
> On Thu, Feb 16, 2023 at 04:15:29PM +0100, Jan Beulich wrote:
>> On 16.02.2023 16:07, Matias Ezequiel Vara Larsen wrote:
>>> On Wed, Dec 14, 2022 at 08:29:53AM +0100, Jan Beulich wrote:
>>>> On 07.10.2022 14:39, Matias Ezequiel Vara Larsen wrote:
>>>>> @@ -287,6 +289,20 @@ static inline void vcpu_runstate_change(
>>>>>      }
>>>>>  
>>>>>      v->runstate.state = new_state;
>>>>> +
>>>>> +    vcpustats_va = (shared_vcpustatspage_t*)d->vcpustats_page.va;
>>>>> +    if ( vcpustats_va )
>>>>> +    {
>>>>> + vcpustats_va->vcpu_info[v->vcpu_id].version =
>>>>> +     version_update_begin(vcpustats_va->vcpu_info[v->vcpu_id].version);
>>>>> +        smp_wmb();
>>>>> +        
>>>>> memcpy(&vcpustats_va->vcpu_info[v->vcpu_id].runstate_running_time,
>>>>> +               &v->runstate.time[RUNSTATE_running],
>>>>> +               sizeof(v->runstate.time[RUNSTATE_running]));
>>>>> +        smp_wmb();
>>>>> +        vcpustats_va->vcpu_info[v->vcpu_id].version =
>>>>> +            
>>>>> version_update_end(vcpustats_va->vcpu_info[v->vcpu_id].version);
>>>>> +    }
>>>>
>>>> A further aspect to consider here is cache line ping-pong. I think the
>>>> per-vCPU elements of the array want to be big enough to not share a
>>>> cache line. The interface being generic this presents some challenge
>>>> in determining what the supposed size is to be. However, taking into
>>>> account the extensibility question, maybe the route to take is to
>>>> simply settle on a power-of-2 value somewhere between x86'es and Arm's
>>>> cache line sizes and the pretty common page size of 4k, e.g. 512 bytes
>>>> or 1k?
>>>>
>>>
>>> I do not now how to address this. I was thinking to align each vcpu_stats
>>> instance to a multiple of the cache-line. I would pick up the first multiple
>>> that is bigger to the size of the vcpu_stats structure. For example, 
>>> currently
>>> the structure is 16 bytes so I would align each instance in a frame to 64
>>> bytes. Would it make sense? 
>>
>> Well, 64 may be an option, but I gave higher numbers for a reason. One thing
>> I don't know is what common cache line sizes are on Arm or e.g. RISC-V.
> 
> Thanks. I found that structures that require cache-aligment are defined with
> "__cacheline_aligned" that uses L1_CACHE_BYTES. For example, in x86, this
> aligns to 128 bytes. What is the reason to use a higher value like 512 bytes 
> or
> 1k?.

You cannot bake an x86 property (which may even change: at some point we may
choose to drop the 128-byte special for the very few CPUs actually using
such, when the majority uses 64-byte cache lines) into the public interface.
You also don't want to make an aspect of the public interface arch-dependent
when not really needed. My suggestion for a higher value was in the hope that
we may never see a port to an architecture with cache lines wider than, say,
512 bytes. What exactly the value should be is of course up for discussion,
but I think it wants to include some slack on top of what we currently
support (arch-wise).

Jan

Reply via email to