from:"Meng Xu"

Re: [Xen-devel] [PATCH v4 4/5] xentrace: enable per-VCPU extratime flag for RTDS

2017-11-16 Thread Meng Xu

Hi all,

On Tue, Oct 17, 2017 at 4:10 AM, Dario Faggioli  wrote:
>
> On Wed, 2017-10-11 at 14:02 -0400, Meng Xu wrote:
> > Change repl_budget event output for xentrace formats and xenalyze
> >
> > Signed-off-by: Meng Xu 
> >
> I'd say:
>
> Reviewed-by: Dario Faggioli 
>

Just a friendly reminder:
This patch has not been pushed into either the staging or master
branch of xen.git.

This is an essential patch for the new version of RTDS scheduler which
Dario and I are maintaining.
This patch won't affect other features.

It has been a while without hearing complaints from the tools maintainers.

Is it ok to push it?

>
> However...
>
> > diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
> > index 79bdba7..19e050f 100644
> > --- a/tools/xentrace/xenalyze.c
> > +++ b/tools/xentrace/xenalyze.c
> > @@ -7935,23 +7935,29 @@ void sched_process(struct pcpu_info *p)
> >  unsigned int vcpuid:16, domid:16;
> >  uint64_t cur_bg;
> >  int delta;
> > +unsigned priority_level;
> > +unsigned has_extratime;
> >
> ...this last field is 'bool' in Xen.
>
> I appreciate that xenalyze does not build if you just make this bool as
> well. But it does build for me, if you do that, and also include
> stdbool.h, which I think is a fine thing to do.
>
> Anyway, I'll leave this to George and tools' maintainers.

If it turns out bool is prefered, I can change it and send out a new one.
But please just let me know so that we can have a complete toolstack
for the new version of RTDS scheduler.

Thanks,

Meng

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 4/5] xentrace: enable per-VCPU extratime flag for RTDS

2017-11-02 Thread Meng Xu

Hi George,

On Wed, Oct 25, 2017 at 10:31 AM, Wei Liu  wrote:
>
> On Mon, Oct 23, 2017 at 02:50:31PM -0400, Meng Xu wrote:
> > On Tue, Oct 17, 2017 at 4:10 AM, Dario Faggioli  wrote:
> > > On Wed, 2017-10-11 at 14:02 -0400, Meng Xu wrote:
> > >> Change repl_budget event output for xentrace formats and xenalyze
> > >>
> > >> Signed-off-by: Meng Xu 
> > >>
> > > I'd say:
> > >
> > > Reviewed-by: Dario Faggioli 
> >
> > Hi guys,
> >
> > Just a reminder, we may need this patch for the work-conserving RTDS
> > scheduler in Xen 4.10.
> >
> > I say Julien sent out the rc2 today which does not include this patch.
> >
> > Thanks and best regards,
> >
>
> I'm waiting for George's ack.


Just a friendly reminder:
Do you have any comment on this patch?

Thanks,

Meng

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 4/5] xentrace: enable per-VCPU extratime flag for RTDS

2017-10-23 Thread Meng Xu

On Tue, Oct 17, 2017 at 4:10 AM, Dario Faggioli  wrote:
> On Wed, 2017-10-11 at 14:02 -0400, Meng Xu wrote:
>> Change repl_budget event output for xentrace formats and xenalyze
>>
>> Signed-off-by: Meng Xu 
>>
> I'd say:
>
> Reviewed-by: Dario Faggioli 

Hi guys,

Just a reminder, we may need this patch for the work-conserving RTDS
scheduler in Xen 4.10.

I say Julien sent out the rc2 today which does not include this patch.

Thanks and best regards,

Meng

---
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] VPMU interrupt unreliability

2017-10-22 Thread Meng Xu

On Fri, Oct 20, 2017 at 3:07 AM, Jan Beulich  wrote:
>
> >>> On 19.10.17 at 20:20,  wrote:
> > Is there any document about the possible attack via the vPMU? The
> > document I found (such as [1] and XSA-163) just briefly say that the
> > vPMU should be disabled due to security concern.
>
> Besides the other responses you've already got, I also recall there
> being at least some CPU models that would live lock upon the
> debug store being placed into virtual space not mapped by present
> pages.


Thank you very much for your explanation! :)


Best Regards,

Meng

---
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] VPMU interrupt unreliability

2017-10-19 Thread Meng Xu

On Thu, Oct 19, 2017 at 11:40 AM, Andrew Cooper
 wrote:
>
> On 19/10/17 16:09, Kyle Huey wrote:
> > On Wed, Oct 11, 2017 at 7:09 AM, Boris Ostrovsky
> >  wrote:
> >> On 10/10/2017 12:54 PM, Kyle Huey wrote:
> >>> On Mon, Jul 24, 2017 at 9:54 AM, Kyle Huey  wrote:
> >>>> On Mon, Jul 24, 2017 at 8:07 AM, Boris Ostrovsky
> >>>>  wrote:
> >>>>>>> One thing I noticed is that the workaround doesn't appear to be
> >>>>>>> complete: it is only checking PMC0 status and not other counters 
> >>>>>>> (fixed
> >>>>>>> or architectural). Of course, without knowing what the actual problem
> >>>>>>> was it's hard to say whether this was intentional.
> >>>>>> handle_pmc_quirk appears to loop through all the counters ...
> >>>>> Right, I didn't notice that it is shifting MSR_CORE_PERF_GLOBAL_STATUS
> >>>>> value one by one and so it is looking at all bits.
> >>>>>
> >>>>>>>> 2. Intercepting MSR loads for counters that have the workaround
> >>>>>>>> applied and giving the guest the correct counter value.
> >>>>>>> We'd have to keep track of whether the counter has been reset (by the
> >>>>>>> quirk) since the last MSR write.
> >>>>>> Yes.
> >>>>>>
> >>>>>>>> 3. Or perhaps even changing the workaround to disable the PMI on that
> >>>>>>>> counter until the guest acks via GLOBAL_OVF_CTRL, assuming that works
> >>>>>>>> on the relevant hardware.
> >>>>>>> MSR_CORE_PERF_GLOBAL_OVF_CTRL is written immediately after the quirk
> >>>>>>> runs (in core2_vpmu_do_interrupt()) so we already do this, don't we?
> >>>>>> I'm suggesting waiting until the *guest* writes to the (virtualized)
> >>>>>> GLOBAL_OVF_CTRL.
> >>>>> Wouldn't it be better to wait until the counter is reloaded?
> >>>> Maybe!  I haven't thought through it a lot.  It's still not clear to
> >>>> me whether MSR_CORE_PERF_GLOBAL_OVF_CTRL actually controls the
> >>>> interrupt in any way or whether it just resets the bits in
> >>>> MSR_CORE_PERF_GLOBAL_STATUS and acking the interrupt on the APIC is
> >>>> all that's required to reenable it.
> >>>>
> >>>> - Kyle
> >>> I wonder if it would be reasonable to just remove the workaround
> >>> entirely at some point.  The set of people using 1) several year old
> >>> hardware, 2) an up to date Xen, and 3) the off-by-default performance
> >>> counters is probably rather small.
> >> We'd probably want to only enable this for affected processors, not
> >> remove it outright. But the problem is that we still don't know for sure
> >> whether this issue affects NHM only, do we?
> >>
> >> (https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg02242.html
> >> is the original message)
> > Yes, the basic problem is that we don't know where to draw the line.
>
> vPMU is disabled by default for security reasons,


Is there any document about the possible attack via the vPMU? The
document I found (such as [1] and XSA-163) just briefly say that the
vPMU should be disabled due to security concern.


[1] https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html

>
> and also broken, in a
> way which demonstrates that vPMU isn't getting much real-world use.

I also noticed that AWS seems support part of the vPMU
functionalities, which were used by Netflix to optimize their
applications' performance, according to
http://www.brendangregg.com/blog/2017-05-04/the-pmcs-of-ec2.html .

I guess the security issue should be solved by AWS? However, without
knowing how the attack could be conducted, I'm not sure how AWS avoids
the attack concern for vPMU.

>
> As far as I'm concerned, all options (including rm -rf and start from
> scratch) are acceptable, especially if this ends up giving us a better
> overall subsystem.
>
> Do we know how other hypervisors work around this issue?

Maybe the solution of AWS is a choice? I'm not sure. I'm just thinking aloud. :)

Thanks,

Meng

-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 4/5] xentrace: enable per-VCPU extratime flag for RTDS

2017-10-17 Thread Meng Xu

On Tue, Oct 17, 2017 at 4:10 AM, Dario Faggioli  wrote:
> On Wed, 2017-10-11 at 14:02 -0400, Meng Xu wrote:
>> Change repl_budget event output for xentrace formats and xenalyze
>>
>> Signed-off-by: Meng Xu 
>>
> I'd say:
>
> Reviewed-by: Dario Faggioli 
>
> However...
>
>> diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
>> index 79bdba7..19e050f 100644
>> --- a/tools/xentrace/xenalyze.c
>> +++ b/tools/xentrace/xenalyze.c
>> @@ -7935,23 +7935,29 @@ void sched_process(struct pcpu_info *p)
>>  unsigned int vcpuid:16, domid:16;
>>  uint64_t cur_bg;
>>  int delta;
>> +unsigned priority_level;
>> +unsigned has_extratime;
>>
> ...this last field is 'bool' in Xen.
>
> I appreciate that xenalyze does not build if you just make this bool as
> well. But it does build for me, if you do that, and also include
> stdbool.h, which I think is a fine thing to do.

Right. I'm not sure about this. If including the stdbool.h is
preferred, I can resend this one with that change.

>
> Anyway, I'll leave this to George and tools' maintainers.

Sure!

Thanks,

Meng



-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 0/5] Towards work-conserving RTDS

2017-10-17 Thread Meng Xu

On Tue, Oct 17, 2017 at 3:29 AM, Dario Faggioli  wrote:

> On Tue, 2017-10-17 at 09:26 +0200, Dario Faggioli wrote:
> > On Thu, 2017-10-12 at 10:34 -0400, Meng Xu wrote:
> > > On Thu, Oct 12, 2017 at 5:02 AM, Wei Liu 
> > > wrote:
> > > >
> > > > FYI all patches except the xentrace one were committed yesterday.
> > >
> > > Thank you very much, Wei!
> > >
> >
> > Hey Meng,
> >
> > Any update on that missing patch, though?
> >
> No, wait... Posted on Wednesday, mmmhh... Ah, so "this" is you posting
> the missing patch!
>

Yes. :) I didn't repost the patch. I made the changes and tested it once I
got the feedback.


>
> Ok, my bad, sorry. I was fooled by the fact that you resent the whole
> series, and that I did not get a copy of it (extra-list, I mean) as
> you're still using my old email address.
>
> Lemme have a look...
>

Ah, I neglected the email address. I was also wondering maybe you were
busy with something else. So I didn't send a reminder.

Thanks!

Best Regards,

Meng


>
> Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
>



-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 0/5] Towards work-conserving RTDS

2017-10-12 Thread Meng Xu

On Thu, Oct 12, 2017 at 5:02 AM, Wei Liu  wrote:
>
> FYI all patches except the xentrace one were committed yesterday.


Thank you very much, Wei!

Best,

Meng

-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 3/5] xl: enable per-VCPU extratime flag for RTDS

2017-10-11 Thread Meng Xu

Change main_sched_rtds and related output functions to support
per-VCPU extratime flag.

Signed-off-by: Meng Xu 
Reviewed-by: Dario Faggioli 
Acked-by: Wei Liu 

---
Changes from v2
Validate the -e option input that can only be 0 or 1
Update docs/man/xl.pod.1.in
Change EXTRATIME to Extratime

Changes from v1
No change because we agree on using -e 0/1 option to
set if a vcpu will get extra time or not

Changes from RFC v1
Changes work_conserving flag to extratime flag
---
 docs/man/xl.pod.1.in   | 59 +--
 tools/xl/xl_cmdtable.c |  3 ++-
 tools/xl/xl_sched.c| 62 +++---
 3 files changed, 78 insertions(+), 46 deletions(-)

diff --git a/docs/man/xl.pod.1.in b/docs/man/xl.pod.1.in
index cd8bb1c..486a24f 100644
--- a/docs/man/xl.pod.1.in
+++ b/docs/man/xl.pod.1.in
@@ -1117,11 +1117,11 @@ as B<--ratelimit_us> in B
 Set or get rtds (Real Time Deferrable Server) scheduler parameters.
 This rt scheduler applies Preemptive Global Earliest Deadline First
 real-time scheduling algorithm to schedule VCPUs in the system.
-Each VCPU has a dedicated period and budget.
-VCPUs in the same domain have the same period and budget.
+Each VCPU has a dedicated period, budget and extratime.
 While scheduled, a VCPU burns its budget.
 A VCPU has its budget replenished at the beginning of each period;
 Unused budget is discarded at the end of each period.
+A VCPU with extratime set gets extra time from the unreserved system resource.
 
 B
 
@@ -1145,6 +1145,11 @@ Period of time, in microseconds, over which to replenish 
the budget.
 Amount of time, in microseconds, that the VCPU will be allowed
 to run every period.
 
+=item B<-e Extratime>, B<--extratime=Extratime>
+
+Binary flag to decide if the VCPU will be allowed to get extra time from
+the unreserved system resource.
+
 =item B<-c CPUPOOL>, B<--cpupool=CPUPOOL>
 
 Restrict output to domains in the specified cpupool.
@@ -1160,57 +1165,57 @@ all the domains:
 
 xl sched-rtds -v all
 Cpupool Pool-0: sched=RTDS
-NameID VCPUPeriodBudget
-Domain-0 00 1  4000
-vm1  10   300   150
-vm1  11   400   200
-vm1  12 1  4000
-vm1  13  1000   500
-vm2  20 1  4000
-vm2  21 1  4000
+NameID VCPUPeriodBudget  Extratime
+Domain-0 00 1  4000yes
+vm1  20   300   150yes
+vm1  21   400   200yes
+vm1  22 1  4000yes
+vm1  23  1000   500yes
+vm2  40 1  4000yes
+vm2  41 1  4000yes
 
 Without any arguments, it will output the default scheduling
 parameters for each domain:
 
 xl sched-rtds
 Cpupool Pool-0: sched=RTDS
-NameIDPeriodBudget
-Domain-0 0 1  4000
-vm1  1 1  4000
-vm2  2 1  4000
+NameIDPeriodBudget  Extratime
+Domain-0 0 1  4000yes
+vm1  2 1  4000yes
+vm2  4 1  4000yes
 
 
-2) Use, for instancei, B<-d vm1, -v all> to see the budget and
+2) Use, for instance, B<-d vm1, -v all> to see the budget and
 period of all VCPUs of a specific domain (B):
 
 xl sched-rtds -d vm1 -v all
-NameID VCPUPeriodBudget
-vm1  10   300   150
-vm1  11   400   200
-vm1  12 1  4000
-vm1  13  1000   500
+NameID VCPUPeriodBudget  Extratime
+vm1  20   300   150yes
+vm1  21   400   200yes
+vm1  22 1  4000yes
+vm1  23  1000   500yes
 
 To see the parameters of a subset of the VCPUs of a domain, use:
 
 xl sched-rtds -d vm1 -v 0 -v 3
-NameID VCPUPeriodBudget
-vm1  10   300   150
-vm1  13  1000   500
+Name

[Xen-devel] [PATCH v4 0/5] Towards work-conserving RTDS

2017-10-11 Thread Meng Xu

This series of patches make RTDS scheduler work-conserving
without breaking real-time guarantees.
VCPUs with extratime flag set can get extra time
from the unreserved system resource.
System administrators can decide which VCPUs have extratime flag set.

Example:
Set the extratime bit of all VCPUs of domain 1:
# xl sched-rtds -d 1 -v all -p 1 -b 2000 -e 1
Each VCPU of domain 1 will be guaranteed to have 2000ms every 1ms
(if the system is schedulable).
If there is a CPU having no work to do,
domain 1's VCPUs will be scheduled onto the CPU,
even though the VCPUs have got 2000ms in 1ms.

Clear the extra bit of all VCPUs of domain 1:
# xl sched-rtds -d 1 -v all -p 1 -b 2000 -e 0

Set/Clear the extratime bit of one specific VCPU of domain 1:
# xl sched-rtds -d 1 -v 1 -p 1 -b 2000 -e 1
# xl sched-rtds -d 1 -v 1 -p 1 -b 2000 -e 0


The original design of the work-conserving RTDS was discussed at
https://www.mail-archive.com/xen-devel@lists.xen.org/msg77150.html

The first version was discussed at
https://www.mail-archive.com/xen-devel@lists.xen.org/msg117361.html

The second version was discussed at
https://www.mail-archive.com/xen-devel@lists.xen.org/msg120618.html

The third version has been mostly reviewed by Dario Faggioli and
acked by Wei Liu, except
[PATCH v4 4/5] xentrace: enable per-VCPU extratime flag for RTDS

The series of patch can be found at github:
https://github.com/PennPanda/RT-Xen
under the branch:
xenbits/rtds/work-conserving-v4

Changes from v3
Handle burn_budget event in xentrace and xenanalyze.
Tested the change with three VMs

Changes from v2
Sanity check the input of -e option which can only be 0 or 1
Set -e to 1 by default if 3rd party library does not set -e option
Set vcpu extratime in sched_rtds_vcpu_get function function, which
fixes a bug in previous version.
Change EXTRATIME to Extratime in the xl output

Changes from v1
Change XEN_DOMCTL_SCHED_RTDS_extratime to XEN_DOMCTL_SCHEDRT_extra
Revise xentrace, xenalyze, and docs
Add LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA symbol in libxl.h

Changes from RFC v1
Merge changes in sched_rt.c into one patch;
Minor change in variable name and comments.

Signed-off-by: Meng Xu 

[PATCH v4 1/5] xen:rtds: towards work conserving RTDS
[PATCH v4 2/5] libxl: enable per-VCPU extratime flag for RTDS
[PATCH v4 3/5] xl: enable per-VCPU extratime flag for RTDS
[PATCH v4 4/5] xentrace: enable per-VCPU extratime flag for RTDS
[PATCH v4 5/5] docs: enable per-VCPU extratime flag for RTDS


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 2/5] libxl: enable per-VCPU extratime flag for RTDS

2017-10-11 Thread Meng Xu

Modify libxl_vcpu_sched_params_get/set and sched_rtds_vcpu_get/set
functions to support per-VCPU extratime flag

Signed-off-by: Meng Xu 
Reviewed-by: Dario Faggioli 
Acked-by: Wei Liu 

---
Changes from v2
1) Move extratime out of the section
   that is marked as depreciated in libxl_domain_sched_params.
2) Set vcpu extratime in sched_rtds_vcpu_get function function;
   This fix a bug in previous version
   when run command "xl sched-rtds -d 0 -v 1" which
   outputs vcpu extratime value incorrectly.

Changes from v1
1) Add LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA to indicate if extratime flag is
supported
2) Change flag name in domctl.h from XEN_DOMCTL_SCHED_RTDS_extratime to
XEN_DOMCTL_SCHEDRT_extra

Changes from RFC v1
Change work_conserving flag to extratime flag
---
 tools/libxl/libxl.h |  6 ++
 tools/libxl/libxl_sched.c   | 17 +
 tools/libxl/libxl_types.idl |  8 
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index f82b91e..5e9aed7 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -257,6 +257,12 @@
 #define LIBXL_HAVE_SCHED_RTDS_VCPU_PARAMS 1
 
 /*
+ * LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA indicates RTDS scheduler
+ * now supports per-vcpu extratime settings.
+ */
+#define LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA 1
+
+/*
  * libxl_domain_build_info has the arm.gic_version field.
  */
 #define LIBXL_HAVE_BUILDINFO_ARM_GIC_VERSION 1
diff --git a/tools/libxl/libxl_sched.c b/tools/libxl/libxl_sched.c
index 7d144d0..512788f 100644
--- a/tools/libxl/libxl_sched.c
+++ b/tools/libxl/libxl_sched.c
@@ -532,6 +532,8 @@ static int sched_rtds_vcpu_get(libxl__gc *gc, uint32_t 
domid,
 for (i = 0; i < num_vcpus; i++) {
 scinfo->vcpus[i].period = vcpus[i].u.rtds.period;
 scinfo->vcpus[i].budget = vcpus[i].u.rtds.budget;
+scinfo->vcpus[i].extratime =
+!!(vcpus[i].u.rtds.flags & XEN_DOMCTL_SCHEDRT_extra);
 scinfo->vcpus[i].vcpuid = vcpus[i].vcpuid;
 }
 rc = 0;
@@ -579,6 +581,8 @@ static int sched_rtds_vcpu_get_all(libxl__gc *gc, uint32_t 
domid,
 for (i = 0; i < num_vcpus; i++) {
 scinfo->vcpus[i].period = vcpus[i].u.rtds.period;
 scinfo->vcpus[i].budget = vcpus[i].u.rtds.budget;
+scinfo->vcpus[i].extratime =
+!!(vcpus[i].u.rtds.flags & XEN_DOMCTL_SCHEDRT_extra);
 scinfo->vcpus[i].vcpuid = vcpus[i].vcpuid;
 }
 rc = 0;
@@ -628,6 +632,10 @@ static int sched_rtds_vcpu_set(libxl__gc *gc, uint32_t 
domid,
 vcpus[i].vcpuid = scinfo->vcpus[i].vcpuid;
 vcpus[i].u.rtds.period = scinfo->vcpus[i].period;
 vcpus[i].u.rtds.budget = scinfo->vcpus[i].budget;
+if (scinfo->vcpus[i].extratime)
+vcpus[i].u.rtds.flags |= XEN_DOMCTL_SCHEDRT_extra;
+else
+vcpus[i].u.rtds.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
 }
 
 r = xc_sched_rtds_vcpu_set(CTX->xch, domid,
@@ -676,6 +684,10 @@ static int sched_rtds_vcpu_set_all(libxl__gc *gc, uint32_t 
domid,
 vcpus[i].vcpuid = i;
 vcpus[i].u.rtds.period = scinfo->vcpus[0].period;
 vcpus[i].u.rtds.budget = scinfo->vcpus[0].budget;
+if (scinfo->vcpus[0].extratime)
+vcpus[i].u.rtds.flags |= XEN_DOMCTL_SCHEDRT_extra;
+else
+vcpus[i].u.rtds.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
 }
 
 r = xc_sched_rtds_vcpu_set(CTX->xch, domid,
@@ -726,6 +738,11 @@ static int sched_rtds_domain_set(libxl__gc *gc, uint32_t 
domid,
 sdom.period = scinfo->period;
 if (scinfo->budget != LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT)
 sdom.budget = scinfo->budget;
+/* Set extratime by default */
+if (scinfo->extratime)
+sdom.flags |= XEN_DOMCTL_SCHEDRT_extra;
+else
+sdom.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
 if (sched_rtds_validate_params(gc, sdom.period, sdom.budget))
 return ERROR_INVAL;
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 2d0bb8a..dd7d364 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -421,14 +421,14 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
 ("cap",  integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_CAP_DEFAULT'}),
 ("period",   integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
 ("budget",   integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
+("extratime",integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
 
-# The following three parameters ('slice', 'latency' and 'extratime') are 
deprecated,
+# The following three parameters ('slice' and 'latency') are deprecated,

[Xen-devel] [PATCH v4 1/5] xen:rtds: towards work conserving RTDS

2017-10-11 Thread Meng Xu

Make RTDS scheduler work conserving without breaking the real-time guarantees.

VCPU model:
Each real-time VCPU is extended to have an extratime flag
and a priority_level field.
When a VCPU's budget is depleted in the current period,
if it has extratime flag set,
its priority_level will increase by 1 and its budget will be refilled;
othewrise, the VCPU will be moved to the depletedq.

Scheduling policy is modified global EDF:
A VCPU v1 has higher priority than another VCPU v2 if
(i) v1 has smaller priority_leve; or
(ii) v1 has the same priority_level but has a smaller deadline

Queue management:
Run queue holds VCPUs with extratime flag set and VCPUs with
remaining budget. Run queue is sorted in increasing order of VCPUs priorities.
Depleted queue holds VCPUs which have extratime flag cleared and depleted 
budget.
Replenished queue is not modified.

Distribution of spare bandwidth
Spare bandwidth is distributed among all VCPUs with extratime flag set,
proportional to these VCPUs utilizations

Signed-off-by: Meng Xu 
Reviewed-by: Dario Faggioli 

---
Changes from v2
Explain how to distribute spare bandwidth in commit log
Minor change in has_extratime function without functionality change.

Changes from v1
Change XEN_DOMCTL_SCHED_RTDS_extratime to XEN_DOMCTL_SCHEDRT_extra as
suggested by Dario

Changes from RFC v1
Rewording comments and commit message
Remove is_work_conserving field from rt_vcpu structure
Use one bit in VCPU's flag to indicate if a VCPU will have extra time
Correct comments style
---
 xen/common/sched_rt.c   | 90 ++---
 xen/include/public/domctl.h |  4 ++
 2 files changed, 80 insertions(+), 14 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 5c51cd9..b770287 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -49,13 +49,15 @@
  * A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is idle or
  * has a lower-priority VCPU running on it.)
  *
- * Each VCPU has a dedicated period and budget.
+ * Each VCPU has a dedicated period, budget and a extratime flag
  * The deadline of a VCPU is at the end of each period;
  * A VCPU has its budget replenished at the beginning of each period;
  * While scheduled, a VCPU burns its budget.
  * The VCPU needs to finish its budget before its deadline in each period;
  * The VCPU discards its unused budget at the end of each period.
- * If a VCPU runs out of budget in a period, it has to wait until next period.
+ * When a VCPU runs out of budget in a period, if its extratime flag is set,
+ * the VCPU increases its priority_level by 1 and refills its budget; 
otherwise,
+ * it has to wait until next period.
  *
  * Each VCPU is implemented as a deferable server.
  * When a VCPU has a task running on it, its budget is continuously burned;
@@ -63,7 +65,8 @@
  *
  * Queue scheme:
  * A global runqueue and a global depletedqueue for each CPU pool.
- * The runqueue holds all runnable VCPUs with budget, sorted by deadline;
+ * The runqueue holds all runnable VCPUs with budget,
+ * sorted by priority_level and deadline;
  * The depletedqueue holds all VCPUs without budget, unsorted;
  *
  * Note: cpumask and cpupool is supported.
@@ -151,6 +154,14 @@
 #define RTDS_depleted (1<<__RTDS_depleted)
 
 /*
+ * RTDS_extratime: Can the vcpu run in the time that is
+ * not part of any real-time reservation, and would therefore
+ * be otherwise left idle?
+ */
+#define __RTDS_extratime4
+#define RTDS_extratime (1<<__RTDS_extratime)
+
+/*
  * rt tracing events ("only" 512 available!). Check
  * include/public/trace.h for more details.
  */
@@ -201,6 +212,8 @@ struct rt_vcpu {
 struct rt_dom *sdom;
 struct vcpu *vcpu;
 
+unsigned priority_level;
+
 unsigned flags;  /* mark __RTDS_scheduled, etc.. */
 };
 
@@ -245,6 +258,11 @@ static inline struct list_head *rt_replq(const struct 
scheduler *ops)
 return &rt_priv(ops)->replq;
 }
 
+static inline bool has_extratime(const struct rt_vcpu *svc)
+{
+return svc->flags & RTDS_extratime;
+}
+
 /*
  * Helper functions for manipulating the runqueue, the depleted queue,
  * and the replenishment events queue.
@@ -274,6 +292,21 @@ vcpu_on_replq(const struct rt_vcpu *svc)
 }
 
 /*
+ * If v1 priority >= v2 priority, return value > 0
+ * Otherwise, return value < 0
+ */
+static s_time_t
+compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu *v2)
+{
+int prio = v2->priority_level - v1->priority_level;
+
+if ( prio == 0 )
+return v2->cur_deadline - v1->cur_deadline;
+
+return prio;
+}
+
+/*
  * Debug related code, dump vcpu/cpu information
  */
 static void
@@ -303,6 +336,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct 
rt_vcpu *svc)
 cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), mask);
 printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_

[Xen-devel] [PATCH v4 5/5] docs: enable per-VCPU extratime flag for RTDS

2017-10-11 Thread Meng Xu

Revise xl tool use case by adding -e option
Remove work-conserving from TODO list

Signed-off-by: Meng Xu 
Reviewed-by: Dario Faggioli 
Acked-by: Wei Liu 

---
No change from v2

Changes from v1
Revise rtds docs
---
 docs/features/sched_rtds.pandoc | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/features/sched_rtds.pandoc b/docs/features/sched_rtds.pandoc
index 354097b..d51b499 100644
--- a/docs/features/sched_rtds.pandoc
+++ b/docs/features/sched_rtds.pandoc
@@ -40,7 +40,7 @@ as follows:
 
 It is possible, for a multiple vCPUs VM, to change the parameters of
 each vCPU individually:
-* `xl sched-rtds -d vm-rt -v 0 -p 2 -b 1 -v 1 -p 45000 -b 12000`
+* `xl sched-rtds -d vm-rt -v 0 -p 2 -b 1 -e 1 -v 1 -p 45000 -b 
12000 -e 0`
 
 # Technical details
 
@@ -53,7 +53,8 @@ the presence of the LIBXL\_HAVE\_SCHED\_RTDS symbol. The 
ability of
 specifying different scheduling parameters for each vcpu has been
 introduced later, and is available if the following symbols are defined:
 * `LIBXL\_HAVE\_VCPU\_SCHED\_PARAMS`,
-* `LIBXL\_HAVE\_SCHED\_RTDS\_VCPU\_PARAMS`.
+* `LIBXL\_HAVE\_SCHED\_RTDS\_VCPU\_PARAMS`,
+* `LIBXL\_HAVE\_SCHED\_RTDS\_VCPU\_EXTRA`.
 
 # Limitations
 
@@ -95,7 +96,6 @@ at a macroscopic level), the following should be done:
 
 # Areas for improvement
 
-* Work-conserving mode to be added;
 * performance assessment, especially focusing on what level of real-time
   behavior the scheduler enables.
 
@@ -118,4 +118,5 @@ at a macroscopic level), the following should be done:
 Date   Revision Version  Notes
 --   ---
 2016-10-14 1Xen 4.8  Document written
+2017-08-31 2Xen 4.10 Revise for work conserving feature
 --   ---
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 4/5] xentrace: enable per-VCPU extratime flag for RTDS

2017-10-11 Thread Meng Xu

Change repl_budget event output for xentrace formats and xenalyze

Signed-off-by: Meng Xu 

---
Changes from v3
Handle burn_budget event

No changes from v2

Changes from v1
Add this changes from v1
---
 tools/xentrace/formats|  4 ++--
 tools/xentrace/xenalyze.c | 16 +++-
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/tools/xentrace/formats b/tools/xentrace/formats
index d6e7e3f..8b286c3 100644
--- a/tools/xentrace/formats
+++ b/tools/xentrace/formats
@@ -74,8 +74,8 @@
 
 0x00022801  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:tickle[ cpu = 
%(1)d ]
 0x00022802  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:runq_pick [ dom:vcpu 
= 0x%(1)08x, cur_deadline = 0x%(3)08x%(2)08x, cur_budget = 0x%(5)08x%(4)08x ]
-0x00022803  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:burn_budget   [ dom:vcpu 
= 0x%(1)08x, cur_budget = 0x%(3)08x%(2)08x, delta = %(4)d ]
-0x00022804  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:repl_budget   [ dom:vcpu 
= 0x%(1)08x, cur_deadline = 0x%(3)08x%(2)08x, cur_budget = 0x%(5)08x%(4)08x ]
+0x00022803  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:burn_budget   [ dom:vcpu 
= 0x%(1)08x, cur_budget = 0x%(3)08x%(2)08x, delta = %(4)d, priority_level = 
%(5)d, has_extratime = %(6)x ]
+0x00022804  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:repl_budget   [ dom:vcpu 
= 0x%(1)08x, priority_level = 0x%(2)08d cur_deadline = 0x%(4)08x%(3)08x, 
cur_budget = 0x%(6)08x%(5)08x ]
 0x00022805  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:sched_tasklet
 0x00022806  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:schedule  [ 
cpu[16]:tasklet[8]:idle[4]:tickled[4] = %(1)08x ]
 
diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 79bdba7..19e050f 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -7935,23 +7935,29 @@ void sched_process(struct pcpu_info *p)
 unsigned int vcpuid:16, domid:16;
 uint64_t cur_bg;
 int delta;
+unsigned priority_level;
+unsigned has_extratime;
 } __attribute__((packed)) *r = (typeof(r))ri->d;
 
 printf(" %s rtds:burn_budget d%uv%u, budget = %"PRIu64", "
-   "delta = %d\n", ri->dump_header, r->domid,
-   r->vcpuid, r->cur_bg, r->delta);
+   "delta = %d, priority_level = %d, has_extratime = %d\n",
+   ri->dump_header, r->domid,
+   r->vcpuid, r->cur_bg, r->delta,
+   r->priority_level, !!r->has_extratime);
 }
 break;
 case TRC_SCHED_CLASS_EVT(RTDS, 4): /* BUDGET_REPLENISH */
 if(opt.dump_all) {
 struct {
 unsigned int vcpuid:16, domid:16;
+unsigned int priority_level;
 uint64_t cur_dl, cur_bg;
 } __attribute__((packed)) *r = (typeof(r))ri->d;
 
-printf(" %s rtds:repl_budget d%uv%u, deadline = %"PRIu64", "
-   "budget = %"PRIu64"\n", ri->dump_header,
-   r->domid, r->vcpuid, r->cur_dl, r->cur_bg);
+printf(" %s rtds:repl_budget d%uv%u, priority_level = %u,"
+   "deadline = %"PRIu64", budget = %"PRIu64"\n",
+   ri->dump_header, r->domid, r->vcpuid,
+   r->priority_level, r->cur_dl, r->cur_bg);
 }
 break;
 case TRC_SCHED_CLASS_EVT(RTDS, 5): /* SCHED_TASKLET*/
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 4/5] xentrace: enable per-VCPU extratime flag for RTDS

2017-10-11 Thread Meng Xu

On Wed, Oct 11, 2017 at 6:57 AM, Dario Faggioli  wrote:
> On Tue, 2017-10-10 at 19:17 -0400, Meng Xu wrote:
>> --- a/tools/xentrace/formats
>> +++ b/tools/xentrace/formats
>> @@ -75,7 +75,7 @@
>>  0x00022801  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:tickle[
>> cpu = %(1)d ]
>>  0x00022802  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:runq_pick [
>> dom:vcpu = 0x%(1)08x, cur_deadline = 0x%(3)08x%(2)08x, cur_budget =
>> 0x%(5)08x%(4)08x ]
>>  0x00022803  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:burn_budget   [
>> dom:vcpu = 0x%(1)08x, cur_budget = 0x%(3)08x%(2)08x, delta = %(4)d ]
>> -0x00022804  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:repl_budget   [
>> dom:vcpu = 0x%(1)08x, cur_deadline = 0x%(3)08x%(2)08x, cur_budget =
>> 0x%(5)08x%(4)08x ]
>> +0x00022804  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:repl_budget   [
>> dom:vcpu = 0x%(1)08x, priority_level = 0x%(2)08d cur_deadline =
>> 0x%(4)08x%(3)08x, cur_budget = 0x%(6)08x%(5)08x ]
>>  0x00022805  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:sched_tasklet
>>  0x00022806  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:schedule  [
>> cpu[16]:tasklet[8]:idle[4]:tickled[4] = %(1)08x ]
>>
> But, both in case of this file and below in xenalyze.c, you update 1
> record (the one of REPL_BUDGET). However, in patch 1, you added the
> priority_level field to two records: REPL_BUDGET and BURN_BUDGET.
>
> Or am I missing something?

OMG, my fault. I forgot to check this. I will add this and double
check it by running some tests.

Best,

Meng
-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 1/5] xen:rtds: towards work conserving RTDS

2017-10-10 Thread Meng Xu

Make RTDS scheduler work conserving without breaking the real-time guarantees.

VCPU model:
Each real-time VCPU is extended to have an extratime flag
and a priority_level field.
When a VCPU's budget is depleted in the current period,
if it has extratime flag set,
its priority_level will increase by 1 and its budget will be refilled;
othewrise, the VCPU will be moved to the depletedq.

Scheduling policy is modified global EDF:
A VCPU v1 has higher priority than another VCPU v2 if
(i) v1 has smaller priority_leve; or
(ii) v1 has the same priority_level but has a smaller deadline

Queue management:
Run queue holds VCPUs with extratime flag set and VCPUs with
remaining budget. Run queue is sorted in increasing order of VCPUs priorities.
Depleted queue holds VCPUs which have extratime flag cleared and depleted 
budget.
Replenished queue is not modified.

Distribution of spare bandwidth
Spare bandwidth is distributed among all VCPUs with extratime flag set,
proportional to these VCPUs utilizations

Signed-off-by: Meng Xu 

---
Changes from v2
Explain how to distribute spare bandwidth in commit log
Minor change in has_extratime function without functionality change.

Changes from v1
Change XEN_DOMCTL_SCHED_RTDS_extratime to XEN_DOMCTL_SCHEDRT_extra as
suggested by Dario

Changes from RFC v1
Rewording comments and commit message
Remove is_work_conserving field from rt_vcpu structure
Use one bit in VCPU's flag to indicate if a VCPU will have extra time
Correct comments style
---
 xen/common/sched_rt.c   | 90 ++---
 xen/include/public/domctl.h |  4 ++
 2 files changed, 80 insertions(+), 14 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 5c51cd9..b770287 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -49,13 +49,15 @@
  * A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is idle or
  * has a lower-priority VCPU running on it.)
  *
- * Each VCPU has a dedicated period and budget.
+ * Each VCPU has a dedicated period, budget and a extratime flag
  * The deadline of a VCPU is at the end of each period;
  * A VCPU has its budget replenished at the beginning of each period;
  * While scheduled, a VCPU burns its budget.
  * The VCPU needs to finish its budget before its deadline in each period;
  * The VCPU discards its unused budget at the end of each period.
- * If a VCPU runs out of budget in a period, it has to wait until next period.
+ * When a VCPU runs out of budget in a period, if its extratime flag is set,
+ * the VCPU increases its priority_level by 1 and refills its budget; 
otherwise,
+ * it has to wait until next period.
  *
  * Each VCPU is implemented as a deferable server.
  * When a VCPU has a task running on it, its budget is continuously burned;
@@ -63,7 +65,8 @@
  *
  * Queue scheme:
  * A global runqueue and a global depletedqueue for each CPU pool.
- * The runqueue holds all runnable VCPUs with budget, sorted by deadline;
+ * The runqueue holds all runnable VCPUs with budget,
+ * sorted by priority_level and deadline;
  * The depletedqueue holds all VCPUs without budget, unsorted;
  *
  * Note: cpumask and cpupool is supported.
@@ -151,6 +154,14 @@
 #define RTDS_depleted (1<<__RTDS_depleted)
 
 /*
+ * RTDS_extratime: Can the vcpu run in the time that is
+ * not part of any real-time reservation, and would therefore
+ * be otherwise left idle?
+ */
+#define __RTDS_extratime4
+#define RTDS_extratime (1<<__RTDS_extratime)
+
+/*
  * rt tracing events ("only" 512 available!). Check
  * include/public/trace.h for more details.
  */
@@ -201,6 +212,8 @@ struct rt_vcpu {
 struct rt_dom *sdom;
 struct vcpu *vcpu;
 
+unsigned priority_level;
+
 unsigned flags;  /* mark __RTDS_scheduled, etc.. */
 };
 
@@ -245,6 +258,11 @@ static inline struct list_head *rt_replq(const struct 
scheduler *ops)
 return &rt_priv(ops)->replq;
 }
 
+static inline bool has_extratime(const struct rt_vcpu *svc)
+{
+return svc->flags & RTDS_extratime;
+}
+
 /*
  * Helper functions for manipulating the runqueue, the depleted queue,
  * and the replenishment events queue.
@@ -274,6 +292,21 @@ vcpu_on_replq(const struct rt_vcpu *svc)
 }
 
 /*
+ * If v1 priority >= v2 priority, return value > 0
+ * Otherwise, return value < 0
+ */
+static s_time_t
+compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu *v2)
+{
+int prio = v2->priority_level - v1->priority_level;
+
+if ( prio == 0 )
+return v2->cur_deadline - v1->cur_deadline;
+
+return prio;
+}
+
+/*
  * Debug related code, dump vcpu/cpu information
  */
 static void
@@ -303,6 +336,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct 
rt_vcpu *svc)
 cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), mask);
 printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
"

[Xen-devel] [PATCH v3 0/5] Towards work-conserving RTDS

2017-10-10 Thread Meng Xu

This series of patches make RTDS scheduler work-conserving
without breaking real-time guarantees.
VCPUs with extratime flag set can get extra time
from the unreserved system resource.
System administrators can decide which VCPUs have extratime flag set.

Example:
Set the extratime bit of all VCPUs of domain 1:
# xl sched-rtds -d 1 -v all -p 1 -b 2000 -e 1
Each VCPU of domain 1 will be guaranteed to have 2000ms every 1ms
(if the system is schedulable).
If there is a CPU having no work to do,
domain 1's VCPUs will be scheduled onto the CPU,
even though the VCPUs have got 2000ms in 1ms.

Clear the extra bit of all VCPUs of domain 1:
# xl sched-rtds -d 1 -v all -p 1 -b 2000 -e 0

Set/Clear the extratime bit of one specific VCPU of domain 1:
# xl sched-rtds -d 1 -v 1 -p 1 -b 2000 -e 1
# xl sched-rtds -d 1 -v 1 -p 1 -b 2000 -e 0


The original design of the work-conserving RTDS was discussed at
https://www.mail-archive.com/xen-devel@lists.xen.org/msg77150.html

The first version was discussed at
https://www.mail-archive.com/xen-devel@lists.xen.org/msg117361.html

The second version was discussed at
https://www.mail-archive.com/xen-devel@lists.xen.org/msg120618.html

The series of patch can be found at github:
https://github.com/PennPanda/RT-Xen
under the branch:
xenbits/rtds/work-conserving-v3.1

Changes from v2
Sanity check the input of -e option which can only be 0 or 1
Set -e to 1 by default if 3rd party library does not set -e option
Set vcpu extratime in sched_rtds_vcpu_get function function, which
fixes a bug in previous version.
Change EXTRATIME to Extratime in the xl output

Changes from v1
Change XEN_DOMCTL_SCHED_RTDS_extratime to XEN_DOMCTL_SCHEDRT_extra
Revise xentrace, xenalyze, and docs
Add LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA symbol in libxl.h

Changes from RFC v1
Merge changes in sched_rt.c into one patch;
Minor change in variable name and comments.

Signed-off-by: Meng Xu 

[PATCH v3 1/5] xen:rtds: towards work conserving RTDS
[PATCH v3 2/5] libxl: enable per-VCPU extratime flag for RTDS
[PATCH v3 3/5] xl: enable per-VCPU extratime flag for RTDS
[PATCH v3 4/5] xentrace: enable per-VCPU extratime flag for RTDS
[PATCH v3 5/5] docs: enable per-VCPU extratime flag for RTDS

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 3/5] xl: enable per-VCPU extratime flag for RTDS

2017-10-10 Thread Meng Xu

Change main_sched_rtds and related output functions to support
per-VCPU extratime flag.

Signed-off-by: Meng Xu 

---
Changes from v2
Validate the -e option input that can only be 0 or 1
Update docs/man/xl.pod.1.in
Change EXTRATIME to Extratime

Changes from v1
No change because we agree on using -e 0/1 option to
set if a vcpu will get extra time or not

Changes from RFC v1
Changes work_conserving flag to extratime flag
---
 docs/man/xl.pod.1.in   | 59 +--
 tools/xl/xl_cmdtable.c |  3 ++-
 tools/xl/xl_sched.c| 62 +++---
 3 files changed, 78 insertions(+), 46 deletions(-)

diff --git a/docs/man/xl.pod.1.in b/docs/man/xl.pod.1.in
index cd8bb1c..486a24f 100644
--- a/docs/man/xl.pod.1.in
+++ b/docs/man/xl.pod.1.in
@@ -1117,11 +1117,11 @@ as B<--ratelimit_us> in B
 Set or get rtds (Real Time Deferrable Server) scheduler parameters.
 This rt scheduler applies Preemptive Global Earliest Deadline First
 real-time scheduling algorithm to schedule VCPUs in the system.
-Each VCPU has a dedicated period and budget.
-VCPUs in the same domain have the same period and budget.
+Each VCPU has a dedicated period, budget and extratime.
 While scheduled, a VCPU burns its budget.
 A VCPU has its budget replenished at the beginning of each period;
 Unused budget is discarded at the end of each period.
+A VCPU with extratime set gets extra time from the unreserved system resource.
 
 B
 
@@ -1145,6 +1145,11 @@ Period of time, in microseconds, over which to replenish 
the budget.
 Amount of time, in microseconds, that the VCPU will be allowed
 to run every period.
 
+=item B<-e Extratime>, B<--extratime=Extratime>
+
+Binary flag to decide if the VCPU will be allowed to get extra time from
+the unreserved system resource.
+
 =item B<-c CPUPOOL>, B<--cpupool=CPUPOOL>
 
 Restrict output to domains in the specified cpupool.
@@ -1160,57 +1165,57 @@ all the domains:
 
 xl sched-rtds -v all
 Cpupool Pool-0: sched=RTDS
-NameID VCPUPeriodBudget
-Domain-0 00 1  4000
-vm1  10   300   150
-vm1  11   400   200
-vm1  12 1  4000
-vm1  13  1000   500
-vm2  20 1  4000
-vm2  21 1  4000
+NameID VCPUPeriodBudget  Extratime
+Domain-0 00 1  4000yes
+vm1  20   300   150yes
+vm1  21   400   200yes
+vm1  22 1  4000yes
+vm1  23  1000   500yes
+vm2  40 1  4000yes
+vm2  41 1  4000yes
 
 Without any arguments, it will output the default scheduling
 parameters for each domain:
 
 xl sched-rtds
 Cpupool Pool-0: sched=RTDS
-NameIDPeriodBudget
-Domain-0 0 1  4000
-vm1  1 1  4000
-vm2  2 1  4000
+NameIDPeriodBudget  Extratime
+Domain-0 0 1  4000yes
+vm1  2 1  4000yes
+vm2  4 1  4000yes
 
 
-2) Use, for instancei, B<-d vm1, -v all> to see the budget and
+2) Use, for instance, B<-d vm1, -v all> to see the budget and
 period of all VCPUs of a specific domain (B):
 
 xl sched-rtds -d vm1 -v all
-NameID VCPUPeriodBudget
-vm1  10   300   150
-vm1  11   400   200
-vm1  12 1  4000
-vm1  13  1000   500
+NameID VCPUPeriodBudget  Extratime
+vm1  20   300   150yes
+vm1  21   400   200yes
+vm1  22 1  4000yes
+vm1  23  1000   500yes
 
 To see the parameters of a subset of the VCPUs of a domain, use:
 
 xl sched-rtds -d vm1 -v 0 -v 3
-NameID VCPUPeriodBudget
-vm1  10   300   150
-vm1  13  1000   500
+NameID VCPUPeriodBudget  Ext

[Xen-devel] [PATCH v3 5/5] docs: enable per-VCPU extratime flag for RTDS

2017-10-10 Thread Meng Xu

Revise xl tool use case by adding -e option
Remove work-conserving from TODO list

Signed-off-by: Meng Xu 

---
No change from v2

Changes from v1
Revise rtds docs
---
 docs/features/sched_rtds.pandoc | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/features/sched_rtds.pandoc b/docs/features/sched_rtds.pandoc
index 354097b..d51b499 100644
--- a/docs/features/sched_rtds.pandoc
+++ b/docs/features/sched_rtds.pandoc
@@ -40,7 +40,7 @@ as follows:
 
 It is possible, for a multiple vCPUs VM, to change the parameters of
 each vCPU individually:
-* `xl sched-rtds -d vm-rt -v 0 -p 2 -b 1 -v 1 -p 45000 -b 12000`
+* `xl sched-rtds -d vm-rt -v 0 -p 2 -b 1 -e 1 -v 1 -p 45000 -b 
12000 -e 0`
 
 # Technical details
 
@@ -53,7 +53,8 @@ the presence of the LIBXL\_HAVE\_SCHED\_RTDS symbol. The 
ability of
 specifying different scheduling parameters for each vcpu has been
 introduced later, and is available if the following symbols are defined:
 * `LIBXL\_HAVE\_VCPU\_SCHED\_PARAMS`,
-* `LIBXL\_HAVE\_SCHED\_RTDS\_VCPU\_PARAMS`.
+* `LIBXL\_HAVE\_SCHED\_RTDS\_VCPU\_PARAMS`,
+* `LIBXL\_HAVE\_SCHED\_RTDS\_VCPU\_EXTRA`.
 
 # Limitations
 
@@ -95,7 +96,6 @@ at a macroscopic level), the following should be done:
 
 # Areas for improvement
 
-* Work-conserving mode to be added;
 * performance assessment, especially focusing on what level of real-time
   behavior the scheduler enables.
 
@@ -118,4 +118,5 @@ at a macroscopic level), the following should be done:
 Date   Revision Version  Notes
 --   ---
 2016-10-14 1Xen 4.8  Document written
+2017-08-31 2Xen 4.10 Revise for work conserving feature
 --   ---
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 2/5] libxl: enable per-VCPU extratime flag for RTDS

2017-10-10 Thread Meng Xu

Modify libxl_vcpu_sched_params_get/set and sched_rtds_vcpu_get/set
functions to support per-VCPU extratime flag

Signed-off-by: Meng Xu 

---
Changes from v2
1) Move extratime out of the section
   that is marked as depreciated in libxl_domain_sched_params.
2) Set vcpu extratime in sched_rtds_vcpu_get function function;
   This fix a bug in previous version
   when run command "xl sched-rtds -d 0 -v 1" which
   outputs vcpu extratime value incorrectly.

Changes from v1
1) Add LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA to indicate if extratime flag is
supported
2) Change flag name in domctl.h from XEN_DOMCTL_SCHED_RTDS_extratime to
XEN_DOMCTL_SCHEDRT_extra

Changes from RFC v1
Change work_conserving flag to extratime flag
---
 tools/libxl/libxl.h |  6 ++
 tools/libxl/libxl_sched.c   | 17 +
 tools/libxl/libxl_types.idl |  8 
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index f82b91e..5e9aed7 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -257,6 +257,12 @@
 #define LIBXL_HAVE_SCHED_RTDS_VCPU_PARAMS 1
 
 /*
+ * LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA indicates RTDS scheduler
+ * now supports per-vcpu extratime settings.
+ */
+#define LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA 1
+
+/*
  * libxl_domain_build_info has the arm.gic_version field.
  */
 #define LIBXL_HAVE_BUILDINFO_ARM_GIC_VERSION 1
diff --git a/tools/libxl/libxl_sched.c b/tools/libxl/libxl_sched.c
index 7d144d0..512788f 100644
--- a/tools/libxl/libxl_sched.c
+++ b/tools/libxl/libxl_sched.c
@@ -532,6 +532,8 @@ static int sched_rtds_vcpu_get(libxl__gc *gc, uint32_t 
domid,
 for (i = 0; i < num_vcpus; i++) {
 scinfo->vcpus[i].period = vcpus[i].u.rtds.period;
 scinfo->vcpus[i].budget = vcpus[i].u.rtds.budget;
+scinfo->vcpus[i].extratime =
+!!(vcpus[i].u.rtds.flags & XEN_DOMCTL_SCHEDRT_extra);
 scinfo->vcpus[i].vcpuid = vcpus[i].vcpuid;
 }
 rc = 0;
@@ -579,6 +581,8 @@ static int sched_rtds_vcpu_get_all(libxl__gc *gc, uint32_t 
domid,
 for (i = 0; i < num_vcpus; i++) {
 scinfo->vcpus[i].period = vcpus[i].u.rtds.period;
 scinfo->vcpus[i].budget = vcpus[i].u.rtds.budget;
+scinfo->vcpus[i].extratime =
+!!(vcpus[i].u.rtds.flags & XEN_DOMCTL_SCHEDRT_extra);
 scinfo->vcpus[i].vcpuid = vcpus[i].vcpuid;
 }
 rc = 0;
@@ -628,6 +632,10 @@ static int sched_rtds_vcpu_set(libxl__gc *gc, uint32_t 
domid,
 vcpus[i].vcpuid = scinfo->vcpus[i].vcpuid;
 vcpus[i].u.rtds.period = scinfo->vcpus[i].period;
 vcpus[i].u.rtds.budget = scinfo->vcpus[i].budget;
+if (scinfo->vcpus[i].extratime)
+vcpus[i].u.rtds.flags |= XEN_DOMCTL_SCHEDRT_extra;
+else
+vcpus[i].u.rtds.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
 }
 
 r = xc_sched_rtds_vcpu_set(CTX->xch, domid,
@@ -676,6 +684,10 @@ static int sched_rtds_vcpu_set_all(libxl__gc *gc, uint32_t 
domid,
 vcpus[i].vcpuid = i;
 vcpus[i].u.rtds.period = scinfo->vcpus[0].period;
 vcpus[i].u.rtds.budget = scinfo->vcpus[0].budget;
+if (scinfo->vcpus[0].extratime)
+vcpus[i].u.rtds.flags |= XEN_DOMCTL_SCHEDRT_extra;
+else
+vcpus[i].u.rtds.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
 }
 
 r = xc_sched_rtds_vcpu_set(CTX->xch, domid,
@@ -726,6 +738,11 @@ static int sched_rtds_domain_set(libxl__gc *gc, uint32_t 
domid,
 sdom.period = scinfo->period;
 if (scinfo->budget != LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT)
 sdom.budget = scinfo->budget;
+/* Set extratime by default */
+if (scinfo->extratime)
+sdom.flags |= XEN_DOMCTL_SCHEDRT_extra;
+else
+sdom.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
 if (sched_rtds_validate_params(gc, sdom.period, sdom.budget))
 return ERROR_INVAL;
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 2d0bb8a..dd7d364 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -421,14 +421,14 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
 ("cap",  integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_CAP_DEFAULT'}),
 ("period",   integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
 ("budget",   integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
+("extratime",integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
 
-# The following three parameters ('slice', 'latency' and 'extratime') are 
deprecated,
+# The following three parameters ('slice' and 'latency') are deprecated,
 # and will have no effect if used, since the S

[Xen-devel] [PATCH v3 4/5] xentrace: enable per-VCPU extratime flag for RTDS

2017-10-10 Thread Meng Xu

Change repl_budget event output for xentrace formats and xenalyze

Signed-off-by: Meng Xu 

---
No changes from v2

Changes from v1
Add this changes from v1
---
 tools/xentrace/formats| 2 +-
 tools/xentrace/xenalyze.c | 8 +---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/tools/xentrace/formats b/tools/xentrace/formats
index d6e7e3f..7d3a209 100644
--- a/tools/xentrace/formats
+++ b/tools/xentrace/formats
@@ -75,7 +75,7 @@
 0x00022801  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:tickle[ cpu = 
%(1)d ]
 0x00022802  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:runq_pick [ dom:vcpu 
= 0x%(1)08x, cur_deadline = 0x%(3)08x%(2)08x, cur_budget = 0x%(5)08x%(4)08x ]
 0x00022803  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:burn_budget   [ dom:vcpu 
= 0x%(1)08x, cur_budget = 0x%(3)08x%(2)08x, delta = %(4)d ]
-0x00022804  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:repl_budget   [ dom:vcpu 
= 0x%(1)08x, cur_deadline = 0x%(3)08x%(2)08x, cur_budget = 0x%(5)08x%(4)08x ]
+0x00022804  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:repl_budget   [ dom:vcpu 
= 0x%(1)08x, priority_level = 0x%(2)08d cur_deadline = 0x%(4)08x%(3)08x, 
cur_budget = 0x%(6)08x%(5)08x ]
 0x00022805  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:sched_tasklet
 0x00022806  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:schedule  [ 
cpu[16]:tasklet[8]:idle[4]:tickled[4] = %(1)08x ]
 
diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 79bdba7..2783204 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -7946,12 +7946,14 @@ void sched_process(struct pcpu_info *p)
 if(opt.dump_all) {
 struct {
 unsigned int vcpuid:16, domid:16;
+unsigned int priority_level;
 uint64_t cur_dl, cur_bg;
 } __attribute__((packed)) *r = (typeof(r))ri->d;
 
-printf(" %s rtds:repl_budget d%uv%u, deadline = %"PRIu64", "
-   "budget = %"PRIu64"\n", ri->dump_header,
-   r->domid, r->vcpuid, r->cur_dl, r->cur_bg);
+printf(" %s rtds:repl_budget d%uv%u, priority_level = %u,"
+   "deadline = %"PRIu64", budget = %"PRIu64"\n",
+   ri->dump_header, r->domid, r->vcpuid,
+   r->priority_level, r->cur_dl, r->cur_bg);
 }
 break;
 case TRC_SCHED_CLASS_EVT(RTDS, 5): /* SCHED_TASKLET*/
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 3/5] xl: enable per-VCPU extratime flag for RTDS

2017-10-09 Thread Meng Xu

On Wed, Sep 13, 2017 at 8:51 PM, Dario Faggioli
 wrote:
>
> On Fri, 2017-09-01 at 11:58 -0400, Meng Xu wrote:
> > diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
> > index ba0159d..1b03d44 100644
> > --- a/tools/xl/xl_cmdtable.c
> > +++ b/tools/xl/xl_cmdtable.c
> > @@ -272,12 +272,13 @@ struct cmd_spec cmd_table[] = {
> >  { "sched-rtds",
> >&main_sched_rtds, 0, 1,
> >"Get/set rtds scheduler parameters",
> > -  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-b[=BUDGET]]]",
> > +  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-b[=BUDGET]] [-
> > e[=EXTRATIME]]]",
> >"-d DOMAIN, --domain=DOMAIN Domain to modify\n"
> >"-v VCPUID/all, --vcpuid=VCPUID/allVCPU to modify or
> > output;\n"
> >"   Using '-v all' to modify/output all vcpus\n"
> >"-p PERIOD, --period=PERIOD Period (us)\n"
> >"-b BUDGET, --budget=BUDGET Budget (us)\n"
> > +  "-e EXTRATIME, --extratime=EXTRATIME EXTRATIME (1=yes, 0=no)\n"
>   Extratime
> ?

We need to provide the option to configure the extratime flag for each
vcpu, right?

>
> >  },
> >  { "domid",
> >&main_domid, 0, 0,
> > diff --git a/tools/xl/xl_sched.c b/tools/xl/xl_sched.c
> > index 85722fe..5138012 100644
> > --- a/tools/xl/xl_sched.c
> > +++ b/tools/xl/xl_sched.c
> > @@ -251,7 +251,7 @@ static int sched_rtds_domain_output(
> >  libxl_domain_sched_params scinfo;
> >
> >  if (domid < 0) {
> > -printf("%-33s %4s %9s %9s\n", "Name", "ID", "Period",
> > "Budget");
> > +printf("%-33s %4s %9s %9s %10s\n", "Name", "ID", "Period",
> > "Budget", "Extra time");
> >  return 0;
> >  }
> >
> Can you paste the output of:
>

Sure

> xl sched-rtds

Cpupool Pool-0: sched=RTDS
NameIDPeriodBudget Extra time
Domain-0 0 1  4000yes

> xl sched-rtds -d 0

NameIDPeriodBudget Extra time
Domain-0 0 1  4000yes

> xl sched-rtds -d 0 -v 1

NameID VCPUPeriodBudget Extra time
Domain-0 01 1  4000yes


> xl sched-rtds -d 0 -v all

NameID VCPUPeriodBudget Extra time
Domain-0 00 1  4000yes
Domain-0 01 1  4000yes
Domain-0 02 1  4000yes
Domain-0 03 1  4000yes
Domain-0 04 1  4000yes
Domain-0 05 1  4000yes
Domain-0 06 1  4000yes
Domain-0 07 1  4000yes
Domain-0 08 1  4000yes
Domain-0 09 1  4000yes
Domain-0 0   10 1  4000yes
Domain-0 0   11 1  4000yes

>
> with the series applied?
>
> > @@ -785,8 +801,9 @@ int main_sched_rtds(int argc, char **argv)
> >  goto out;
> >  }
> >  if (((v_index > b_index) && opt_b) || ((v_index > p_index) &&
> > opt_p)
> > -|| p_index != b_index) {
> > -fprintf(stderr, "Incorrect number of period and budget\n");
> > + || ((v_index > e_index) && opt_e) || p_index != b_index
> > + || p_index != e_index || b_index != e_index ) {
> >
> I don't think you need the `b_indes ! e_index` part. If p==b and p==e,
> it's automatically true that b==e.

Right.

>
> > @@ -820,7 +837,7 @@ int main_sched_rtds(int argc, char **argv)
> >  r = EXIT_FAILURE;
> >  goto out;
> >  }
> > -} else if (!opt_p && !opt_b) {
> > +} else if (!opt_p && !opt_b && !opt_e) {
> >  /* get per-vcpu rtds scheduling parameters */
> >  libxl_vcpu_sched_params scinfo;
> >  libx

Re: [Xen-devel] [PATCH v2 2/5] libxl: enable per-VCPU extratime flag for RTDS

2017-10-09 Thread Meng Xu

On Tue, Sep 19, 2017 at 5:23 AM, Dario Faggioli
 wrote:
>
> On Fri, 2017-09-15 at 12:01 -0400, Meng Xu wrote:
> > On Wed, Sep 13, 2017 at 8:16 PM, Dario Faggioli
> >  wrote:
> > >
> > > > I'm ok with what it is in this patch, although I feel that we can
> > > > kill the
> > > >  if (scinfo->extratime !=
> > > > LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT)
> > > > because LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT is -1.
> > > >
> > >
> > > No, sorry, I don't understand what you mean here...
> >
> > I was thinking about the following code:
> >
> > if (scinfo->extratime !=
> > LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT) {
> > if (scinfo->extratime)
> > sdom.flags |= XEN_DOMCTL_SCHEDRT_extra;
> > else
> > sdom.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
> > }
> >
> > This code can be changed to
> > if (scinfo->extratime)
> > sdom.flags |= XEN_DOMCTL_SCHEDRT_extra;
> > else
> > sdom.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
> >
> > If the extratime uses default value (-1), we still set the extratime
> > flag.
> >
> > That's why I feel we may kill the
> >  if (scinfo->extratime != LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT)
> >
> Mmm... Ok, I see it now. Well, this is of course all up to the tools'
> maintainers.
>
> What I think it would be valauble to ask ourself here is, can, at this
> point, scinfo->extratime be equal to
> XL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT?
>
> And if it is, what does it mean, and what do we want to do?
>
> I mean, if extratime is -1, it means that we've been called, without it
> being touched by xl (although, remember that, as a library, libxl can
> be linked to and called by other programs too, e.g., libvirt).
>
> If you think that this is a serious programming bug, you can use
> XL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT to check that, and raise an
> assert.
>
> If you think it's an API misuse, you can use it to check for that, and
> return an error.
>
> If you think it's just fine, you can do whatever you want to do as
> default (which, AFAIUI, it's set the flag). In this case, it's probably
> fine to ignore XL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT in actual code.
> Although, I'd still put a reference to it in a comment, to explain
> what's going on, and why we're doing things differently from budget and
> period (since _their_ *_DEFAULT are checked).


I think it should be fine for API to call the function without setting
extratime parameter. We set the extratime by default.

I will go with the following code for the next version.
> if (scinfo->extratime)
> sdom.flags |= XEN_DOMCTL_SCHEDRT_extra;
> else
> sdom.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
>

Thank you very much!

Best,

Meng

-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] RT-Xen on ARM

2017-10-09 Thread Meng Xu

Hi Andrii,

I'm sorry for replying to this thread late. I was busy with a paper
deadline until last Saturday morning.

I saw Dario's thorough answer which explains the high-level idea of
the real-time analysis that is the theoretical foundation of the
analysis tool, e.g., CARTs.
Hopefully, he answered your question.
If not, please feel free to ask.

I just added some very quick comment about your questions/comments below.

On Thu, Sep 28, 2017 at 5:18 AM, Andrii Anisov  wrote:
> Hello,
>
>
> On 27.09.17 22:57, Meng Xu wrote:
>>
>> Note that:
>> When you use gEDF scheduler in VM or VMM (i.e., Xen), you should use
>> MPR2 model
>
> I guess you mean DMPR in CARTS terms.
>
>>   to compute the resource interface (i.e., VCPU parameters).
>> When you use pEDF scheduler, you should use PRM model to compute.
>>>
>>>  - Could you please provide an example input xml for CARTS described
>>> a
>>> system with 2 RT domains with 2 VCPUs each, running on a 2PCPUs, with
>>> gEDF
>>> scheduling at VMM level (for XEN based setup).
>>
>> Hmm, if you use the gEDF scheduling algorithm, this may not be
>> possible. Let me explain why.
>> In the MPR2 model, it computes the interface with the minimum number
>> of cores. To get 2 VCPUs for a VM, the total utilization (i.e., budget
>> / period) of these two VCPUs must be larger than 1.0. Since you ask
>> for 2 domains, the total utilization of these 4 VCPUs will be larger
>> than 2.0, which are definitely not schedulable on two cores.
>
> Well, if we are speaking about test-cases similar to described in [1], where
> the whole real time tasks set utilization is taken from 1.1...(PCPU*1)-0.1,
> there is no problem with having VCPU number greater than PCPUs. For sure if
> we take number of domains  more that 1.

The number of VCPUs can be larger than the number of PCPUs.

>
>> If you are considering VCPUs with very low utilization, you may use
>> PRM model to compute each VCPU's parameters; after that, you can treat
>> these VCPUs as tasks, create another xml file, and ask CARTS to
>> compute the resource interface for these VCPUs.
>
> Sounds terrible for getting it scripted :(

If you use python to parse the xml file, it should not be very
difficuly. Python has api to parse the xml. :)

>>
>> (Unfortunately, the current CARTS implementation does not support
>> mixing MPR model in one XML file, although it is supported in theory.
>> This can be worked around by using the above approach.)
>>
>>> For pEDF at both VMM and
>>> domain level, my understanding is that the os_scheduler represents XEN,
>>> and
>>> VCPUs are represented by components with tasks running on them.
>>
>> Yes, if you analyze the entire system that uses one type of scheduler
>> with only one type of model (i.e., PRM or MPR2).
>>
>> If you mixed the scheduling algorithm or the interface model, you can
>> compute each VM or VCPU's parameters first. Then you treat VCPUs as
>> tasks and create another XML which will be used to compute the number
>> of cores to schedule all these VCPUs.
>>
>>>  - I did not get a concept of min_period/max_period for a
>>> component/os_scheduler in CARTS description files. If I have them
>>> different,
>>> CARTS gives calculation for all periods in between, but did not provide
>>> the
>>> best period to get system schedulable.
>>
>> You should set them to the same value.
>
> Ok, how to chose the value for some taskset in a component?

Tasks' periods and execution time depends on the tasks' requirement.
As Dario mentioned, if a sensor needs to process every 100ms, the
sensor task's period is 100ms. Its execution time is the worst-case
execution time of the sensor task.

As to the component (or VM)'s period, it's better to be smaller than
its tasks' periods. Usually, I may want to set to a value divisible by
its tasks' periods.
You may try different values for components' periods, because the
VCPU's bandwidth (budget/period) will be different for different
components' periods.
You can choose the component's period that produces a smaller VCPU's
bandwidth, which may help make VCPUs easiler to be scheduled on PCPUs.

Best,

Meng


-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] Changing my email address

2017-10-05 Thread Meng Xu

Hi Dario,

On Thu, Oct 5, 2017 at 10:28 AM, Dario Faggioli  wrote:
>
> Hello,
>
> Soon I won't have access to dario.faggi...@citrix.com email address.

It's sad to hear this. :(

>
> Therefore, replace it, in my entries in MAINTAINERS, with an email address 
> that
> I actually can, and will actually read.
>
> One thing about RTDS. Meng, which one of the following two sentences, better
> describes your situation?
>
>  a) Supported:   Someone is actually paid to look after this.
>  b) Maintained:  Someone actually looks after it.
>
> If it's a (you're currently paied to look after RTDS) then we're fine.

I'm paid to look after RTDS at least before I graduate. :)

Best regards,

Meng

-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] MAINTAINERS: update entries to new email address.

2017-10-05 Thread Meng Xu

On Thu, Oct 5, 2017 at 10:28 AM, Dario Faggioli  wrote:

> Replace, in the 'M:' fields of the components I co-maintain
> ('CPU POOLS', 'SCHEDULING' and 'RTDS SCHEDULER'), the Citrix
> email, to which I don't have access any longer, with my
> personal email.
>
> Signed-off-by: Dario Faggioli 
> ---
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Tim Deegan 
> Cc: Wei Liu 
> Cc: Juergen Gross 
> Cc: Meng Xu 
>

Acked-by: Meng Xu 

Meng
-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 0/5] Towards work-conserving RTDS

2017-10-02 Thread Meng Xu

On Mon, Oct 2, 2017 at 1:04 PM, Dario Faggioli 
wrote:

> On Mon, 2017-10-02 at 17:38 +0300, Andrii Anisov wrote:
> > Hello Meng Xu and Dario,
> >
> Hi,
>
> > On 01.09.17 18:58, Meng Xu wrote:
> > > This series of patches make RTDS scheduler work-conserving
> > > without breaking real-time guarantees.
> > > VCPUs with extratime flag set can get extra time
> > > from the unreserved system resource.
> > > System administrators can decide which VCPUs have extratime flag
> > > set.
> >
> > As I understand from threads and the code, the work conserving
> > algorithm
> > is quite simplistic and will prefer a vcpu with greater utilization.
> >
> >  From our side we are looking for a bit different solution. I.e., in
> > the
> > same cpupool, running vcpus eager for RT characteristics under EDF
> > conditions, and share the rest of resources between non-rt vcpus
> > (i.e.
> > in a credit manner).
> > Possible use-case could be a system with a domain hunger for
> > resources,
> > but not critical (some infotainment system) and an RT domain
> > utilizing
> > at most 20% of a single CPU core. Having a SoC with 4 cores,
> > partitioning would be a significant resources wasting for described
> > scenario.
> >
> IMO, this is interesting, but I think the proper way to achieve
> something like this is not modify RTDS to also contain something like
> Credit, nor to modify Credit to also contain something like RTDS.
>
> The idea I have in mind to serve the use case you're describing is as
> follows. Right now, a cpupool can only have a scheduler. If it's RTDS,
> all the domains are scheduler with RTDS, if it's Credit, all the
> domains are scheduled with Credit, etc.
>
> My idea would be to allow a stack of schedulers in a cpupool.
> Basically, you'd configure a cpupool with sched="rtds,credit2" and then
> you specify, for each domain, what scheduler you want it to use.
>
> The end result would be that, in the example above, domains scheduler
> with Credit2 would run in the time left free by the domains scheduler
> by RTDS. E.g., if you have a cpupool with only 1 CPU, an RTDS domain
> with P=100,B=20, an RTDS domain with P=1000,B=40, and two Credit2
> domains, one with weight 256 and the other with weight 512. Then, the
> two RTDS domains will get 20% and 40% of the CPU, while the two Credit2
> domains will share the remaining 40% (the one with w=512 getting twice
> as much as the one with w=256).
>
> This is kind of similar with what Linux does with scheduling classes,
> but even more flexible.
>

I was thinking about Linux scheduling class as well. :)
I think this is a great idea. :)


> I am not working on implementing this right now, because I'm busy with
> other things, but I would like to do that at some point. And if you're
> up for helping, that would be great! :-)
>

Right now, I'm with busy with a deadline. I will take care of the
work-conserving RTDS next week.
As to supporting different scheduling class on the same cpupool, I'm not
yet sure when I'm available for this. :(

Best,

Meng
-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] RT-Xen on ARM

2017-09-27 Thread Meng Xu

On Wed, Sep 27, 2017 at 10:37 AM, Andrii Anisov  wrote:
> Hello,
>
>
> On 27.09.17 16:57, Meng Xu wrote:
>>
>> The command is:
>> java -jar carts.jar inputfile outputfile
>
> From the next example, I would say the command is:
> java -jar carts.jar inputfile interface_type outputfile
>
>> An example command is:
>> java -jar carts.jar 1-1.10-in.xml MPR2 1-1.10-out.xml
>
> Thanks a lot. It does work.
>
> Could you please clarify a bit more points to me:
> - As I understand the upstreamed rtds employs gEDF only. Is it correct?

RTDS can support gEDF and partitioned EDF.
To support pEDF, you can set each VCPU's affinity (hard and soft
affninity)  to only one core using "xl vcpu-pin" command. The VCPUs on
each core will be scheduled by pEDF scheduling algorithm.

Note that:
When you use gEDF scheduler in VM or VMM (i.e., Xen), you should use
MPR2 model to compute the resource interface (i.e., VCPU parameters).
When you use pEDF scheduler, you should use PRM model to compute.

> - Could you please provide an example input xml for CARTS described a
> system with 2 RT domains with 2 VCPUs each, running on a 2PCPUs, with gEDF
> scheduling at VMM level (for XEN based setup).

Hmm, if you use the gEDF scheduling algorithm, this may not be
possible. Let me explain why.
In the MPR2 model, it computes the interface with the minimum number
of cores. To get 2 VCPUs for a VM, the total utilization (i.e., budget
/ period) of these two VCPUs must be larger than 1.0. Since you ask
for 2 domains, the total utilization of these 4 VCPUs will be larger
than 2.0, which are definitely not schedulable on two cores.

If you are considering VCPUs with very low utilization, you may use
PRM model to compute each VCPU's parameters; after that, you can treat
these VCPUs as tasks, create another xml file, and ask CARTS to
compute the resource interface for these VCPUs.

(Unfortunately, the current CARTS implementation does not support
mixing MPR model in one XML file, although it is supported in theory.
This can be worked around by using the above approach.)

> For pEDF at both VMM and
> domain level, my understanding is that the os_scheduler represents XEN, and
> VCPUs are represented by components with tasks running on them.

Yes, if you analyze the entire system that uses one type of scheduler
with only one type of model (i.e., PRM or MPR2).

If you mixed the scheduling algorithm or the interface model, you can
compute each VM or VCPU's parameters first. Then you treat VCPUs as
tasks and create another XML which will be used to compute the number
of cores to schedule all these VCPUs.

> - I did not get a concept of min_period/max_period for a
> component/os_scheduler in CARTS description files. If I have them different,
> CARTS gives calculation for all periods in between, but did not provide the
> best period to get system schedulable.

You should set them to the same value.
min_period/max_period range is used for other models. I never used it.

Best,

Meng

-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] RT-Xen on ARM

2017-09-27 Thread Meng Xu

Hi Andrii,

On Wed, Sep 27, 2017 at 8:37 AM, Andrii Anisov  wrote:
>
> Dear Meng Xu,
>
>
> On 22.08.17 05:02, Meng Xu wrote:
>>
>> Given the set of tasks in each VM, we compute the VCPUs' periods and
>> budgets, using the CARTS tool [1]. Note that each task has a period
>> and a worst-case execution time (wcet).
>>
>> [1] https://rtg.cis.upenn.edu/carts/
>
>
> In a CARTS tool documentation [1] it is said that:
> "At the same time, it is also accompanied by  a lightweight command‐line  
> option  that  enables  our  tool  to  be integrated with other existing 
> toolchains."
>
> But there is no CLI usage description in the document. Could you please 
> provide such a description?


The command is:
java -jar carts.jar inputfile outputfile

An example command is:
java -jar carts.jar 1-1.10-in.xml MPR2 1-1.10-out.xml

Best,

Meng


-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/5] libxl: enable per-VCPU extratime flag for RTDS

2017-09-15 Thread Meng Xu

On Wed, Sep 13, 2017 at 8:16 PM, Dario Faggioli
 wrote:
> On Fri, 2017-09-01 at 12:03 -0400, Meng Xu wrote:
>> On Fri, Sep 1, 2017 at 11:58 AM, Meng Xu 
>> wrote:
>> > @@ -705,6 +717,12 @@ static int sched_rtds_domain_set(libxl__gc
>> > *gc, uint32_t domid,
>> >  sdom.period = scinfo->period;
>> >  if (scinfo->budget != LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT)
>> >  sdom.budget = scinfo->budget;
>> > +if (scinfo->extratime !=
>> > LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT) {
>> > +if (scinfo->extratime)
>> > +sdom.flags |= XEN_DOMCTL_SCHEDRT_extra;
>> > +else
>> > +sdom.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
>> > +}
>> >  if (sched_rtds_validate_params(gc, sdom.period, sdom.budget))
>> >  return ERROR_INVAL;
>>
>>
>> As you mentioned in the comment to the xl patch v1, I used
>> LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT for extratime flag as what
>> we did for period and budget. But the way we handle flags is exactly
>> the same with the way we handle period and budget.
>>
> Mmm... and (since you say 'But') is that a problem?

Sorry, the sentence should be "But the way we handle flags is *not* exactly
the same with the way we handle period and budget".
I missed "not" in the previous sentence.

>
>> I'm ok with what it is in this patch, although I feel that we can
>> kill the
>>  if (scinfo->extratime != LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT)
>> because LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT is -1.
>>
> No, sorry, I don't understand what you mean here...

I was thinking about the following code:

if (scinfo->extratime != LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT) {
if (scinfo->extratime)
sdom.flags |= XEN_DOMCTL_SCHEDRT_extra;
else
sdom.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
}

This code can be changed to
if (scinfo->extratime)
sdom.flags |= XEN_DOMCTL_SCHEDRT_extra;
else
sdom.flags &= ~XEN_DOMCTL_SCHEDRT_extra;

If the extratime uses default value (-1), we still set the extratime flag.

That's why I feel we may kill the
 if (scinfo->extratime != LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT)

Please correct me if I'm wrong.

For the next version, I plan to keep what it is right now. That is:
if (scinfo->extratime != LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT) {
if (scinfo->extratime)
sdom.flags |= XEN_DOMCTL_SCHEDRT_extra;
else
sdom.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
}


Best,

Meng

-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/5] xen:rtds: towards work conserving RTDS

2017-09-15 Thread Meng Xu

Hi Dario,

On Wed, Sep 13, 2017 at 9:06 PM, Dario Faggioli
 wrote:
>
> On Fri, 2017-09-01 at 11:58 -0400, Meng Xu wrote:
> > Make RTDS scheduler work conserving without breaking the real-time
> > guarantees.
> >
> > VCPU model:
> > Each real-time VCPU is extended to have an extratime flag
> > and a priority_level field.
> > When a VCPU's budget is depleted in the current period,
> > if it has extratime flag set,
> > its priority_level will increase by 1 and its budget will be
> > refilled;
> > othewrise, the VCPU will be moved to the depletedq.
> >
> > Scheduling policy is modified global EDF:
> > A VCPU v1 has higher priority than another VCPU v2 if
> > (i) v1 has smaller priority_leve; or
> > (ii) v1 has the same priority_level but has a smaller deadline
> >
> > Queue management:
> > Run queue holds VCPUs with extratime flag set and VCPUs with
> > remaining budget. Run queue is sorted in increasing order of VCPUs
> > priorities.
> > Depleted queue holds VCPUs which have extratime flag cleared and
> > depleted budget.
> > Replenished queue is not modified.
> >
> Mmm.. didn't we agree about putting a word of explanation of how the
> spare bandwidth ends up being distributed (i.e., in a way which is
> proportional to the utilization)?


I didn't recall that. I should have double checked that I have
resolved every comment in previous patch.
Anyway, I added the comment in the next version, which is comming soon.

>
> Or is it there and it's me that am
> not finding it?
>
> > --- a/xen/common/sched_rt.c
> > +++ b/xen/common/sched_rt.c
> > @@ -245,6 +258,11 @@ static inline struct list_head *rt_replq(const
> > struct scheduler *ops)
> >  return &rt_priv(ops)->replq;
> >  }
> >
> > +static inline bool has_extratime(const struct rt_vcpu *svc)
> > +{
> > +return (svc->flags & RTDS_extratime) ? 1 : 0;
> > +}
> > +
> 'true' and 'false'. But I think
>
>  return svc->flags & RTDS_extratime
>
> is just fine already, without any need for the ?: operator.


OK. Either works for me. I will go with
return svc->flags & RTDS_extratime.
>
>
> The rest of the patch looks fine to me.


Thanks,

Meng

-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/2] public/domctl: drop unnecessary typedefs and handles

2017-09-12 Thread Meng Xu

On Tue, Sep 12, 2017 at 11:08 AM, Jan Beulich  wrote:
>
> By virtue of the struct xen_domctl container structure, most of them
> are really just cluttering the name space.
>
> While doing so,
> - convert an enum typed (pt_irq_type_t) structure field to a fixed
>   width type,
> - make x86's paging_domctl() and descendants take a properly typed
>   handle,
> - add const in a few places.
>
> Signed-off-by: Jan Beulich 


Acked-by: Meng Xu 

Thanks,

Meng

-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/5] libxl: enable per-VCPU extratime flag for RTDS

2017-09-01 Thread Meng Xu

Dario,

I didn't include your Reviewed-by tag because I made one small change.


On Fri, Sep 1, 2017 at 11:58 AM, Meng Xu  wrote:
>
> Modify libxl_vcpu_sched_params_get/set and sched_rtds_vcpu_get/set
> functions to support per-VCPU extratime flag
>
> Signed-off-by: Meng Xu 
>
> ---
> Changes from v1
> 1) Add LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA to indicate if extratime flag is
> supported
> 2) Change flag name in domctl.h from XEN_DOMCTL_SCHED_RTDS_extratime to
> XEN_DOMCTL_SCHEDRT_extra
>
> Changes from RFC v1
> Change work_conserving flag to extratime flag
> ---
>  tools/libxl/libxl_sched.c | 12 
>  1 file changed, 12 insertions(+)
> ---
>  tools/libxl/libxl.h   |  6 ++
>  tools/libxl/libxl_sched.c | 18 ++
>  2 files changed, 24 insertions(+)
>
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 1704525..ead300f 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -257,6 +257,12 @@
>  #define LIBXL_HAVE_SCHED_RTDS_VCPU_PARAMS 1
>
>  /*
> + * LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA indicates RTDS scheduler
> + * now supports per-vcpu extratime settings.
> + */
> +#define LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA 1
> +
> +/*
>   * libxl_domain_build_info has the arm.gic_version field.
>   */
>  #define LIBXL_HAVE_BUILDINFO_ARM_GIC_VERSION 1
> diff --git a/tools/libxl/libxl_sched.c b/tools/libxl/libxl_sched.c
> index faa604e..b76a29a 100644
> --- a/tools/libxl/libxl_sched.c
> +++ b/tools/libxl/libxl_sched.c
> @@ -558,6 +558,10 @@ static int sched_rtds_vcpu_get_all(libxl__gc *gc, 
> uint32_t domid,
>  for (i = 0; i < num_vcpus; i++) {
>  scinfo->vcpus[i].period = vcpus[i].u.rtds.period;
>  scinfo->vcpus[i].budget = vcpus[i].u.rtds.budget;
> +if (vcpus[i].u.rtds.flags & XEN_DOMCTL_SCHEDRT_extra)
> +   scinfo->vcpus[i].extratime = 1;
> +else
> +   scinfo->vcpus[i].extratime = 0;
>  scinfo->vcpus[i].vcpuid = vcpus[i].vcpuid;
>  }
>  rc = 0;
> @@ -607,6 +611,10 @@ static int sched_rtds_vcpu_set(libxl__gc *gc, uint32_t 
> domid,
>  vcpus[i].vcpuid = scinfo->vcpus[i].vcpuid;
>  vcpus[i].u.rtds.period = scinfo->vcpus[i].period;
>  vcpus[i].u.rtds.budget = scinfo->vcpus[i].budget;
> +if (scinfo->vcpus[i].extratime)
> +vcpus[i].u.rtds.flags |= XEN_DOMCTL_SCHEDRT_extra;
> +else
> +vcpus[i].u.rtds.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
>  }
>
>  r = xc_sched_rtds_vcpu_set(CTX->xch, domid,
> @@ -655,6 +663,10 @@ static int sched_rtds_vcpu_set_all(libxl__gc *gc, 
> uint32_t domid,
>  vcpus[i].vcpuid = i;
>  vcpus[i].u.rtds.period = scinfo->vcpus[0].period;
>  vcpus[i].u.rtds.budget = scinfo->vcpus[0].budget;
> +if (scinfo->vcpus[0].extratime)
> +vcpus[i].u.rtds.flags |= XEN_DOMCTL_SCHEDRT_extra;
> +else
> +vcpus[i].u.rtds.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
>  }
>
>  r = xc_sched_rtds_vcpu_set(CTX->xch, domid,
> @@ -705,6 +717,12 @@ static int sched_rtds_domain_set(libxl__gc *gc, uint32_t 
> domid,
>  sdom.period = scinfo->period;
>  if (scinfo->budget != LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT)
>  sdom.budget = scinfo->budget;
> +if (scinfo->extratime != LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT) {
> +if (scinfo->extratime)
> +sdom.flags |= XEN_DOMCTL_SCHEDRT_extra;
> +else
> +sdom.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
> +}
>  if (sched_rtds_validate_params(gc, sdom.period, sdom.budget))
>  return ERROR_INVAL;


As you mentioned in the comment to the xl patch v1, I used
LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT for extratime flag as what
we did for period and budget. But the way we handle flags is exactly
the same with the way we handle period and budget.

I'm ok with what it is in this patch, although I feel that we can kill the
 if (scinfo->extratime != LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT)
because LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT is -1.


What do you think?

Thanks,

Meng


-- 
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 5/5] docs: enable per-VCPU extratime flag for RTDS

2017-09-01 Thread Meng Xu

Revise xl tool use case by adding -e option
Remove work-conserving from TODO list

Signed-off-by: Meng Xu 

---
Changes from v1
Revise rtds docs
---
 docs/features/sched_rtds.pandoc | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/features/sched_rtds.pandoc b/docs/features/sched_rtds.pandoc
index 354097b..d51b499 100644
--- a/docs/features/sched_rtds.pandoc
+++ b/docs/features/sched_rtds.pandoc
@@ -40,7 +40,7 @@ as follows:
 
 It is possible, for a multiple vCPUs VM, to change the parameters of
 each vCPU individually:
-* `xl sched-rtds -d vm-rt -v 0 -p 2 -b 1 -v 1 -p 45000 -b 12000`
+* `xl sched-rtds -d vm-rt -v 0 -p 2 -b 1 -e 1 -v 1 -p 45000 -b 
12000 -e 0`
 
 # Technical details
 
@@ -53,7 +53,8 @@ the presence of the LIBXL\_HAVE\_SCHED\_RTDS symbol. The 
ability of
 specifying different scheduling parameters for each vcpu has been
 introduced later, and is available if the following symbols are defined:
 * `LIBXL\_HAVE\_VCPU\_SCHED\_PARAMS`,
-* `LIBXL\_HAVE\_SCHED\_RTDS\_VCPU\_PARAMS`.
+* `LIBXL\_HAVE\_SCHED\_RTDS\_VCPU\_PARAMS`,
+* `LIBXL\_HAVE\_SCHED\_RTDS\_VCPU\_EXTRA`.
 
 # Limitations
 
@@ -95,7 +96,6 @@ at a macroscopic level), the following should be done:
 
 # Areas for improvement
 
-* Work-conserving mode to be added;
 * performance assessment, especially focusing on what level of real-time
   behavior the scheduler enables.
 
@@ -118,4 +118,5 @@ at a macroscopic level), the following should be done:
 Date   Revision Version  Notes
 --   ---
 2016-10-14 1Xen 4.8  Document written
+2017-08-31 2Xen 4.10 Revise for work conserving feature
 --   ---
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 0/5] Towards work-conserving RTDS

2017-09-01 Thread Meng Xu

This series of patches make RTDS scheduler work-conserving
without breaking real-time guarantees.
VCPUs with extratime flag set can get extra time
from the unreserved system resource.
System administrators can decide which VCPUs have extratime flag set.

Example:
Set the extratime bit of all VCPUs of domain 1:
# xl sched-rtds -d 1 -v all -p 1 -b 2000 -e 1
Each VCPU of domain 1 will be guaranteed to have 2000ms every 1ms
(if the system is schedulable).
If there is a CPU having no work to do,
domain 1's VCPUs will be scheduled onto the CPU,
even though the VCPUs have got 2000ms in 1ms.

Clear the extra bit of all VCPUs of domain 1:
# xl sched-rtds -d 1 -v all -p 1 -b 2000 -e 0

Set/Clear the extratime bit of one specific VCPU of domain 1:
# xl sched-rtds -d 1 -v 1 -p 1 -b 2000 -e 1
# xl sched-rtds -d 1 -v 1 -p 1 -b 2000 -e 0


The original design of the work-conserving RTDS was discussed at
https://www.mail-archive.com/xen-devel@lists.xen.org/msg77150.html

The first version was discussed at
https://www.mail-archive.com/xen-devel@lists.xen.org/msg117361.html

The series of patch can be found at github:
https://github.com/PennPanda/RT-Xen
under the branch:
xenbits/rtds/work-conserving-v2

Changes from v1
Change XEN_DOMCTL_SCHED_RTDS_extratime to XEN_DOMCTL_SCHEDRT_extra
Revise xentrace, xenalyze, and docs
Add LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA symbol in libxl.h

Changes from RFC v1
Merge changes in sched_rt.c into one patch;
Minor change in variable name and comments.

Signed-off-by: Meng Xu 

[PATCH v2 1/5] xen:rtds: towards work conserving RTDS
[PATCH v2 2/5] libxl: enable per-VCPU extratime flag for RTDS
[PATCH v2 3/5] xl: enable per-VCPU extratime flag for RTDS
[PATCH v2 4/5] xentrace: enable per-VCPU extratime flag for RTDS
[PATCH v2 5/5] docs: enable per-VCPU extratime flag for RTDS


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 2/5] libxl: enable per-VCPU extratime flag for RTDS

2017-09-01 Thread Meng Xu

Modify libxl_vcpu_sched_params_get/set and sched_rtds_vcpu_get/set
functions to support per-VCPU extratime flag

Signed-off-by: Meng Xu 

---
Changes from v1
1) Add LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA to indicate if extratime flag is
supported
2) Change flag name in domctl.h from XEN_DOMCTL_SCHED_RTDS_extratime to
XEN_DOMCTL_SCHEDRT_extra

Changes from RFC v1
Change work_conserving flag to extratime flag
---
 tools/libxl/libxl_sched.c | 12 
 1 file changed, 12 insertions(+)
---
 tools/libxl/libxl.h   |  6 ++
 tools/libxl/libxl_sched.c | 18 ++
 2 files changed, 24 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 1704525..ead300f 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -257,6 +257,12 @@
 #define LIBXL_HAVE_SCHED_RTDS_VCPU_PARAMS 1
 
 /*
+ * LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA indicates RTDS scheduler
+ * now supports per-vcpu extratime settings.
+ */
+#define LIBXL_HAVE_SCHED_RTDS_VCPU_EXTRA 1
+
+/*
  * libxl_domain_build_info has the arm.gic_version field.
  */
 #define LIBXL_HAVE_BUILDINFO_ARM_GIC_VERSION 1
diff --git a/tools/libxl/libxl_sched.c b/tools/libxl/libxl_sched.c
index faa604e..b76a29a 100644
--- a/tools/libxl/libxl_sched.c
+++ b/tools/libxl/libxl_sched.c
@@ -558,6 +558,10 @@ static int sched_rtds_vcpu_get_all(libxl__gc *gc, uint32_t 
domid,
 for (i = 0; i < num_vcpus; i++) {
 scinfo->vcpus[i].period = vcpus[i].u.rtds.period;
 scinfo->vcpus[i].budget = vcpus[i].u.rtds.budget;
+if (vcpus[i].u.rtds.flags & XEN_DOMCTL_SCHEDRT_extra)
+   scinfo->vcpus[i].extratime = 1;
+else
+   scinfo->vcpus[i].extratime = 0;
 scinfo->vcpus[i].vcpuid = vcpus[i].vcpuid;
 }
 rc = 0;
@@ -607,6 +611,10 @@ static int sched_rtds_vcpu_set(libxl__gc *gc, uint32_t 
domid,
 vcpus[i].vcpuid = scinfo->vcpus[i].vcpuid;
 vcpus[i].u.rtds.period = scinfo->vcpus[i].period;
 vcpus[i].u.rtds.budget = scinfo->vcpus[i].budget;
+if (scinfo->vcpus[i].extratime)
+vcpus[i].u.rtds.flags |= XEN_DOMCTL_SCHEDRT_extra;
+else
+vcpus[i].u.rtds.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
 }
 
 r = xc_sched_rtds_vcpu_set(CTX->xch, domid,
@@ -655,6 +663,10 @@ static int sched_rtds_vcpu_set_all(libxl__gc *gc, uint32_t 
domid,
 vcpus[i].vcpuid = i;
 vcpus[i].u.rtds.period = scinfo->vcpus[0].period;
 vcpus[i].u.rtds.budget = scinfo->vcpus[0].budget;
+if (scinfo->vcpus[0].extratime)
+vcpus[i].u.rtds.flags |= XEN_DOMCTL_SCHEDRT_extra;
+else
+vcpus[i].u.rtds.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
 }
 
 r = xc_sched_rtds_vcpu_set(CTX->xch, domid,
@@ -705,6 +717,12 @@ static int sched_rtds_domain_set(libxl__gc *gc, uint32_t 
domid,
 sdom.period = scinfo->period;
 if (scinfo->budget != LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT)
 sdom.budget = scinfo->budget;
+if (scinfo->extratime != LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT) {
+if (scinfo->extratime)
+sdom.flags |= XEN_DOMCTL_SCHEDRT_extra;
+else
+sdom.flags &= ~XEN_DOMCTL_SCHEDRT_extra;
+}
 if (sched_rtds_validate_params(gc, sdom.period, sdom.budget))
 return ERROR_INVAL;
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 3/5] xl: enable per-VCPU extratime flag for RTDS

2017-09-01 Thread Meng Xu

Change main_sched_rtds and related output functions to support
per-VCPU extratime flag.

Signed-off-by: Meng Xu 

---
Changes from v1
No change because we agree on using -e 0/1 option to
set if a vcpu will get extra time or not

Changes from RFC v1
Changes work_conserving flag to extratime flag
---
 tools/xl/xl_cmdtable.c |  3 ++-
 tools/xl/xl_sched.c| 56 ++
 2 files changed, 40 insertions(+), 19 deletions(-)
---
 tools/xl/xl_cmdtable.c |  3 ++-
 tools/xl/xl_sched.c| 56 ++
 2 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index ba0159d..1b03d44 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -272,12 +272,13 @@ struct cmd_spec cmd_table[] = {
 { "sched-rtds",
   &main_sched_rtds, 0, 1,
   "Get/set rtds scheduler parameters",
-  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-b[=BUDGET]]]",
+  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-b[=BUDGET]] 
[-e[=EXTRATIME]]]",
   "-d DOMAIN, --domain=DOMAIN Domain to modify\n"
   "-v VCPUID/all, --vcpuid=VCPUID/allVCPU to modify or output;\n"
   "   Using '-v all' to modify/output all vcpus\n"
   "-p PERIOD, --period=PERIOD Period (us)\n"
   "-b BUDGET, --budget=BUDGET Budget (us)\n"
+  "-e EXTRATIME, --extratime=EXTRATIME EXTRATIME (1=yes, 0=no)\n"
 },
 { "domid",
   &main_domid, 0, 0,
diff --git a/tools/xl/xl_sched.c b/tools/xl/xl_sched.c
index 85722fe..5138012 100644
--- a/tools/xl/xl_sched.c
+++ b/tools/xl/xl_sched.c
@@ -251,7 +251,7 @@ static int sched_rtds_domain_output(
 libxl_domain_sched_params scinfo;
 
 if (domid < 0) {
-printf("%-33s %4s %9s %9s\n", "Name", "ID", "Period", "Budget");
+printf("%-33s %4s %9s %9s %10s\n", "Name", "ID", "Period", "Budget", 
"Extra time");
 return 0;
 }
 
@@ -262,11 +262,12 @@ static int sched_rtds_domain_output(
 }
 
 domname = libxl_domid_to_name(ctx, domid);
-printf("%-33s %4d %9d %9d\n",
+printf("%-33s %4d %9d %9d %10s\n",
 domname,
 domid,
 scinfo.period,
-scinfo.budget);
+scinfo.budget,
+scinfo.extratime ? "yes" : "no");
 free(domname);
 libxl_domain_sched_params_dispose(&scinfo);
 return 0;
@@ -279,8 +280,8 @@ static int sched_rtds_vcpu_output(int domid, 
libxl_vcpu_sched_params *scinfo)
 int i;
 
 if (domid < 0) {
-printf("%-33s %4s %4s %9s %9s\n", "Name", "ID",
-   "VCPU", "Period", "Budget");
+printf("%-33s %4s %4s %9s %9s %10s\n", "Name", "ID",
+   "VCPU", "Period", "Budget", "Extra time");
 return 0;
 }
 
@@ -290,12 +291,13 @@ static int sched_rtds_vcpu_output(int domid, 
libxl_vcpu_sched_params *scinfo)
 
 domname = libxl_domid_to_name(ctx, domid);
 for ( i = 0; i < scinfo->num_vcpus; i++ ) {
-printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32"\n",
+printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32" %10s\n",
domname,
domid,
scinfo->vcpus[i].vcpuid,
scinfo->vcpus[i].period,
-   scinfo->vcpus[i].budget);
+   scinfo->vcpus[i].budget,
+   scinfo->vcpus[i].extratime ? "yes" : "no");
 }
 free(domname);
 return 0;
@@ -309,8 +311,8 @@ static int sched_rtds_vcpu_output_all(int domid,
 int i;
 
 if (domid < 0) {
-printf("%-33s %4s %4s %9s %9s\n", "Name", "ID",
-   "VCPU", "Period", "Budget");
+printf("%-33s %4s %4s %9s %9s %10s\n", "Name", "ID",
+   "VCPU", "Period", "Budget", "Extra time");
 return 0;
 }
 
@@ -321,12 +323,13 @@ static int sched_rtds_vcpu_output_all(int domid,
 
 domname = libxl_domid_to_name(ctx, domid);
 for ( i = 0; i < scinfo->num_vcpus; i++ ) {
-printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32"\n",
+printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32" %10s\n",
domname,
domid,
scinfo->vcpus[i].vcpuid,
scinfo->vcpus[i].period,
-   scinfo->vcpus[i].budget);
+   scinfo->vcpus[i].budg

[Xen-devel] [PATCH v2 4/5] xentrace: enable per-VCPU extratime flag for RTDS

2017-09-01 Thread Meng Xu

Change repl_budget event output for xentrace formats and xenalyze

Signed-off-by: Meng Xu 

---
Changes from v1
Add this changes from v1
---
 tools/xentrace/formats| 2 +-
 tools/xentrace/xenalyze.c | 8 +---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/tools/xentrace/formats b/tools/xentrace/formats
index f39182a..470ac5c 100644
--- a/tools/xentrace/formats
+++ b/tools/xentrace/formats
@@ -75,7 +75,7 @@
 0x00022801  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:tickle[ cpu = 
%(1)d ]
 0x00022802  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:runq_pick [ dom:vcpu 
= 0x%(1)08x, cur_deadline = 0x%(3)08x%(2)08x, cur_budget = 0x%(5)08x%(4)08x ]
 0x00022803  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:burn_budget   [ dom:vcpu 
= 0x%(1)08x, cur_budget = 0x%(3)08x%(2)08x, delta = %(4)d ]
-0x00022804  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:repl_budget   [ dom:vcpu 
= 0x%(1)08x, cur_deadline = 0x%(3)08x%(2)08x, cur_budget = 0x%(5)08x%(4)08x ]
+0x00022804  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:repl_budget   [ dom:vcpu 
= 0x%(1)08x, priority_level = 0x%(2)08d cur_deadline = 0x%(4)08x%(3)08x, 
cur_budget = 0x%(6)08x%(5)08x ]
 0x00022805  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:sched_tasklet
 0x00022806  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:schedule  [ 
cpu[16]:tasklet[8]:idle[4]:tickled[4] = %(1)08x ]
 
diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 39fc35f..6fb952c 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -7944,12 +7944,14 @@ void sched_process(struct pcpu_info *p)
 if(opt.dump_all) {
 struct {
 unsigned int vcpuid:16, domid:16;
+unsigned int priority_level;
 uint64_t cur_dl, cur_bg;
 } __attribute__((packed)) *r = (typeof(r))ri->d;
 
-printf(" %s rtds:repl_budget d%uv%u, deadline = %"PRIu64", "
-   "budget = %"PRIu64"\n", ri->dump_header,
-   r->domid, r->vcpuid, r->cur_dl, r->cur_bg);
+printf(" %s rtds:repl_budget d%uv%u, priority_level = %u,"
+   "deadline = %"PRIu64", budget = %"PRIu64"\n",
+   ri->dump_header, r->domid, r->vcpuid,
+   r->priority_level, r->cur_dl, r->cur_bg);
 }
 break;
 case TRC_SCHED_CLASS_EVT(RTDS, 5): /* SCHED_TASKLET*/
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 1/5] xen:rtds: towards work conserving RTDS

2017-09-01 Thread Meng Xu

Make RTDS scheduler work conserving without breaking the real-time guarantees.

VCPU model:
Each real-time VCPU is extended to have an extratime flag
and a priority_level field.
When a VCPU's budget is depleted in the current period,
if it has extratime flag set,
its priority_level will increase by 1 and its budget will be refilled;
othewrise, the VCPU will be moved to the depletedq.

Scheduling policy is modified global EDF:
A VCPU v1 has higher priority than another VCPU v2 if
(i) v1 has smaller priority_leve; or
(ii) v1 has the same priority_level but has a smaller deadline

Queue management:
Run queue holds VCPUs with extratime flag set and VCPUs with
remaining budget. Run queue is sorted in increasing order of VCPUs priorities.
Depleted queue holds VCPUs which have extratime flag cleared and depleted 
budget.
Replenished queue is not modified.

Signed-off-by: Meng Xu 

---
Changes from v1
Change XEN_DOMCTL_SCHED_RTDS_extratime to XEN_DOMCTL_SCHEDRT_extra as
suggested by Dario

Changes from RFC v1
Rewording comments and commit message
Remove is_work_conserving field from rt_vcpu structure
Use one bit in VCPU's flag to indicate if a VCPU will have extra time
Correct comments style
---
 xen/common/sched_rt.c   | 90 ++---
 xen/include/public/domctl.h |  4 ++
 2 files changed, 80 insertions(+), 14 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 0ac5816..fab6f49 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -49,13 +49,15 @@
  * A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is idle or
  * has a lower-priority VCPU running on it.)
  *
- * Each VCPU has a dedicated period and budget.
+ * Each VCPU has a dedicated period, budget and a extratime flag
  * The deadline of a VCPU is at the end of each period;
  * A VCPU has its budget replenished at the beginning of each period;
  * While scheduled, a VCPU burns its budget.
  * The VCPU needs to finish its budget before its deadline in each period;
  * The VCPU discards its unused budget at the end of each period.
- * If a VCPU runs out of budget in a period, it has to wait until next period.
+ * When a VCPU runs out of budget in a period, if its extratime flag is set,
+ * the VCPU increases its priority_level by 1 and refills its budget; 
otherwise,
+ * it has to wait until next period.
  *
  * Each VCPU is implemented as a deferable server.
  * When a VCPU has a task running on it, its budget is continuously burned;
@@ -63,7 +65,8 @@
  *
  * Queue scheme:
  * A global runqueue and a global depletedqueue for each CPU pool.
- * The runqueue holds all runnable VCPUs with budget, sorted by deadline;
+ * The runqueue holds all runnable VCPUs with budget,
+ * sorted by priority_level and deadline;
  * The depletedqueue holds all VCPUs without budget, unsorted;
  *
  * Note: cpumask and cpupool is supported.
@@ -151,6 +154,14 @@
 #define RTDS_depleted (1<<__RTDS_depleted)
 
 /*
+ * RTDS_extratime: Can the vcpu run in the time that is
+ * not part of any real-time reservation, and would therefore
+ * be otherwise left idle?
+ */
+#define __RTDS_extratime4
+#define RTDS_extratime (1<<__RTDS_extratime)
+
+/*
  * rt tracing events ("only" 512 available!). Check
  * include/public/trace.h for more details.
  */
@@ -201,6 +212,8 @@ struct rt_vcpu {
 struct rt_dom *sdom;
 struct vcpu *vcpu;
 
+unsigned priority_level;
+
 unsigned flags;  /* mark __RTDS_scheduled, etc.. */
 };
 
@@ -245,6 +258,11 @@ static inline struct list_head *rt_replq(const struct 
scheduler *ops)
 return &rt_priv(ops)->replq;
 }
 
+static inline bool has_extratime(const struct rt_vcpu *svc)
+{
+return (svc->flags & RTDS_extratime) ? 1 : 0;
+}
+
 /*
  * Helper functions for manipulating the runqueue, the depleted queue,
  * and the replenishment events queue.
@@ -274,6 +292,21 @@ vcpu_on_replq(const struct rt_vcpu *svc)
 }
 
 /*
+ * If v1 priority >= v2 priority, return value > 0
+ * Otherwise, return value < 0
+ */
+static s_time_t
+compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu *v2)
+{
+int prio = v2->priority_level - v1->priority_level;
+
+if ( prio == 0 )
+return v2->cur_deadline - v1->cur_deadline;
+
+return prio;
+}
+
+/*
  * Debug related code, dump vcpu/cpu information
  */
 static void
@@ -303,6 +336,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct 
rt_vcpu *svc)
 cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), mask);
 printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
" cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
+   " \t\t priority_level=%d has_extratime=%d\n"
" \t\t onQ=%d runnable=%d flags=%x effective hard_affinity=%s\n",
 svc->vcpu

Re: [Xen-devel] [RFC PATCH 0/5] Extend resources to support more vcpus in single VM

2017-08-25 Thread Meng Xu

Hi Tianyu,

On Thu, Aug 24, 2017 at 10:52 PM, Lan Tianyu  wrote:
>
> This patchset is to extend some resources(i.e, event channel,
> hap and so) to support more vcpus for single VM.
>
>
> Chao Gao (1):
>   xl/libacpi: extend lapic_id() to uint32_t
>
> Lan Tianyu (4):
>   xen/hap: Increase hap size for more vcpus support
>   XL: Increase event channels to support more vcpus
>   Tool/ACPI: DSDT extension to support more vcpus
>   hvmload: Add x2apic entry support in the MADT build
>
>  tools/firmware/hvmloader/util.c |  2 +-
>  tools/libacpi/acpi2_0.h | 10 +++
>  tools/libacpi/build.c   | 61 
> +
>  tools/libacpi/libacpi.h |  2 +-
>  tools/libacpi/mk_dsdt.c | 11 
>  tools/libxl/libxl_create.c  |  2 +-
>  tools/libxl/libxl_x86_acpi.c|  2 +-
>  xen/arch/x86/mm/hap/hap.c   |  2 +-
>  8 files changed, 63 insertions(+), 29 deletions(-)


How many VCPUs for a single VM do you want to support with this patch set?

Thanks,

Meng
-- 
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.10 Development Update

2017-08-22 Thread Meng Xu

On Mon, Aug 21, 2017 at 6:07 AM, Julien Grall  wrote:
> This email only tracks big items for xen.git tree. Please reply for items you
> woulk like to see in 4.10 so that people have an idea what is going on and
> prioritise accordingly.
>
> You're welcome to provide description and use cases of the feature you're
> working on.
>
> = Timeline =
>
> We now adopt a fixed cut-off date scheme. We will release twice a
> year. The upcoming 4.10 timeline are as followed:
>
> * Last posting date: September 15th, 2017
> * Hard code freeze: September 29th, 2017
> * RC1: TBD
> * Release: December 2, 2017
>
> Note that we don't have freeze exception scheme anymore. All patches
> that wish to go into 4.10 must be posted no later than the last posting
> date. All patches posted after that date will be automatically queued
> into next release.
>
> RCs will be arranged immediately after freeze.
>
> We recently introduced a jira instance to track all the tasks (not only big)
> for the project. See: https://xenproject.atlassian.net/projects/XEN/issues.
>
> Most of the tasks tracked by this e-mail also have a corresponding jira task
> referred by XEN-N.
>
> I have started to include the version number of series associated to each
> feature. Can each owner send an update on the version number if the series
> was posted upstream?
>
> = Projects =
>
> == Hypervisor ==
>
> *  Per-cpu tasklet
>   -  XEN-28
>   -  Konrad Rzeszutek Wilk
>
> *  Add support of rcu_idle_{enter,exit}
>   -  XEN-27
>   -  Dario Faggioli

I'm moving the RTDS scheduler to work-conserving scheduler.
The first version of the patch series has been posted at
https://www.mail-archive.com/xen-devel@lists.xen.org/msg117062.html,
after we discussed the RFC patch.

Thanks,

Meng

-- 
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] RT-Xen on ARM

2017-08-21 Thread Meng Xu

On Mon, Aug 21, 2017 at 4:38 AM, Andrii Anisov  wrote:
>
> On 18.08.17 23:43, Meng Xu wrote:
>>
>> Sure. The workload we used in the paper is mainly the cpu-intensive task.
>> We first calibrate a busy-loop of multiplications that runs for 1ms.
>> Then for a task that executes for exe(ms), we simply let the task
>> execute the 1ms busy loop for exe times.
>
> I'm a bit confused, why didn't you ran the system with rtspin from
> LITMUS-RT, any issues with it?

The task we are using should do same amount of calculation for the
same amount of time. For example, suppose it takes 1ms to run the
following piece of code:
for( i = 0; i < 1 million; i++)
 sum += i;
This piece of code can be viewed as the "payload" of a realistic workload.

Suppose the task is scheduled to run at t0, preempted at t1, resumes
at t2, and finishes at t3. We have (t1 - t0) + (t3 - t2) = 1ms and we
are sure the task did the addition for 1million times.

However, if we use the rtspin, the rtspin will check if (t2-t0) > 1ms.
If so, it will claim it finishes its workload although it hasn't
finished its workload, i.e., doing addition for 1million times.

Since we want to compare if tasks can finish their "workload" by their
deadline under different scheduling algorithms, we should fix the
"amount of workload" a task does under different scheduling policies.
rtspin() does not achieve our purpose. That's why we don't use it.

Note that rtspin() is initially designed to test the scheduling
overload of LITMUS. It does not perform the same amount of workload
for the same assigned wcet.

> BTW, I've found set experimental patches (scripts and functional changes) on
> your github: https://github.com/PennPanda/liblitmus .
> Are they related to the mentioned document [1]?

Not really. The liblitmus repo under my repo. is for another project.
It is not for [1]'s purpose.

The idea of creating the real-time task is similar, though.
The real-time task is based on the bin/base_task.c in liblitmus.
It needs to fill out the job() function as follows:

static int job(int wcet)
{
for (i = 0; i < wcet; i++)
  loop_for_one_1ms()
}

 loop_for_one_1ms() {
 /* iterations value differs across machines */
 for (j = 0; j < iterations; j++ )
   result  = result + j * j;
  }

>
>> [1] https://www.cis.upenn.edu/~linhphan/papers/emsoft14-rt-xen.pdf
>
>
> --

Hope it helps clear the confusion.

Thanks,

Meng

-- 
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] RT-Xen on ARM

2017-08-21 Thread Meng Xu

On Mon, Aug 21, 2017 at 4:16 AM, Andrii Anisov  wrote:
>
> On 21.08.17 11:07, Andrii Anisov wrote:
>>
>> Hello Meng Xu,
>>
>>
>> On 18.08.17 23:43, Meng Xu wrote:
>>>
>>> The Section 4.1 and 4.2 in [1] explained the whole experiment steps.
>>> If you have any question or confusion on a specific step, please feel
>>> free to let me know.
>>
>> From the document it is not really clear if you ran one guest RT domain or
>> several simultaneously for your experiments.
>
> Also it is not described XEN RT scheduler setup like vcpus period/budget
> configuration for each guest domain.
> It is not obvious if the configured set of vcpus in the experiment setup
> utilized all the pcpus bandwidth.
>

Given the set of tasks in each VM, we compute the VCPUs' periods and
budgets, using the CARTS tool [1]. Note that each task has a period
and a worst-case execution time (wcet).

The configured set of vcpus in the experiment setup may not use all
pcpus bandwidth. For example, if we have one task (period = 10ms, wcet
= 2ms) on a VCPU, the VCPU of the task will not be configured with
100% bandwidth. If the VCPU is the only VCPU on a pcpu, that pcpu
bandwidth won't be fully used because there is not enough workload to
fully use all pcpu bandwidth.

[1] https://rtg.cis.upenn.edu/carts/

Best,

Meng
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] RT-Xen on ARM

2017-08-21 Thread Meng Xu

On Mon, Aug 21, 2017 at 4:07 AM, Andrii Anisov  wrote:
>
> Hello Meng Xu,
>
>
> On 18.08.17 23:43, Meng Xu wrote:
>>
>> The Section 4.1 and 4.2 in [1] explained the whole experiment steps.
>> If you have any question or confusion on a specific step, please feel
>> free to let me know.
>
> From the document it is not really clear if you ran one guest RT domain or 
> several simultaneously for your experiments.
>

We run 4 VMs simultaneously.


Meng


-- 
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] RT-Xen on ARM

2017-08-18 Thread Meng Xu

Hi Andrii,

On Tue, Aug 1, 2017 at 4:02 AM, Andrii Anisov  wrote:
> Hello Meng Xu,
>
> I've get back to this stuff.

Sorry for the late response. I'm not sure if you have already solved this.

>
>
> On 03.07.17 17:58, Andrii Anisov wrote:
>>
>> That's why we are going to keep configuration (of guests and workloads)
>> close to [1] for evaluation, but on our target SoC.
>> I'm wondering if there are known issues or specifics for ARM.
>>
>> [1] https://www.cis.upenn.edu/~linhphan/papers/emsoft14-rt-xen.pdf
>
> Currently I have a setup with dom0 and domU's with Litmus-RT.

Great!

> Following the
> document I need workload tasks.
> Maybe you have mentioned workload tasks sources you can share, so that would
> shorten my steps.

Sure. The workload we used in the paper is mainly the cpu-intensive task.
We first calibrate a busy-loop of multiplications that runs for 1ms.
Then for a task that executes for exe(ms), we simply let the task
execute the 1ms busy loop for exe times.
It is also good to run the same task for several times to make sure
the task's execution time is table from different runs.

The Section 4.1 and 4.2 in [1] explained the whole experiment steps.
If you have any question or confusion on a specific step, please feel
free to let me know.
We may schedule a meeting to clarify all the questions or confusions
you may have.

[1] https://www.cis.upenn.edu/~linhphan/papers/emsoft14-rt-xen.pdf

Best regards,

Meng

>
> --
>
> *Andrii Anisov*
>
>

-- 
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v1 3/3] xl: enable per-VCPU extratime flag for RTDS

2017-08-09 Thread Meng Xu

On Wed, Aug 9, 2017 at 3:32 AM, Dario Faggioli
 wrote:
> On Tue, 2017-08-08 at 15:55 -0700, Meng Xu wrote:
>> On Tue, Aug 8, 2017 at 3:24 PM, Dario Faggioli
>>  wrote:
>> >
>> > Therefore, I think I would set extratime as on by default in both
>> > Xen
>> > an xl. What do you think?
>> >
>>
>> Right now, the domain is created with its VCPUs' extratime flag on.
>> So
>> by default, extratime is set on in Xen.
>>
>> I'm not sure what do you suggest setting the extratime flag on by
>> default in xl?
>> Did you mean if users do not input -e option, the extratime flag will
>> be set as on?
>>
> No, as I said, I'm ok with the requirement of -e 0/1 always having to
> be present, when changing the vCPU(s) parameters with xl.
>
> I'm talking about what happens at domain creation time.
>
> If the default in Xen is already 'extratime on', I think we're mostly
> fine.

Yes. It is.

>
> As for xl/libxl, I think it would probably be good to take care of
> extratime, e.g., in sched_rtds_domain_set() (in a similar way to how we
> deal with period and budget, i.e., taking advantage of
> LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT).
>

OK.

Thanks,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v1 1/3] xen:rtds: towards work conserving RTDS

2017-08-08 Thread Meng Xu

On Tue, Aug 8, 2017 at 3:52 PM, Dario Faggioli
 wrote:
> On Tue, 2017-08-08 at 12:06 -0700, Meng Xu wrote:
>> On Tue, Aug 8, 2017 at 10:57 AM, Dario Faggioli
>>  wrote:
>> > On Sun, 2017-08-06 at 12:22 -0400, Meng Xu wrote:
>> > >
>> > > diff --git a/xen/include/public/domctl.h
>> > > b/xen/include/public/domctl.h
>> > > index 0669c31..ba5daa9 100644
>> > > --- a/xen/include/public/domctl.h
>> > > +++ b/xen/include/public/domctl.h
>> > > @@ -360,6 +360,9 @@ typedef struct xen_domctl_sched_credit2 {
>> > >  typedef struct xen_domctl_sched_rtds {
>> > >  uint32_t period;
>> > >  uint32_t budget;
>> > > +#define _XEN_DOMCTL_SCHED_RTDS_extratime 0
>> > > +#define
>> > > XEN_DOMCTL_SCHED_RTDS_extratime  (1U<<_XEN_DOMCTL_SCHED_RTDS_extr
>> > > atim
>> > > e)
>> > > +uint32_t flags;
>> > >
>> >
>> > I'd add a one liner comment above the flag definition, as, for
>> > instance, how things are done in createdomain:
>>
>> Sure.
>>
>> How about comment:
>> /* Does this VCPU get extratime beyond reserved time? */
>>
> 'Can this vCPU execute beyond its reserved amount of time?'
>
>> >
>> > struct xen_domctl_createdomain {
>> > /* IN parameters */
>> > uint32_t ssidref;
>> > xen_domain_handle_t handle;
>> >  /* Is this an HVM guest (as opposed to a PVH or PV guest)? */
>> > #define _XEN_DOMCTL_CDF_hvm_guest 0
>> > #define
>> > XEN_DOMCTL_CDF_hvm_guest  (1U<<_XEN_DOMCTL_CDF_hvm_guest)
>> >  /* Use hardware-assisted paging if available? */
>> > #define _XEN_DOMCTL_CDF_hap   1
>> > #define XEN_DOMCTL_CDF_hap(1U<<_XEN_DOMCTL_CDF_hap)
>> >
>> > Also, consider shortening the name (e.g., by contracting the
>> > SCHED_RTDS
>> > part; it does not matter if it's not 100% equal to what's in
>> > sched_rt.c, I think).
>>
>>
>> How about shorten it to XEN_DOMCTL_RTDS_extra or
>> XEN_DOMCTL_RTDS_extratime?
>>
> Personally, I'd go for XEN_DOMCTL_SCHEDRT_extra (or _extratime, or
> _extrat).

OK. I can go with  XEN_DOMCTL_SCHEDRT_extra.

Thanks,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v1 3/3] xl: enable per-VCPU extratime flag for RTDS

2017-08-08 Thread Meng Xu

On Tue, Aug 8, 2017 at 3:24 PM, Dario Faggioli
 wrote:
> On Tue, 2017-08-08 at 12:16 -0700, Meng Xu wrote:
>> On Tue, Aug 8, 2017 at 9:09 AM, Dario Faggioli
>>  wrote:
>> > On Sun, 2017-08-06 at 22:43 -0400, Meng Xu wrote:
>> > >
>> > > As to (1), if users want to set some VCPUs with extratime flag
>> > > set
>> > > and
>> > > some with extratime flag clear, there are two types of input:
>> > > (a) xl sched-rtds -d 1 -v 1 -p 1 -b 4000 -e 0 -v 2 -p 1
>> > > -b
>> > > 4000 -e 1 -v 5 -p 1 -b 4000 -e 0
>> > > (b) xl sched-rtds -d 1 -v 1 -p 1 -b 4000 -v 2 -p 1 -b
>> > > 4000 -e
>> > > 1 -v 5 -p 1 -b 4000
>> > > I felt that the style (a) is more intuitive and the input
>> > > commands
>> > > have very static pattern, i.e., each vcpu must have -v -p -b -e
>> > > options set.
>> > >
>> >
>> > Exactly, I do think that (b) is indeed a better user interface.
>> >
>> With the approach (b), what I have in my mind was: if users do not
>> use
>> -e option for a vcpu index, the vcpu will have its extratime flag
>> cleared.
>> If not-setting -e option for a VCPU means using the current extratime
>> value for the VCPU, then how should users clear the extratime flag
>> for
>> a VCPU?
>>
> Yeah, I know... Well, it's an hard interface to get right.
>
> So, I think, considering how things currently work for budget and
> period, I guess I'm fine with the -e switch taking a 0/1 value.
>
> I've checked how it was in SEDF, and it was like that in there too
> (see, e.g. commit 1583cdd1fdded49698503a699c5868643051e391).
>
>> If you look at the -p and -b option for the xl sched-rtds, we will
>> find that users will have to first read both parameters of a VCPU
>> even
>> if they only want to change the value for one parameter, either -p or
>> -b. We don't allow users to specify -p or -b without an input value.
>>
> Yes. Which I now remember as something I've never really liked. But
> again, it's an interface which is a bit hard to get right. And it's
> certainly not this patch series' job to change it.
>
> So, let's stick with it. Thanks for bearing with me. :-)

No problem at all. :-)
I also checked the SEDF scheduler's commands while I was working on
this patch version.
I felt that keeping the same format for the -p, -b and -e options is a
better idea.

>
>
> I now want to bring something new on the table, though: what should the
> default be?
>
> I mean, what do we expect most people to want, e.g., at domain creation
> time, if they don't include an 'extratime=1' in their config file
> (actually, I don't think it's even possible to do that! :-O) ?
>
> I believe that --kind of unlikely wrt what happens in the real-time
> research and papers-- most users would expect a work conserving
> scheduler, unless they specify otherwise.
>
> As in, they hopefully will enjoy being able to reserve some CPU
> bandwidth in a very precise and deterministic way, for their vCPUs. But
> I don't think they see as a good thing the fact that those vCPUs stops
> running at some point, even if the system is idle.
>
> Also, I think we really should set dom0 to be in extratime mode.
>
> Therefore, I think I would set extratime as on by default in both Xen
> an xl. What do you think?
>

Right now, the domain is created with its VCPUs' extratime flag on. So
by default, extratime is set on in Xen.

I'm not sure what do you suggest setting the extratime flag on by default in xl?
Did you mean if users do not input -e option, the extratime flag will
be set as on?
If so, users may get confused IMHO. Some users may think not setting
-e option indicating clear the extratime flag, while some who
carefully read the instruction of the commands know the xl set the
extratime flag by default if -e option is not provided.

Thanks,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v1 3/3] xl: enable per-VCPU extratime flag for RTDS

2017-08-08 Thread Meng Xu

On Tue, Aug 8, 2017 at 9:09 AM, Dario Faggioli
 wrote:
> On Sun, 2017-08-06 at 22:43 -0400, Meng Xu wrote:
>> On Sun, Aug 6, 2017 at 12:22 PM, Meng Xu 
>> wrote:
>> >
>> > diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
>> > index 2c71a9f..88933a4 100644
>> > --- a/tools/xl/xl_cmdtable.c
>> > +++ b/tools/xl/xl_cmdtable.c
>> > @@ -272,12 +272,13 @@ struct cmd_spec cmd_table[] = {
>> >  { "sched-rtds",
>> >&main_sched_rtds, 0, 1,
>> >"Get/set rtds scheduler parameters",
>> > -  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-
>> > b[=BUDGET]]]",
>> > +  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-b[=BUDGET]]
>> > [-e[=EXTRATIME]]]",
>> >"-d DOMAIN, --domain=DOMAIN Domain to modify\n"
>> >"-v VCPUID/all, --vcpuid=VCPUID/allVCPU to modify or
>> > output;\n"
>> >"   Using '-v all' to modify/output all vcpus\n"
>> >"-p PERIOD, --period=PERIOD Period (us)\n"
>> >"-b BUDGET, --budget=BUDGET Budget (us)\n"
>> > +  "-e EXTRATIME, --extratime=EXTRATIME EXTRATIME (1=yes,
>> > 0=no)\n"
>>
>> Hi Dario,
>>
>> I kept the EXTRATIME value for -e option because: (1) it may be more
>> intuitive for users; (2) it needs much less code change than the
>> input
>> style that does not need EXTRATIME value.
>>
> I'm of the opposite view.
>
> But again, it's tools' people views' that count. :-D
>
>> As to (1), if users want to set some VCPUs with extratime flag set
>> and
>> some with extratime flag clear, there are two types of input:
>> (a) xl sched-rtds -d 1 -v 1 -p 1 -b 4000 -e 0 -v 2 -p 1 -b
>> 4000 -e 1 -v 5 -p 1 -b 4000 -e 0
>> (b) xl sched-rtds -d 1 -v 1 -p 1 -b 4000 -v 2 -p 1 -b 4000 -e
>> 1 -v 5 -p 1 -b 4000
>> I felt that the style (a) is more intuitive and the input commands
>> have very static pattern, i.e., each vcpu must have -v -p -b -e
>> options set.
>>
> Exactly, I do think that (b) is indeed a better user interface.
>
> For instance, what if I want to change period and budget of vcpu 1 of
> domain 3, _without_ changing whether or not it can use the extra time.

With the approach (b), what I have in my mind was: if users do not use
-e option for a vcpu index, the vcpu will have its extratime flag
cleared.
If not-setting -e option for a VCPU means using the current extratime
value for the VCPU, then how should users clear the extratime flag for
a VCPU? Are you indicating the -e option has three meanings:
If -e option is set without value, keep the extratime value unchanged;
If -e option is set with value 0, clear the extratime value;
If -e option is set with value 1, set the extratime value.


If you look at the -p and -b option for the xl sched-rtds, we will
find that users will have to first read both parameters of a VCPU even
if they only want to change the value for one parameter, either -p or
-b. We don't allow users to specify -p or -b without an input value.

By looking at how -p and -b options are handled, I leaned to the
approach (a): users must input a value for the -e option,  similar to
how  the -p and -b options are handled.

>
> With (a), I don't think I can do that. Or at least, I'd have to either
> remember or check what extratime is right now, and pass that same value
> explicitly to `xl sched-rtds -d 3 -v 1 ...'.
>
> That does not look super user-friendly to me.
>
>> As to (2), if we go with -e without EXTRATIME, we will have to keep
>> track of the vcpu that has no -e option. I thought about this option,
>> we can pre-set the extratime value to false when -v option is
>> assigned:
>> case 'v':
>> ...
>> extratimes[v_index]  = 0;
>>
>> and set the extratimes[v_index] = 0 when -e is set.
>>
> Yes, something like that. Or, even better, use its current value.
>
> That would require calling libxl_vcpu_sched_params_get() (or, at times,
> libxl_vcpu_sched_params_get_all()), which I assumed you were doing
> already, while you apparently don't. Mmm...
>
>> This approach is not very neat in the code: we have to reallocate
>> memory for extratimes array when its size is not enough; we also have
>> to deal with the special case when -e is set before -v, such as the
>> command "xl sched-rtds -p 1 -b 4000 -e -v 0"
>>
> Err... sorry, there's code for reallocations in this patch already,
> isn't this the case?
>
> Also, it may be me, but I don't understand how this is different from
> how -b and -p are dealt with.
>
> Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Thanks,

Meng

-- 
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v1 1/3] xen:rtds: towards work conserving RTDS

2017-08-08 Thread Meng Xu

On Tue, Aug 8, 2017 at 10:57 AM, Dario Faggioli
 wrote:
> On Sun, 2017-08-06 at 12:22 -0400, Meng Xu wrote:
>> Make RTDS scheduler work conserving without breaking the real-time
>> guarantees.
>>
>> VCPU model:
>> Each real-time VCPU is extended to have an extratime flag
>> and a priority_level field.
>> When a VCPU's budget is depleted in the current period,
>> if it has extratime flag set,
>> its priority_level will increase by 1 and its budget will be
>> refilled;
>> othewrise, the VCPU will be moved to the depletedq.
>>
>> Scheduling policy is modified global EDF:
>> A VCPU v1 has higher priority than another VCPU v2 if
>> (i) v1 has smaller priority_leve; or
>> (ii) v1 has the same priority_level but has a smaller deadline
>>
>> Queue management:
>> Run queue holds VCPUs with extratime flag set and VCPUs with
>> remaining budget. Run queue is sorted in increasing order of VCPUs
>> priorities.
>> Depleted queue holds VCPUs which have extratime flag cleared and
>> depleted budget.
>> Replenished queue is not modified.
>>
>> Signed-off-by: Meng Xu 
>>
> This looks mostly good to me.
>
> There are only a couple of things left, in addition to the
> changlog+comment mention to how the 'spare bandwidth' distribution
> works, that we agreed upon in the other thread.
>
>> --- a/xen/common/sched_rt.c
>> +++ b/xen/common/sched_rt.c
>> @@ -245,6 +258,11 @@ static inline struct list_head *rt_replq(const
>> struct scheduler *ops)
>>  return &rt_priv(ops)->replq;
>>  }
>>
>> +static inline bool has_extratime(const struct rt_vcpu *svc)
>> +{
>> +return (svc->flags & RTDS_extratime) ? 1 : 0;
>> +}
>> +
>>
> Cool... I like 'has_extratime()' soo much better as a name than what it
> was before! Thanks. :-)
>

:-)

>>  /*
>>   * Helper functions for manipulating the runqueue, the depleted
>> queue,
>>   * and the replenishment events queue.
>> @@ -274,6 +292,21 @@ vcpu_on_replq(const struct rt_vcpu *svc)
>>  }
>>
>>  /*
>> + * If v1 priority >= v2 priority, return value > 0
>> + * Otherwise, return value < 0
>> + */
>> +static s_time_t
>> +compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu
>> *v2)
>> +{
>> +int prio = v2->priority_level - v1->priority_level;
>> +
>> +if ( prio == 0 )
>> +return v2->cur_deadline - v1->cur_deadline;
>> +
> Indentation.

Oh, sorry. Will correct it.

>
>> @@ -423,15 +459,18 @@ rt_update_deadline(s_time_t now, struct rt_vcpu
>> *svc)
>>   */
>>  svc->last_start = now;
>>  svc->cur_budget = svc->budget;
>> +svc->priority_level = 0;
>>
>>  /* TRACE */
>>  {
>>  struct __packed {
>>  unsigned vcpu:16, dom:16;
>> +unsigned priority_level;
>>  uint64_t cur_deadline, cur_budget;
>>  } d;
>>
> Can you please, and in this very comment, update
> tools/xentrace/xenalyze.c and tools/xentrace/formats as well, to take
> into account this new field?

Sure. Will do in the next version.

>
>> diff --git a/xen/include/public/domctl.h
>> b/xen/include/public/domctl.h
>> index 0669c31..ba5daa9 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -360,6 +360,9 @@ typedef struct xen_domctl_sched_credit2 {
>>  typedef struct xen_domctl_sched_rtds {
>>  uint32_t period;
>>  uint32_t budget;
>> +#define _XEN_DOMCTL_SCHED_RTDS_extratime 0
>> +#define
>> XEN_DOMCTL_SCHED_RTDS_extratime  (1U<<_XEN_DOMCTL_SCHED_RTDS_extratim
>> e)
>> +uint32_t flags;
>>
> I'd add a one liner comment above the flag definition, as, for
> instance, how things are done in createdomain:

Sure.

How about comment:
/* Does this VCPU get extratime beyond reserved time? */

>
> struct xen_domctl_createdomain {
> /* IN parameters */
> uint32_t ssidref;
> xen_domain_handle_t handle;
>  /* Is this an HVM guest (as opposed to a PVH or PV guest)? */
> #define _XEN_DOMCTL_CDF_hvm_guest 0
> #define XEN_DOMCTL_CDF_hvm_guest  (1U<<_XEN_DOMCTL_CDF_hvm_guest)
>  /* Use hardware-assisted paging if available? */
> #define _XEN_DOMCTL_CDF_hap   1
> #define XEN_DOMCTL_CDF_hap(1U<<_XEN_DOMCTL_CDF_hap)
>
> Also, consider shortening the name (e.g., by contracting the SCHED_RTDS
> part; it does not matter if it's not 100% equal to what's in
> sched_rt.c, I think).


How about shorten it to XEN_DOMCTL_RTDS_extra or XEN_DOMCTL_RTDS_extratime?

>
> This, of course, is just my opinion, and final say belongs to
> maintainers of this public interface, which I think means 'THE REST',
> and most of them are not Cc-ed. Let me do that...

Thank you very much!

Best,

Meng

-- 
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1] xen:rtds: towards work conserving RTDS

2017-08-07 Thread Meng Xu

On Mon, Aug 7, 2017 at 3:14 PM, Dario Faggioli
 wrote:
> On Mon, 2017-08-07 at 14:27 -0400, Meng Xu wrote:
>> On Mon, Aug 7, 2017 at 1:35 PM, Dario Faggioli
>>
>> > Is this wanted or expected?
>>
>> It is wanted.
>>
>> A VCPU i that has already got budget_i * priority_level_i time has
>> higher priority than another VCPU j that got budget_j *
>> priority_level_j time, where priority_level_j > priority_level_i.
>>
>> For the unreserved resource, a VCPU will gets roughly budget/period
>> proportional unreserved CPU time.
>>
>>
>> > Basically, if I'm not wrong, this means that the actual priority,
>> > during the extratime phase, is some combination of deadline and
>> > budget
>> > (which would make me think to utilization)... is this the case?
>>
>> Yes.
>> The higher utilization a VCPU has, the more extra time it will get in
>> the extratime phase.
>>
>> >
>> > I don't care much about the actual schedule during the extratime
>> > phase,
>> > in the sense that it doesn't have to be anything too complicated or
>> > super advanced... but I at least would like:
>> > - to know how it works, and hence what to expect,
>> > - for it to be roughly fair.
>>
>> The unreserved resource is proportionally allocated to VCPUs roughly
>> based on VCPU's budget/period.
>>
> Right. Then this deserves both:
> - a quick mention in the changelog
> - a little bit more detailed explanation in a comment close to one of
>   the place where the policy is enacted (or at the top of the file,
>   or, well, somewhere :-) )
>

Sure. I can do that in the next version.
Hopefully we can reach the agreement on the code based on this version
so that the next version can be the final version for this patch
series. Hopefully. :)

Best,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1] xen:rtds: towards work conserving RTDS

2017-08-07 Thread Meng Xu

On Mon, Aug 7, 2017 at 1:35 PM, Dario Faggioli
 wrote:
> On Sat, 2017-08-05 at 17:35 -0400, Meng Xu wrote:
>> >
>> > > @@ -966,8 +1001,16 @@ burn_budget(const struct scheduler *ops,
>> > > struct
>> > > rt_vcpu *svc, s_time_t now)
>> > >
>> > >  if ( svc->cur_budget <= 0 )
>> > >  {
>> > > -svc->cur_budget = 0;
>> > > -__set_bit(__RTDS_depleted, &svc->flags);
>> > > +if ( is_work_conserving(svc) )
>> > > +{
>> > > +svc->priority_level++;
>> > >
>> >
>> >ASSERT(svc->priority_level <= 1);
>>
>> I'm sorry I didn't see this suggestion in previous email. I don't
>> think this assert makes sense.
>>
>> A vcpu that has extratime can have priority_level > 1.
>> For example, a VCPU (period = 100ms, budget = 10ms) runs alone on a
>> core. The VCPU may get its budget replenished  for 9 times in a
>> period. the vcpu's priority_level may be 9.
>>
> Ah, ok. Yes, I missed this, while I see this now.
>
> But doesn't this mean that, at a certain time t, between both CPUs that
> are both in 'etratime mode' (i.e., they've run out of budget, but
> they're running because they have extratime set), the one that has
> received less replenishments gets priority?

Yes.

>
> Is this wanted or expected?

It is wanted.

A VCPU i that has already got budget_i * priority_level_i time has
higher priority than another VCPU j that got budget_j *
priority_level_j time, where priority_level_j > priority_level_i.

For the unreserved resource, a VCPU will gets roughly budget/period
proportional unreserved CPU time.


> Basically, if I'm not wrong, this means that the actual priority,
> during the extratime phase, is some combination of deadline and budget
> (which would make me think to utilization)... is this the case?

Yes.
The higher utilization a VCPU has, the more extra time it will get in
the extratime phase.

>
> I don't care much about the actual schedule during the extratime phase,
> in the sense that it doesn't have to be anything too complicated or
> super advanced... but I at least would like:
> - to know how it works, and hence what to expect,
> - for it to be roughly fair.

The unreserved resource is proportionally allocated to VCPUs roughly
based on VCPU's budget/period.

Best,

Meng


---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v1 3/3] xl: enable per-VCPU extratime flag for RTDS

2017-08-06 Thread Meng Xu

On Sun, Aug 6, 2017 at 12:22 PM, Meng Xu  wrote:
> Change main_sched_rtds and related output functions to support
> per-VCPU extratime flag.
>
> Signed-off-by: Meng Xu 
>
> ---
> Changes from RFC v1
> Changes work_conserving flag to extratime flag
> ---
>  tools/xl/xl_cmdtable.c |  3 ++-
>  tools/xl/xl_sched.c| 56 
> ++
>  2 files changed, 40 insertions(+), 19 deletions(-)
>
> diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
> index 2c71a9f..88933a4 100644
> --- a/tools/xl/xl_cmdtable.c
> +++ b/tools/xl/xl_cmdtable.c
> @@ -272,12 +272,13 @@ struct cmd_spec cmd_table[] = {
>  { "sched-rtds",
>&main_sched_rtds, 0, 1,
>"Get/set rtds scheduler parameters",
> -  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-b[=BUDGET]]]",
> +  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-b[=BUDGET]] 
> [-e[=EXTRATIME]]]",
>"-d DOMAIN, --domain=DOMAIN Domain to modify\n"
>"-v VCPUID/all, --vcpuid=VCPUID/allVCPU to modify or output;\n"
>"   Using '-v all' to modify/output all vcpus\n"
>"-p PERIOD, --period=PERIOD Period (us)\n"
>"-b BUDGET, --budget=BUDGET Budget (us)\n"
> +  "-e EXTRATIME, --extratime=EXTRATIME EXTRATIME (1=yes, 0=no)\n"

Hi Dario,

I kept the EXTRATIME value for -e option because: (1) it may be more
intuitive for users; (2) it needs much less code change than the input
style that does not need EXTRATIME value.

As to (1), if users want to set some VCPUs with extratime flag set and
some with extratime flag clear, there are two types of input:
(a) xl sched-rtds -d 1 -v 1 -p 1 -b 4000 -e 0 -v 2 -p 1 -b
4000 -e 1 -v 5 -p 1 -b 4000 -e 0
(b) xl sched-rtds -d 1 -v 1 -p 1 -b 4000 -v 2 -p 1 -b 4000 -e
1 -v 5 -p 1 -b 4000
I felt that the style (a) is more intuitive and the input commands
have very static pattern, i.e., each vcpu must have -v -p -b -e
options set.

As to (2), if we go with -e without EXTRATIME, we will have to keep
track of the vcpu that has no -e option. I thought about this option,
we can pre-set the extratime value to false when -v option is
assigned:
case 'v':
...
extratimes[v_index]  = 0;

and set the extratimes[v_index] = 0 when -e is set.

This approach is not very neat in the code: we have to reallocate
memory for extratimes array when its size is not enough; we also have
to deal with the special case when -e is set before -v, such as the
command "xl sched-rtds -p 1 -b 4000 -e -v 0"

Best,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v1 0/3] Towards work-conserving RTDS

2017-08-06 Thread Meng Xu

This series of patches make RTDS scheduler work-conserving
without breaking real-time guarantees.
VCPUs with extratime flag set can get extra time
from the unreserved system resource.
System administrators can decide which VCPUs have extratime flag set.

Example:
Set the extratime bit of all VCPUs of domain 1:
# xl sched-rtds -d 1 -v all -p 1 -b 2000 -e 1
Each VCPU of domain 1 will be guaranteed to have 2000ms every 1ms
(if the system is schedulable).
If there is a CPU having no work to do,
domain 1's VCPUs will be scheduled onto the CPU,
even though the VCPUs have got 2000ms in 1ms.

Clear the extra bit of all VCPUs of domain 1:
# xl sched-rtds -d 1 -v all -p 1 -b 2000 -e 0

Set/Clear the extratime bit of one specific VCPU of domain 1:
# xl sched-rtds -d 1 -v 1 -p 1 -b 2000 -e 1
# xl sched-rtds -d 1 -v 1 -p 1 -b 2000 -e 0


The original design of the work-conserving RTDS was discussed in
https://www.mail-archive.com/xen-devel@lists.xen.org/msg77150.html

The series of patch can be found at github:
https://github.com/PennPanda/RT-Xen
under the branch:
xenbits/rtds/work-conserving-v1

Changes from RFC v1
Merge changes in sched_rt.c into one patch;
Minor change in variable name and comments.

Signed-off-by: Meng Xu 

[PATCH v1 1/3] xen:rtds: towards work conserving RTDS
[PATCH v1 2/3] libxl: enable per-VCPU extratime flag for RTDS
[PATCH v1 3/3] xl: enable per-VCPU extratime flag for RTDS

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v1 2/3] libxl: enable per-VCPU extratime flag for RTDS

2017-08-06 Thread Meng Xu

Modify libxl_vcpu_sched_params_get/set and sched_rtds_vcpu_get/set
functions to support per-VCPU extratime flag

Signed-off-by: Meng Xu 

---
Changes from RFC v1
Change work_conserving flag to extratime flag
---
 tools/libxl/libxl_sched.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/tools/libxl/libxl_sched.c b/tools/libxl/libxl_sched.c
index faa604e..4ebed96 100644
--- a/tools/libxl/libxl_sched.c
+++ b/tools/libxl/libxl_sched.c
@@ -558,6 +558,10 @@ static int sched_rtds_vcpu_get_all(libxl__gc *gc, uint32_t 
domid,
 for (i = 0; i < num_vcpus; i++) {
 scinfo->vcpus[i].period = vcpus[i].u.rtds.period;
 scinfo->vcpus[i].budget = vcpus[i].u.rtds.budget;
+if ( vcpus[i].u.rtds.flags & XEN_DOMCTL_SCHED_RTDS_extratime )
+   scinfo->vcpus[i].extratime = 1;
+else
+   scinfo->vcpus[i].extratime = 0;
 scinfo->vcpus[i].vcpuid = vcpus[i].vcpuid;
 }
 rc = 0;
@@ -607,6 +611,10 @@ static int sched_rtds_vcpu_set(libxl__gc *gc, uint32_t 
domid,
 vcpus[i].vcpuid = scinfo->vcpus[i].vcpuid;
 vcpus[i].u.rtds.period = scinfo->vcpus[i].period;
 vcpus[i].u.rtds.budget = scinfo->vcpus[i].budget;
+if ( scinfo->vcpus[i].extratime )
+vcpus[i].u.rtds.flags |= XEN_DOMCTL_SCHED_RTDS_extratime;
+else
+vcpus[i].u.rtds.flags &= ~XEN_DOMCTL_SCHED_RTDS_extratime;
 }
 
 r = xc_sched_rtds_vcpu_set(CTX->xch, domid,
@@ -655,6 +663,10 @@ static int sched_rtds_vcpu_set_all(libxl__gc *gc, uint32_t 
domid,
 vcpus[i].vcpuid = i;
 vcpus[i].u.rtds.period = scinfo->vcpus[0].period;
 vcpus[i].u.rtds.budget = scinfo->vcpus[0].budget;
+if ( scinfo->vcpus[0].extratime )
+vcpus[i].u.rtds.flags |= XEN_DOMCTL_SCHED_RTDS_extratime;
+else
+vcpus[i].u.rtds.flags &= ~XEN_DOMCTL_SCHED_RTDS_extratime;
 }
 
 r = xc_sched_rtds_vcpu_set(CTX->xch, domid,
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v1 3/3] xl: enable per-VCPU extratime flag for RTDS

2017-08-06 Thread Meng Xu

Change main_sched_rtds and related output functions to support
per-VCPU extratime flag.

Signed-off-by: Meng Xu 

---
Changes from RFC v1
Changes work_conserving flag to extratime flag
---
 tools/xl/xl_cmdtable.c |  3 ++-
 tools/xl/xl_sched.c| 56 ++
 2 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 2c71a9f..88933a4 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -272,12 +272,13 @@ struct cmd_spec cmd_table[] = {
 { "sched-rtds",
   &main_sched_rtds, 0, 1,
   "Get/set rtds scheduler parameters",
-  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-b[=BUDGET]]]",
+  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-b[=BUDGET]] 
[-e[=EXTRATIME]]]",
   "-d DOMAIN, --domain=DOMAIN Domain to modify\n"
   "-v VCPUID/all, --vcpuid=VCPUID/allVCPU to modify or output;\n"
   "   Using '-v all' to modify/output all vcpus\n"
   "-p PERIOD, --period=PERIOD Period (us)\n"
   "-b BUDGET, --budget=BUDGET Budget (us)\n"
+  "-e EXTRATIME, --extratime=EXTRATIME EXTRATIME (1=yes, 0=no)\n"
 },
 { "domid",
   &main_domid, 0, 0,
diff --git a/tools/xl/xl_sched.c b/tools/xl/xl_sched.c
index 85722fe..5138012 100644
--- a/tools/xl/xl_sched.c
+++ b/tools/xl/xl_sched.c
@@ -251,7 +251,7 @@ static int sched_rtds_domain_output(
 libxl_domain_sched_params scinfo;
 
 if (domid < 0) {
-printf("%-33s %4s %9s %9s\n", "Name", "ID", "Period", "Budget");
+printf("%-33s %4s %9s %9s %10s\n", "Name", "ID", "Period", "Budget", 
"Extra time");
 return 0;
 }
 
@@ -262,11 +262,12 @@ static int sched_rtds_domain_output(
 }
 
 domname = libxl_domid_to_name(ctx, domid);
-printf("%-33s %4d %9d %9d\n",
+printf("%-33s %4d %9d %9d %10s\n",
 domname,
 domid,
 scinfo.period,
-scinfo.budget);
+scinfo.budget,
+scinfo.extratime ? "yes" : "no");
 free(domname);
 libxl_domain_sched_params_dispose(&scinfo);
 return 0;
@@ -279,8 +280,8 @@ static int sched_rtds_vcpu_output(int domid, 
libxl_vcpu_sched_params *scinfo)
 int i;
 
 if (domid < 0) {
-printf("%-33s %4s %4s %9s %9s\n", "Name", "ID",
-   "VCPU", "Period", "Budget");
+printf("%-33s %4s %4s %9s %9s %10s\n", "Name", "ID",
+   "VCPU", "Period", "Budget", "Extra time");
 return 0;
 }
 
@@ -290,12 +291,13 @@ static int sched_rtds_vcpu_output(int domid, 
libxl_vcpu_sched_params *scinfo)
 
 domname = libxl_domid_to_name(ctx, domid);
 for ( i = 0; i < scinfo->num_vcpus; i++ ) {
-printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32"\n",
+printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32" %10s\n",
domname,
domid,
scinfo->vcpus[i].vcpuid,
scinfo->vcpus[i].period,
-   scinfo->vcpus[i].budget);
+   scinfo->vcpus[i].budget,
+   scinfo->vcpus[i].extratime ? "yes" : "no");
 }
 free(domname);
 return 0;
@@ -309,8 +311,8 @@ static int sched_rtds_vcpu_output_all(int domid,
 int i;
 
 if (domid < 0) {
-printf("%-33s %4s %4s %9s %9s\n", "Name", "ID",
-   "VCPU", "Period", "Budget");
+printf("%-33s %4s %4s %9s %9s %10s\n", "Name", "ID",
+   "VCPU", "Period", "Budget", "Extra time");
 return 0;
 }
 
@@ -321,12 +323,13 @@ static int sched_rtds_vcpu_output_all(int domid,
 
 domname = libxl_domid_to_name(ctx, domid);
 for ( i = 0; i < scinfo->num_vcpus; i++ ) {
-printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32"\n",
+printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32" %10s\n",
domname,
domid,
scinfo->vcpus[i].vcpuid,
scinfo->vcpus[i].period,
-   scinfo->vcpus[i].budget);
+   scinfo->vcpus[i].budget,
+   scinfo->vcpus[i].extratime ? "yes" : "no");
 }
 free(domname);
 return 0;
@@ -702,14 +705,18 @@ int main_sched_rtds(int argc, char **argv)
 int *vcpus = (int *)xmalloc(sizeof(int)); /* IDs of VCPUs that change */
 int *perio

[Xen-devel] [PATCH v1 1/3] xen:rtds: towards work conserving RTDS

2017-08-06 Thread Meng Xu

Make RTDS scheduler work conserving without breaking the real-time guarantees.

VCPU model:
Each real-time VCPU is extended to have an extratime flag
and a priority_level field.
When a VCPU's budget is depleted in the current period,
if it has extratime flag set,
its priority_level will increase by 1 and its budget will be refilled;
othewrise, the VCPU will be moved to the depletedq.

Scheduling policy is modified global EDF:
A VCPU v1 has higher priority than another VCPU v2 if
(i) v1 has smaller priority_leve; or
(ii) v1 has the same priority_level but has a smaller deadline

Queue management:
Run queue holds VCPUs with extratime flag set and VCPUs with
remaining budget. Run queue is sorted in increasing order of VCPUs priorities.
Depleted queue holds VCPUs which have extratime flag cleared and depleted 
budget.
Replenished queue is not modified.

Signed-off-by: Meng Xu 

---
Changes from RFC v1
Rewording comments and commit message
Remove is_work_conserving field from rt_vcpu structure
Use one bit in VCPU's flag to indicate if a VCPU will have extra time
Correct comments style
---
 xen/common/sched_rt.c   | 90 ++---
 xen/include/public/domctl.h |  3 ++
 2 files changed, 79 insertions(+), 14 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 39f6bee..4e048b9 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -49,13 +49,15 @@
  * A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is idle or
  * has a lower-priority VCPU running on it.)
  *
- * Each VCPU has a dedicated period and budget.
+ * Each VCPU has a dedicated period, budget and a extratime flag
  * The deadline of a VCPU is at the end of each period;
  * A VCPU has its budget replenished at the beginning of each period;
  * While scheduled, a VCPU burns its budget.
  * The VCPU needs to finish its budget before its deadline in each period;
  * The VCPU discards its unused budget at the end of each period.
- * If a VCPU runs out of budget in a period, it has to wait until next period.
+ * When a VCPU runs out of budget in a period, if its extratime flag is set,
+ * the VCPU increases its priority_level by 1 and refills its budget; 
otherwise,
+ * it has to wait until next period.
  *
  * Each VCPU is implemented as a deferable server.
  * When a VCPU has a task running on it, its budget is continuously burned;
@@ -63,7 +65,8 @@
  *
  * Queue scheme:
  * A global runqueue and a global depletedqueue for each CPU pool.
- * The runqueue holds all runnable VCPUs with budget, sorted by deadline;
+ * The runqueue holds all runnable VCPUs with budget,
+ * sorted by priority_level and deadline;
  * The depletedqueue holds all VCPUs without budget, unsorted;
  *
  * Note: cpumask and cpupool is supported.
@@ -151,6 +154,14 @@
 #define RTDS_depleted (1<<__RTDS_depleted)
 
 /*
+ * RTDS_extratime: Can the vcpu run in the time that is
+ * not part of any real-time reservation, and would therefore
+ * be otherwise left idle?
+ */
+#define __RTDS_extratime4
+#define RTDS_extratime (1<<__RTDS_extratime)
+
+/*
  * rt tracing events ("only" 512 available!). Check
  * include/public/trace.h for more details.
  */
@@ -201,6 +212,8 @@ struct rt_vcpu {
 struct rt_dom *sdom;
 struct vcpu *vcpu;
 
+unsigned priority_level;
+
 unsigned flags;  /* mark __RTDS_scheduled, etc.. */
 };
 
@@ -245,6 +258,11 @@ static inline struct list_head *rt_replq(const struct 
scheduler *ops)
 return &rt_priv(ops)->replq;
 }
 
+static inline bool has_extratime(const struct rt_vcpu *svc)
+{
+return (svc->flags & RTDS_extratime) ? 1 : 0;
+}
+
 /*
  * Helper functions for manipulating the runqueue, the depleted queue,
  * and the replenishment events queue.
@@ -274,6 +292,21 @@ vcpu_on_replq(const struct rt_vcpu *svc)
 }
 
 /*
+ * If v1 priority >= v2 priority, return value > 0
+ * Otherwise, return value < 0
+ */
+static s_time_t
+compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu *v2)
+{
+int prio = v2->priority_level - v1->priority_level;
+
+if ( prio == 0 )
+return v2->cur_deadline - v1->cur_deadline;
+
+return prio;
+}
+
+/*
  * Debug related code, dump vcpu/cpu information
  */
 static void
@@ -303,6 +336,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct 
rt_vcpu *svc)
 cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), mask);
 printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
" cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
+   " \t\t priority_level=%d has_extratime=%d\n"
" \t\t onQ=%d runnable=%d flags=%x effective hard_affinity=%s\n",
 svc->vcpu->domain->domain_id,
 svc->vcpu->vcpu_id,
@@ -312,6 +346,8 @@ rt_dump_vcpu(const str

Re: [Xen-devel] [PATCH RFC v1] xen:rtds: towards work conserving RTDS

2017-08-05 Thread Meng Xu

>
>> @@ -966,8 +1001,16 @@ burn_budget(const struct scheduler *ops, struct
>> rt_vcpu *svc, s_time_t now)
>>
>>  if ( svc->cur_budget <= 0 )
>>  {
>> -svc->cur_budget = 0;
>> -__set_bit(__RTDS_depleted, &svc->flags);
>> +if ( is_work_conserving(svc) )
>> +{
>> +svc->priority_level++;
>>
>ASSERT(svc->priority_level <= 1);

I'm sorry I didn't see this suggestion in previous email. I don't
think this assert makes sense.

A vcpu that has extratime can have priority_level > 1.
For example, a VCPU (period = 100ms, budget = 10ms) runs alone on a
core. The VCPU may get its budget replenished  for 9 times in a
period. the vcpu's priority_level may be 9.

The priority_level here also indicates how many times the VCPU gets
the extra budget in the current period.

>
>> +svc->cur_budget = svc->budget;
>> +}
>> +else
>> +    {
>> +svc->cur_budget = 0;
>> +__set_bit(__RTDS_depleted, &svc->flags);
>> +}
>>  }

Thanks,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 2/3] libxl: enable per-VCPU work conserving flag for RTDS

2017-08-04 Thread Meng Xu

On Fri, Aug 4, 2017 at 10:34 AM, Wei Liu  wrote:
> On Fri, Aug 04, 2017 at 02:53:51PM +0200, Dario Faggioli wrote:
>> On Fri, 2017-08-04 at 13:10 +0100, Wei Liu wrote:
>> > On Fri, Aug 04, 2017 at 10:13:18AM +0200, Dario Faggioli wrote:
>> > > On Thu, 2017-08-03 at 17:39 -0400, Meng Xu wrote:
>> > > >
>> > > *HOWEVER*, in this case, we do have that 'extratime' field already,
>> > > as
>> > > a leftover from SEDF, which is there taking space and cluttering
>> > > the
>> > > interface, so why don't make good use of it. Especially considering
>> > > it
>> > > was used for _exactly_ the same thing, and with _exactly_ the same
>> > > meaning, and even for a very similar (i.e., SEDF was also real-
>> > > time)
>> > > kind of scheduler.
>> >
>> > Correct me if I'm wrong:
>> >
>> > 1. extratime is ever only used in SEDF
>> > 2. SEDF is removed
>> >
>> > That means we do have extratime to use in all other schedulers.
>> >
>> I'm not sure what you mean with this last line.
>>
>> IAC, this is how our the related data structures looks like, right now:
>>
>> libxl_sched_params = Struct("sched_params",[
>> ("vcpuid",   integer, {'init_val': 
>> 'LIBXL_SCHED_PARAM_VCPU_INDEX_DEFAULT'}),
>> ("weight",   integer, {'init_val': 
>> 'LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT'}),
>> ("cap",  integer, {'init_val': 
>> 'LIBXL_DOMAIN_SCHED_PARAM_CAP_DEFAULT'}),
>> ("period",   integer, {'init_val': 
>> 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
>> ("extratime",integer, {'init_val': 
>> 'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
>> ("budget",   integer, {'init_val': 
>> 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
>> ])
>>
>> The extratime field is there. Any scheduler can use it, if it wants
>> (and in the way it wants). Currently, no one of them does that.
>
> Right, that's what I wanted to know.
>
>>
>> libxl_domain_sched_params = Struct("domain_sched_params",[
>> ("sched",libxl_scheduler),
>> ("weight",   integer, {'init_val': 
>> 'LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT'}),
>> ("cap",  integer, {'init_val': 
>> 'LIBXL_DOMAIN_SCHED_PARAM_CAP_DEFAULT'}),
>> ("period",   integer, {'init_val': 
>> 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
>> ("budget",   integer, {'init_val': 
>> 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
>>
>> # The following three parameters ('slice', 'latency' and 'extratime') 
>> are deprecated,
>> # and will have no effect if used, since the SEDF scheduler has been 
>> removed.
>> # Note that 'period' was an SDF parameter too, but it is still effective 
>> as it is
>> # now used (together with 'budget') by the RTDS scheduler.
>> ("slice",integer, {'init_val': 
>> 'LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT'}),
>> ("latency",  integer, {'init_val': 
>> 'LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT'}),
>> ("extratime",integer, {'init_val': 
>> 'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
>> ])
>>
>> Same here. 'slice', 'latency' and 'extratime' are there because we
>> deprecate, but don't remove stuff. They're not used in any way. [*]
>>
>> If, at some point, I'd decide to develop a feature for, say Credit2,
>> that controll the latency (whatever that would mean, it's just an
>> example! :-D) of domains, I think I'll use this 'latency' field, for
>> its interface, instead of adding some other stuff.
>>
>> > However, please consider the possibility of reintroducing SEDF in the
>> > future. Suppose that would happen, does extratime still has the same
>> > semantics?
>> >
>> Well, I guess yes. But how does this matter? Each scheduler can, if it
>> wants, use all these parameters in the way it actuallly prefers. So,
>> the fact that RTDS will be using 'extratime' for letting vCPUs execute
>> past their own real-time reservation, does not prevent the reintroduced
>> SEDF --nor any other already existing or new scheduler-- to also use
>> it, for similar (or maybe even not so similar) purposes.
>>
>> Or am I missing something?
>
> If extratime means different things to different schedulers, it's going
> to be confusing. As a layperson I can't tell what extratime is or how it
> is supposed to be used. I would like to have the field to have only one
> meaning.

Right now, extratime is not used by any scheduler. It was used in SEDF only.

Since RTDS is the first scheduler to use the extratime after SEDF is
depreciated, if we will use it, it only has one meaning: if extratime
is non-zero, it indicates the VCPU will get extra time.

I guess I lean to use extratime in the RTDS now.

Best,

Meng
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 3/3] xl: enable per-VCPU work conserving flag for RTDS

2017-08-04 Thread Meng Xu

On Fri, Aug 4, 2017 at 5:01 AM, Dario Faggioli
 wrote:
> On Thu, 2017-08-03 at 18:02 -0400, Meng Xu wrote:
>> On Thu, Aug 3, 2017 at 12:03 PM, Dario Faggioli
>>  wrote:
>> >
>> > > @@ -702,14 +705,18 @@ int main_sched_rtds(int argc, char **argv)
>> > >  int *vcpus = (int *)xmalloc(sizeof(int)); /* IDs of VCPUs
>> > > that
>> > > change */
>> > >  int *periods = (int *)xmalloc(sizeof(int)); /* period is in
>> > > microsecond */
>> > >  int *budgets = (int *)xmalloc(sizeof(int)); /* budget is in
>> > > microsecond */
>> > > +int *workconservings = (int *)xmalloc(sizeof(int)); /*
>> > > budget is
>> > > in microsecond */
>> > >
>> >
>> > Yeah, budget is in microseconds. But this is not budget! :-P
>>
>> Ah, my bad..
>>
>> >
>> > In fact (jokes apart), it can be just a bool, can't it?
>>
>> Yes, bool is enough.
>> Is "workconserving" too long here?
>>
> So, I don't want to turn this into a discussion about what colour we
> should paint the infamous bikeshed... but, yeah, I don't especially
> like this name! :-P
>
> An I mean, not only here, but everywhere you've used it (changelogs,
> other patches, etc.).
>
> There are two reasons for that:
>  - it's indeed very long;
>  - being work conserving is (or at least, I've always heard it used
>and used it myself) a characteristic of a scheduling algorithm (or
>of its implementation), *not* of a task/vcpu/schedulable entity.

Fair enough. I agree work conserving  is not a good name.

>
>It is the scheduler that is work conserving, iff it never let CPUs
>sit idle, when there is work to do. In our case here, the scheduler
>is work conserving if all the vCPUs has this flag set. It's not,
>if even just one has it clear.
>
>And by putting workconserving-ness at the vCPU level, it looks to
>me that we're doing something terminologically wrong, and
>potentially confusing.
>
> I didn't bring this up before, because I'm a bit afraid that it's just
> be being picky... but since you mentioned this yourself.
>
>> I thought about alternative names, such as "wc", "workc", and
>> "extratime". None of them is good enough.
>>
> Yep, I agree that contractions like 'wc' or 'workc' are pretty bad.
> 'extratime', I'd actually like it better, TBH.
>
>> The ideal one should be much
>> shorter and easy to link to "work conserving". :(
>> If we use "extratime", it may cause confusion with the "extratime" in
>> the depreciated SEDF. (That is my concern of reusing the EXTRATIME in
>> the libxl_type.idl.)
>>
> Well, but SEDF being gone (and since quite a few time), and the fact
> that RTDS and SEDF have not really never been there together, does
> leave very few room for confusion, I think.
>
> While in academia (e.g., in the GRUB == Gready Reclaming of Unused
> Bandwidth papers), what you're trying to achieved, I've heard it called
> 'reclaiming' (as I'm sure you have as well :-)), and my friends that
> are still working on Linux, are actually using it in there:
>
> https://lkml.org/lkml/2017/5/18/1128
> https://lkml.org/lkml/2017/5/18/1137 <-- SCHED_FLAG_RECLAIM
>
> I'm not so sure about it... As I'm not sure the meaning would appear
> obvious, to people not into RT scheduling research.
>
> And even from this point of view, 'extratime' seems a lot better to me.
> And if it were me doing this, I'd probably use it, both in the
> internals and in the interface.
>

I'm thinking between reclaim and extratime.
I will use extratime since extratime is already in the libxl.
extratime means the VCPU will have extra time. It's the scheduler to
determine how much extratime it will get.

Thanks,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 3/3] xl: enable per-VCPU work conserving flag for RTDS

2017-08-03 Thread Meng Xu

On Thu, Aug 3, 2017 at 12:03 PM, Dario Faggioli
 wrote:
> On Tue, 2017-08-01 at 14:33 -0400, Meng Xu wrote:
>> --- a/tools/xl/xl_cmdtable.c
>> +++ b/tools/xl/xl_cmdtable.c
>> @@ -272,12 +272,13 @@ struct cmd_spec cmd_table[] = {
>>  { "sched-rtds",
>>&main_sched_rtds, 0, 1,
>>"Get/set rtds scheduler parameters",
>> -  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-b[=BUDGET]]]",
>> +  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-b[=BUDGET]]]
>> [-w[=WORKCONSERVING]]",
>>"-d DOMAIN, --domain=DOMAIN Domain to modify\n"
>>"-v VCPUID/all, --vcpuid=VCPUID/allVCPU to modify or
>> output;\n"
>>"   Using '-v all' to modify/output all vcpus\n"
>>"-p PERIOD, --period=PERIOD Period (us)\n"
>>"-b BUDGET, --budget=BUDGET Budget (us)\n"
>> +  "-w WORKCONSERVING, --
>> workconserving=WORKCONSERVINGWORKCONSERVING (1=yes,0=no)\n"
>>
> Does this really need to accept a 1 or 0 parameter? Can't it be that,
> if -w is provided, the vCPU is marked as work-conserving, if it's not,
> it's considered reservation only.
>
>> --- a/tools/xl/xl_sched.c
>> +++ b/tools/xl/xl_sched.c
>>
>> @@ -279,8 +280,8 @@ static int sched_rtds_vcpu_output(int domid,
>> libxl_vcpu_sched_params *scinfo)
>>  int i;
>>
>>  if (domid < 0) {
>> -printf("%-33s %4s %4s %9s %9s\n", "Name", "ID",
>> -   "VCPU", "Period", "Budget");
>> +printf("%-33s %4s %4s %9s %9s %15s\n", "Name", "ID",
>> +   "VCPU", "Period", "Budget", "Work conserving");
>>  return 0;
>>  }
>>
>> @@ -290,12 +291,13 @@ static int sched_rtds_vcpu_output(int domid,
>> libxl_vcpu_sched_params *scinfo)
>>
>>  domname = libxl_domid_to_name(ctx, domid);
>>  for ( i = 0; i < scinfo->num_vcpus; i++ ) {
>> -printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32"\n",
>> +printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32" %15d\n",
>>
> As far as printing it goes, OTOH, I would indeed print a string, i.e.,
> "yes", if the field is found to be 1 (true), or "no", if the field is
> found to be 0 (false).
>
>> @@ -702,14 +705,18 @@ int main_sched_rtds(int argc, char **argv)
>>  int *vcpus = (int *)xmalloc(sizeof(int)); /* IDs of VCPUs that
>> change */
>>  int *periods = (int *)xmalloc(sizeof(int)); /* period is in
>> microsecond */
>>  int *budgets = (int *)xmalloc(sizeof(int)); /* budget is in
>> microsecond */
>> +int *workconservings = (int *)xmalloc(sizeof(int)); /* budget is
>> in microsecond */
>>
> Yeah, budget is in microseconds. But this is not budget! :-P

Ah, my bad..

>
> In fact (jokes apart), it can be just a bool, can't it?

Yes, bool is enough.
Is "workconserving" too long here?

I thought about alternative names, such as "wc", "workc", and
"extratime". None of them is good enough. The ideal one should be much
shorter and easy to link to "work conserving". :(
If we use "extratime", it may cause confusion with the "extratime" in
the depreciated SEDF. (That is my concern of reusing the EXTRATIME in
the libxl_type.idl.)

Maybe "workc" is better than "workconserving"?

Thanks,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 2/3] libxl: enable per-VCPU work conserving flag for RTDS

2017-08-03 Thread Meng Xu

On Thu, Aug 3, 2017 at 11:53 AM, Dario Faggioli
 wrote:
> On Tue, 2017-08-01 at 14:33 -0400, Meng Xu wrote:
>> diff --git a/tools/libxl/libxl_types.idl
>> b/tools/libxl/libxl_types.idl
>> index 8a9849c..f6c3ead 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -401,6 +401,7 @@ libxl_sched_params = Struct("sched_params",[
>>  ("period",   integer, {'init_val':
>> 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
>>  ("extratime",integer, {'init_val':
>> 'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
>>  ("budget",   integer, {'init_val':
>> 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
>> +("is_work_conserving", integer, {'init_val':
>> 'LIBXL_DOMAIN_SCHED_PARAM_IS_WORK_CONSERVING_DEFAULT'}),
>>  ])
>>
> How about, here at libxl level, we use the "extratime" field that we
> have as a leftover from SEDF (and which had, in that scheduler, a
> similar meaning)?
>
> If we don't want to use that one, and we want a new field, I suggest
> thinking to a shorter name.

How about 'LIBXL_DOMAIN_SCHED_PARAM_FLAG'?
We use a bit in the flag field in the sched_rt.c to indicate if a VCPU
is work-conserving. The flag field is also extensible for adding other
VCPU properties in the future, if necessary.

Thanks,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 1/3] xen:rtds: enable XL to set and get vcpu work conserving flag

2017-08-03 Thread Meng Xu

On Thu, Aug 3, 2017 at 11:47 AM, Dario Faggioli
 wrote:
> On Tue, 2017-08-01 at 14:33 -0400, Meng Xu wrote:
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -360,6 +360,7 @@ typedef struct xen_domctl_sched_credit2 {
>>  typedef struct xen_domctl_sched_rtds {
>>  uint32_t period;
>>  uint32_t budget;
>> +bool is_work_conserving;
>>
> I wonder whether it wouldn't be better (e.g., more future proof) to
> have a 'uint32_T flags' field here too.
>
> That way, if/when, in future, we want to introduce some other way of
> tweaking the scheduler's behavior for this vCPU, we already have space
> for specifying it...
>

uint32_t flag sounds reasonable to me.
I can do it in the next version.

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 0/3] Enable XL to set and get per-VCPU work conserving flag for RTDS scheduler

2017-08-02 Thread Meng Xu

On Wed, Aug 2, 2017 at 1:49 PM, Dario Faggioli
 wrote:
> On Tue, 2017-08-01 at 14:33 -0400, Meng Xu wrote:
>> This series of patches enable the toolstack to
>> set and get per-VCPU work-conserving flag.
>> With the toolstack, system administrators can decide
>> which VCPUs will be made work-conserving.
>>
> Thanks for this series as well, Meng. I'll look at it in the next
> couple of days.
>>
>> We plan to perform two steps in making RTDS scheduler work-
>> conserving:
>> (1) First make all VCPUs work-conserving by default,
>> which was sent as a separate patch. This work aims for Xen 4.10
>> release.
>> (2) After that, we enable the XL to set and get per-VCPU work-
>> conserving flag,
>> which is this series of patches.
>>
> I think it's better if you merge the "xen:rtds: towards work conserving
> RTDS" as patch 1 of this series.
>
> In fact, sending them as separate series, you make people think that
> they're independent, while they're not (as this series is pretty
> useless, without that patch :-P).

Sure. I can do that. :)

Thanks,

Meng


Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1] xen:rtds: towards work conserving RTDS

2017-08-02 Thread Meng Xu

On Wed, Aug 2, 2017 at 1:46 PM, Dario Faggioli
 wrote:
> Hey, Meng!
>
> It's really cool to see progress on this... There was quite a bit of
> interest in scheduling in general at the Summit in Budapest, and one
> important thing for making sure RTDS will be really useful, is for it
> to have a work conserving mode! :-)

Glad to hear that. :-)

>
> On Tue, 2017-08-01 at 14:13 -0400, Meng Xu wrote:
>> Make RTDS scheduler work conserving to utilize the idle resource,
>> without breaking the real-time guarantees.
>
> Just kill the "to utilize the idle resource". We can expect that people
>  that are interested in this commit, also know what 'work conserving'
> means. :-)

Got it. Will do.

>
>> VCPU model:
>> Each real-time VCPU is extended to have a work conserving flag
>> and a priority_level field.
>> When a VCPU's budget is depleted in the current period,
>> if it has work conserving flag set,
>> its priority_level will increase by 1 and its budget will be
>> refilled;
>> othewrise, the VCPU will be moved to the depletedq.
>>
> Mmm... Ok. But is the budget burned, while the vCPU executes at
> priority_level 1? If yes, doesn't this mean we risk having less budget
> when we get back to priority_lvevel 0?
>
> Oh, wait, maybe it's the case that, when we get back to priority_level
> 0, we also get another replenishment, is that the case? If yes, I
> actually think it's fine...

It's the latter case: the vcpu will get another replenishment when it
gets back to priority_level 0.

>
>> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
>> index 39f6bee..740a712 100644
>> --- a/xen/common/sched_rt.c
>> +++ b/xen/common/sched_rt.c
>> @@ -191,6 +195,7 @@ struct rt_vcpu {
>>  /* VCPU parameters, in nanoseconds */
>>  s_time_t period;
>>  s_time_t budget;
>> +bool_t is_work_conserving;   /* is vcpu work conserving */
>>
>>  /* VCPU current infomation in nanosecond */
>>  s_time_t cur_budget; /* current budget */
>> @@ -201,6 +206,8 @@ struct rt_vcpu {
>>  struct rt_dom *sdom;
>>  struct vcpu *vcpu;
>>
>> +unsigned priority_level;
>> +
>>  unsigned flags;  /* mark __RTDS_scheduled, etc.. */
>>
> So, since we've got a 'flags' field already, can the flag be one of its
> bit, instead of adding a new bool in the struct:
>
> /*
>  * RTDS_work_conserving: Can the vcpu run in the time that is
>  * not part of any real-time reservation, and would therefore
>  * be otherwise left idle?
>  */
> __RTDS_work_conserving   4
> #define RTDS_work_conserving (1<<__RTDS_work_conserving)

Thank you very much for the suggestion! I will modify based on your suggestion.

Actually, I was not very comfortable with the is_work_conserving field either.
It makes the structure verbose and mess up the struct's the cache_line
alignment.

>
>> @@ -245,6 +252,11 @@ static inline struct list_head *rt_replq(const
>> struct scheduler *ops)
>>  return &rt_priv(ops)->replq;
>>  }
>>
>> +static inline bool_t is_work_conserving(const struct rt_vcpu *svc)
>> +{
>>
> Use bool.

OK.

>
>> @@ -273,6 +285,20 @@ vcpu_on_replq(const struct rt_vcpu *svc)
>>  return !list_empty(&svc->replq_elem);
>>  }
>>
>> +/* If v1 priority >= v2 priority, return value > 0
>> + * Otherwise, return value < 0
>> + */
>>
> Comment style.

Got it. Will make it as:
/*
 * If v1 priority >= v2 priority, return value > 0
 * Otherwise, return value < 0
 */

>
> Apart from that, do you want this to return >0 if v1 should have
> priority over v2, and <0 if vice-versa, right? If yes...

Yes.

>
>> +static int
>> +compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu
>> *v2)
>> +{
>> +if ( v1->priority_level < v2->priority_level ||
>> + ( v1->priority_level == v2->priority_level &&
>> + v1->cur_deadline <= v2->cur_deadline ) )
>> +return 1;
>> +else
>> +return -1;
>>
>   int prio = v2->priority_level - v1->priority_level;
>
>   if ( prio == 0 )
> return v2->cur_deadline - v1->cur_deadline;
>
>   return prio;
>
> Return type has to become s_time_t, and there's a chance that it'll
> return 0, if they are at the same level, and have the same absolute
> deadline. But I think you can deal with this in the caller.

OK. Will do.

>
>> @@ -966,8 +1001,16 @@ burn_budget(const struct scheduler *

[Xen-devel] [PATCH v5] xen: rtds: only tickle non-already tickled CPUs

2017-08-02 Thread Meng Xu

When more than one idle VCPUs that have the same PCPU as their
previous running core invoke runq_tickle(), they will tickle the same
PCPU. The tickled PCPU will only pick at most one VCPU, i.e., the
highest-priority one, to execute. The other VCPUs will not be
scheduled for a period, even when there is an idle core, making these
VCPUs unnecessarily starve for one period.

Therefore, always make sure that we only tickle PCPUs that have not
been tickled already.

Signed-off-by: Haoran Li 
Signed-off-by: Meng Xu 
Reviewed-by: Dario Faggioli 

---
The initial discussion of this patch can be found at
https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg02857.html

Changes in v5:
Revise comments as Dario suggested

Changes in v4:
1) Take Dario's suggestions:
   Search the new->cpu first for the cpu to tickle.
   This get rid of the if statement in previous versions.
2) Reword the comments and commit messages.
3) Rebased on staging branch.

Issues in v2 and v3:
Did not rebase on the latest staging branch.
Did not solve the comments/issues in v1.
Please ignore the v2 and v3.
---
 xen/common/sched_rt.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 39f6bee..0ac5816 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1147,9 +1147,9 @@ rt_vcpu_sleep(const struct scheduler *ops, struct vcpu 
*vc)
  * Called by wake() and context_saved()
  * We have a running candidate here, the kick logic is:
  * Among all the cpus that are within the cpu affinity
- * 1) if the new->cpu is idle, kick it. This could benefit cache hit
- * 2) if there are any idle vcpu, kick it.
- * 3) now all pcpus are busy;
+ * 1) if there are any idle CPUs, kick one.
+  For cache benefit, we check new->cpu as first
+ * 2) now all pcpus are busy;
  *among all the running vcpus, pick lowest priority one
  *if snext has higher priority, kick it.
  *
@@ -1177,17 +1177,13 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu 
*new)
 cpumask_and(¬_tickled, online, new->vcpu->cpu_hard_affinity);
 cpumask_andnot(¬_tickled, ¬_tickled, &prv->tickled);
 
-/* 1) if new's previous cpu is idle, kick it for cache benefit */
-if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) )
-{
-SCHED_STAT_CRANK(tickled_idle_cpu);
-cpu_to_tickle = new->vcpu->processor;
-goto out;
-}
-
-/* 2) if there are any idle pcpu, kick it */
-/* The same loop also find the one with lowest priority */
-for_each_cpu(cpu, ¬_tickled)
+/*
+ * 1) If there are any idle CPUs, kick one.
+ *For cache benefit,we first search new->cpu.
+ *The same loop also find the one with lowest priority.
+ */
+cpu = cpumask_test_or_cycle(new->vcpu->processor, ¬_tickled);
+while ( cpu!= nr_cpu_ids )
 {
 iter_vc = curr_on_cpu(cpu);
 if ( is_idle_vcpu(iter_vc) )
@@ -1200,9 +1196,12 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu 
*new)
 if ( latest_deadline_vcpu == NULL ||
  iter_svc->cur_deadline > latest_deadline_vcpu->cur_deadline )
 latest_deadline_vcpu = iter_svc;
+
+cpumask_clear_cpu(cpu, ¬_tickled);
+cpu = cpumask_cycle(cpu, ¬_tickled);
 }
 
-/* 3) candicate has higher priority, kick out lowest priority vcpu */
+/* 2) candicate has higher priority, kick out lowest priority vcpu */
 if ( latest_deadline_vcpu != NULL &&
  new->cur_deadline < latest_deadline_vcpu->cur_deadline )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4] xen: rtds: only tickle non-already tickled CPUs

2017-08-01 Thread Meng Xu

When more than one idle VCPUs that have the same PCPU as their
previous running core invoke runq_tickle(), they will tickle the same
PCPU. The tickled PCPU will only pick at most one VCPU, i.e., the
highest-priority one, to execute. The other VCPUs will not be
scheduled for a period, even when there is an idle core, making these
VCPUs unnecessarily starve for one period.

Therefore, always make sure that we only tickle PCPUs that have not
been tickled already.

Signed-off-by: Haoran Li 
Signed-off-by: Meng Xu 

---
The initial discussion of this patch can be found at
https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg02857.html

Changes in v4:
1) Take Dario's suggestions:
   Search the new->cpu first for the cpu to tickle.
   This get rid of the if statement in previous versions.
2) Reword the comments and commit messages.
3) Rebased on staging branch.

Issues in v2 and v3:
Did not rebase on the latest staging branch.
Did not solve the comments/issues in v1.
Please ignore the v2 and v3.
---
 xen/common/sched_rt.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 39f6bee..5fec95f 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1147,9 +1147,9 @@ rt_vcpu_sleep(const struct scheduler *ops, struct vcpu 
*vc)
  * Called by wake() and context_saved()
  * We have a running candidate here, the kick logic is:
  * Among all the cpus that are within the cpu affinity
- * 1) if the new->cpu is idle, kick it. This could benefit cache hit
- * 2) if there are any idle vcpu, kick it.
- * 3) now all pcpus are busy;
+ * 1) if there are any idle vcpu, kick it.
+  For cache benefit,we first search new->cpu.
+ * 2) now all pcpus are busy;
  *among all the running vcpus, pick lowest priority one
  *if snext has higher priority, kick it.
  *
@@ -1177,17 +1177,13 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu 
*new)
 cpumask_and(¬_tickled, online, new->vcpu->cpu_hard_affinity);
 cpumask_andnot(¬_tickled, ¬_tickled, &prv->tickled);
 
-/* 1) if new's previous cpu is idle, kick it for cache benefit */
-if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) )
-{
-SCHED_STAT_CRANK(tickled_idle_cpu);
-cpu_to_tickle = new->vcpu->processor;
-goto out;
-}
-
-/* 2) if there are any idle pcpu, kick it */
-/* The same loop also find the one with lowest priority */
-for_each_cpu(cpu, ¬_tickled)
+/*
+ * 1) If there are any idle vcpu, kick it.
+ *For cache benefit,we first search new->cpu.
+ *The same loop also find the one with lowest priority.
+ */
+cpu = cpumask_test_or_cycle(new->vcpu->processor, ¬_tickled);
+while ( cpu!= nr_cpu_ids )
 {
 iter_vc = curr_on_cpu(cpu);
 if ( is_idle_vcpu(iter_vc) )
@@ -1200,9 +1196,12 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu 
*new)
 if ( latest_deadline_vcpu == NULL ||
  iter_svc->cur_deadline > latest_deadline_vcpu->cur_deadline )
 latest_deadline_vcpu = iter_svc;
+
+cpumask_clear_cpu(cpu, ¬_tickled);
+cpu = cpumask_cycle(cpu, ¬_tickled);
 }
 
-/* 3) candicate has higher priority, kick out lowest priority vcpu */
+/* 2) candicate has higher priority, kick out lowest priority vcpu */
 if ( latest_deadline_vcpu != NULL &&
  new->cur_deadline < latest_deadline_vcpu->cur_deadline )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 0/3] Enable XL to set and get per-VCPU work conserving flag for RTDS scheduler

2017-08-01 Thread Meng Xu

On Tue, Aug 1, 2017 at 2:33 PM, Meng Xu  wrote:
>
> This series of patches enable the toolstack to
> set and get per-VCPU work-conserving flag.
> With the toolstack, system administrators can decide
> which VCPUs will be made work-conserving.
>
> The design of the work-conserving RTDS was discussed in
> https://www.mail-archive.com/xen-devel@lists.xen.org/msg77150.html
>
> We plan to perform two steps in making RTDS scheduler work-conserving:
> (1) First make all VCPUs work-conserving by default,
> which was sent as a separate patch. This work aims for Xen 4.10 release.
> (2) After that, we enable the XL to set and get per-VCPU work-conserving flag,
> which is this series of patches.


The series of patches that have both steps done can be found at the
following repo: https://github.com/PennPanda/RT-Xen
under the branch xenbits/rtds/work-conserving-RFCv1.

Thanks,

Meng


Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH RFC v1 0/3] Enable XL to set and get per-VCPU work conserving flag for RTDS scheduler

2017-08-01 Thread Meng Xu

This series of patches enable the toolstack to
set and get per-VCPU work-conserving flag.
With the toolstack, system administrators can decide
which VCPUs will be made work-conserving.

The design of the work-conserving RTDS was discussed in
https://www.mail-archive.com/xen-devel@lists.xen.org/msg77150.html

We plan to perform two steps in making RTDS scheduler work-conserving:
(1) First make all VCPUs work-conserving by default,
which was sent as a separate patch. This work aims for Xen 4.10 release.
(2) After that, we enable the XL to set and get per-VCPU work-conserving flag,
which is this series of patches.

Signed-off-by: Meng Xu 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH RFC v1 3/3] xl: enable per-VCPU work conserving flag for RTDS

2017-08-01 Thread Meng Xu

Change main_sched_rtds and related output functions to support
per-VCPU work conserving flag.

Signed-off-by: Meng Xu 
---
 tools/xl/xl_cmdtable.c |  3 ++-
 tools/xl/xl_sched.c| 56 ++
 2 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 30eb93c..95997e1 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -272,12 +272,13 @@ struct cmd_spec cmd_table[] = {
 { "sched-rtds",
   &main_sched_rtds, 0, 1,
   "Get/set rtds scheduler parameters",
-  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-b[=BUDGET]]]",
+  "[-d  [-v[=VCPUID/all]] [-p[=PERIOD]] [-b[=BUDGET]]] 
[-w[=WORKCONSERVING]]",
   "-d DOMAIN, --domain=DOMAIN Domain to modify\n"
   "-v VCPUID/all, --vcpuid=VCPUID/allVCPU to modify or output;\n"
   "   Using '-v all' to modify/output all vcpus\n"
   "-p PERIOD, --period=PERIOD Period (us)\n"
   "-b BUDGET, --budget=BUDGET Budget (us)\n"
+  "-w WORKCONSERVING, --workconserving=WORKCONSERVINGWORKCONSERVING 
(1=yes,0=no)\n"
 },
 { "domid",
   &main_domid, 0, 0,
diff --git a/tools/xl/xl_sched.c b/tools/xl/xl_sched.c
index 85722fe..35a64e1 100644
--- a/tools/xl/xl_sched.c
+++ b/tools/xl/xl_sched.c
@@ -251,7 +251,7 @@ static int sched_rtds_domain_output(
 libxl_domain_sched_params scinfo;
 
 if (domid < 0) {
-printf("%-33s %4s %9s %9s\n", "Name", "ID", "Period", "Budget");
+printf("%-33s %4s %9s %9s %15s\n", "Name", "ID", "Period", "Budget", 
"Work conserving");
 return 0;
 }
 
@@ -262,11 +262,12 @@ static int sched_rtds_domain_output(
 }
 
 domname = libxl_domid_to_name(ctx, domid);
-printf("%-33s %4d %9d %9d\n",
+printf("%-33s %4d %9d %9d %15d\n",
 domname,
 domid,
 scinfo.period,
-scinfo.budget);
+scinfo.budget,
+scinfo.is_work_conserving);
 free(domname);
 libxl_domain_sched_params_dispose(&scinfo);
 return 0;
@@ -279,8 +280,8 @@ static int sched_rtds_vcpu_output(int domid, 
libxl_vcpu_sched_params *scinfo)
 int i;
 
 if (domid < 0) {
-printf("%-33s %4s %4s %9s %9s\n", "Name", "ID",
-   "VCPU", "Period", "Budget");
+printf("%-33s %4s %4s %9s %9s %15s\n", "Name", "ID",
+   "VCPU", "Period", "Budget", "Work conserving");
 return 0;
 }
 
@@ -290,12 +291,13 @@ static int sched_rtds_vcpu_output(int domid, 
libxl_vcpu_sched_params *scinfo)
 
 domname = libxl_domid_to_name(ctx, domid);
 for ( i = 0; i < scinfo->num_vcpus; i++ ) {
-printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32"\n",
+printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32" %15d\n",
domname,
domid,
scinfo->vcpus[i].vcpuid,
scinfo->vcpus[i].period,
-   scinfo->vcpus[i].budget);
+   scinfo->vcpus[i].budget,
+   scinfo->vcpus[i].is_work_conserving );
 }
 free(domname);
 return 0;
@@ -309,8 +311,8 @@ static int sched_rtds_vcpu_output_all(int domid,
 int i;
 
 if (domid < 0) {
-printf("%-33s %4s %4s %9s %9s\n", "Name", "ID",
-   "VCPU", "Period", "Budget");
+printf("%-33s %4s %4s %9s %9s %15s\n", "Name", "ID",
+   "VCPU", "Period", "Budget", "Work conserving");
 return 0;
 }
 
@@ -321,12 +323,13 @@ static int sched_rtds_vcpu_output_all(int domid,
 
 domname = libxl_domid_to_name(ctx, domid);
 for ( i = 0; i < scinfo->num_vcpus; i++ ) {
-printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32"\n",
+printf("%-33s %4d %4d %9"PRIu32" %9"PRIu32" %15d\n",
domname,
domid,
scinfo->vcpus[i].vcpuid,
scinfo->vcpus[i].period,
-   scinfo->vcpus[i].budget);
+   scinfo->vcpus[i].budget,
+   scinfo->vcpus[i].is_work_conserving);
 }
 free(domname);
 return 0;
@@ -702,14 +705,18 @@ int main_sched_rtds(int argc, char **argv)
 int *vcpus = (int *)xmalloc(sizeof(int)); /* IDs of VCPUs that change */
 int *periods = (int *)xmalloc(sizeof(int)); /* period is in microsecond */
 int *budgets = (int *)xmalloc(si

[Xen-devel] [PATCH RFC v1 1/3] xen:rtds: enable XL to set and get vcpu work conserving flag

2017-08-01 Thread Meng Xu

Extend the hypercalls(XEN_DOMCTL_SCHEDOP_getvcpuinfo/putvcpuinfo) to
get/set a domain's per-VCPU work conserving parameters.

Signed-off-by: Meng Xu 
---
 xen/common/sched_rt.c   | 2 ++
 xen/include/public/domctl.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 740a712..76ed4cb 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1442,6 +1442,7 @@ rt_dom_cntl(
 svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
 local_sched.u.rtds.budget = svc->budget / MICROSECS(1);
 local_sched.u.rtds.period = svc->period / MICROSECS(1);
+local_sched.u.rtds.is_work_conserving = 
svc->is_work_conserving;
 spin_unlock_irqrestore(&prv->lock, flags);
 
 if ( copy_to_guest_offset(op->u.v.vcpus, index,
@@ -1466,6 +1467,7 @@ rt_dom_cntl(
 svc = rt_vcpu(d->vcpu[local_sched.vcpuid]);
 svc->period = period;
 svc->budget = budget;
+svc->is_work_conserving = 
local_sched.u.rtds.is_work_conserving;
 spin_unlock_irqrestore(&prv->lock, flags);
 }
 /* Process a most 64 vCPUs without checking for preemptions. */
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index ff39762..e67cd9e 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -360,6 +360,7 @@ typedef struct xen_domctl_sched_credit2 {
 typedef struct xen_domctl_sched_rtds {
 uint32_t period;
 uint32_t budget;
+bool is_work_conserving;
 } xen_domctl_sched_rtds_t;
 
 typedef struct xen_domctl_schedparam_vcpu {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH RFC v1 2/3] libxl: enable per-VCPU work conserving flag for RTDS

2017-08-01 Thread Meng Xu

Modify libxl_vcpu_sched_params_get/set and sched_rtds_vcpu_get/set
functions to support per-VCPU work conserving flag

Signed-off-by: Meng Xu 
---
 tools/libxl/libxl.h | 1 +
 tools/libxl/libxl_sched.c   | 3 +++
 tools/libxl/libxl_types.idl | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 7cf0f31..dd9c926 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -2058,6 +2058,7 @@ int libxl_sched_credit2_params_set(libxl_ctx *ctx, 
uint32_t poolid,
 #define LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT   -1
 #define LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT -1
 #define LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT-1
+#define LIBXL_DOMAIN_SCHED_PARAM_IS_WORK_CONSERVING_DEFAULT-1
 
 /* Per-VCPU parameters */
 #define LIBXL_SCHED_PARAM_VCPU_INDEX_DEFAULT   -1
diff --git a/tools/libxl/libxl_sched.c b/tools/libxl/libxl_sched.c
index faa604e..fe92747 100644
--- a/tools/libxl/libxl_sched.c
+++ b/tools/libxl/libxl_sched.c
@@ -558,6 +558,7 @@ static int sched_rtds_vcpu_get_all(libxl__gc *gc, uint32_t 
domid,
 for (i = 0; i < num_vcpus; i++) {
 scinfo->vcpus[i].period = vcpus[i].u.rtds.period;
 scinfo->vcpus[i].budget = vcpus[i].u.rtds.budget;
+scinfo->vcpus[i].is_work_conserving = 
vcpus[i].u.rtds.is_work_conserving;
 scinfo->vcpus[i].vcpuid = vcpus[i].vcpuid;
 }
 rc = 0;
@@ -607,6 +608,7 @@ static int sched_rtds_vcpu_set(libxl__gc *gc, uint32_t 
domid,
 vcpus[i].vcpuid = scinfo->vcpus[i].vcpuid;
 vcpus[i].u.rtds.period = scinfo->vcpus[i].period;
 vcpus[i].u.rtds.budget = scinfo->vcpus[i].budget;
+vcpus[i].u.rtds.is_work_conserving = 
scinfo->vcpus[i].is_work_conserving;
 }
 
 r = xc_sched_rtds_vcpu_set(CTX->xch, domid,
@@ -655,6 +657,7 @@ static int sched_rtds_vcpu_set_all(libxl__gc *gc, uint32_t 
domid,
 vcpus[i].vcpuid = i;
 vcpus[i].u.rtds.period = scinfo->vcpus[0].period;
 vcpus[i].u.rtds.budget = scinfo->vcpus[0].budget;
+vcpus[i].u.rtds.is_work_conserving = 
scinfo->vcpus[0].is_work_conserving;
 }
 
 r = xc_sched_rtds_vcpu_set(CTX->xch, domid,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 8a9849c..f6c3ead 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -401,6 +401,7 @@ libxl_sched_params = Struct("sched_params",[
 ("period",   integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
 ("extratime",integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}),
 ("budget",   integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
+("is_work_conserving", integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_IS_WORK_CONSERVING_DEFAULT'}),
 ])
 
 libxl_vcpu_sched_params = Struct("vcpu_sched_params",[
@@ -414,6 +415,7 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
 ("cap",  integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_CAP_DEFAULT'}),
 ("period",   integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}),
 ("budget",   integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
+("is_work_conserving", integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_IS_WORK_CONSERVING_DEFAULT'}),
 
 # The following three parameters ('slice', 'latency' and 'extratime') are 
deprecated,
 # and will have no effect if used, since the SEDF scheduler has been 
removed.
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH RFC v1] xen:rtds: towards work conserving RTDS

2017-08-01 Thread Meng Xu

Make RTDS scheduler work conserving to utilize the idle resource,
without breaking the real-time guarantees.

VCPU model:
Each real-time VCPU is extended to have a work conserving flag
and a priority_level field.
When a VCPU's budget is depleted in the current period,
if it has work conserving flag set,
its priority_level will increase by 1 and its budget will be refilled;
othewrise, the VCPU will be moved to the depletedq.

Scheduling policy: modified global EDF:
A VCPU v1 has higher priority than another VCPU v2 if
(i) v1 has smaller priority_leve; or
(ii) v1 has the same priority_level but has a smaller deadline

Signed-off-by: Meng Xu 
---
 xen/common/sched_rt.c | 71 ++-
 1 file changed, 59 insertions(+), 12 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 39f6bee..740a712 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -49,13 +49,16 @@
  * A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is idle or
  * has a lower-priority VCPU running on it.)
  *
- * Each VCPU has a dedicated period and budget.
+ * Each VCPU has a dedicated period, budget and is_work_conserving flag
  * The deadline of a VCPU is at the end of each period;
  * A VCPU has its budget replenished at the beginning of each period;
  * While scheduled, a VCPU burns its budget.
  * The VCPU needs to finish its budget before its deadline in each period;
  * The VCPU discards its unused budget at the end of each period.
- * If a VCPU runs out of budget in a period, it has to wait until next period.
+ * A work conserving VCPU has is_work_conserving flag set to true;
+ * When a VCPU runs out of budget in a period, if it is work conserving,
+ * it increases its priority_level by 1 and refill its budget; otherwise,
+ * it has to wait until next period.
  *
  * Each VCPU is implemented as a deferable server.
  * When a VCPU has a task running on it, its budget is continuously burned;
@@ -63,7 +66,8 @@
  *
  * Queue scheme:
  * A global runqueue and a global depletedqueue for each CPU pool.
- * The runqueue holds all runnable VCPUs with budget, sorted by deadline;
+ * The runqueue holds all runnable VCPUs with budget,
+ * sorted by priority_level and deadline;
  * The depletedqueue holds all VCPUs without budget, unsorted;
  *
  * Note: cpumask and cpupool is supported.
@@ -191,6 +195,7 @@ struct rt_vcpu {
 /* VCPU parameters, in nanoseconds */
 s_time_t period;
 s_time_t budget;
+bool_t is_work_conserving;   /* is vcpu work conserving */
 
 /* VCPU current infomation in nanosecond */
 s_time_t cur_budget; /* current budget */
@@ -201,6 +206,8 @@ struct rt_vcpu {
 struct rt_dom *sdom;
 struct vcpu *vcpu;
 
+unsigned priority_level;
+
 unsigned flags;  /* mark __RTDS_scheduled, etc.. */
 };
 
@@ -245,6 +252,11 @@ static inline struct list_head *rt_replq(const struct 
scheduler *ops)
 return &rt_priv(ops)->replq;
 }
 
+static inline bool_t is_work_conserving(const struct rt_vcpu *svc)
+{
+return svc->is_work_conserving;
+}
+
 /*
  * Helper functions for manipulating the runqueue, the depleted queue,
  * and the replenishment events queue.
@@ -273,6 +285,20 @@ vcpu_on_replq(const struct rt_vcpu *svc)
 return !list_empty(&svc->replq_elem);
 }
 
+/* If v1 priority >= v2 priority, return value > 0
+ * Otherwise, return value < 0
+ */
+static int
+compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu *v2)
+{
+if ( v1->priority_level < v2->priority_level ||
+ ( v1->priority_level == v2->priority_level && 
+ v1->cur_deadline <= v2->cur_deadline ) )
+return 1;
+else
+return -1;
+}
+
 /*
  * Debug related code, dump vcpu/cpu information
  */
@@ -303,6 +329,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct 
rt_vcpu *svc)
 cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), mask);
 printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
" cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
+   " \t\t priority_level=%d work_conserving=%d\n"
" \t\t onQ=%d runnable=%d flags=%x effective hard_affinity=%s\n",
 svc->vcpu->domain->domain_id,
 svc->vcpu->vcpu_id,
@@ -312,6 +339,8 @@ rt_dump_vcpu(const struct scheduler *ops, const struct 
rt_vcpu *svc)
 svc->cur_budget,
 svc->cur_deadline,
 svc->last_start,
+svc->priority_level,
+is_work_conserving(svc),
 vcpu_on_q(svc),
 vcpu_runnable(svc->vcpu),
 svc->flags,
@@ -423,15 +452,18 @@ rt_update_deadline(s_time_t now, struct rt_vcpu *svc)
  */
 svc->last_start = now;

Re: [Xen-devel] [PATCH 5/6] xen: RTDS: rearrange members of control structures

2017-07-21 Thread Meng Xu

On Fri, Jun 23, 2017 at 6:55 AM, Dario Faggioli
 wrote:
>
> Nothing changed in `pahole` output, in terms of holes
> and padding, but some fields have been moved, to put
> related members in same cache line.
>
> Signed-off-by: Dario Faggioli 
> ---
> Cc: Meng Xu 
> Cc: George Dunlap 
> ---
>  xen/common/sched_rt.c |   13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 1b30014..39f6bee 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -171,11 +171,14 @@ static void repl_timer_handler(void *data);
>  struct rt_private {
>  spinlock_t lock;/* the global coarse-grained lock */
>  struct list_head sdom;  /* list of availalbe domains, used for dump 
> */
> +
>  struct list_head runq;  /* ordered list of runnable vcpus */
>  struct list_head depletedq; /* unordered list of depleted vcpus */
> +
> +struct timer *repl_timer;   /* replenishment timer */
>  struct list_head replq; /* ordered list of vcpus that need 
> replenishment */
> +
>  cpumask_t tickled;  /* cpus been tickled */
> -struct timer *repl_timer;   /* replenishment timer */
>  };
>
>  /*
> @@ -185,10 +188,6 @@ struct rt_vcpu {
>  struct list_head q_elem; /* on the runq/depletedq list */
>  struct list_head replq_elem; /* on the replenishment events list */
>
> -/* Up-pointers */
> -struct rt_dom *sdom;
> -struct vcpu *vcpu;
> -
>  /* VCPU parameters, in nanoseconds */
>  s_time_t period;
>  s_time_t budget;
> @@ -198,6 +197,10 @@ struct rt_vcpu {
>  s_time_t last_start; /* last start time */
>  s_time_t cur_deadline;   /* current deadline for EDF */
>
> +/* Up-pointers */
> +struct rt_dom *sdom;
> +struct vcpu *vcpu;
> +
>  unsigned flags;  /* mark __RTDS_scheduled, etc.. */
>  };
>

Reviewed-by: Meng Xu 

BTW, Dario, I'm wondering if you used any tool to give hints about how
to arrange the fields in a structure or you just did it manually?

Thanks,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v13 01/23] docs: create Cache Allocation Technology (CAT) and Code and Data Prioritization (CDP) feature document

2017-07-10 Thread Meng Xu

On Mon, Jul 10, 2017 at 1:25 AM, Yi Sun  wrote:
> On 17-07-07 12:37:28, Meng Xu wrote:
>> > +  Sample cache capacity bitmasks for a bitlength of 8 are shown below. 
>> > Please
>> > +  note that all (and only) contiguous '1' combinations are allowed (e.g. 
>> > H,
>> > +  0FF0H, 003CH, etc.).
>>
>> IIRC, the number of contiguous '1's in CBM should be at least 2 at
>> least on my machine (Intel(R) Xeon(R) CPU E5-2618L v3).
>> I'm unsure if this constraint exist for all CAT-capable processors.
>> For those processors that have such constraint, the system may crash
>> when the user sets only 1 bit to the CBM.
>>
> It seems your machine does not officially support CAT. Per my info, some
> machines, e.g. Haswell, do not officially support CAT but you can enable
> CAT through some actions. On these machines, you may encounter such issue.
>
> Per SDM, we do not have such limitation. Per my test on SKL, 1 bit setting
> works well.

I see.
IIRC, Xen CAT won't be enabled on my haswell machine. As long as all
those machines that does not support 1 bit setting are not enabled by
default, that is fine. Otherwise, this may cause a problem.

>
>>
>> > +  - Member `dom_ids`
>> > +
>> > +`dom_ids` is a bitmap, every bit corresponds to a domain. Index is
>> > +domain_id. It is used to help restore domain_id to 0 when a 
>> > socket is
>> > +offline and then online again.
>>
>> Did you mean "it is used to help restore domain_id to COS0, which has
>> all 1s in CBM, when a socket is offline and then online again."
>>
> Sorry, a typo here, should be:
> "It is used to help restore 'd->arch.psr_cos_ids[socket]' to 0 when a socket 
> is
> offline and then online again."

I think this is more clear.
Another statement could be:
It is used to help restore the cos_id of the domain_id to 0 when a
socket is offline and then online again.

>
> If you think it is still not clear, I may add your explanation:
> ", which has all 1s in CBM, "

If you need to send another version, I think it's better to correct it.

Thanks,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] RT-Xen on ARM

2017-07-07 Thread Meng Xu

[sorry, my finger slips. let me rephrase my last sentence.]

>
>> For example, if you have a particular task in the VM that you need must
>> absolutely execute for at least 10ms every 100ms, you can:
>> - inside the VM, pin the task to vCPU 0, and give it top priority;
>> - at the Xen level, give (with RTDS) to vCPU 0 budget=10ms and
>>   period=100ms (or maybe budget of 12ms, to allow for some overhead
>>   :-P).
>
> This assumes that the start time of the task's each period is
> synchronized with the start time of the VCPU's each period.
> If this assumption does not hold, the VCPU need a larger budget to
> guarantee the task will always have 10ms in *any* 100ms time interval.
> The budget can be computed by CARTS
> (https://rtg.cis.upenn.edu/carts/).
>
>>
>> This is something that no other scheduler allows you to do. :-)
>
> Exactly.
> If you want to reason about the real-time timing properties required
> by some safety-critical systems' standard, such as the ISO-26262 for
> automotive systems, the above computation and analysis will be
> required.

I mean:
If you want to argue about the real-time timing properties required
by some safety-critical systems' standard, such as the ISO-26262 for
automotive systems, the system support (such as the RTDS scheduler),
the correct configuration of the system, and the analysis  computation
and analysis will be required.

Thanks,

Meng
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] RT-Xen on ARM

2017-07-07 Thread Meng Xu

On Wed, Jul 5, 2017 at 4:51 AM, Dario Faggioli
 wrote:
> On Mon, 2017-07-03 at 14:42 -0400, Meng Xu wrote:
>> On Mon, Jul 3, 2017 at 10:58 AM, Andrii Anisov > m> wrote:
>> >
>> Once the scheduling policy is determined, you will need to configure
>> the VCPUs' parameters based on the systems' workload.
>> This requires the workload's timing parameters for the CARTS tool to
>> compute the VCPUs' parameters.
>>
> Yes, this is an interesting thing that Meng is mentioning.
>
> RTDS allows you to specify the parameters (budget and period, or,
> depending on how you prefer to see things, utilization and latency) on
> a per-vCPU basis.
>
> This may look cumbersome and overly complicated (and, in fact, if you
> don't need it, you can ignore it :-D), but it may actually be really
> important in a truly RT scenario.
>
> Whether or not it is useful, almost entirely depends on what the VM is
> doing, and how you decide to control and configure things inside it.

Exactly.

Andrii,
If you encountered any question/difficulty in choosing the proper VCPU
parameters for your workload, please don't hesitate to ping me and
Dario.

I'm also trying to make it *easier* for users to *correctly* configure
the VCPUs on RTDS.
The more we understand the real use cases, the more we can help
improve the scheduler and its related tools.

> For example, if you have a particular task in the VM that you need must
> absolutely execute for at least 10ms every 100ms, you can:
> - inside the VM, pin the task to vCPU 0, and give it top priority;
> - at the Xen level, give (with RTDS) to vCPU 0 budget=10ms and
>   period=100ms (or maybe budget of 12ms, to allow for some overhead
>   :-P).

This assumes that the start time of the task's each period is
synchronized with the start time of the VCPU's each period.
If this assumption does not hold, the VCPU need a larger budget to
guarantee the task will always have 10ms in *any* 100ms time interval.
The budget can be computed by CARTS
(https://rtg.cis.upenn.edu/carts/).

>
> This is something that no other scheduler allows you to do. :-)

Exactly.
If you want to reason about the real-time timing properties required
by some safety-critical systems' standard, such as the ISO-26262 for
automotive systems, the above computation and analysis will be
required.

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] RT-Xen on ARM

2017-07-07 Thread Meng Xu

On Wed, Jul 5, 2017 at 4:29 AM, Dario Faggioli
 wrote:
> On Tue, 2017-07-04 at 11:12 -0400, Meng Xu wrote:
>> On Tue, Jul 4, 2017 at 8:28 AM, Andrii Anisov > > wrote:
>> >
>> > So you are suggesting to introduce more RT schedulers with
>> > different algorithms. Did I get you right?
>>
>> The EDF scheduling cares about the overall system's RT performance.
>> If
>> you want to guarantee the *soft* real-time performance of the IVI
>> domains and allow the IVI domain to delay the two RT domains in some
>> scheduling periods, the EDF scheduling is better than the RM
>> scheduling. Note that we need to reserve enough CPU resources to make
>> sure the delay from the IVI domain to the two RT domains won't cause
>> the deadline miss of the two RT domains.
>>
> This is technically correct, but, at the same time, I don't think it is
> the best way to describe why and how one should use the RTDS scheduler.

Thank you very much, Dario, for giving a better explanation in the
user's perspectives.

>
> In fact, what scheduling and prioritization strategy is used,
> internally in the scheduler, is (for now) not exposed to the user, and
> it hence should not have an impact in deciding whether or not to adopt
> the scheduler... Unless we've done things in a very wrong way! :-P
>
> What I'd say, as a description of what RTDS can give, to people
> interested in using it, would be as follows.
>
> RTDS gives you the chance to provide your VMs, guarantees of CPU
> utilization that is precise, and has a well defined and strictly
> enforced granularity. In fact, by using RTDS, it's possible to specify
> two things:
> - that a VM should at least be able to execute for a certain U% of
>   total CPU time
> - that a VM will be able to exploit this 'reservation' with a time
>   granularity of P milliseconds.
>
> U, in fact, is expressed as U=B/P, P (called period) is how frequently
> a VM is given a chance to run, while B (called budget) is for how long
> it will be able to run, on every time interval of length P.
>
> So, if, as an example, a VM has a budget of 10 milliseconds and a
> period of 100 milliseconds, this means:
> - the VM will be granted 10% CPU execution time;
> - if an event for the VM arrives at time t1, the VM itself will be
>   able to start processing process it no later than t2=t1+2*P-2*B
>
> That's why, IMO, the period matters (a lot!). If one "just" knows that
> a VM will roughly need, say, 40% CPU time, then it does not matter if
> the scheduling parameters are B/P=4/10, or B/P=40/100, or
> B/P=4/10.
> OTOH, if one also cares about the latency, doing the math and setting
> the period properly.
>
> In fact, this capability of specifying the granularity of a
> reservation, is one of the main differences between RTDS (and, in
> general, or real time scheduling algorithms) and other general purpose
> algorithm. In fact, it is possible with general purpose algorithms too
> (for example, using weights, in Credit1 and Credit2, or using `nice' in
> Linux's CFS) to specify a certain utilization of a VM (task). But, in
> those algorithms, it's impossible to specify precisely, and on a per-VM
> basis, the granularity of such reservation.
>
> The caveat is that, unfortunately, the guarantee does not extend to
> letting you exploit the full capacity. What I mean is that, while on
> uniprocessor systems all that I have said above stays true, with the
> only constraint of not giving, to the various VMs cumulatively, more
> than 100% utilization, on multiprocessors, that is not true. Therefore,
> if you have 4 pCPUs, and you assign the parameters to the various VMs
> in such a way that the sum of B/P of all of them is <= 400%, it's not
> guaranteed that _all_ of them will actually get their B, in every
> interval of length P.
>
> Knowing what the upper bound is, for a given number of pCPU, is not
> easy. A necessary and sufficient limit has (to the best of my
> knowledge, which may not be updated to the current state of the art of
> RT academic literature) yet to be found. There are various limits, and
> various ways of computing them, none of which is suitable to be
> implemented inside an hypervisor... so Xen won't tell you whether or
> not your overall set of parameters is feasible or not. :-(
>
> (Perhaps we could, at least, keep track of the total utilization and at
> least warn the user when we overcome full capacity. Say, if with 4
> pCPUs, we go over 400%, we can well print a warning saying that
> deadlines will be missed. Meng?)

The total utilization can help answer if the VCPU parameters are
feasible or not.
But I'm

Re: [Xen-devel] [PATCH v13 01/23] docs: create Cache Allocation Technology (CAT) and Code and Data Prioritization (CDP) feature document

2017-07-07 Thread Meng Xu

[I just add some of my thoughts when I read this document.]

> +## Hardware perspective
> +
> +  CAT/CDP defines a range of MSRs to assign different cache access patterns
> +  which are known as CBMs, each CBM is associated with a COS.
> +
> +  ```
> +  E.g. L2 CAT:
> +  +++
> + IA32_PQR_ASSOC   | MSR (per socket)   |Address |
> +   ++---+---+ +++
> +   ||COS|   | | IA32_L2_QOS_MASK_0 | 0xD10  |
> +   ++---+---+ +++
> +  └-> | ...|  ...   |
> +  +++
> +  | IA32_L2_QOS_MASK_n | 0xD10+n (n<64) |
> +  +++
> +  ```
> +
> +  L3 CAT/CDP uses a range of MSRs from 0xC90 ~ 0xC90+n (n<128).
> +
> +  L2 CAT uses a range of MSRs from 0xD10 ~ 0xD10+n (n<64), following the L3
> +  CAT/CDP MSRs, setting different L2 cache accessing patterns from L3 cache 
> is
> +  supported.
> +
> +  Every MSR stores a CBM value. A capacity bitmask (CBM) provides a hint to 
> the
> +  hardware indicating the cache space a domain should be limited to as well 
> as
> +  providing an indication of overlap and isolation in the CAT-capable cache 
> from
> +  other domains contending for the cache.
> +
> +  Sample cache capacity bitmasks for a bitlength of 8 are shown below. Please
> +  note that all (and only) contiguous '1' combinations are allowed (e.g. 
> H,
> +  0FF0H, 003CH, etc.).

IIRC, the number of contiguous '1's in CBM should be at least 2 at
least on my machine (Intel(R) Xeon(R) CPU E5-2618L v3).
I'm unsure if this constraint exist for all CAT-capable processors.
For those processors that have such constraint, the system may crash
when the user sets only 1 bit to the CBM.


> +   3. Per-socket PSR features information structure
> +
> +  ```
> +  struct psr_socket_info {
> +  bool feat_init;
> +  struct feat_node *features[PSR_SOCKET_FEAT_NUM];
> +  spinlock_t ref_lock;
> +  unsigned int cos_ref[MAX_COS_REG_CNT];
> +  DECLARE_BITMAP(dom_ids, DOMID_IDLE + 1);
> +  };
> +  ```
> +
> +  We collect all PSR allocation features information of a socket in this
> +  `struct psr_socket_info`.
> +
> +  - Member `feat_init`
> +
> +`feat_init` is a flag, to indicate whether the CPU init on a socket
> +has been done.
> +
> +  - Member `features`
> +
> +`features` is a pointer array to save all enabled features poniters
> +according to feature position defined in `enum psr_feat_type`.
> +
> +  - Member `ref_lock`
> +
> +`ref_lock` is a spin lock to protect `cos_ref`.
> +
> +  - Member `cos_ref`
> +
> +`cos_ref` is an array which maintains the reference of one COS. It 
> maps
> +to cos_reg_val[MAX_COS_REG_NUM] in `struct feat_node`. If one COS is
> +used by one domain, the corresponding reference will increase by 
> one. If
> +a domain releases the COS, the reference will decrease by one. The 
> array
> +is indexed by COS ID.
> +
> +  - Member `dom_ids`
> +
> +`dom_ids` is a bitmap, every bit corresponds to a domain. Index is
> +    domain_id. It is used to help restore domain_id to 0 when a socket is
> +offline and then online again.

Did you mean "it is used to help restore domain_id to COS0, which has
all 1s in CBM, when a socket is offline and then online again."

Best,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] RT-Xen on ARM

2017-07-04 Thread Meng Xu

On Tue, Jul 4, 2017 at 8:28 AM, Andrii Anisov  wrote:
>
>
> On 03.07.17 21:42, Meng Xu wrote:
>>
>> As far as I know, there is no known issue for ARM as long as that
>> version Xen runs on the ARM board.
>
>  That's good.
>>
>> I assume you have your own workloads to run, which are periodically
>> activated task.
>> The workloads in [1] are independent periodic CPU-intensive tasks: the
>> task does some computation for every period.
>> If your workloads are similar to the tasks, it should be ok.
>
> Actually now we have just a high-level use case without any specific 
> parameters defined.
> I.e. in an automotive system there should be a domain dedicated to 
> instrumental cluster beside IVI domain. IC domain should be RT.
> So we are just evaluating and experimenting with an existing functionality.
>
>> One thing in my mind that may affect your evaluations for your real
>> workload is what you want to achieve.
>>
>> The RTDS uses the EDF scheduling, under which the priorities of the
>> VCPUs (or VMs) are dynamically changed based on their (absolute)
>> deadlines. This provides better real-time performance for the
>> *overall* system.
>
> In case we would have a driver domain and IC domain would draw to pv display 
> baked by backend in a driver domain. Driver domain should be RT capable as 
> well.
> So it seems two domains should be RT beside non-RT IVI domain.
>
>> If you want to make one VM highest priority and let that VM preempt
>> other VMs whenever the highest priority VM is active, it's better to
>> use the RM or FP scheduling, instead of the EDF scheduling.
>
> So you are suggesting to introduce more RT schedulers with different 
> algorithms. Did I get you right?

The EDF scheduling cares about the overall system's RT performance. If
you want to guarantee the *soft* real-time performance of the IVI
domains and allow the IVI domain to delay the two RT domains in some
scheduling periods, the EDF scheduling is better than the RM
scheduling. Note that we need to reserve enough CPU resources to make
sure the delay from the IVI domain to the two RT domains won't cause
the deadline miss of the two RT domains.

The RM scheduling will guarantees a domain always has a higher
priority than another domain. If you want to eliminate the CPU delay
from the IVI domain to the other two RT domains, can tolerate some
deadline misses of the IVI domain, and want to consolidate the three
domains to *fewer* cores, the RM scheduling should be a better choice,
IMO.

Supporting the RM scheduling policy in the RTDS scheduler is not
difficult. Actually, the RTDS scheduler was designed to be able to
extend to other scheduling policies, such as RM scheduling. In the
RT-Xen project[1], it supports both RM and EDF scheduling policy. We
just choose to upstream the EDF first.

Currently, we are working on synchronizing the RT-Xen with the latest
Xen: we want to implement the RM scheduling policy in the latest Xen.
I'm also teaching/training a master student how to implement the
scheduling policies in the RTDS scheduler so that we can have more
contributors.

I personally am very interested in the realistic use case, especially
the automotive use cases, for the RTDS scheduler. If you have any use
case that we can help to test, please don't hesitate to ask.

[1] https://github.com/PennPanda/RT-Xen

Best,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] RT-Xen on ARM

2017-07-03 Thread Meng Xu

On Mon, Jul 3, 2017 at 10:58 AM, Andrii Anisov  wrote:
> Hello Meng Xu,
>
>
> On 03.07.17 16:35, Meng Xu wrote:
>>>
>>> Do you have any recommendations or suggestions?
>>
>> Which experiment/use case do you plan to run?
>> What are the requirements (or performance guarantees) you want to have
>> from RTDS?
>
> Currently we have no defined target use-cases.
> That's why we are going to keep configuration (of guests and workloads)
> close to [1] for evaluation, but on our target SoC.
> I'm wondering if there are known issues or specifics for ARM.

As far as I know, there is no known issue for ARM as long as that
version Xen runs on the ARM board.

I assume you have your own workloads to run, which are periodically
activated task.
The workloads in [1] are independent periodic CPU-intensive tasks: the
task does some computation for every period.
If your workloads are similar to the tasks, it should be ok.

One thing in my mind that may affect your evaluations for your real
workload is what you want to achieve.

The RTDS uses the EDF scheduling, under which the priorities of the
VCPUs (or VMs) are dynamically changed based on their (absolute)
deadlines. This provides better real-time performance for the
*overall* system.
If you want to make one VM highest priority and let that VM preempt
other VMs whenever the highest priority VM is active, it's better to
use the RM or FP scheduling, instead of the EDF scheduling.

Once the scheduling policy is determined, you will need to configure
the VCPUs' parameters based on the systems' workload.
This requires the workload's timing parameters for the CARTS tool to
compute the VCPUs' parameters.

Best,

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RTDS Patch v3 for Xen4.8]

2017-07-03 Thread Meng Xu

Hi Haoran,

On Mon, Jul 3, 2017 at 12:17 PM, Haoran Li  wrote:
>
> From: naroahlee 
>
>  When more than one idle VCPUs that have
>  the same PCPU as their previous running core invoke runq_tickle(), they will
>  tickle the same PCPU. The tickled PCPU will only pick at most one VCPU, i.e.,
>  the highest-priority one, to execute. The other VCPUs will not be scheduled
>  for a period, even when there is an idle core, making these VCPUs
>  unnecessarily starve for one period. Therefore, always make sure that we only
>  tickle PCPUs that have not been tickled already.
>
> Signed-off-by: Haoran Li 
> Reviewed-by:   Meng Xu   


As Dario mentioned in the email, the title should be changed and the
email should be a new email thread, instead of a forward email.

A reference to the format of sending a newer version of patch can be
found at https://www.mail-archive.com/xen-devel@lists.xen.org/msg60115.html

In the commit message, you can add
---
Changes to the v1
---
to state the changes made from the previous versions.
You can also refer to the previous discussion with a link in that section..
This makes the reviewers' life easier.
The change log won't be committed.

Could you please send another version after resolving the concerns
raised by Dario and me?

Don't hesitate to ping me if you have any question.

Thanks,

Meng

-- 
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] RT-Xen on ARM

2017-07-03 Thread Meng Xu

On Mon, Jul 3, 2017 at 7:03 AM, Andrii Anisov  wrote:
>
> Dear Meng Xu,

Hi Andrii,
>
>
> We are going to evaluate an RTDS scheduler on ARM.
>
>
> Basically I'm going to repeat use-cases described in 
> https://www.cis.upenn.edu/~linhphan/papers/emsoft14-rt-xen.pdf in some amount.

I see. Please don't hesitate to ask if you have any question about the results.

>
> Do you have any recommendations or suggestions?

Which experiment/use case do you plan to run?
What are the requirements (or performance guarantees) you want to have
from RTDS?

The configuration for the VCPUs depends on the tasks and the OS
scheduler running on the VCPUs.
The VCPU's utilization (budget/period) is usually larger than the
tasks' utilizations (\sum e_i / p_i), where e_i is the task's
worst-case execution time, and p_i is the task's period.
The VCPU's parameters can be calculated by the CARTS tool [1],

[1] https://rtg.cis.upenn.edu/carts/index.php

>
>
> BTW, even following 
> https://xenbits.xen.org/docs/unstable/features/sched_rtds.html I've faced 
> several issues, not rtds one, but nasty:
>
> - no xentop for ARM

Did you try "sudo xl top"?
IIRC, sudo xl top should work.
>
>
> - root@salvator-x-h3-xt:/scripts# xl sched-rtds -d DomU -v all -p 1 -b 
> 2500
>   (XEN) FLASK: Denying unknown domctl_scheduler_op: 2.
>   libxl: error: libxl_sched.c:663:sched_rtds_vcpu_set_all: Domain 2:Setting 
> vcpu sched rtds: Operation not permitted
>   libxl_vcpu_sched_params_set_all failed.

Which version of Xen or commit point did you use?

Thanks,

Meng

-- 
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Question about the general performance counter overflow interrupt handling

2017-03-31 Thread Meng Xu

Hi Boris,

On Fri, Mar 31, 2017 at 12:01 PM, Boris Ostrovsky
 wrote:
>
>>> When I program the general performance counter to trigger an overflow
>>> interrupt, I set the following bits for the event selector register
>>> and run a task to generate the L3 cache cache miss.
>>> FLAG_ENABLE: 0x40UL
>>> FLAG_INT:0x10UL
>>> FLAG_USR: 0x01UL
>>> L3_ALLMISS_EVENT0x2E
>>> L3_ALLMISS_MESI 0x41
>>>
>>> I'm sure the performance counter does overflow, but I didn't see any
>>> interrupt was triggered. Maybe I missed something?
>
> Did you program global registrers (MSR_CORE_PERF_GLOBAL_CTRL,
> MSR_CORE_PERF_GLOBAL_OVF_CTRL)?

I tried two scenarios:
Scenario1)
MSR_CORE_PERF_GLOBAL_CTRL (0x38F) = 0x0
MSR_CORE_PERF_GLOBAL_OVF_CTRL (0x390) = 0x0
The function pmu_apic_interrupt() is not called.
Scenario 2)
   MSR_CORE_PERF_GLOBAL_CTRL (0x38F) = 0xff
MSR_CORE_PERF_GLOBAL_OVF_CTRL (0x390) = 0x0
The function pmu_apic_interrupt() is not called either.

In both scenarios, the IA32_PERF_GLOBAL_STATUS (0x38E) is 0xf.

I tried to set MSR_CORE_PERF_GLOBAL_OVF_CTRL to 0xf, but the
register's content is not changed. :(

Maybe I should set the MSR_CORE_PERF_GLOBAL_OVF_CTRL to 0xF to enable
the overflow interrupt?

Thank you very much for your time and help!

Meng

---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Question about the general performance counter overflow interrupt handling

2017-03-31 Thread Meng Xu

[Sorry, I cc.ed Quan's previous email at Intel. Change to his current email.]

On Fri, Mar 31, 2017 at 11:41 AM, Meng Xu  wrote:
> Hi Jan and Boris,
>
> I'm Meng Xu from the University of Pennsylvania.
>
> I'm wondering:
> How does Xen (vpmu) handle the general performance counter's overflow 
> interrupt?
> Could you point me to the function handler, if Xen does handle it?
>
> ---What I want to achieve---
> I'm looking at the real-time performance in Xen.
> I want to profile the system's status for every K L3 cache misses from
> a specific core.
> I plan to program the general performance counter to -K to trigger an
> overflow interrupt. In the interrupt handler, I plan to check the
> system's status and give hints to the scheduler.
>
> --- What I have tried ---
> I want to find the interrupt handler and plug in another function.
> 1) I checked Xen's vpmu command option, it does not say vpmu handles
> the general performance counter's overflow interrupt.
>
> 2) I also added a function inside pmu_apic_interrupt() in apic.c.
> However, it seems that the pmu_apic_interrupt() is not triggered when
> the general performance counter overflows.
>
> When I program the general performance counter to trigger an overflow
> interrupt, I set the following bits for the event selector register
> and run a task to generate the L3 cache cache miss.
> FLAG_ENABLE: 0x40UL
> FLAG_INT:0x10UL
> FLAG_USR: 0x01UL
> L3_ALLMISS_EVENT0x2E
> L3_ALLMISS_MESI 0x41
>
> I'm sure the performance counter does overflow, but I didn't see any
> interrupt was triggered. Maybe I missed something?
>
> Thank you very much for your help and time!
>
> Best regards,
>
> Meng
> ---
> Meng Xu
> PhD Candidate in Computer and Information Science
> University of Pennsylvania
> http://www.cis.upenn.edu/~mengxu/

-- 
---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] Question about the general performance counter overflow interrupt handling

2017-03-31 Thread Meng Xu

Hi Jan and Boris,

I'm Meng Xu from the University of Pennsylvania.

I'm wondering:
How does Xen (vpmu) handle the general performance counter's overflow interrupt?
Could you point me to the function handler, if Xen does handle it?

---What I want to achieve---
I'm looking at the real-time performance in Xen.
I want to profile the system's status for every K L3 cache misses from
a specific core.
I plan to program the general performance counter to -K to trigger an
overflow interrupt. In the interrupt handler, I plan to check the
system's status and give hints to the scheduler.

--- What I have tried ---
I want to find the interrupt handler and plug in another function.
1) I checked Xen's vpmu command option, it does not say vpmu handles
the general performance counter's overflow interrupt.

2) I also added a function inside pmu_apic_interrupt() in apic.c.
However, it seems that the pmu_apic_interrupt() is not triggered when
the general performance counter overflows.

When I program the general performance counter to trigger an overflow
interrupt, I set the following bits for the event selector register
and run a task to generate the L3 cache cache miss.
FLAG_ENABLE: 0x40UL
FLAG_INT:0x10UL
FLAG_USR: 0x01UL
L3_ALLMISS_EVENT0x2E
L3_ALLMISS_MESI 0x41

I'm sure the performance counter does overflow, but I didn't see any
interrupt was triggered. Maybe I missed something?

Thank you very much for your help and time!

Best regards,

Meng
---
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RTDS Patch v2 for Xen4.8] xen: rtds: only tickle non-already tickled CPUs

2017-02-25 Thread Meng Xu

On Fri, Feb 24, 2017 at 4:54 PM, Haoran Li  wrote:
> From: naroahlee 
>
> Bug Analysis:
> When more than one idle VCPUs that have the same PCPU as their
> previous running core invoke runq_tickle(), they will tickle the same
> PCPU. The tickled PCPU will only pick at most one VCPU, i.e., the
> highest-priority one, to execute. The other VCPUs will not be
> scheduled for a period, even when there is an idle core, making these
> VCPUs unnecessarily starve for one period.
> Therefore, always make sure that we only tickle PCPUs that have not
> been tickled already.
> ---
>  xen/common/sched_rt.c | 26 --
>  1 file changed, 12 insertions(+), 14 deletions(-)
>
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 1b30014..012975c 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -1144,9 +1144,10 @@ rt_vcpu_sleep(const struct scheduler *ops, struct vcpu 
> *vc)
>   * Called by wake() and context_saved()
>   * We have a running candidate here, the kick logic is:
>   * Among all the cpus that are within the cpu affinity
> - * 1) if the new->cpu is idle, kick it. This could benefit cache hit
> - * 2) if there are any idle vcpu, kick it.
> - * 3) now all pcpus are busy;
> + * 1) if there are any idle vcpu, kick it.
> + *For cache benefit, we first search new->cpu.
> + *

Please ditch this empty line.

> + * 2) now all pcpus are busy;
>   *among all the running vcpus, pick lowest priority one
>   *if snext has higher priority, kick it.
>   *
> @@ -1174,17 +1175,11 @@ runq_tickle(const struct scheduler *ops, struct 
> rt_vcpu *new)
>  cpumask_and(¬_tickled, online, new->vcpu->cpu_hard_affinity);
>  cpumask_andnot(¬_tickled, ¬_tickled, &prv->tickled);
>
> -/* 1) if new's previous cpu is idle, kick it for cache benefit */
> -if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) )
> -{
> -SCHED_STAT_CRANK(tickled_idle_cpu);
> -cpu_to_tickle = new->vcpu->processor;
> -goto out;
> -}
> -
> -/* 2) if there are any idle pcpu, kick it */
> +/* 1) if there are any idle pcpu, kick it */
>  /* The same loop also find the one with lowest priority */
> -for_each_cpu(cpu, ¬_tickled)
> +   /* For cache benefit, we search new->cpu first */
> +cpu = cpumask_test_or_cycle(new->vcpu->processor, ¬_tickled);
> +while ( cpu != nr_cpu_ids )
>  {
>  iter_vc = curr_on_cpu(cpu);
>  if ( is_idle_vcpu(iter_vc) )
> @@ -1197,9 +1192,12 @@ runq_tickle(const struct scheduler *ops, struct 
> rt_vcpu *new)
>  if ( latest_deadline_vcpu == NULL ||
>   iter_svc->cur_deadline > latest_deadline_vcpu->cur_deadline )
>  latest_deadline_vcpu = iter_svc;
> +
> +cpumask_clear_cpu(cpu, ¬_tickled);
> +cpu = cpumask_cycle(cpu, ¬_tickled);
>  }
>
> -/* 3) candicate has higher priority, kick out lowest priority vcpu */
> +/* 2) candicate has higher priority, kick out lowest priority vcpu */
>  if ( latest_deadline_vcpu != NULL &&
>   new->cur_deadline < latest_deadline_vcpu->cur_deadline )
>  {
> --
> 1.9.1
>
> ---
> CC: 
> CC: 

The code looks good to me.

The empty line in the comment may not be a bit deal.

Reviewed-by: Meng Xu 

Thanks,

Meng
---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 01/16] docs: create Memory Bandwidth Allocation (MBA) feature document.

2017-02-24 Thread Meng Xu

>> > +  System administrator can change PSR allocation policy at runtime by
>> > +  tool stack. Since MBA shares COS with CAT/CDP, a COS corresponds to a
>> > +  2-tuple, like [CBM, Thrtl] with only-CAT enalbed, when CDP is enable,
>> > +  the COS corresponds to a 3-tuple, like [Code_CBM, Data_CBM, Thrtl]. If
>> > +  neither CAT nor CDP is enabled, things would be easier, one COS
>> > +  corresponds to one Thrtl.
>>
>> How many bits in Thrtl field?
>> Is it decided by the hardware type?
>>
> This is defined in SDM.
> "The definition for the MBA delay value MSRs is provided in Figure 17.39. The
> lower 16 bits are used for MBA delay values, and values from zero to the 
> maximum
> from the CPUID MBA_MAX-1 value are supported."
>
> Please note, MBA value is different with CBM. You do not need care the bits.
>
>> > +# References
>> > +
>> > +"INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES" 
>> > [Intel® 64 and IA-32 Architectures Software Developer Manuals, 
>> > vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
>> > +
>>
>> I checked the document. The CAT is in Chapter 17.17. However, there is
>> no description about the MBA? ;-)
> Have you downloaded latest SDM? 17.18.7 is for MBA.

Ah-ha, I saw it now. I guess I downloaded the old version. :-)

I found this MBA feature is interesting. Is there any processor on the
market we can purchase?
We'd like to evaluate this feature. ;-)

Thanks,

Meng

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 01/16] docs: create Memory Bandwidth Allocation (MBA) feature document.

2017-02-23 Thread Meng Xu

Hi Yi,

I have some quick comment about this document. Some minor points are
not very clear, IMHO.

> +
> +  2. `psr-mba-set [OPTIONS] domain-id throttling`:
> +
> + Set memory bandwidth throttling for domain.
> +
> + Options:
> + '-s': Specify the socket to process, otherwise all sockets are 
> processed.
> +
> + Throttling value set in register implies memory bandwidth blocked, i.e.
> + higher throttling value results in lower bandwidth. The max throttling
> + value can be got through CPUID.
> +
> + The response of the throttling value could be linear mode or non-linear
> + mode.
> +
> + Linear mode: the input precision is defined as 100-(MBA_MAX). For 
> instance,
> + if the MBA_MAX value is 90, the input precision is 10%. Values not an 
> even
> + multiple of the precision (e.g., 12%) will be rounded down (e.g., to 10%
> + delay applied) by HW automatically.

So MBA has a minimum allocation unit. What is the minimum bandwidth
allocation unit?
From the above example, I had the impression that the allocation unit is 10%.
As mentioned in the document later, the throttle value is set in the
COS register's  Thrtl bit fields as shown in [Code_CBM, Data_CBM,
Thrtl]. I had the impression that the maximum number of bandwidth
units we can allocate is 2^number_of_bits_in_Thrtl.
Only one of my impression could be true, right? ;-)

In addition, since hardware will round down the partial bandwidth
value, why shouldn't we just allow system operators to configure the
"valid" bandwidth supported by the hardware.
For example, if the hardware only supports the bandwidth  throttle
value in 10% units, then we should not allow users to input the
bandwidth throttle value as 12% or 13%. Otherwise, as a system
operator, I would be confused at why I increased the bandwidth
throttle value from 11% to 19%, I still see the same bandwidth
guarantee.

> +
> + Non-linear mode: input delay values are powers-of-two from zero to the
> + MBA_MAX value from CPUID. In this case any values not a power of two 
> will
> + be rounded down the next nearest power of two by HW automatically.\

First question: Why is it the delay value instead of bandwidth value
in the non-linear mode? Does MBA really control memory access latency?

Second question: Does the hardware provide any guaranteed bandwidth in
the non-linear mode?
I saw the document patch in Linux at
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1307176.html:
[Qutoe]
In nonlinear scale currently SDM specifies
+throttle values in 2^n values. However the h/w does not guarantee a
+specific curve for the amount of memory b/w that is actually throttled.
+But for any thrtl_by value x > y, its guaranteed that x would throttle
+more b/w than y.  The info directory specifies the max thrtl_by value
+and thrtl_by granularity.
[/Qutoe]

It seems that the non-linear mode simply provide some throttling
relations but don't guarantee the actual throttle value.
Maybe it will be good to clearly state the capability and limitations
of the hardware.

> +  System administrator can change PSR allocation policy at runtime by
> +  tool stack. Since MBA shares COS with CAT/CDP, a COS corresponds to a
> +  2-tuple, like [CBM, Thrtl] with only-CAT enalbed, when CDP is enable,
> +  the COS corresponds to a 3-tuple, like [Code_CBM, Data_CBM, Thrtl]. If
> +  neither CAT nor CDP is enabled, things would be easier, one COS
> +  corresponds to one Thrtl.

How many bits in Thrtl field?
Is it decided by the hardware type?

> +# References
> +
> +"INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES" 
> [Intel® 64 and IA-32 Architectures Software Developer Manuals, 
> vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
> +

I checked the document. The CAT is in Chapter 17.17. However, there is
no description about the MBA? ;-)

Thanks,

Meng

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RTDS Patch for Xen4.8] xen: sched_rt.c Check not_tickled Mask

2017-02-22 Thread Meng Xu

Hi Haoran,

Thank you for sending this patch out quickly! :-)

The title can be
[PATCH] xen: rtds: only tickle the same cpu once

On Wed, Feb 22, 2017 at 5:16 PM, Haoran Li  wrote:
> From: naroahlee 
>
>Modify: xen/common/sched_rt.c runq_tickle(): Check not_tickled Mask
> for a Cache-Preferenced-PCPU

No need for this.

>
> The bug is introduced in Xen 4.7 when we converted RTDS scheduler from
> quantum-driven model to event-driven model. We assumed whenever
> runq_tickle() is invoked, we will find a PCPU via a NOT-tickled mask.
> However, in runq_tickle(): Case1: Pick Cache Preference
> IDLE-PCPU is NOT masked by the not-tickled CPU mask.
>
> Buggy behavior:
> When two VCPUs tried to tickle a IDLE-VCPU, which is now on their
> cache-preference PCPU, these two VCPU will tickle the same PCPU in a row.
> However, only one VCPU is guranteed to be scheduled, because runq_pick()
> would be executed only once in rt_schedule().
> That means, another VCPU will lost (be descheduled) a Period.
>
> Bug Analysis:
> We need to exclude tickled VCPUs when trying to evaluate runq_tickle() case 1

Change the description to the following:

When more than one idle VCPUs that have the same PCPU as their
previous running core invoke runq_tickle(), they will tickle the same
PCPU. The tickled PCPU will only pick at most one VCPU, i.e., the
highest-priority one, to execute. The other VCPUs will not be
scheduled for a period, even when there is an idle core, making these
VCPUs unnecessarily starve for one period.

To fix this issue, we should always tickle the non-tickled PCPU in the
runq_tickle().

Meng

>
> Signed-off-by: Haoran Li 
> ---
>  xen/common/sched_rt.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 1b30014..777192f 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -1175,7 +1175,8 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu 
> *new)
>  cpumask_andnot(¬_tickled, ¬_tickled, &prv->tickled);
>
>  /* 1) if new's previous cpu is idle, kick it for cache benefit */
> -if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) )
> +if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) &&
> + cpumask_test_cpu(new->vcpu->processor, ¬_tickled))

You should have a space before the last ).

Can you resend the patch with the comment resolved?

Thanks,

Meng

-- 

Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen: sched: improve debug dump output.

2017-01-27 Thread Meng Xu

On Thu, Jan 26, 2017 at 11:52 AM, Dario Faggioli
 wrote:
> Scheduling information debug dump for Credit2 is hard
> to read as it contains the same information repeated
> multiple time in different ways.
>
> In fact, in Credit2, CPUs are grouped in runqueus.
> Here's the current debug output:
>
>  CPU[00]  sibling=,0003, core=,00ff
> run: [32767.0] flags=0 cpu=0 credit=-1073741824 [w=0] load=0 (~0%)
>   1: [0.3] flags=0 cpu=2 credit=3273410 [w=256] load=262144 (~100%)
>   2: [0.4] flags=0 cpu=2 credit=2974954 [w=256] load=262144 (~100%)
>  CPU[01]  sibling=,0003, core=,00ff
> run: [32767.1] flags=0 cpu=1 credit=-1073741824 [w=0] load=0 (~0%)
>   1: [0.3] flags=0 cpu=2 credit=3273410 [w=256] load=262144 (~100%)
>   2: [0.4] flags=0 cpu=2 credit=2974954 [w=256] load=262144 (~100%)
>  CPU[02]  sibling=,000c, core=,00ff
> run: [0.2] flags=2 cpu=2 credit=3556909 [w=256] load=262144 (~100%)
>   1: [0.3] flags=0 cpu=2 credit=3273410 [w=256] load=262144 (~100%)
>   2: [0.4] flags=0 cpu=2 credit=2974954 [w=256] load=262144 (~100%)
>
> Here, CPUs 0, 1 and 2, are all part of runqueue 0,
> the content of which (which, BTW, is d0v3 and d0v4)
> is printed 3 times! It is also not very useful to
> see the details of the idle vcpus, as they're always
> the same (except for the vCPU ids).
>
> With this change, we print:
>  - pCPUs details and, for non idle ones, what vCPU
>they're running;
>  - the runqueue content, once and for all.
>
>  Runqueue 0:
>  CPU[00] runq=0, sibling=,0003, core=,00ff
> run: [0.15] flags=2 cpu=0 credit=5804742 [w=256] load=3655 (~1%)
>  CPU[01] runq=0, sibling=,0003, core=,00ff
>  CPU[02] runq=0, sibling=,000c, core=,00ff
> run: [0.3] flags=2 cpu=2 credit=6674856 [w=256] load=262144 (~100%)
>  CPU[03] runq=0, sibling=,000c, core=,00ff
>  RUNQ:
>   0: [0.1] flags=0 cpu=2 credit=6561215 [w=256] load=262144 (~100%)
>   1: [0.2] flags=0 cpu=2 credit=5812356 [w=256] load=262144 (~100%)
>
> Stop printing details of idle vCPUs also in Credit1
> and RTDS (they're pretty useless in there too).
>
> Signed-off-by: Dario Faggioli 
> ---
> Cc: George Dunlap 
> Cc: Anshul Makkar 
> Cc: Meng Xu 
> ---
>  xen/common/sched_credit.c  |6 ++--
>  xen/common/sched_credit2.c |   72 
> +-------
>  xen/common/sched_rt.c  |9 +-
>  xen/common/schedule.c  |7 ++--
>  4 files changed, 49 insertions(+), 45 deletions(-)

As to the sched_rt.c,
Acked-by Meng Xu 

Thanks,

Meng

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen: sched: improve debug dump output.

2017-01-27 Thread Meng Xu

>
> > > TBH, what I don't especially like in the output above is, within
> > > the
> > > vCPU info being printed:
> > >  - the spaces inside '[' ']';
> > >  - the big numbers;
> > >  - the fact that last_start is rather useless (it's more tracing
> > > info
> > >than debug dump info, IMO);
> >
> > I feel the last_start is useful, at least to identify the previous
> > subtle bug in budget accounting. It tells us when the running VCPU
> > was
> > scheduled and indicates how the budget will be burned later.
> > When I saw the last_start is in the previous period and the
> > burn_budget() still use the old last_start to burn budget for the
> > current period, I figured out the bug.
> >
> That's ok, we're all different... That's what makes the World so
> beautiful. :-)
>
> > >  - the fact that the various queues info and CPUs info are not
> > >displaed closer, and they even have "Domain info:" in between
> > > them
> > >(which is because of schedule.c, not sched_rt.c, I know);
> > >  - the word "info" after "Global RunQueue", "Global DepletedQueue",
> > >"Global Replenishment Events";
> > >  - the word "Global", in the names above;
> > >  - the onQ and runnable flags being printed, which I don't is
> > > really
> > >necessary or useful;
> > >  - the lack of scheduler wide information (e.g., the tickled mask,
> > > the
> > >next timeout of the replenishment timer, etc);
> > >
> > > But this is material for another patch. :-)
> >
> > I agree with all of the above output improvements, except for killing
> > the last_start info. :-)
> >
> Ok. I'm not yet working on a patch that does remove it. If/when I will
> and send it, you're more than entitled, and have the necessary power,
> to Nack it. ;-P


OK. Once this serials of patch gets in, I can send one patch to fix this output.
>
>
> > > Going back to printing "idle" or not, also remember that this is
> > > debug
> > > output, meant at being mostly useful for developers, or with help
> > > from
> > > developers. And developers can easily check in the code what having
> > > just the CPU ID printed means (in case it's not obvious, which I
> > > think
> > > it is, or they don't remember).
> > >
> > > That being said, it's not that I can't live with the added "idle"
> > > indication. But I like it less and would prefer not to add it.
> >
> > Sure! I was thinking if we should even avoid printing out the idle
> > CPU
> > number to make the output more concise on an idle systems.
> >
> Yeah, that was also my first doing. But, then, looking at the output, I
> found it a little bit too obfuscated the info about what pCPUs are
> actually in use within the scheduler (think of the case where it's not
> default, and is in a cpupool).
>
> So I ended up deciding to leave it there, all of them, idle or not.
>
> > After seeing the complete output, I think the current output is fine.
> >
> Great! So, you'll send your Acked-by soon?


Yep. I'm replying to your original email.

Meng
---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen: sched: improve debug dump output.

2017-01-27 Thread Meng Xu

On Thu, Jan 26, 2017 at 5:08 PM, Dario Faggioli
 wrote:
> On Thu, 2017-01-26 at 13:59 -0500, Meng Xu wrote:
>> Hi Dario,
>>
> Hi,
>
>> On Thu, Jan 26, 2017 at 11:52 AM, Dario Faggioli
>>  wrote:
>> >
>> >  Runqueue 0:
>> >  CPU[00] runq=0, sibling=,0003, core=,00ff
>> > run: [0.15] flags=2 cpu=0 credit=5804742 [w=256] load=3655
>> > (~1%)
>> >  CPU[01] runq=0, sibling=,0003, core=,00ff
>> >  CPU[02] runq=0, sibling=,000c, core=,00ff
>> > run: [0.3] flags=2 cpu=2 credit=6674856 [w=256] load=262144
>> > (~100%)
>> >  CPU[03] runq=0, sibling=,000c, core=,00ff
>> >  RUNQ:
>>
>> What is the difference between RUNQ and Runqueue 0 in the message?
>>
> Right. So, this is more comprehensive output:
>
> (XEN) [ 2797.156864] Cpupool 0:
> (XEN) [ 2797.156866] Cpus: 0-5,10-15
> (XEN) [ 2797.156868] Scheduler: SMP Credit Scheduler rev2 (credit2)
> (XEN) [ 2797.156871] Active queues: 2
> (XEN) [ 2797.156873]default-weight = 256
> (XEN) [ 2797.156876] Runqueue 0:
> (XEN) [ 2797.156878]ncpus  = 6
> (XEN) [ 2797.156879]cpus   = 0-5
> (XEN) [ 2797.156881]max_weight = 256
> (XEN) [ 2797.156882]instload   = 5
> (XEN) [ 2797.156884]aveload= 1052984 (~401%)
> (XEN) [ 2797.156887]idlers: ,002a
> (XEN) [ 2797.156889]tickled: ,
> (XEN) [ 2797.156891]fully idle cores: ,
> (XEN) [ 2797.156894] Runqueue 1:
> (XEN) [ 2797.156896]ncpus  = 6
> (XEN) [ 2797.156897]cpus   = 10-15
> (XEN) [ 2797.156899]max_weight = 256
> (XEN) [ 2797.156900]instload   = 4
> (XEN) [ 2797.156902]aveload= 1061305 (~404%)
> (XEN) [ 2797.156904]idlers: ,e800
> (XEN) [ 2797.156906]tickled: ,
> (XEN) [ 2797.156908]fully idle cores: ,c000
> (XEN) [ 2797.156910] Domain info:
> (XEN) [ 2797.156912]Domain: 0 w 256 v 16
> (XEN) [ 2797.156914]  1: [0.0] flags=2 cpu=4 credit=-476120314 [w=256] 
> load=85800 (~32%)
> (XEN) [ 2797.156919]  2: [0.1] flags=0 cpu=2 credit=5630728 [w=256] 
> load=262144 (~100%)
> (XEN) [ 2797.156923]  3: [0.2] flags=0 cpu=2 credit=4719251 [w=256] 
> load=262144 (~100%)
> (XEN) [ 2797.156928]  4: [0.3] flags=2 cpu=2 credit=5648202 [w=256] 
> load=262144 (~100%)
> (XEN) [ 2797.156933]  5: [0.4] flags=2 cpu=12 credit=2735243 [w=256] 
> load=262144 (~100%)
> (XEN) [ 2797.156939]  6: [0.5] flags=2 cpu=12 credit=2721770 [w=256] 
> load=262144 (~100%)
> (XEN) [ 2797.156945]  7: [0.6] flags=0 cpu=12 credit=2150753 [w=256] 
> load=262144 (~100%)
> (XEN) [ 2797.156950]  8: [0.7] flags=0 cpu=14 credit=10424341 [w=256] 
> load=2836 (~1%)
> (XEN) [ 2797.156986]  9: [0.8] flags=0 cpu=4 credit=1050 [w=256] 
> load=14 (~0%)
> (XEN) [ 2797.156991] 10: [0.9] flags=0 cpu=14 credit=1050 [w=256] 
> load=12 (~0%)
> (XEN) [ 2797.156995] 11: [0.10] flags=0 cpu=5 credit=9204778 [w=256] 
> load=7692 (~2%)
> (XEN) [ 2797.156999] 12: [0.11] flags=0 cpu=1 credit=10501097 [w=256] 
> load=2791 (~1%)
> (XEN) [ 2797.157004] 13: [0.12] flags=0 cpu=4 credit=1050 [w=256] 
> load=28 (~0%)
> (XEN) [ 2797.157008] 14: [0.13] flags=0 cpu=11 credit=1050 [w=256] 
> load=19 (~0%)
> (XEN) [ 2797.157013] 15: [0.14] flags=0 cpu=14 credit=1050 [w=256] 
> load=388 (~0%)
> (XEN) [ 2797.157017] 16: [0.15] flags=0 cpu=3 credit=9832716 [w=256] 
> load=7326 (~2%)
> (XEN) [ 2797.157022]Domain: 1 w 256 v 2
> (XEN) [ 2797.157024] 17: [1.0] flags=2 cpu=10 credit=-1085114190 [w=256] 
> load=261922 (~99%)
> (XEN) [ 2797.157029] 18: [1.1] flags=0 cpu=14 credit=1050 [w=256] 
> load=0 (~0%)
> (XEN) [ 2797.157033]Domain: 2 w 256 v 2
> (XEN) [ 2797.157035] 19: [2.0] flags=2 cpu=0 credit=-593239186 [w=256] 
> load=47389 (~18%)
> (XEN) [ 2797.157040] 20: [2.1] flags=0 cpu=11 credit=1050 [w=256] 
> load=0 (~0%)
> (XEN) [ 2797.157044] Runqueue 0:
> (XEN) [ 2797.157047] CPU[00] runq=0, sibling=,0003, 
> core=,00ff
> (XEN) [ 2797.157050]run: [2.0] flags=2 cpu=0 credit=-593239186 [w=256] 
> load=47389 (~18%)
> (XEN) [ 2797.157055] CPU[01] runq=0, sibling=,0003, 
> core=,00ff
> (XEN) [ 2797.157058] CPU[02] runq=0, sibling=,000c, 
> core=,00ff
> (XEN) [ 2797.157061]run: [0.3] flags=2 cpu=2 credit=5648202 [w=256] 
> load=262144 (~100%)
> (XEN) [ 2797.157066] CPU[03] runq=0,

Re: [Xen-devel] [PATCH] xen: sched: improve debug dump output.

2017-01-26 Thread Meng Xu

Hi Dario,

I'm commenting on the rtds part.

On Thu, Jan 26, 2017 at 11:52 AM, Dario Faggioli
 wrote:
> Scheduling information debug dump for Credit2 is hard
> to read as it contains the same information repeated
> multiple time in different ways.
>
> In fact, in Credit2, CPUs are grouped in runqueus.
> Here's the current debug output:
>
>  CPU[00]  sibling=,0003, core=,00ff
> run: [32767.0] flags=0 cpu=0 credit=-1073741824 [w=0] load=0 (~0%)
>   1: [0.3] flags=0 cpu=2 credit=3273410 [w=256] load=262144 (~100%)
>   2: [0.4] flags=0 cpu=2 credit=2974954 [w=256] load=262144 (~100%)
>  CPU[01]  sibling=,0003, core=,00ff
> run: [32767.1] flags=0 cpu=1 credit=-1073741824 [w=0] load=0 (~0%)
>   1: [0.3] flags=0 cpu=2 credit=3273410 [w=256] load=262144 (~100%)
>   2: [0.4] flags=0 cpu=2 credit=2974954 [w=256] load=262144 (~100%)
>  CPU[02]  sibling=,000c, core=,00ff
> run: [0.2] flags=2 cpu=2 credit=3556909 [w=256] load=262144 (~100%)
>   1: [0.3] flags=0 cpu=2 credit=3273410 [w=256] load=262144 (~100%)
>   2: [0.4] flags=0 cpu=2 credit=2974954 [w=256] load=262144 (~100%)
>
> Here, CPUs 0, 1 and 2, are all part of runqueue 0,
> the content of which (which, BTW, is d0v3 and d0v4)
> is printed 3 times! It is also not very useful to
> see the details of the idle vcpus, as they're always
> the same (except for the vCPU ids).
>
> With this change, we print:
>  - pCPUs details and, for non idle ones, what vCPU
>they're running;
>  - the runqueue content, once and for all.
>
>  Runqueue 0:
>  CPU[00] runq=0, sibling=,0003, core=,00ff
> run: [0.15] flags=2 cpu=0 credit=5804742 [w=256] load=3655 (~1%)
>  CPU[01] runq=0, sibling=,0003, core=,00ff
>  CPU[02] runq=0, sibling=,000c, core=,00ff
> run: [0.3] flags=2 cpu=2 credit=6674856 [w=256] load=262144 (~100%)
>  CPU[03] runq=0, sibling=,000c, core=,00ff
>  RUNQ:

What is the difference between RUNQ and Runqueue 0 in the message?

>   0: [0.1] flags=0 cpu=2 credit=6561215 [w=256] load=262144 (~100%)
>   1: [0.2] flags=0 cpu=2 credit=5812356 [w=256] load=262144 (~100%)
>
> Stop printing details of idle vCPUs also in Credit1
> and RTDS (they're pretty useless in there too).
>
> Signed-off-by: Dario Faggioli 
> ---
> Cc: George Dunlap 
> Cc: Anshul Makkar 
> Cc: Meng Xu 
> ---
>  xen/common/sched_credit.c  |6 ++--
>  xen/common/sched_credit2.c |   72 
> +---
>  xen/common/sched_rt.c  |9 +-
>  xen/common/schedule.c  |7 ++--
>  4 files changed, 49 insertions(+), 45 deletions(-)
>

> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 24b4b22..f2d979c 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -320,10 +320,17 @@ static void
>  rt_dump_pcpu(const struct scheduler *ops, int cpu)
>  {
>  struct rt_private *prv = rt_priv(ops);
> +struct rt_vcpu *svc;
>  unsigned long flags;
>
>  spin_lock_irqsave(&prv->lock, flags);
> -rt_dump_vcpu(ops, rt_vcpu(curr_on_cpu(cpu)));
> +printk("CPU[%02d]\n", cpu);

We probably do not need this printk().
In rt_dump_vcpu(), we will print out the CPU number.

> +/* current VCPU (nothing to say if that's the idle vcpu). */
> +svc = rt_vcpu(curr_on_cpu(cpu));
> +if ( svc && !is_idle_vcpu(svc->vcpu) )
> +    {
> +rt_dump_vcpu(ops, svc);
> +}

Maybe it is better to print the CPU number if the CPU is running an idle VCPU.
The printk info could be:
 printk("CPU[%02d]: idle\n", cpu);

>  spin_unlock_irqrestore(&prv->lock, flags);
>  }

Thanks,

Meng


Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Granularity of Credit and RTDS Scheduler

2017-01-10 Thread Meng Xu

On Tue, Jan 10, 2017 at 4:32 PM, wy11  wrote:
> Thank you very much for your explanation.
>
> To me, it seems to be an accounting problem because even if the time is
> accounted immediately a VCPU wake up or sleeps, the problem remains if the
> time is recorded in microseconds or million seconds because a VCPU can run
> for less than a unit so that the time accounted is 0.
>
> If the time granularity of RTDS is nanosecond, then it is no longer a
> problem.

Yep.

> Can you please help me to know where I can find it in the source
> code?

It's in burn_budget function in sched_rt.c .

>
> Again, thanks a lot for your help.

:-)

Meng

>
>
> Quoting Meng Xu :
>
>> [cc. Dario and George]
>>
>> On Fri, Jan 6, 2017 at 1:34 PM, wy11  wrote:
>>>
>>> Dear Xen developers,
>>
>>
>> Hi,
>>
>>>
>>> Recently I read a paper about possible theft of service attacks in Xen
>>> hypervisor.
>>>
>>> https://arxiv.org/pdf/1103.0759.pdf
>>
>>
>> I quickly read it. It is interesting to see that EC2 suffers from such
>> issue.
>> According to 4.1, it seems to me that this is more like a scheduler
>> "bug" in budget accounting logic.
>> When the attack VCPU wake up, the scheduler should starts to counting
>> all time consumed from now on for the attack VM, instead of the victim
>> VM. When the attack VCPU sleeps, the scheduler should accounts the
>> budget consumed for the attack VM.
>>
>> In the event-driven RTDS scheduler, this issue should not happen. The
>> scheduler did account the budget for the correct VMs, IIRC.
>> Is there any experiment showing that RTDS scheduler suffers this issue?
>>
>>>
>>> Due to the 10 ms intervals between sampling points, a malicious VM is
>>> able
>>> to run less than a interval and sleep to avoid being accounted.
>>
>>
>> I don't think the scheduling interval is the issue. It is more like a
>> budget accounting issue for me.
>> Dario and George may have better answer for this.
>>
>>>
>>> According to the info page of RTDS, it seems that after V4.7, a RTDS
>>> based
>>> scheduler achieves a granularity of microsecond.
>>
>>
>> This is just the time granularity for users to specify the VCPU
>> parameters.
>> In the scheduler, it is in nanoseconds.
>>
>>> However, is it able that a
>>> VM runs for less than a microsecond and relinquish the host actively so
>>> as
>>> to keep its budget?
>>
>>
>> Nope. I don't think the attack model in the paper will succeed for the
>> RTDS scheduler.
>> If I understand the attack model correctly, it is the budget
>> accounting issue instead of timing granularity issue. (Please correct
>> me if I'm wrong).
>>
>> If you have a script to show the attack on RTDS scheduler, I would be
>> happy to reproduce it on my machine and help fix it.
>>
>>>
>>> A similar problem occurs in earlier Linux kernel, and it is fixed in
>>> today's
>>> Linux on x86 machines by utilizing a clock source TSC with a granularity
>>> of
>>> nanoseconds. I'd like to know if there is any reason that the Xen
>>> hypervisor
>>> does not choose a nanosecond scheduler?
>>
>>
>> What do you mean by a nanosecond scheduler? In the kernel, scheduler
>> accounts the budget in nanoseconds.
>>
>> Meng
>>
>>
>> --
>> ---
>> Meng Xu
>> PhD Student in Computer and Information Science
>> University of Pennsylvania
>> http://www.cis.upenn.edu/~mengxu/
>>
>> !DSPAM:8895,5871ea96257295422997164!
>
>
>
>



-- 
---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Granularity of Credit and RTDS Scheduler

2017-01-07 Thread Meng Xu

[cc. Dario and George]

On Fri, Jan 6, 2017 at 1:34 PM, wy11  wrote:
> Dear Xen developers,

Hi,

>
> Recently I read a paper about possible theft of service attacks in Xen
> hypervisor.
>
> https://arxiv.org/pdf/1103.0759.pdf

I quickly read it. It is interesting to see that EC2 suffers from such issue.
According to 4.1, it seems to me that this is more like a scheduler
"bug" in budget accounting logic.
When the attack VCPU wake up, the scheduler should starts to counting
all time consumed from now on for the attack VM, instead of the victim
VM. When the attack VCPU sleeps, the scheduler should accounts the
budget consumed for the attack VM.

In the event-driven RTDS scheduler, this issue should not happen. The
scheduler did account the budget for the correct VMs, IIRC.
Is there any experiment showing that RTDS scheduler suffers this issue?

>
> Due to the 10 ms intervals between sampling points, a malicious VM is able
> to run less than a interval and sleep to avoid being accounted.

I don't think the scheduling interval is the issue. It is more like a
budget accounting issue for me.
Dario and George may have better answer for this.

>
> According to the info page of RTDS, it seems that after V4.7, a RTDS based
> scheduler achieves a granularity of microsecond.

This is just the time granularity for users to specify the VCPU parameters.
In the scheduler, it is in nanoseconds.

> However, is it able that a
> VM runs for less than a microsecond and relinquish the host actively so as
> to keep its budget?

Nope. I don't think the attack model in the paper will succeed for the
RTDS scheduler.
If I understand the attack model correctly, it is the budget
accounting issue instead of timing granularity issue. (Please correct
me if I'm wrong).

If you have a script to show the attack on RTDS scheduler, I would be
happy to reproduce it on my machine and help fix it.

>
> A similar problem occurs in earlier Linux kernel, and it is fixed in today's
> Linux on x86 machines by utilizing a clock source TSC with a granularity of
> nanoseconds. I'd like to know if there is any reason that the Xen hypervisor
> does not choose a nanosecond scheduler?

What do you mean by a nanosecond scheduler? In the kernel, scheduler
accounts the budget in nanoseconds.

Meng

-- 
---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Xen On Nvidia Jetson TX1

2016-12-30 Thread Meng Xu

Hi Methuku,

On Fri, Dec 30, 2016 at 4:38 PM, Methuku Karthik  wrote:
> Hello Everyone,
>
> I am trying to run Xen on Jetson TX1.
>
> I have compiled uboot and changed the mode to nonsec. but I am not seeing
> any message about cpu mode in dmesg.
>
> how to check which mode the cpus have started ?

Can you try out this repo: https://github.com/xenbedded/hyp-mode-checks

It may serve for your purpose.

>
> Please suggest any kernel source for arm64 which can generate zImage  with
> Xen support?

I cc.ed Kyle who showed a demo of Xen on TX1 (or TK1) in 2016. Maybe
he has a repo. for the kernel.

Meng


---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Xen ARM community call

2016-11-08 Thread Meng Xu

Hi Julien,

On Tue, Nov 8, 2016 at 7:19 AM, Julien Grall  wrote:
> Hi all,
>
> I would like to start organizing a recurring community call to discuss and
> sync-up on upcoming features for Xen ARM.
>
> Example of features that could be discussed:
> - Sharing co-processor between guests
> - PCI passthrough
>
> I would suggest to start with a 1 hour meeting on the Wednesday 23rd
> November. I know that people are spread across different timezones, so I
> would like to gather thought before choosing a time.

I'm interested in joining. I'm at EST (GMT-5).

Thank you very much!

Meng

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 01/15] docs: L2 Cache Allocation Technology (CAT) feature document.

2016-10-30 Thread Meng Xu

Hi Yi,

On Mon, Oct 24, 2016 at 11:40 PM, Yi Sun  wrote:
> Signed-off-by: Yi Sun 
> ---
>  docs/features/l2_cat.pandoc | 314 
> 
>  1 file changed, 314 insertions(+)
>  create mode 100644 docs/features/l2_cat.pandoc
>
> diff --git a/docs/features/l2_cat.pandoc b/docs/features/l2_cat.pandoc
> new file mode 100644
> index 000..8544510
> --- /dev/null
> +++ b/docs/features/l2_cat.pandoc
> @@ -0,0 +1,314 @@
> +% Intel L2 Cache Allocation Technology (L2 CAT) Feature
> +% Revision 2.0
> +
> +\clearpage
> +
> +# Basics
> +
> + 
> + Status: **Tech Preview**
> +
> +Architecture(s): Intel x86
> +
> +   Component(s): Hypervisor, toolstack
> +
> +   Hardware: Atom codename Goldmont and beyond
> + 

I'm interested in trying out your code.
I'm planning to purchase a SoC with the Atom Goldmont processors.
Do you have some suggestions about the SoC I should purchase?
I would prefer to use the same SoC as you have. :-)

Thank you very much!

Meng

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [Apply for Xen Outreach Prj] QEMU analysis

2016-10-30 Thread Meng Xu

Hi Haoran,

On Fri, Oct 28, 2016 at 11:28 PM, Lee Naroah  wrote:
>
> Hi Mr. Monné,
>
> My name is Haoran Li.  I am a second year Ph. D student in Computer 
> Science.  My research interest is Real-Time Virtualization.  I wonder to know 
> some more details of the project,  "QEMU xen-blkback performance analysis and 
> improvements", as shown in 
> https://wiki.xenproject.org/wiki/Outreach_Program_Projects.


You can use  to highlight the sentence.

People use the plain text in the ML. :-)



Meng

-- 
---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

1 2 3 4 >

1 - 100 of 397 matches

Mail list logo