On Wed, 2018-04-11 at 14:45 +0200, Olaf Hering wrote:
> On Wed, Apr 11, Dario Faggioli wrote:
>
> > If you're interested in figuring out, I'd like to see:
> > - full output of `xl info -n'
> > - output of `xl debug-key u'
> > - xl vcpu-list
> > - xl list -n
>
> Logs for this .cfg attached:
>
> n
On Fri, 2018-04-13 at 11:29 +, George Dunlap wrote:
> I think as far as backports go, my current RFC would be
> fine. Another possibility, though, would be to simply add a
> migrate() callback to remove the vcpu from the runqueue before
> switching v->processor, *without* removing any of the c
> On Apr 13, 2018, at 10:25 AM, Dario Faggioli wrote:
>
> On Fri, 2018-04-13 at 09:03 +, George Dunlap wrote:
>>> On Apr 12, 2018, at 6:25 PM, Dario Faggioli
>>> wrote:
>>>
>> I think the bottom line is, for this test to be valid, then at this
>> point test_bit(VPF_migrating) *must* imply
On Fri, 2018-04-13 at 09:03 +, George Dunlap wrote:
> > On Apr 12, 2018, at 6:25 PM, Dario Faggioli
> > wrote:
> >
> > On the "other CPU", we might be around here [**]:
> >
> > static void vcpu_migrate(struct vcpu *v)
> > {
> >...
> >if ( v->is_running ||
> > !test_and_clear_
On Fri, Apr 13, Dario Faggioli wrote:
> Yes. In fact, Olaf, I still think that doing a run with George's RFC
> applied, would be useful, if only as a data point.
First tests indicate that this series fixes the bug.
Olaf
signature.asc
Description: PGP signature
_
On Fri, 2018-04-13 at 09:03 +, George Dunlap wrote:
> > On Apr 12, 2018, at 6:25 PM, Dario Faggioli
> > wrote:
> >
> I think the bottom line is, for this test to be valid, then at this
> point test_bit(VPF_migrating) *must* imply !vcpu_on_runqueue(v), but
> at this point it doesn’t: If someon
> On Apr 12, 2018, at 6:25 PM, Dario Faggioli wrote:
>
> On Thu, 2018-04-12 at 17:38 +0200, Dario Faggioli wrote:
>> On Thu, 2018-04-12 at 15:15 +0200, Dario Faggioli wrote:
>>> On Thu, 2018-04-12 at 14:45 +0200, Olaf Hering wrote:
dies after the first iteration.
BU
On Fri, 2018-04-13 at 08:23 +0200, Olaf Hering wrote:
> Am Thu, 12 Apr 2018 19:25:43 +0200
> schrieb Dario Faggioli :
>
> > Olaf, new patch! :-)
>
> BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
>
Thanks!
> (XEN) CPU 36: d10v1 isr=0 runnbl=1 proc=36 pf=0 orq=0 csf=4
>
So, FTR:
- CPU is smp_proce
Am Thu, 12 Apr 2018 19:25:43 +0200
schrieb Dario Faggioli :
> Olaf, new patch! :-)
BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
(XEN) CPU 36: d10v1 isr=0 runnbl=1 proc=36 pf=0 orq=0 csf=4
(XEN) CPU 33: d10v2 isr=0 runnbl=0 proc=33 pf=1 orq=0 csf=4
(XEN) CPU 20: d10v2 isr=0 runnbl=1 proc=20 pf=0
On Thu, 2018-04-12 at 17:38 +0200, Dario Faggioli wrote:
> On Thu, 2018-04-12 at 15:15 +0200, Dario Faggioli wrote:
> > On Thu, 2018-04-12 at 14:45 +0200, Olaf Hering wrote:
> > >
> > > dies after the first iteration.
> > >
> > > BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags));
> >
On Thu, 2018-04-12 at 15:15 +0200, Dario Faggioli wrote:
> On Thu, 2018-04-12 at 14:45 +0200, Olaf Hering wrote:
> >
> > dies after the first iteration.
> >
> > BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags));
> >
>
Update. I replaced this:
+BUG_ON(vcpu_runnable(prev));
+
On Thu, 2018-04-12 at 14:45 +0200, Olaf Hering wrote:
> Am Thu, 12 Apr 2018 12:16:34 +0200
> schrieb Dario Faggioli :
>
> > Olaf, new patch. Please, remove _everything_ and apply _only_ this
> > one.
>
> dies after the first iteration.
>
> BUG_ON(!test_bit(_VPF_migrating, &prev->pause_fl
Am Thu, 12 Apr 2018 12:16:34 +0200
schrieb Dario Faggioli :
> Olaf, new patch. Please, remove _everything_ and apply _only_ this one.
dies after the first iteration.
BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags));
(XEN) Xen BUG at schedule.c:1570
(XEN) [ Xen-4.11.20180411T100
On Thu, 2018-04-12 at 09:38 +, George Dunlap wrote:
> > On Apr 11, 2018, at 10:31 PM, Dario Faggioli
> > wrote:
> > (XEN) Xen BUG at sched_credit.c:876
> > (XEN) [ Xen-4.11.20180410T125709.50f8ba84a5-
> > 7.bug1087289_411 x86_64 debug=y Not tainted ]
> > (XEN) CPU:108
> > (XEN)
> On Apr 11, 2018, at 10:31 PM, Dario Faggioli wrote:
>
> Il Mer 11 Apr 2018, 22:48 Olaf Hering ha scritto:
> On Wed, Apr 11, Dario Faggioli wrote:
>
> > It will crash, again, possibly with the same stack trace, but I think
> > it's worth a try.
>
> BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc))
>>> On 11.04.18 at 23:31, wrote:
> Il Mer 11 Apr 2018, 22:48 Olaf Hering ha scritto:
>
>> On Wed, Apr 11, Dario Faggioli wrote:
>>
>> > It will crash, again, possibly with the same stack trace, but I think
>> > it's worth a try.
>>
>> BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
>>
>> (XEN) Xen
Il Mer 11 Apr 2018, 22:48 Olaf Hering ha scritto:
> On Wed, Apr 11, Dario Faggioli wrote:
>
> > It will crash, again, possibly with the same stack trace, but I think
> > it's worth a try.
>
> BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
>
> (XEN) Xen BUG at sched_credit.c:876
> (XEN) [ Xen-4.
On Wed, Apr 11, Dario Faggioli wrote:
> It will crash, again, possibly with the same stack trace, but I think
> it's worth a try.
BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
(XEN) grant_table.c:1769:d15v18 Expanding d15 grant table from 12 to 13 frames
(XEN) grant_table.c:1769:d15v20 Expanding
On Wed, 2018-04-11 at 17:27 +0200, Olaf Hering wrote:
> On Wed, Apr 11, Olaf Hering wrote:
>
> > That was with sched=credit2, sorry for that.
> > Now with just that second patch ...
>
> Still BUG in csched_load_balance.
>
> (XEN) Xen BUG at sched_credit.c:1694
> (XEN) [ Xen-4.11.20180410T125
Am Wed, 11 Apr 2018 09:38:59 -0600
schrieb "Jan Beulich" :
> And till now I had assumed we've taken care of them with earlier
> fixes (all 4.7 reports were with old packages, like 4.7.2 based
> ones). Can you repro this with a debug hypervisor (so we can
> both trust the stack trace and know wheth
>>> On 11.04.18 at 17:03, wrote:
> On Wed, Apr 11, Olaf Hering wrote:
>
>> On Wed, Apr 11, Dario Faggioli wrote:
>>
>> > Olaf, can you give it a try? It should be fine to run it on top of the
>> > last debug patch (the one that produced this crash).
>>
>> Yes, with both changes it did >4k itera
On Wed, Apr 11, Olaf Hering wrote:
> On Wed, Apr 11, Olaf Hering wrote:
> > On Wed, Apr 11, Dario Faggioli wrote:
> > > Olaf, can you give it a try? It should be fine to run it on top of the
> > > last debug patch (the one that produced this crash).
> > Yes, with both changes it did >4k iterations
On Wed, Apr 11, Olaf Hering wrote:
> On Wed, Apr 11, Dario Faggioli wrote:
>
> > Olaf, can you give it a try? It should be fine to run it on top of the
> > last debug patch (the one that produced this crash).
>
> Yes, with both changes it did >4k iterations already. Thanks.
That was with sched=
On Wed, Apr 11, Dario Faggioli wrote:
> If you're interested in figuring out, I'd like to see:
> - full output of `xl info -n'
> - output of `xl debug-key u'
> - xl vcpu-list
> - xl list -n
Logs for this .cfg attached:
name='fv_sles12sp1.0'
vif=[ 'mac=00:18:3e:58:00:c1,bridge=br0' ]
memory=
>>> On 11.04.18 at 13:02, wrote:
> On 04/11/2018 11:17 AM, Dario Faggioli wrote:
>> On Wed, 2018-04-11 at 12:00 +0200, Olaf Hering wrote:
>>> On Wed, Apr 11, Dario Faggioli wrote:
>>>
Olaf, can you give it a try? It should be fine to run it on top of
the
last debug patch (the one th
On 04/11/2018 11:17 AM, Dario Faggioli wrote:
> On Wed, 2018-04-11 at 12:00 +0200, Olaf Hering wrote:
>> On Wed, Apr 11, Dario Faggioli wrote:
>>
>>> Olaf, can you give it a try? It should be fine to run it on top of
>>> the
>>> last debug patch (the one that produced this crash).
>>
>> Yes, with b
On Wed, 2018-04-11 at 11:37 +0100, George Dunlap wrote:
> On 04/10/2018 11:59 PM, Dario Faggioli wrote:
> >
> > So, basically, the race is between context_saved() and
> > vcpu_set_affinity(). Basically, vcpu_set_affinity() sets the
> > VPF_migrating pause flags on a vcpu in a runqueue, with the in
On Wed, 2018-04-11 at 12:00 +0200, Olaf Hering wrote:
> On Wed, Apr 11, Dario Faggioli wrote:
>
> > Olaf, can you give it a try? It should be fine to run it on top of
> > the
> > last debug patch (the one that produced this crash).
>
> Yes, with both changes it did >4k iterations already. Thanks.
On 04/10/2018 11:59 PM, Dario Faggioli wrote:
> [Adding Andrew, not because I expect anything, but just because we've
> chatted about this issue on IRC :-) ]
>
> On Tue, 2018-04-10 at 22:37 +0200, Olaf Hering wrote:
>> On Tue, Apr 10, Dario Faggioli wrote:
>>
>> BUG_ON(__vcpu_on_runq(CSCHED_
On Wed, 2018-04-11 at 10:48 +0200, Olaf Hering wrote:
> On Wed, Apr 11, Dario Faggioli wrote:
> > So, now, when you say 'does not work', do you mean 'domain creation
> > is
> > aborted with errors' or 'domain is created, but memory is not where
> > it
> > should be'.
>
> domU can not be created du
On Wed, 2018-04-11 at 10:48 +0200, Olaf Hering wrote:
> On Wed, Apr 11, Dario Faggioli wrote:
> > So, now, when you say 'does not work', do you mean 'domain creation
> > is
> > aborted with errors' or 'domain is created, but memory is not where
> > it
> > should be'.
>
> domU can not be created du
On Wed, Apr 11, Dario Faggioli wrote:
> Olaf, can you give it a try? It should be fine to run it on top of the
> last debug patch (the one that produced this crash).
Yes, with both changes it did >4k iterations already. Thanks.
Olaf
signature.asc
Description: PGP signature
On Wed, Apr 11, Dario Faggioli wrote:
> So, now, when you say 'does not work', do you mean 'domain creation is
> aborted with errors' or 'domain is created, but memory is not where it
> should be'.
domU can not be created due to "libxl__set_vcpuaffinity: setting vcpu
affinity: Invalid argument".
On Wed, 2018-04-11 at 08:23 +0200, Olaf Hering wrote:
> It turned out that I had a typo all the time in my template, it used
> 'cpu=' rather than 'cpus='. On this system none of this works:
> #pus="node:${node}"
> cpus="nodes:${node}"
> #pus="nodes:${node},^node:0"
> #pus_soft="nodes:${node},^node:
On Wed, 2018-04-11 at 09:39 +0200, Juergen Gross wrote:
> On 11/04/18 09:31, Dario Faggioli wrote:
> > > On Tue, 2018-04-10 at 22:37 +0200, Olaf Hering wrote:
> > > > On Tue, Apr 10, Dario Faggioli wrote:
> > > >
> > > > BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
> > > >
> >
> > ... patch atta
On 11/04/18 09:31, Dario Faggioli wrote:
> On Wed, 2018-04-11 at 00:59 +0200, Dario Faggioli wrote:
>> [Adding Andrew, not because I expect anything, but just because
>> we've chatted about this issue on IRC :-) ]
>>
> Except, I did not add it. :-P
>
> Anyway...
>
>> On Tue, 2018-04-10 at 22:37 +
On Wed, 2018-04-11 at 00:59 +0200, Dario Faggioli wrote:
> [Adding Andrew, not because I expect anything, but just because
> we've chatted about this issue on IRC :-) ]
>
Except, I did not add it. :-P
Anyway...
> On Tue, 2018-04-10 at 22:37 +0200, Olaf Hering wrote:
> > On Tue, Apr 10, Dario Fag
On Tue, Apr 10, Dario Faggioli wrote:
> I remember specifically wanting for it to support not only "nodes:", but also
> "node:", because I thought that, e.g., "nodes:3" would have sound weird to
> users.
It turned out that I had a typo all the time in my template, it used
'cpu=' rather than 'cpus
[Adding Andrew, not because I expect anything, but just because we've
chatted about this issue on IRC :-) ]
On Tue, 2018-04-10 at 22:37 +0200, Olaf Hering wrote:
> On Tue, Apr 10, Dario Faggioli wrote:
>
> BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
>
> (XEN) Xen BUG at sched_credit.c:876
> (X
Il Mar 10 Apr 2018, 22:16 Olaf Hering ha scritto:
> On Tue, Apr 10, Olaf Hering wrote:
>
> > On Tue, Apr 10, Dario Faggioli wrote:
> >
> > > In the meanwhile --let me repeat myself-- just go ahead with "node:2",
> > > "node:3", etc. :-D
> >
> > I did, and that fails.
>
> I think the man page is n
On Tue, Apr 10, Dario Faggioli wrote:
> So, Olaf, if you're fancy giving this a tray anyway, well, go ahead.
BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
(XEN) Xen BUG at sched_credit.c:876
(XEN) [ Xen-4.11.20180410T125709.50f8ba84a5-3.bug1087289_411 x86_64
debug=y Not tainted ]
(XE
On Tue, Apr 10, Olaf Hering wrote:
> On Tue, Apr 10, Dario Faggioli wrote:
>
> > In the meanwhile --let me repeat myself-- just go ahead with "node:2",
> > "node:3", etc. :-D
>
> I did, and that fails.
I think the man page is not that clear, to me. If there is a difference
between 'node' vs. 'n
On Tue, Apr 10, Dario Faggioli wrote:
> In the meanwhile --let me repeat myself-- just go ahead with "node:2",
> "node:3", etc. :-D
I did, and that fails.
Olaf
signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenpro
On Tue, 2018-04-10 at 21:03 +0200, Olaf Hering wrote:
> On Tue, Apr 10, Dario Faggioli wrote:
>
> > As said, its cpus= and cpus_soft=, and you probably just need
> > cpus="node:1"
> > cpus_soft="node:1"
> > Or, even just:
> > cpus="node:1"
> > as, if soft-affinity is set to be equal to hard, it is
On Tue, Apr 10, Dario Faggioli wrote:
> On Tue, 2018-04-10 at 17:59 +0200, Olaf Hering wrote:
> > memory=
> > vcpus=36
> > cpu="nodes:1,^node:0"
> > cpu_soft="nodes:1,^node:0"
> As said, its cpus= and cpus_soft=, and you probably just need
> cpus="node:1"
> cpus_soft="node:1"
> Or, even just:
On Tue, 2018-04-10 at 16:25 +0100, George Dunlap wrote:
> On 04/10/2018 12:29 PM, Dario Faggioli wrote:
> >
> One thing we might consider doing is implementing the migrate()
> callback
> for the Credit scheduler, and just have it make a bunch of sanity
> checks
> (v->processor lock held, new_cpu l
On Tue, 2018-04-10 at 17:59 +0200, Olaf Hering wrote:
> On Tue, Apr 10, Olaf Hering wrote:
>
> > (XEN) Xen BUG at sched_credit.c:1694
>
> And another one with debug=y and this config:
>
Wow...
> memory=
> vcpus=36
> cpu="nodes:1,^node:0"
> cpu_soft="nodes:1,^node:0"
>
As said, its cpus= and
On Tue, Apr 10, Olaf Hering wrote:
> (XEN) Xen BUG at sched_credit.c:1694
And another one with debug=y and this config:
memory=
vcpus=36
cpu="nodes:1,^node:0"
cpu_soft="nodes:1,^node:0"
(nodes=1 cycles between 1-3 for each following domU).
(XEN) Assertion 'CSCHED_PCPU(cpu)->nr_runnable >= 1'
On Tue, 2018-04-10 at 16:25 +0100, George Dunlap wrote:
> On 04/10/2018 12:29 PM, Dario Faggioli wrote:
> >
> whenever that is. (Possibly at the end of the current call to
> vcpu_migrate(), possibly at the end of a vcpu_migrate() triggered in
> context_saved() due to VPF_migrating.)
>
> vcpu_mig
On 04/10/2018 04:18 PM, Olaf Hering wrote:
> On Tue, Apr 10, Olaf Hering wrote:
>
>> (XEN) Xen BUG at sched_credit.c:1694
>
> Another variant:
>
> This time the domUs had just vcpus=36 and
> cpus=nodes:N,node:^0/cpus_soft=nodes:N,node:^0
>
> (XEN) Xen BUG at sched_credit.c:280
> (XEN) [ Xe
On 04/10/2018 12:29 PM, Dario Faggioli wrote:
> On Tue, 2018-04-10 at 11:59 +0100, George Dunlap wrote:
>> On 04/10/2018 11:33 AM, Dario Faggioli wrote:
>>> On Tue, 2018-04-10 at 09:34 +, George Dunlap wrote:
Assuming the bug is this one:
BUG_ON( cpu != snext->vcpu->processor );
On Tue, Apr 10, Olaf Hering wrote:
> (XEN) Xen BUG at sched_credit.c:1694
Another variant:
This time the domUs had just vcpus=36 and
cpus=nodes:N,node:^0/cpus_soft=nodes:N,node:^0
(XEN) Xen BUG at sched_credit.c:280
(XEN) [ Xen-4.11.20180407T144959.e62e140daa-2.bug1087289_411 x86_64
deb
On Tue, 2018-04-10 at 11:59 +0100, George Dunlap wrote:
> On 04/10/2018 11:33 AM, Dario Faggioli wrote:
> > On Tue, 2018-04-10 at 09:34 +, George Dunlap wrote:
> > > Assuming the bug is this one:
> > >
> > > BUG_ON( cpu != snext->vcpu->processor );
> > >
> >
> > Yes, it is that one.
> >
> >
On Tue, 2018-04-10 at 11:59 +0100, George Dunlap wrote:
> On 04/10/2018 11:33 AM, Dario Faggioli wrote:
> > On Tue, 2018-04-10 at 09:34 +, George Dunlap wrote:
> > > Assuming the bug is this one:
> > >
> > > BUG_ON( cpu != snext->vcpu->processor );
> > >
> >
> > Yes, it is that one.
> >
> >
On Tue, 2018-04-10 at 11:59 +0100, George Dunlap wrote:
> On 04/10/2018 11:33 AM, Dario Faggioli wrote:
> > On Tue, 2018-04-10 at 09:34 +, George Dunlap wrote:
> > > Assuming the bug is this one:
> > >
> > > BUG_ON( cpu != snext->vcpu->processor );
> > >
> >
> > Yes, it is that one.
> >
> >
On Tue, 2018-04-10 at 11:59 +0100, George Dunlap wrote:
> On 04/10/2018 11:33 AM, Dario Faggioli wrote:
> > On Tue, 2018-04-10 at 09:34 +, George Dunlap wrote:
> > > Assuming the bug is this one:
> > >
> > > BUG_ON( cpu != snext->vcpu->processor );
> > >
> >
> > Yes, it is that one.
> >
> >
On Tue, 2018-04-10 at 11:59 +0100, George Dunlap wrote:
> On 04/10/2018 11:33 AM, Dario Faggioli wrote:
> > On Tue, 2018-04-10 at 09:34 +, George Dunlap wrote:
> > > Assuming the bug is this one:
> > >
> > > BUG_ON( cpu != snext->vcpu->processor );
> > >
> >
> > Yes, it is that one.
> >
> >
On 04/10/2018 11:33 AM, Dario Faggioli wrote:
> On Tue, 2018-04-10 at 09:34 +, George Dunlap wrote:
>> Assuming the bug is this one:
>>
>> BUG_ON( cpu != snext->vcpu->processor );
>>
> Yes, it is that one.
>
> Another stack trace, this time from a debug=y built hypervisor, of what
> we are thi
On Tue, 2018-04-10 at 09:34 +, George Dunlap wrote:
> Assuming the bug is this one:
>
> BUG_ON( cpu != snext->vcpu->processor );
>
Yes, it is that one.
Another stack trace, this time from a debug=y built hypervisor, of what
we are thinking it is the same bug (although reproduced in a slightl
> On Apr 10, 2018, at 9:57 AM, Olaf Hering wrote:
>
> While hunting some other bug we run into the single BUG in
> sched_credit.c:csched_load_balance(). This happens with all versions
> since 4.7, staging is also affected. Testsystem is a Haswell model 63
> system with 4 NUMA nodes and 144 thre
While hunting some other bug we run into the single BUG in
sched_credit.c:csched_load_balance(). This happens with all versions
since 4.7, staging is also affected. Testsystem is a Haswell model 63
system with 4 NUMA nodes and 144 threads.
(XEN) Xen BUG at sched_credit.c:1694
(XEN) [ Xen-4.11.
61 matches
Mail list logo