Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Juergen Gross
On 13/07/16 16:28, Ian Jackson wrote:
> David Vrabel writes ("Re: [Xen-devel] xenstored memory leak"):
>> On 13/07/16 14:32, Juergen Gross wrote:
>>> On 13/07/16 15:17, David Vrabel wrote:
>>>> The Linux driver already cleans up open transactions and removes watches
>>>> when the file handle is released.
>>>
>>> Hmm, does this work reliably? I observed a memory leak of about 5kB per
>>> create/destroy domain pair with xenstored _and_ with xenstore domain.
>>
>> Don't know.
>>
>> I only took a quick look at the xenbus_file_release() function and it
>> looked like it ought to be doing the right thing...
> 
> If it didn't, it would leak with oxenstored too.

This I can't tell. I've got no oxenstore domain running.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Ian Jackson
David Vrabel writes ("Re: [Xen-devel] xenstored memory leak"):
> On 13/07/16 14:32, Juergen Gross wrote:
> > On 13/07/16 15:17, David Vrabel wrote:
> >> The Linux driver already cleans up open transactions and removes watches
> >> when the file handle is released.
> > 
> > Hmm, does this work reliably? I observed a memory leak of about 5kB per
> > create/destroy domain pair with xenstored _and_ with xenstore domain.
> 
> Don't know.
> 
> I only took a quick look at the xenbus_file_release() function and it
> looked like it ought to be doing the right thing...

If it didn't, it would leak with oxenstored too.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Wei Liu
On Wed, Jul 13, 2016 at 04:09:26PM +0200, Juergen Gross wrote:
> On 13/07/16 15:52, Wei Liu wrote:
> > On Wed, Jul 13, 2016 at 03:25:45PM +0200, Juergen Gross wrote:
> >> On 13/07/16 15:07, Wei Liu wrote:
> >>> On Wed, Jul 13, 2016 at 02:21:38PM +0200, Juergen Gross wrote:
>  On 06/07/16 09:31, Juergen Gross wrote:
> > While testing some patches for support of ballooning in Mini-OS by using
> > the xenstore domain I realized that each xl create/destroy pair would
> > increase memory consumption in Mini-OS by about 5kB. Wondering whether
> > this is a xenstore domain only effect I did the same test with xenstored
> > and oxenstored daemons.
> >
> > xenstored showed the same behavior, the "referenced" size showed by the
> > pmap command grew by about 5kB for each create/destroy pair.
> >
> > oxenstored seemed to be even worse in the beginning (about 6kB for each
> > pair), but after about 100 create/destroys the value seemed to be
> > rather stable.
> >
> > Did anyone notice this memory leak before?
> 
>  I think I've found the problem:
> 
>  qemu as the device model is setting up a xenstore watch for each backend
>  type it is supporting. Unfortunately those watches are never removed
>  again. This sums up to the observed memory leak.
> 
>  I'm not sure how oxenstored is avoiding the problem, may be by testing
>  socket connections to be still alive and so detecting qemu has gone.
>  OTOH this won't help for oxenstored running in another domain than the
>  device model (either due to oxenstore-stubdom, or a driver domain with
>  a qemu based device model).
> 
> >>>
> >>> How unfortunate.
> >>>
> >>> My gut feeling is that xenstored shouldn't have the knowledge to
> >>> associate a watch with a "process". The concept of a process is only
> >>> meaningful to OS, which wouldn't work on cross-domain xenstored setup.
> >>
> >> Right.
> >>
> >>> Maybe the OS xenbus driver should reap all watches on behalf the dead
> >>> process. This would also avoid a crashed QEMU leaking resources.
> >>>
> >>> And xenstored should have proper quota support so that a domain can't
> >>> set up excessive numbers of watches.
> >>
> >> This would be dom0 unless you arrange the device model to be accounted
> >> as the domid it is running for. But this is problematic with a xenstore
> >> domain again.
> >>
> > 
> > The quota could be based on "connection" (ring or socket) and
> > counted as per-connection? Just throwing ideas around, not necessarily
> > saying this is the way to go.
> 
> Sure. But with xenstore domain all xenstore access of dom0 is via one
> ring. And how would you want to apply any quota here solving our
> problem with one qemu process in dom0 creating stale watches? You

That's a job for the kernel driver.

The quota inside xenstored is to protect itself from abuse from one
single connection and punish the bad actors.

> could open a new connection for the device model, of course, but this
> would require some xenbus rework.
> 

I wouldn't go down that route unless absolutely necessary  because that
seems to require xenbus protocol extension.

Wei.

> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Juergen Gross
On 13/07/16 15:52, Wei Liu wrote:
> On Wed, Jul 13, 2016 at 03:25:45PM +0200, Juergen Gross wrote:
>> On 13/07/16 15:07, Wei Liu wrote:
>>> On Wed, Jul 13, 2016 at 02:21:38PM +0200, Juergen Gross wrote:
 On 06/07/16 09:31, Juergen Gross wrote:
> While testing some patches for support of ballooning in Mini-OS by using
> the xenstore domain I realized that each xl create/destroy pair would
> increase memory consumption in Mini-OS by about 5kB. Wondering whether
> this is a xenstore domain only effect I did the same test with xenstored
> and oxenstored daemons.
>
> xenstored showed the same behavior, the "referenced" size showed by the
> pmap command grew by about 5kB for each create/destroy pair.
>
> oxenstored seemed to be even worse in the beginning (about 6kB for each
> pair), but after about 100 create/destroys the value seemed to be
> rather stable.
>
> Did anyone notice this memory leak before?

 I think I've found the problem:

 qemu as the device model is setting up a xenstore watch for each backend
 type it is supporting. Unfortunately those watches are never removed
 again. This sums up to the observed memory leak.

 I'm not sure how oxenstored is avoiding the problem, may be by testing
 socket connections to be still alive and so detecting qemu has gone.
 OTOH this won't help for oxenstored running in another domain than the
 device model (either due to oxenstore-stubdom, or a driver domain with
 a qemu based device model).

>>>
>>> How unfortunate.
>>>
>>> My gut feeling is that xenstored shouldn't have the knowledge to
>>> associate a watch with a "process". The concept of a process is only
>>> meaningful to OS, which wouldn't work on cross-domain xenstored setup.
>>
>> Right.
>>
>>> Maybe the OS xenbus driver should reap all watches on behalf the dead
>>> process. This would also avoid a crashed QEMU leaking resources.
>>>
>>> And xenstored should have proper quota support so that a domain can't
>>> set up excessive numbers of watches.
>>
>> This would be dom0 unless you arrange the device model to be accounted
>> as the domid it is running for. But this is problematic with a xenstore
>> domain again.
>>
> 
> The quota could be based on "connection" (ring or socket) and
> counted as per-connection? Just throwing ideas around, not necessarily
> saying this is the way to go.

Sure. But with xenstore domain all xenstore access of dom0 is via one
ring. And how would you want to apply any quota here solving our
problem with one qemu process in dom0 creating stale watches? You
could open a new connection for the device model, of course, but this
would require some xenbus rework.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Wei Liu
On Wed, Jul 13, 2016 at 03:25:45PM +0200, Juergen Gross wrote:
> On 13/07/16 15:07, Wei Liu wrote:
> > On Wed, Jul 13, 2016 at 02:21:38PM +0200, Juergen Gross wrote:
> >> On 06/07/16 09:31, Juergen Gross wrote:
> >>> While testing some patches for support of ballooning in Mini-OS by using
> >>> the xenstore domain I realized that each xl create/destroy pair would
> >>> increase memory consumption in Mini-OS by about 5kB. Wondering whether
> >>> this is a xenstore domain only effect I did the same test with xenstored
> >>> and oxenstored daemons.
> >>>
> >>> xenstored showed the same behavior, the "referenced" size showed by the
> >>> pmap command grew by about 5kB for each create/destroy pair.
> >>>
> >>> oxenstored seemed to be even worse in the beginning (about 6kB for each
> >>> pair), but after about 100 create/destroys the value seemed to be
> >>> rather stable.
> >>>
> >>> Did anyone notice this memory leak before?
> >>
> >> I think I've found the problem:
> >>
> >> qemu as the device model is setting up a xenstore watch for each backend
> >> type it is supporting. Unfortunately those watches are never removed
> >> again. This sums up to the observed memory leak.
> >>
> >> I'm not sure how oxenstored is avoiding the problem, may be by testing
> >> socket connections to be still alive and so detecting qemu has gone.
> >> OTOH this won't help for oxenstored running in another domain than the
> >> device model (either due to oxenstore-stubdom, or a driver domain with
> >> a qemu based device model).
> >>
> > 
> > How unfortunate.
> > 
> > My gut feeling is that xenstored shouldn't have the knowledge to
> > associate a watch with a "process". The concept of a process is only
> > meaningful to OS, which wouldn't work on cross-domain xenstored setup.
> 
> Right.
> 
> > Maybe the OS xenbus driver should reap all watches on behalf the dead
> > process. This would also avoid a crashed QEMU leaking resources.
> > 
> > And xenstored should have proper quota support so that a domain can't
> > set up excessive numbers of watches.
> 
> This would be dom0 unless you arrange the device model to be accounted
> as the domid it is running for. But this is problematic with a xenstore
> domain again.
> 

The quota could be based on "connection" (ring or socket) and
counted as per-connection? Just throwing ideas around, not necessarily
saying this is the way to go.

Wei.

> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Juergen Gross
On 13/07/16 15:17, David Vrabel wrote:
> On 13/07/16 14:07, Wei Liu wrote:
>>
>> My gut feeling is that xenstored shouldn't have the knowledge to
>> associate a watch with a "process". The concept of a process is only
>> meaningful to OS, which wouldn't work on cross-domain xenstored setup.
>> Maybe the OS xenbus driver should reap all watches on behalf the dead
>> process. This would also avoid a crashed QEMU leaking resources.
> 
> The Linux driver already cleans up open transactions and removes watches
> when the file handle is released.

Hmm, does this work reliably? I observed a memory leak of about 5kB per
create/destroy domain pair with xenstored _and_ with xenstore domain.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Wei Liu
On Wed, Jul 13, 2016 at 02:20:28PM +0100, Ian Jackson wrote:
> Wei Liu writes ("Re: [Xen-devel] xenstored memory leak"):
> > On Wed, Jul 13, 2016 at 02:21:38PM +0200, Juergen Gross wrote:
> > > qemu as the device model is setting up a xenstore watch for each backend
> > > type it is supporting. Unfortunately those watches are never removed
> > > again. This sums up to the observed memory leak.
> 
> I think this must be a bug in C xenstored.
> 
> > > I'm not sure how oxenstored is avoiding the problem, may be by testing
> > > socket connections to be still alive and so detecting qemu has gone.
> > > OTOH this won't help for oxenstored running in another domain than the
> > > device model (either due to oxenstore-stubdom, or a driver domain with
> > > a qemu based device model).
> > 
> > How unfortunate.
> > 
> > My gut feeling is that xenstored shouldn't have the knowledge to
> > associate a watch with a "process".
> 
> xenstored needs to associate watches with connections.  If a
> connection is terminated, the watches need to be cleaned up, along
> with whatever other things "belong" to that connection (notably
> transactions, and replies in flight).
> 
> Here a `connection' might be a socket, or a ring.
> 

Agreed.

> C xenstored does have code which tries to do this.  It's a bit
> impenetrable, though, because it's done through destructors provided
> to the reference counting membery allocator (!)
> 
> > The concept of a process is only meaningful to OS, which wouldn't
> > work on cross-domain xenstored setup.  Maybe the OS xenbus driver
> > should reap all watches on behalf the dead process. This would also
> > avoid a crashed QEMU leaking resources.
> 
> The OS xenbus driver needs to mediate everything, so that it can
> direct replies to the right places etc.  It needs to (and does)
> maintain a list of watches.  When a process closes the xenbus device,
> it destroys the watches by issuing commands to the actual xenstored
> via its ring connection.
> 
> I guess that the qemu in this case is making a socket connection to
> xenstored.
> 

Yeah, I suspect that as well.

Wei.

> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread David Vrabel
On 13/07/16 14:32, Juergen Gross wrote:
> On 13/07/16 15:17, David Vrabel wrote:
>> On 13/07/16 14:07, Wei Liu wrote:
>>>
>>> My gut feeling is that xenstored shouldn't have the knowledge to
>>> associate a watch with a "process". The concept of a process is only
>>> meaningful to OS, which wouldn't work on cross-domain xenstored setup.
>>> Maybe the OS xenbus driver should reap all watches on behalf the dead
>>> process. This would also avoid a crashed QEMU leaking resources.
>>
>> The Linux driver already cleans up open transactions and removes watches
>> when the file handle is released.
> 
> Hmm, does this work reliably? I observed a memory leak of about 5kB per
> create/destroy domain pair with xenstored _and_ with xenstore domain.

Don't know.

I only took a quick look at the xenbus_file_release() function and it
looked like it ought to be doing the right thing...

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Ian Jackson
Juergen Gross writes ("Re: [Xen-devel] xenstored memory leak"):
> On 13/07/16 14:40, Andrew Cooper wrote:
> > On 13/07/16 13:21, Juergen Gross wrote:
> >> I'll post a qemu patch to remove those watches on exit soon.

I don't think this is right.  qemu should not explictly remove watches
when it is exiting.  The cleanup should be handled by xenstored
directly (if qemu is connecting via a socket) or by the OS xenbus
driver (otherwise).  Each of those entities keeps a list of which of
their clients owns what watches.

> > That is good, but there needs to be further consideration as to how to
> > clean up after a crashed qemu/etc.
> 
> Hmm, I see two possibilities:
> 
> a) Add a possibility to do xs_unwatch() with a path and a token with
>wildcards (e.g.: xs_unwatch(xs, "/local/domain/0/backend/qdisk",
>"be:*:97:*") for removing all qdisk watches for domid 97) and use
>this when doing the xenstore cleanup from libxl on domain
>destruction.
> 
> b) Instead of watching /local/domain//backend/ let qemu
>create /local/domain//backend// and watch
>this. When this directory is being removed by libxl all the watches
>will be gone automatically.

This is going in entirely the wrong direction.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Juergen Gross
On 13/07/16 15:07, Wei Liu wrote:
> On Wed, Jul 13, 2016 at 02:21:38PM +0200, Juergen Gross wrote:
>> On 06/07/16 09:31, Juergen Gross wrote:
>>> While testing some patches for support of ballooning in Mini-OS by using
>>> the xenstore domain I realized that each xl create/destroy pair would
>>> increase memory consumption in Mini-OS by about 5kB. Wondering whether
>>> this is a xenstore domain only effect I did the same test with xenstored
>>> and oxenstored daemons.
>>>
>>> xenstored showed the same behavior, the "referenced" size showed by the
>>> pmap command grew by about 5kB for each create/destroy pair.
>>>
>>> oxenstored seemed to be even worse in the beginning (about 6kB for each
>>> pair), but after about 100 create/destroys the value seemed to be
>>> rather stable.
>>>
>>> Did anyone notice this memory leak before?
>>
>> I think I've found the problem:
>>
>> qemu as the device model is setting up a xenstore watch for each backend
>> type it is supporting. Unfortunately those watches are never removed
>> again. This sums up to the observed memory leak.
>>
>> I'm not sure how oxenstored is avoiding the problem, may be by testing
>> socket connections to be still alive and so detecting qemu has gone.
>> OTOH this won't help for oxenstored running in another domain than the
>> device model (either due to oxenstore-stubdom, or a driver domain with
>> a qemu based device model).
>>
> 
> How unfortunate.
> 
> My gut feeling is that xenstored shouldn't have the knowledge to
> associate a watch with a "process". The concept of a process is only
> meaningful to OS, which wouldn't work on cross-domain xenstored setup.

Right.

> Maybe the OS xenbus driver should reap all watches on behalf the dead
> process. This would also avoid a crashed QEMU leaking resources.
> 
> And xenstored should have proper quota support so that a domain can't
> set up excessive numbers of watches.

This would be dom0 unless you arrange the device model to be accounted
as the domid it is running for. But this is problematic with a xenstore
domain again.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Juergen Gross
On 13/07/16 14:40, Andrew Cooper wrote:
> On 13/07/16 13:21, Juergen Gross wrote:
>> On 06/07/16 09:31, Juergen Gross wrote:
>>> While testing some patches for support of ballooning in Mini-OS by using
>>> the xenstore domain I realized that each xl create/destroy pair would
>>> increase memory consumption in Mini-OS by about 5kB. Wondering whether
>>> this is a xenstore domain only effect I did the same test with xenstored
>>> and oxenstored daemons.
>>>
>>> xenstored showed the same behavior, the "referenced" size showed by the
>>> pmap command grew by about 5kB for each create/destroy pair.
>>>
>>> oxenstored seemed to be even worse in the beginning (about 6kB for each
>>> pair), but after about 100 create/destroys the value seemed to be
>>> rather stable.
>>>
>>> Did anyone notice this memory leak before?
>> I think I've found the problem:
>>
>> qemu as the device model is setting up a xenstore watch for each backend
>> type it is supporting. Unfortunately those watches are never removed
>> again. This sums up to the observed memory leak.
>>
>> I'm not sure how oxenstored is avoiding the problem, may be by testing
>> socket connections to be still alive and so detecting qemu has gone.
>> OTOH this won't help for oxenstored running in another domain than the
>> device model (either due to oxenstore-stubdom, or a driver domain with
>> a qemu based device model).
> 
> I do seem to remember something along those lines.
> 
>>
>> I'll post a qemu patch to remove those watches on exit soon.
> 
> That is good, but there needs to be further consideration as to how to
> clean up after a crashed qemu/etc.

Hmm, I see two possibilities:

a) Add a possibility to do xs_unwatch() with a path and a token with
   wildcards (e.g.: xs_unwatch(xs, "/local/domain/0/backend/qdisk",
   "be:*:97:*") for removing all qdisk watches for domid 97) and use
   this when doing the xenstore cleanup from libxl on domain
   destruction.

b) Instead of watching /local/domain//backend/ let qemu
   create /local/domain//backend// and watch
   this. When this directory is being removed by libxl all the watches
   will be gone automatically.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Ian Jackson
Wei Liu writes ("Re: [Xen-devel] xenstored memory leak"):
> On Wed, Jul 13, 2016 at 02:21:38PM +0200, Juergen Gross wrote:
> > qemu as the device model is setting up a xenstore watch for each backend
> > type it is supporting. Unfortunately those watches are never removed
> > again. This sums up to the observed memory leak.

I think this must be a bug in C xenstored.

> > I'm not sure how oxenstored is avoiding the problem, may be by testing
> > socket connections to be still alive and so detecting qemu has gone.
> > OTOH this won't help for oxenstored running in another domain than the
> > device model (either due to oxenstore-stubdom, or a driver domain with
> > a qemu based device model).
> 
> How unfortunate.
> 
> My gut feeling is that xenstored shouldn't have the knowledge to
> associate a watch with a "process".

xenstored needs to associate watches with connections.  If a
connection is terminated, the watches need to be cleaned up, along
with whatever other things "belong" to that connection (notably
transactions, and replies in flight).

Here a `connection' might be a socket, or a ring.

C xenstored does have code which tries to do this.  It's a bit
impenetrable, though, because it's done through destructors provided
to the reference counting membery allocator (!)

> The concept of a process is only meaningful to OS, which wouldn't
> work on cross-domain xenstored setup.  Maybe the OS xenbus driver
> should reap all watches on behalf the dead process. This would also
> avoid a crashed QEMU leaking resources.

The OS xenbus driver needs to mediate everything, so that it can
direct replies to the right places etc.  It needs to (and does)
maintain a list of watches.  When a process closes the xenbus device,
it destroys the watches by issuing commands to the actual xenstored
via its ring connection.

I guess that the qemu in this case is making a socket connection to
xenstored.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread David Vrabel
On 13/07/16 14:07, Wei Liu wrote:
> 
> My gut feeling is that xenstored shouldn't have the knowledge to
> associate a watch with a "process". The concept of a process is only
> meaningful to OS, which wouldn't work on cross-domain xenstored setup.
> Maybe the OS xenbus driver should reap all watches on behalf the dead
> process. This would also avoid a crashed QEMU leaking resources.

The Linux driver already cleans up open transactions and removes watches
when the file handle is released.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Wei Liu
On Wed, Jul 13, 2016 at 02:21:38PM +0200, Juergen Gross wrote:
> On 06/07/16 09:31, Juergen Gross wrote:
> > While testing some patches for support of ballooning in Mini-OS by using
> > the xenstore domain I realized that each xl create/destroy pair would
> > increase memory consumption in Mini-OS by about 5kB. Wondering whether
> > this is a xenstore domain only effect I did the same test with xenstored
> > and oxenstored daemons.
> > 
> > xenstored showed the same behavior, the "referenced" size showed by the
> > pmap command grew by about 5kB for each create/destroy pair.
> > 
> > oxenstored seemed to be even worse in the beginning (about 6kB for each
> > pair), but after about 100 create/destroys the value seemed to be
> > rather stable.
> > 
> > Did anyone notice this memory leak before?
> 
> I think I've found the problem:
> 
> qemu as the device model is setting up a xenstore watch for each backend
> type it is supporting. Unfortunately those watches are never removed
> again. This sums up to the observed memory leak.
> 
> I'm not sure how oxenstored is avoiding the problem, may be by testing
> socket connections to be still alive and so detecting qemu has gone.
> OTOH this won't help for oxenstored running in another domain than the
> device model (either due to oxenstore-stubdom, or a driver domain with
> a qemu based device model).
> 

How unfortunate.

My gut feeling is that xenstored shouldn't have the knowledge to
associate a watch with a "process". The concept of a process is only
meaningful to OS, which wouldn't work on cross-domain xenstored setup.
Maybe the OS xenbus driver should reap all watches on behalf the dead
process. This would also avoid a crashed QEMU leaking resources.

And xenstored should have proper quota support so that a domain can't
set up excessive numbers of watches.

> I'll post a qemu patch to remove those watches on exit soon.
> 
> To find the problem I've added some debug aid to xenstored: when
> a special parameter is specified on invocation it will dump its memory
> allocation structure via talloc_report_full() to a file whenever it is
> receiving a SIGUSR1 signal. Anybody interested in this patch?
> 

Wouldn't hurt to post to the list. If we take it we take it, if we don't
it would still be useful for people who want custom debug patch later.

Wei.

> 
> Juergen
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Andrew Cooper
On 13/07/16 13:21, Juergen Gross wrote:
> On 06/07/16 09:31, Juergen Gross wrote:
>> While testing some patches for support of ballooning in Mini-OS by using
>> the xenstore domain I realized that each xl create/destroy pair would
>> increase memory consumption in Mini-OS by about 5kB. Wondering whether
>> this is a xenstore domain only effect I did the same test with xenstored
>> and oxenstored daemons.
>>
>> xenstored showed the same behavior, the "referenced" size showed by the
>> pmap command grew by about 5kB for each create/destroy pair.
>>
>> oxenstored seemed to be even worse in the beginning (about 6kB for each
>> pair), but after about 100 create/destroys the value seemed to be
>> rather stable.
>>
>> Did anyone notice this memory leak before?
> I think I've found the problem:
>
> qemu as the device model is setting up a xenstore watch for each backend
> type it is supporting. Unfortunately those watches are never removed
> again. This sums up to the observed memory leak.
>
> I'm not sure how oxenstored is avoiding the problem, may be by testing
> socket connections to be still alive and so detecting qemu has gone.
> OTOH this won't help for oxenstored running in another domain than the
> device model (either due to oxenstore-stubdom, or a driver domain with
> a qemu based device model).

I do seem to remember something along those lines.

>
> I'll post a qemu patch to remove those watches on exit soon.

That is good, but there needs to be further consideration as to how to
clean up after a crashed qemu/etc.

>
> To find the problem I've added some debug aid to xenstored: when
> a special parameter is specified on invocation it will dump its memory
> allocation structure via talloc_report_full() to a file whenever it is
> receiving a SIGUSR1 signal. Anybody interested in this patch?

Sounds useful.  +1

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-13 Thread Juergen Gross
On 06/07/16 09:31, Juergen Gross wrote:
> While testing some patches for support of ballooning in Mini-OS by using
> the xenstore domain I realized that each xl create/destroy pair would
> increase memory consumption in Mini-OS by about 5kB. Wondering whether
> this is a xenstore domain only effect I did the same test with xenstored
> and oxenstored daemons.
> 
> xenstored showed the same behavior, the "referenced" size showed by the
> pmap command grew by about 5kB for each create/destroy pair.
> 
> oxenstored seemed to be even worse in the beginning (about 6kB for each
> pair), but after about 100 create/destroys the value seemed to be
> rather stable.
> 
> Did anyone notice this memory leak before?

I think I've found the problem:

qemu as the device model is setting up a xenstore watch for each backend
type it is supporting. Unfortunately those watches are never removed
again. This sums up to the observed memory leak.

I'm not sure how oxenstored is avoiding the problem, may be by testing
socket connections to be still alive and so detecting qemu has gone.
OTOH this won't help for oxenstored running in another domain than the
device model (either due to oxenstore-stubdom, or a driver domain with
a qemu based device model).

I'll post a qemu patch to remove those watches on exit soon.

To find the problem I've added some debug aid to xenstored: when
a special parameter is specified on invocation it will dump its memory
allocation structure via talloc_report_full() to a file whenever it is
receiving a SIGUSR1 signal. Anybody interested in this patch?


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-07 Thread Wei Liu
On Wed, Jul 06, 2016 at 09:31:38AM +0200, Juergen Gross wrote:
> While testing some patches for support of ballooning in Mini-OS by using
> the xenstore domain I realized that each xl create/destroy pair would
> increase memory consumption in Mini-OS by about 5kB. Wondering whether
> this is a xenstore domain only effect I did the same test with xenstored
> and oxenstored daemons.
> 
> xenstored showed the same behavior, the "referenced" size showed by the
> pmap command grew by about 5kB for each create/destroy pair.
> 
> oxenstored seemed to be even worse in the beginning (about 6kB for each
> pair), but after about 100 create/destroys the value seemed to be
> rather stable.
> 
> Did anyone notice this memory leak before?
> 

No. But I don't normally look at xenstored memory consumption anyway...

> David, it seems ocaml based xenstore doesn't have such a problem. How
> mature is the xenstore Mirage-OS variant? Could it be easily integrated
> into the Xen build?
> 
> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-06 Thread Andrew Cooper
On 06/07/16 14:55, Juergen Gross wrote:
> On 06/07/16 15:48, Andrew Cooper wrote:
>> On 06/07/16 08:31, Juergen Gross wrote:
>>> While testing some patches for support of ballooning in Mini-OS by using
>>> the xenstore domain I realized that each xl create/destroy pair would
>>> increase memory consumption in Mini-OS by about 5kB. Wondering whether
>>> this is a xenstore domain only effect I did the same test with xenstored
>>> and oxenstored daemons.
>>>
>>> xenstored showed the same behavior, the "referenced" size showed by the
>>> pmap command grew by about 5kB for each create/destroy pair.
>>>
>>> oxenstored seemed to be even worse in the beginning (about 6kB for each
>>> pair), but after about 100 create/destroys the value seemed to be
>>> rather stable.
>> Do you mean that after a while, you see oxenstored not leaking any
>> further memory, even with new domains being created?
> In my test: yes. I did:
>
> while true
> do
>   xl create minios.xl
>   sleep 3
>   xl shutdown minios
>   sleep 2
> done
>
> After about 200 iterations memory usage with oxenstored was stable. I
> stopped the loop after more than 1000 iterations.
>
>> Ocaml is a garbage collected languague, so you would expect the process
>> to get larger until the GC decides to kick in.
> Okay. This explains the pattern.
>
>>> Did anyone notice this memory leak before?
>> We have not encountered this in XenServer stress scenarios.
> You are using oxenstored, right? The real leak is in xenstored only.

Correct.

>
>> (It is entirely possible that this specific to something xl does which
>> Xapi doesn't.)
> I doubt that. I'm seeing the leak with the C-variant of xenstore, both
> as daemon and as stubdom.

Right, and we haven't used C xenstored in the last decade.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-06 Thread Juergen Gross
On 06/07/16 15:48, Andrew Cooper wrote:
> On 06/07/16 08:31, Juergen Gross wrote:
>> While testing some patches for support of ballooning in Mini-OS by using
>> the xenstore domain I realized that each xl create/destroy pair would
>> increase memory consumption in Mini-OS by about 5kB. Wondering whether
>> this is a xenstore domain only effect I did the same test with xenstored
>> and oxenstored daemons.
>>
>> xenstored showed the same behavior, the "referenced" size showed by the
>> pmap command grew by about 5kB for each create/destroy pair.
>>
>> oxenstored seemed to be even worse in the beginning (about 6kB for each
>> pair), but after about 100 create/destroys the value seemed to be
>> rather stable.
> 
> Do you mean that after a while, you see oxenstored not leaking any
> further memory, even with new domains being created?

In my test: yes. I did:

while true
do
  xl create minios.xl
  sleep 3
  xl shutdown minios
  sleep 2
done

After about 200 iterations memory usage with oxenstored was stable. I
stopped the loop after more than 1000 iterations.

> Ocaml is a garbage collected languague, so you would expect the process
> to get larger until the GC decides to kick in.

Okay. This explains the pattern.

>> Did anyone notice this memory leak before?
> 
> We have not encountered this in XenServer stress scenarios.

You are using oxenstored, right? The real leak is in xenstored only.

> (It is entirely possible that this specific to something xl does which
> Xapi doesn't.)

I doubt that. I'm seeing the leak with the C-variant of xenstore, both
as daemon and as stubdom.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xenstored memory leak

2016-07-06 Thread Andrew Cooper
On 06/07/16 08:31, Juergen Gross wrote:
> While testing some patches for support of ballooning in Mini-OS by using
> the xenstore domain I realized that each xl create/destroy pair would
> increase memory consumption in Mini-OS by about 5kB. Wondering whether
> this is a xenstore domain only effect I did the same test with xenstored
> and oxenstored daemons.
>
> xenstored showed the same behavior, the "referenced" size showed by the
> pmap command grew by about 5kB for each create/destroy pair.
>
> oxenstored seemed to be even worse in the beginning (about 6kB for each
> pair), but after about 100 create/destroys the value seemed to be
> rather stable.

Do you mean that after a while, you see oxenstored not leaking any
further memory, even with new domains being created?

Ocaml is a garbage collected languague, so you would expect the process
to get larger until the GC decides to kick in.

>
> Did anyone notice this memory leak before?

We have not encountered this in XenServer stress scenarios.

(It is entirely possible that this specific to something xl does which
Xapi doesn't.)

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] xenstored memory leak

2016-07-06 Thread Juergen Gross
While testing some patches for support of ballooning in Mini-OS by using
the xenstore domain I realized that each xl create/destroy pair would
increase memory consumption in Mini-OS by about 5kB. Wondering whether
this is a xenstore domain only effect I did the same test with xenstored
and oxenstored daemons.

xenstored showed the same behavior, the "referenced" size showed by the
pmap command grew by about 5kB for each create/destroy pair.

oxenstored seemed to be even worse in the beginning (about 6kB for each
pair), but after about 100 create/destroys the value seemed to be
rather stable.

Did anyone notice this memory leak before?

David, it seems ocaml based xenstore doesn't have such a problem. How
mature is the xenstore Mirage-OS variant? Could it be easily integrated
into the Xen build?


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel