Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-03-17 Thread Bob Liu
Hi Felipe,

On 03/06/2015 06:30 PM, Felipe Franciosi wrote:
 -Original Message-
 From: Bob Liu [mailto:bob@oracle.com]
 Sent: 05 March 2015 00:47
 To: Konrad Rzeszutek Wilk
 Cc: Roger Pau Monne; Felipe Franciosi; David Vrabel; xen-devel@lists.xen.org;
 linux-ker...@vger.kernel.org; ax...@fb.com; h...@infradead.org;
 avanzini.aria...@gmail.com; cheg...@amazon.de
 Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new
 struct


 ...snip...

 Meaning you weren't able to do the same test?


 I can if there are more details about how to set up this 5 and 10 guests
 environment and test pattern have been used.
 Just think it might be save time if somebody still have the similar 
 environment
 by hand.
 Roger and Felipe, if you still have the environment could you please have a
 quick compare about feature-persistent performance with patch [PATCH v5 0/2]
 gnttab: Improve scaleability?
 
 I've been meaning to do that. I don't have the environment up, but it isn't 
 too hard to put it back together. A bit swamped at the moment, but will try 
 (very hard) to do it next week.
 

Do you have gotten any testing result?

-- 
Regards,
-Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-03-17 Thread Bob Liu

On 03/17/2015 10:52 PM, Felipe Franciosi wrote:
 Hi Bob,
 
 I've put the hardware back together and am sorting out the software for 
 testing. Things are not moving as fast as I wanted due to other commitments. 
 I'll keep this thread updated as I progress. Malcolm is OOO and I'm trying to 
 get his patches to work on a newer Xen.
 

Thank you!

 The evaluation will compare:
 1) bare metal i/o (for baseline)
 2) tapdisk3 (currently using grant copy, which is what scales best in my 
 experience)
 3) blkback w/ persistent grants
 4) blkback w/o persistent grants (I will just comment out the handshake bits 
 in blkback/blkfront)
 5) blkback w/o persistent grants + Malcolm's grant map patches
 

I think you need to add the patches from Christoph Egger with title
[PATCH v5 0/2] gnttab: Improve scaleability here.
http://lists.xen.org/archives/html/xen-devel/2015-02/msg01188.html


 To my knowledge, blkback (w/ or w/o persistent grants) is always faster than 
 user space alternatives (e.g. tapdisk, qemu-qdisk) as latency is much lower. 
 However, tapdisk with grant copy has been shown to produce (much) better 
 aggregate throughput figures as it avoids any issues with grant (un)mapping.
 
 I'm hoping to show that (5) above scales better than (3) and (4) in a 
 representative scenario. If it does, I will recommend that we get rid of 
 persistent grants in favour of a better and more scalable grant (un)mapping 
 implementation.
 

Right, but even if 5) have better performance, we have to make sure
older hypervisors with new linux kernel won't be affected after get rid
of persistent grants.

-- 
Regards,
-Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-03-17 Thread Felipe Franciosi
Hi Bob,

 -Original Message-
 From: Bob Liu [mailto:bob@oracle.com]
 Sent: 17 March 2015 07:00
 To: Felipe Franciosi
 Cc: Konrad Rzeszutek Wilk; Roger Pau Monne; David Vrabel; xen-
 de...@lists.xen.org; linux-ker...@vger.kernel.org; ax...@fb.com;
 h...@infradead.org; avanzini.aria...@gmail.com; cheg...@amazon.de
 Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new
 struct
 
 Hi Felipe,
 
 On 03/06/2015 06:30 PM, Felipe Franciosi wrote:
  -Original Message-
  From: Bob Liu [mailto:bob@oracle.com]
  Sent: 05 March 2015 00:47
  To: Konrad Rzeszutek Wilk
  Cc: Roger Pau Monne; Felipe Franciosi; David Vrabel;
  xen-devel@lists.xen.org; linux-ker...@vger.kernel.org; ax...@fb.com;
  h...@infradead.org; avanzini.aria...@gmail.com; cheg...@amazon.de
  Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to
  an new struct
 
 
  ...snip...
 
  Meaning you weren't able to do the same test?
 
 
  I can if there are more details about how to set up this 5 and 10
  guests environment and test pattern have been used.
  Just think it might be save time if somebody still have the similar
  environment by hand.
  Roger and Felipe, if you still have the environment could you please
  have a quick compare about feature-persistent performance with patch
  [PATCH v5 0/2]
  gnttab: Improve scaleability?
 
  I've been meaning to do that. I don't have the environment up, but it isn't 
  too
 hard to put it back together. A bit swamped at the moment, but will try (very
 hard) to do it next week.
 
 
 Do you have gotten any testing result?

I've put the hardware back together and am sorting out the software for 
testing. Things are not moving as fast as I wanted due to other commitments. 
I'll keep this thread updated as I progress. Malcolm is OOO and I'm trying to 
get his patches to work on a newer Xen.

The evaluation will compare:
1) bare metal i/o (for baseline)
2) tapdisk3 (currently using grant copy, which is what scales best in my 
experience)
3) blkback w/ persistent grants
4) blkback w/o persistent grants (I will just comment out the handshake bits in 
blkback/blkfront)
5) blkback w/o persistent grants + Malcolm's grant map patches

To my knowledge, blkback (w/ or w/o persistent grants) is always faster than 
user space alternatives (e.g. tapdisk, qemu-qdisk) as latency is much lower. 
However, tapdisk with grant copy has been shown to produce (much) better 
aggregate throughput figures as it avoids any issues with grant (un)mapping.

I'm hoping to show that (5) above scales better than (3) and (4) in a 
representative scenario. If it does, I will recommend that we get rid of 
persistent grants in favour of a better and more scalable grant (un)mapping 
implementation.

Comments welcome.

Cheers,
F.

 
 --
 Regards,
 -Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-03-06 Thread Felipe Franciosi
 -Original Message-
 From: Bob Liu [mailto:bob@oracle.com]
 Sent: 05 March 2015 00:47
 To: Konrad Rzeszutek Wilk
 Cc: Roger Pau Monne; Felipe Franciosi; David Vrabel; xen-devel@lists.xen.org;
 linux-ker...@vger.kernel.org; ax...@fb.com; h...@infradead.org;
 avanzini.aria...@gmail.com; cheg...@amazon.de
 Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new
 struct
 
 
 ...snip...
 
  Meaning you weren't able to do the same test?
 
 
 I can if there are more details about how to set up this 5 and 10 guests
 environment and test pattern have been used.
 Just think it might be save time if somebody still have the similar 
 environment
 by hand.
 Roger and Felipe, if you still have the environment could you please have a
 quick compare about feature-persistent performance with patch [PATCH v5 0/2]
 gnttab: Improve scaleability?

I've been meaning to do that. I don't have the environment up, but it isn't too 
hard to put it back together. A bit swamped at the moment, but will try (very 
hard) to do it next week.

Cheers,
Felipe

 
 Thanks,
 -Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-03-04 Thread Konrad Rzeszutek Wilk
  David assertion that better performance and scalbility can be gained
  with grant table locking and TLB flush avoidance is interesting - as
  1). The grant locking is going in Xen 4.6 but not earlier - so when running
  on older hypervisors this gives an performance benefit.
  
  2). I have not seen any prototype TLB flush avoidance code so not know
  when that would be available.
  
  Perhaps a better choice is to do the removal of the persistence support
  when the changes in Xen hypervisor are known?
  
 
 With patch: [PATCH v5 0/2] gnttab: Improve scaleability, I can get
 nearly the same performance as without persistence support.
 
 But I'm not sure about the benchmark described here:
 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/block/xen-blkfront.c?id=0a8704a51f386cab7394e38ff1d66eef924d8ab8

Meaning you weren't able to do the same test?

 
 -- 
 Regards,
 -Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-03-04 Thread Bob Liu

On 03/05/2015 05:21 AM, Konrad Rzeszutek Wilk wrote:
 David assertion that better performance and scalbility can be gained
 with grant table locking and TLB flush avoidance is interesting - as
 1). The grant locking is going in Xen 4.6 but not earlier - so when running
 on older hypervisors this gives an performance benefit.

 2). I have not seen any prototype TLB flush avoidance code so not know
 when that would be available.

 Perhaps a better choice is to do the removal of the persistence support
 when the changes in Xen hypervisor are known?


 With patch: [PATCH v5 0/2] gnttab: Improve scaleability, I can get
 nearly the same performance as without persistence support.

 But I'm not sure about the benchmark described here:
 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/block/xen-blkfront.c?id=0a8704a51f386cab7394e38ff1d66eef924d8ab8
 
 Meaning you weren't able to do the same test?
 

I can if there are more details about how to set up this 5 and 10 guests
environment and test pattern have been used.
Just think it might be save time if somebody still have the similar
environment by hand.
Roger and Felipe, if you still have the environment could you please
have a quick compare about feature-persistent performance with patch
[PATCH v5 0/2] gnttab: Improve scaleability?

Thanks,
-Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-02-20 Thread Konrad Rzeszutek Wilk
 
  Agree, Life would be easier if we can remove the persistent feature.

..snip..
 
  If Konrad/Bob agree I would like to send a patch to remove persistent
  grants and then have the multiqueue series rebased on top of that.

..snip..
 
  I agree with this.
 
  I think we can get better  performance/scalability gains of with 
  improvements
  to grant table locking and TLB flush avoidance.
 
  David
  
  It doesn't change the fact that persistent grants (as well as the grant 
  copy implementation we did for tapdisk3) were alternatives that allowed 
  aggregate storage performance to increase drastically. Before committing to 
  removing something that allow Xen users to scale their deployments, I think 
  we need to revisit whether the recent improvements to the whole grant 
  mechanisms (grant table locking, TLB flushing, batched calls, etc) are 
  performing as we would (now) expect.
 
 The fact that this extension improved performance doesn't mean it's
 right or desirable. So IMHO we should just remove it and take the
 performance hit. Then we can figure ways to deal with the limitations

.. snip..

Removing code just because without a clear forward plan might lead to
re-instating said code back again - if no forward plan has been achieved.

If the matter here is purely code complication I would stress that doing
cleanups in code can simplify this - as in the code can do with some
moving of the 'grant' ops (persistent or not) in a different file.

That ought to short-term remove the problems with the 'if (persistent_grant)'
problem.

David assertion that better performance and scalbility can be gained
with grant table locking and TLB flush avoidance is interesting - as
1). The grant locking is going in Xen 4.6 but not earlier - so when running
on older hypervisors this gives an performance benefit.

2). I have not seen any prototype TLB flush avoidance code so not know
when that would be available.

Perhaps a better choice is to do the removal of the persistence support
when the changes in Xen hypervisor are known?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-02-19 Thread David Vrabel
On 19/02/15 11:08, Roger Pau Monné wrote:
 El 19/02/15 a les 3.05, Bob Liu ha escrit:


 On 02/19/2015 02:08 AM, Felipe Franciosi wrote:
 -Original Message-
 From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
 Sent: 18 February 2015 17:38
 To: Roger Pau Monne
 Cc: Bob Liu; xen-devel@lists.xen.org; David Vrabel; linux-
 ker...@vger.kernel.org; Felipe Franciosi; ax...@fb.com; h...@infradead.org;
 avanzini.aria...@gmail.com
 Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an 
 new
 struct

 On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote:
 El 15/02/15 a les 9.18, Bob Liu ha escrit:
 AFAICT you seem to have a list of persistent grants, indirect pages
 and a grant table callback for each ring, isn't this supposed to be
 shared between all rings?

 I don't think we should be going down that route, or else we can hoard
 a large amount of memory and grants.

 It does remove the lock that would have to be accessed by each ring thread 
 to
 access those. Those values (grants) can be limited to be a smaller value 
 such
 that the overall number is the same as it was with the previous version. 
 As in:
 each ring has = MAX_GRANTS / nr_online_cpus().


 We should definitely be concerned with the amount of memory consumed on the 
 backend for each plugged virtual disk. We have faced several problems in 
 XenServer around this area before; it drastically affects VBD scalability 
 per host.


 Right, so we have to keep both the lock and the amount of memory
 consumed in mind.

 This makes me think that all the persistent grants work was done as a 
 workaround while we were facing several performance problems around 
 concurrent grant un/mapping operations. Given all the recent submissions 
 made around this (grant ops) area, is this something we should perhaps 
 revisit and discuss whether we want to continue offering persistent grants 
 as a feature?


 Agree, Life would be easier if we can remove the persistent feature.
 
 I was thinking about this yesterday, and IMHO I think we should remove
 persistent grants now while it's not too entangled, leaving it for later
 will just make our life more miserable.
 
 While it's true that persistent grants provide a throughput increase by
 preventing grant table operations and TLB flushes, it has several
 problems that cannot by avoided:
 
  - Memory/grants hoarding, we need to reserve the same amount of memory
 as the amount of data that we want to have in-flight. While this is not
 so critical for memory, it is for grants, since using too many grants
 can basically deadlock other PV interfaces. There's no way to avoid this
 since it's the design behind persistent grants.
 
  - Memcopy: guest needs to perform a memcopy of all data that goes
 through blkfront. While not so critical, Felipe found systems were
 memcopy was more expensive than grant map/unmap in the backend (IIRC
 those were AMD systems).
 
  - Complexity/interactions: when persistent grants was designed number
 of requests was limited to 32 and each request could only contain 11
 pages. This means we had to use 352 pages/grants which was fine. Now
 that we have indirect IO and multiqueue in the horizon this number has
 gone up by orders of magnitude, I don't think this is viable/useful any
 more.
 
 If Konrad/Bob agree I would like to send a patch to remove persistent
 grants and then have the multiqueue series rebased on top of that.

I agree with this.

I think we can get better  performance/scalability gains of with
improvements to grant table locking and TLB flush avoidance.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-02-19 Thread Malcolm Crossley
On 19/02/15 11:08, Roger Pau Monné wrote:
 El 19/02/15 a les 3.05, Bob Liu ha escrit:


 On 02/19/2015 02:08 AM, Felipe Franciosi wrote:
 -Original Message-
 From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
 Sent: 18 February 2015 17:38
 To: Roger Pau Monne
 Cc: Bob Liu; xen-devel@lists.xen.org; David Vrabel; linux-
 ker...@vger.kernel.org; Felipe Franciosi; ax...@fb.com; h...@infradead.org;
 avanzini.aria...@gmail.com
 Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an 
 new
 struct

 On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote:
 El 15/02/15 a les 9.18, Bob Liu ha escrit:
 AFAICT you seem to have a list of persistent grants, indirect pages
 and a grant table callback for each ring, isn't this supposed to be
 shared between all rings?

 I don't think we should be going down that route, or else we can hoard
 a large amount of memory and grants.

 It does remove the lock that would have to be accessed by each ring thread 
 to
 access those. Those values (grants) can be limited to be a smaller value 
 such
 that the overall number is the same as it was with the previous version. 
 As in:
 each ring has = MAX_GRANTS / nr_online_cpus().


 We should definitely be concerned with the amount of memory consumed on the 
 backend for each plugged virtual disk. We have faced several problems in 
 XenServer around this area before; it drastically affects VBD scalability 
 per host.


 Right, so we have to keep both the lock and the amount of memory
 consumed in mind.

 This makes me think that all the persistent grants work was done as a 
 workaround while we were facing several performance problems around 
 concurrent grant un/mapping operations. Given all the recent submissions 
 made around this (grant ops) area, is this something we should perhaps 
 revisit and discuss whether we want to continue offering persistent grants 
 as a feature?


 Agree, Life would be easier if we can remove the persistent feature.
 
 I was thinking about this yesterday, and IMHO I think we should remove
 persistent grants now while it's not too entangled, leaving it for later
 will just make our life more miserable.
 
 While it's true that persistent grants provide a throughput increase by
 preventing grant table operations and TLB flushes, it has several
 problems that cannot by avoided:
 
  - Memory/grants hoarding, we need to reserve the same amount of memory
 as the amount of data that we want to have in-flight. While this is not
 so critical for memory, it is for grants, since using too many grants
 can basically deadlock other PV interfaces. There's no way to avoid this
 since it's the design behind persistent grants.
 
  - Memcopy: guest needs to perform a memcopy of all data that goes
 through blkfront. While not so critical, Felipe found systems were
 memcopy was more expensive than grant map/unmap in the backend (IIRC
 those were AMD systems).
 
  - Complexity/interactions: when persistent grants was designed number
 of requests was limited to 32 and each request could only contain 11
 pages. This means we had to use 352 pages/grants which was fine. Now
 that we have indirect IO and multiqueue in the horizon this number has
 gone up by orders of magnitude, I don't think this is viable/useful any
 more.
 
 If Konrad/Bob agree I would like to send a patch to remove persistent
 grants and then have the multiqueue series rebased on top of that.
 

I agree that persistent grants should be removed.

Modern PCI express SSD cards are so fast ( 4 gigabyte/s and probably 14
gigabytes/s soon) that a memory copy in the front end will be the
limiting factor.

Using grant maps allows for zero touch movement of data and allows us to
focus on optimising that interface.

Malcolm

 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-02-19 Thread Roger Pau Monné
El 19/02/15 a les 3.05, Bob Liu ha escrit:
 
 
 On 02/19/2015 02:08 AM, Felipe Franciosi wrote:
 -Original Message-
 From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
 Sent: 18 February 2015 17:38
 To: Roger Pau Monne
 Cc: Bob Liu; xen-devel@lists.xen.org; David Vrabel; linux-
 ker...@vger.kernel.org; Felipe Franciosi; ax...@fb.com; h...@infradead.org;
 avanzini.aria...@gmail.com
 Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new
 struct

 On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote:
 El 15/02/15 a les 9.18, Bob Liu ha escrit:
 AFAICT you seem to have a list of persistent grants, indirect pages
 and a grant table callback for each ring, isn't this supposed to be
 shared between all rings?

 I don't think we should be going down that route, or else we can hoard
 a large amount of memory and grants.

 It does remove the lock that would have to be accessed by each ring thread 
 to
 access those. Those values (grants) can be limited to be a smaller value 
 such
 that the overall number is the same as it was with the previous version. As 
 in:
 each ring has = MAX_GRANTS / nr_online_cpus().


 We should definitely be concerned with the amount of memory consumed on the 
 backend for each plugged virtual disk. We have faced several problems in 
 XenServer around this area before; it drastically affects VBD scalability 
 per host.

 
 Right, so we have to keep both the lock and the amount of memory
 consumed in mind.
 
 This makes me think that all the persistent grants work was done as a 
 workaround while we were facing several performance problems around 
 concurrent grant un/mapping operations. Given all the recent submissions 
 made around this (grant ops) area, is this something we should perhaps 
 revisit and discuss whether we want to continue offering persistent grants 
 as a feature?

 
 Agree, Life would be easier if we can remove the persistent feature.

I was thinking about this yesterday, and IMHO I think we should remove
persistent grants now while it's not too entangled, leaving it for later
will just make our life more miserable.

While it's true that persistent grants provide a throughput increase by
preventing grant table operations and TLB flushes, it has several
problems that cannot by avoided:

 - Memory/grants hoarding, we need to reserve the same amount of memory
as the amount of data that we want to have in-flight. While this is not
so critical for memory, it is for grants, since using too many grants
can basically deadlock other PV interfaces. There's no way to avoid this
since it's the design behind persistent grants.

 - Memcopy: guest needs to perform a memcopy of all data that goes
through blkfront. While not so critical, Felipe found systems were
memcopy was more expensive than grant map/unmap in the backend (IIRC
those were AMD systems).

 - Complexity/interactions: when persistent grants was designed number
of requests was limited to 32 and each request could only contain 11
pages. This means we had to use 352 pages/grants which was fine. Now
that we have indirect IO and multiqueue in the horizon this number has
gone up by orders of magnitude, I don't think this is viable/useful any
more.

If Konrad/Bob agree I would like to send a patch to remove persistent
grants and then have the multiqueue series rebased on top of that.

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-02-19 Thread Roger Pau Monné
El 19/02/15 a les 13.06, Felipe Franciosi ha escrit:
 
 
 -Original Message-
 From: David Vrabel
 Sent: 19 February 2015 11:15
 To: Roger Pau Monne; Bob Liu; Felipe Franciosi
 Cc: 'Konrad Rzeszutek Wilk'; xen-devel@lists.xen.org; linux-
 ker...@vger.kernel.org; ax...@fb.com; h...@infradead.org;
 avanzini.aria...@gmail.com
 Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new
 struct

 On 19/02/15 11:08, Roger Pau Monné wrote:
 El 19/02/15 a les 3.05, Bob Liu ha escrit:


 On 02/19/2015 02:08 AM, Felipe Franciosi wrote:
 -Original Message-
 From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
 Sent: 18 February 2015 17:38
 To: Roger Pau Monne
 Cc: Bob Liu; xen-devel@lists.xen.org; David Vrabel; linux-
 ker...@vger.kernel.org; Felipe Franciosi; ax...@fb.com;
 h...@infradead.org; avanzini.aria...@gmail.com
 Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information
 to an new struct

 On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote:
 El 15/02/15 a les 9.18, Bob Liu ha escrit:
 AFAICT you seem to have a list of persistent grants, indirect
 pages and a grant table callback for each ring, isn't this
 supposed to be shared between all rings?

 I don't think we should be going down that route, or else we can
 hoard a large amount of memory and grants.

 It does remove the lock that would have to be accessed by each ring
 thread to access those. Those values (grants) can be limited to be
 a smaller value such that the overall number is the same as it was with
 the previous version. As in:
 each ring has = MAX_GRANTS / nr_online_cpus().


 We should definitely be concerned with the amount of memory consumed
 on the backend for each plugged virtual disk. We have faced several problems
 in XenServer around this area before; it drastically affects VBD scalability 
 per
 host.


 Right, so we have to keep both the lock and the amount of memory
 consumed in mind.

 This makes me think that all the persistent grants work was done as a
 workaround while we were facing several performance problems around
 concurrent grant un/mapping operations. Given all the recent submissions
 made around this (grant ops) area, is this something we should perhaps 
 revisit
 and discuss whether we want to continue offering persistent grants as a 
 feature?


 Agree, Life would be easier if we can remove the persistent feature.

 I was thinking about this yesterday, and IMHO I think we should remove
 persistent grants now while it's not too entangled, leaving it for
 later will just make our life more miserable.

 While it's true that persistent grants provide a throughput increase
 by preventing grant table operations and TLB flushes, it has several
 problems that cannot by avoided:

  - Memory/grants hoarding, we need to reserve the same amount of
 memory as the amount of data that we want to have in-flight. While
 this is not so critical for memory, it is for grants, since using too
 many grants can basically deadlock other PV interfaces. There's no way
 to avoid this since it's the design behind persistent grants.

  - Memcopy: guest needs to perform a memcopy of all data that goes
 through blkfront. While not so critical, Felipe found systems were
 memcopy was more expensive than grant map/unmap in the backend (IIRC
 those were AMD systems).

  - Complexity/interactions: when persistent grants was designed number
 of requests was limited to 32 and each request could only contain 11
 pages. This means we had to use 352 pages/grants which was fine. Now
 that we have indirect IO and multiqueue in the horizon this number has
 gone up by orders of magnitude, I don't think this is viable/useful
 any more.

 If Konrad/Bob agree I would like to send a patch to remove persistent
 grants and then have the multiqueue series rebased on top of that.

 I agree with this.

 I think we can get better  performance/scalability gains of with improvements
 to grant table locking and TLB flush avoidance.

 David
 
 It doesn't change the fact that persistent grants (as well as the grant copy 
 implementation we did for tapdisk3) were alternatives that allowed aggregate 
 storage performance to increase drastically. Before committing to removing 
 something that allow Xen users to scale their deployments, I think we need to 
 revisit whether the recent improvements to the whole grant mechanisms (grant 
 table locking, TLB flushing, batched calls, etc) are performing as we would 
 (now) expect.

The fact that this extension improved performance doesn't mean it's
right or desirable. So IMHO we should just remove it and take the
performance hit. Then we can figure ways to deal with the limitations
properly instead of doing this kind of hacks because they prevent us
from moving forward.

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-02-18 Thread Felipe Franciosi
 -Original Message-
 From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
 Sent: 18 February 2015 17:38
 To: Roger Pau Monne
 Cc: Bob Liu; xen-devel@lists.xen.org; David Vrabel; linux-
 ker...@vger.kernel.org; Felipe Franciosi; ax...@fb.com; h...@infradead.org;
 avanzini.aria...@gmail.com
 Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new
 struct
 
 On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote:
  El 15/02/15 a les 9.18, Bob Liu ha escrit:
   A ring is the representation of a hardware queue, this patch
   separate ring information from blkfront_info to an new struct
   blkfront_ring_info to make preparation for real multi hardware queues
 supporting.
  
   Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com
   Signed-off-by: Bob Liu bob@oracle.com
   ---
drivers/block/xen-blkfront.c | 403
   +++
1 file changed, 218 insertions(+), 185 deletions(-)
  
   diff --git a/drivers/block/xen-blkfront.c
   b/drivers/block/xen-blkfront.c index 5a90a51..aaa4a0e 100644
   --- a/drivers/block/xen-blkfront.c
   +++ b/drivers/block/xen-blkfront.c
   @@ -102,23 +102,15 @@ MODULE_PARM_DESC(max, Maximum amount
 of
   segments in indirect requests (default  #define BLK_RING_SIZE
   __CONST_RING_SIZE(blkif, PAGE_SIZE)
  
/*
   - * We have one of these per vbd, whether ide, scsi or 'other'.
   They
   - * hang in private_data off the gendisk structure. We may end up
   - * putting all kinds of interesting stuff here :-)
   + *  Per-ring info.
   + *  A blkfront_info structure can associate with one or more
   + blkfront_ring_info,
   + *  depending on how many hardware queues supported.
 */
   -struct blkfront_info
   -{
   +struct blkfront_ring_info {
 spinlock_t io_lock;
   - struct mutex mutex;
   - struct xenbus_device *xbdev;
   - struct gendisk *gd;
   - int vdevice;
   - blkif_vdev_t handle;
   - enum blkif_state connected;
 int ring_ref;
 struct blkif_front_ring ring;
 unsigned int evtchn, irq;
   - struct request_queue *rq;
 struct work_struct work;
 struct gnttab_free_callback callback;
 struct blk_shadow shadow[BLK_RING_SIZE]; @@ -126,6 +118,22 @@
   struct blkfront_info
 struct list_head indirect_pages;
 unsigned int persistent_gnts_c;
 unsigned long shadow_free;
   + struct blkfront_info *info;
 
  AFAICT you seem to have a list of persistent grants, indirect pages
  and a grant table callback for each ring, isn't this supposed to be
  shared between all rings?
 
  I don't think we should be going down that route, or else we can hoard
  a large amount of memory and grants.
 
 It does remove the lock that would have to be accessed by each ring thread to
 access those. Those values (grants) can be limited to be a smaller value such
 that the overall number is the same as it was with the previous version. As 
 in:
 each ring has = MAX_GRANTS / nr_online_cpus().
 

We should definitely be concerned with the amount of memory consumed on the 
backend for each plugged virtual disk. We have faced several problems in 
XenServer around this area before; it drastically affects VBD scalability per 
host.

This makes me think that all the persistent grants work was done as a 
workaround while we were facing several performance problems around concurrent 
grant un/mapping operations. Given all the recent submissions made around this 
(grant ops) area, is this something we should perhaps revisit and discuss 
whether we want to continue offering persistent grants as a feature?

Thanks,
Felipe

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-02-18 Thread Roger Pau Monné
El 15/02/15 a les 9.18, Bob Liu ha escrit:
 A ring is the representation of a hardware queue, this patch separate ring
 information from blkfront_info to an new struct blkfront_ring_info to make
 preparation for real multi hardware queues supporting.
 
 Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com
 Signed-off-by: Bob Liu bob@oracle.com
 ---
  drivers/block/xen-blkfront.c | 403 
 +++
  1 file changed, 218 insertions(+), 185 deletions(-)
 
 diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
 index 5a90a51..aaa4a0e 100644
 --- a/drivers/block/xen-blkfront.c
 +++ b/drivers/block/xen-blkfront.c
 @@ -102,23 +102,15 @@ MODULE_PARM_DESC(max, Maximum amount of segments in 
 indirect requests (default
  #define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE)
  
  /*
 - * We have one of these per vbd, whether ide, scsi or 'other'.  They
 - * hang in private_data off the gendisk structure. We may end up
 - * putting all kinds of interesting stuff here :-)
 + *  Per-ring info.
 + *  A blkfront_info structure can associate with one or more 
 blkfront_ring_info,
 + *  depending on how many hardware queues supported.
   */
 -struct blkfront_info
 -{
 +struct blkfront_ring_info {
   spinlock_t io_lock;
 - struct mutex mutex;
 - struct xenbus_device *xbdev;
 - struct gendisk *gd;
 - int vdevice;
 - blkif_vdev_t handle;
 - enum blkif_state connected;
   int ring_ref;
   struct blkif_front_ring ring;
   unsigned int evtchn, irq;
 - struct request_queue *rq;
   struct work_struct work;
   struct gnttab_free_callback callback;
   struct blk_shadow shadow[BLK_RING_SIZE];
 @@ -126,6 +118,22 @@ struct blkfront_info
   struct list_head indirect_pages;
   unsigned int persistent_gnts_c;
   unsigned long shadow_free;
 + struct blkfront_info *info;

AFAICT you seem to have a list of persistent grants, indirect pages and
a grant table callback for each ring, isn't this supposed to be shared
between all rings?

I don't think we should be going down that route, or else we can hoard a
large amount of memory and grants.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-02-18 Thread Konrad Rzeszutek Wilk
On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote:
 El 15/02/15 a les 9.18, Bob Liu ha escrit:
  A ring is the representation of a hardware queue, this patch separate ring
  information from blkfront_info to an new struct blkfront_ring_info to make
  preparation for real multi hardware queues supporting.
  
  Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com
  Signed-off-by: Bob Liu bob@oracle.com
  ---
   drivers/block/xen-blkfront.c | 403 
  +++
   1 file changed, 218 insertions(+), 185 deletions(-)
  
  diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
  index 5a90a51..aaa4a0e 100644
  --- a/drivers/block/xen-blkfront.c
  +++ b/drivers/block/xen-blkfront.c
  @@ -102,23 +102,15 @@ MODULE_PARM_DESC(max, Maximum amount of segments in 
  indirect requests (default
   #define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE)
   
   /*
  - * We have one of these per vbd, whether ide, scsi or 'other'.  They
  - * hang in private_data off the gendisk structure. We may end up
  - * putting all kinds of interesting stuff here :-)
  + *  Per-ring info.
  + *  A blkfront_info structure can associate with one or more 
  blkfront_ring_info,
  + *  depending on how many hardware queues supported.
*/
  -struct blkfront_info
  -{
  +struct blkfront_ring_info {
  spinlock_t io_lock;
  -   struct mutex mutex;
  -   struct xenbus_device *xbdev;
  -   struct gendisk *gd;
  -   int vdevice;
  -   blkif_vdev_t handle;
  -   enum blkif_state connected;
  int ring_ref;
  struct blkif_front_ring ring;
  unsigned int evtchn, irq;
  -   struct request_queue *rq;
  struct work_struct work;
  struct gnttab_free_callback callback;
  struct blk_shadow shadow[BLK_RING_SIZE];
  @@ -126,6 +118,22 @@ struct blkfront_info
  struct list_head indirect_pages;
  unsigned int persistent_gnts_c;
  unsigned long shadow_free;
  +   struct blkfront_info *info;
 
 AFAICT you seem to have a list of persistent grants, indirect pages and
 a grant table callback for each ring, isn't this supposed to be shared
 between all rings?
 
 I don't think we should be going down that route, or else we can hoard a
 large amount of memory and grants.

It does remove the lock that would have to be accessed by each
ring thread to access those. Those values (grants) can be limited to be a 
smaller
value such that the overall number is the same as it was with the previous
version. As in: each ring has = MAX_GRANTS / nr_online_cpus().
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-02-18 Thread Konrad Rzeszutek Wilk
   AFAICT you seem to have a list of persistent grants, indirect pages
   and a grant table callback for each ring, isn't this supposed to be
   shared between all rings?
  
   I don't think we should be going down that route, or else we can hoard
   a large amount of memory and grants.
  
  It does remove the lock that would have to be accessed by each ring thread 
  to
  access those. Those values (grants) can be limited to be a smaller value 
  such
  that the overall number is the same as it was with the previous version. As 
  in:
  each ring has = MAX_GRANTS / nr_online_cpus().
  
 
 We should definitely be concerned with the amount of memory consumed on the 
 backend for each plugged virtual disk. We have faced several problems in 
 XenServer around this area before; it drastically affects VBD scalability per 
 host.
 
 This makes me think that all the persistent grants work was done as a 
 workaround while we were facing several performance problems around 
 concurrent grant un/mapping operations. Given all the recent submissions made 
 around this (grant ops) area, is this something we should perhaps revisit and 
 discuss whether we want to continue offering persistent grants as a feature?
 

Certainly. Perhaps as a talking point at XenHackathon?

 Thanks,
 Felipe

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-02-18 Thread Bob Liu


On 02/19/2015 02:08 AM, Felipe Franciosi wrote:
 -Original Message-
 From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
 Sent: 18 February 2015 17:38
 To: Roger Pau Monne
 Cc: Bob Liu; xen-devel@lists.xen.org; David Vrabel; linux-
 ker...@vger.kernel.org; Felipe Franciosi; ax...@fb.com; h...@infradead.org;
 avanzini.aria...@gmail.com
 Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new
 struct

 On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote:
 El 15/02/15 a les 9.18, Bob Liu ha escrit:
 A ring is the representation of a hardware queue, this patch
 separate ring information from blkfront_info to an new struct
 blkfront_ring_info to make preparation for real multi hardware queues
 supporting.

 Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com
 Signed-off-by: Bob Liu bob@oracle.com
 ---
  drivers/block/xen-blkfront.c | 403
 +++
  1 file changed, 218 insertions(+), 185 deletions(-)

 diff --git a/drivers/block/xen-blkfront.c
 b/drivers/block/xen-blkfront.c index 5a90a51..aaa4a0e 100644
 --- a/drivers/block/xen-blkfront.c
 +++ b/drivers/block/xen-blkfront.c
 @@ -102,23 +102,15 @@ MODULE_PARM_DESC(max, Maximum amount
 of
 segments in indirect requests (default  #define BLK_RING_SIZE
 __CONST_RING_SIZE(blkif, PAGE_SIZE)

  /*
 - * We have one of these per vbd, whether ide, scsi or 'other'.
 They
 - * hang in private_data off the gendisk structure. We may end up
 - * putting all kinds of interesting stuff here :-)
 + *  Per-ring info.
 + *  A blkfront_info structure can associate with one or more
 + blkfront_ring_info,
 + *  depending on how many hardware queues supported.
   */
 -struct blkfront_info
 -{
 +struct blkfront_ring_info {
spinlock_t io_lock;
 -  struct mutex mutex;
 -  struct xenbus_device *xbdev;
 -  struct gendisk *gd;
 -  int vdevice;
 -  blkif_vdev_t handle;
 -  enum blkif_state connected;
int ring_ref;
struct blkif_front_ring ring;
unsigned int evtchn, irq;
 -  struct request_queue *rq;
struct work_struct work;
struct gnttab_free_callback callback;
struct blk_shadow shadow[BLK_RING_SIZE]; @@ -126,6 +118,22 @@
 struct blkfront_info
struct list_head indirect_pages;
unsigned int persistent_gnts_c;
unsigned long shadow_free;
 +  struct blkfront_info *info;

 AFAICT you seem to have a list of persistent grants, indirect pages
 and a grant table callback for each ring, isn't this supposed to be
 shared between all rings?

 I don't think we should be going down that route, or else we can hoard
 a large amount of memory and grants.

 It does remove the lock that would have to be accessed by each ring thread to
 access those. Those values (grants) can be limited to be a smaller value such
 that the overall number is the same as it was with the previous version. As 
 in:
 each ring has = MAX_GRANTS / nr_online_cpus().

 
 We should definitely be concerned with the amount of memory consumed on the 
 backend for each plugged virtual disk. We have faced several problems in 
 XenServer around this area before; it drastically affects VBD scalability per 
 host.
 

Right, so we have to keep both the lock and the amount of memory
consumed in mind.

 This makes me think that all the persistent grants work was done as a 
 workaround while we were facing several performance problems around 
 concurrent grant un/mapping operations. Given all the recent submissions made 
 around this (grant ops) area, is this something we should perhaps revisit and 
 discuss whether we want to continue offering persistent grants as a feature?
 

Agree, Life would be easier if we can remove the persistent feature.

-- 
Regards,
-Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-02-15 Thread Bob Liu
A ring is the representation of a hardware queue, this patch separate ring
information from blkfront_info to an new struct blkfront_ring_info to make
preparation for real multi hardware queues supporting.

Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com
Signed-off-by: Bob Liu bob@oracle.com
---
 drivers/block/xen-blkfront.c | 403 +++
 1 file changed, 218 insertions(+), 185 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 5a90a51..aaa4a0e 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -102,23 +102,15 @@ MODULE_PARM_DESC(max, Maximum amount of segments in 
indirect requests (default
 #define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE)
 
 /*
- * We have one of these per vbd, whether ide, scsi or 'other'.  They
- * hang in private_data off the gendisk structure. We may end up
- * putting all kinds of interesting stuff here :-)
+ *  Per-ring info.
+ *  A blkfront_info structure can associate with one or more 
blkfront_ring_info,
+ *  depending on how many hardware queues supported.
  */
-struct blkfront_info
-{
+struct blkfront_ring_info {
spinlock_t io_lock;
-   struct mutex mutex;
-   struct xenbus_device *xbdev;
-   struct gendisk *gd;
-   int vdevice;
-   blkif_vdev_t handle;
-   enum blkif_state connected;
int ring_ref;
struct blkif_front_ring ring;
unsigned int evtchn, irq;
-   struct request_queue *rq;
struct work_struct work;
struct gnttab_free_callback callback;
struct blk_shadow shadow[BLK_RING_SIZE];
@@ -126,6 +118,22 @@ struct blkfront_info
struct list_head indirect_pages;
unsigned int persistent_gnts_c;
unsigned long shadow_free;
+   struct blkfront_info *info;
+};
+
+/*
+ * We have one of these per vbd, whether ide, scsi or 'other'.  They
+ * hang in private_data off the gendisk structure. We may end up
+ * putting all kinds of interesting stuff here :-)
+ */
+struct blkfront_info {
+   struct mutex mutex;
+   struct xenbus_device *xbdev;
+   struct gendisk *gd;
+   int vdevice;
+   blkif_vdev_t handle;
+   enum blkif_state connected;
+   struct request_queue *rq;
unsigned int feature_flush;
unsigned int feature_discard:1;
unsigned int feature_secdiscard:1;
@@ -135,6 +143,7 @@ struct blkfront_info
unsigned int max_indirect_segments;
int is_ready;
struct blk_mq_tag_set tag_set;
+   struct blkfront_ring_info rinfo;
 };
 
 static unsigned int nr_minors;
@@ -167,34 +176,35 @@ static DEFINE_SPINLOCK(minor_lock);
 #define INDIRECT_GREFS(_segs) \
((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
 
-static int blkfront_setup_indirect(struct blkfront_info *info);
+static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
 
-static int get_id_from_freelist(struct blkfront_info *info)
+static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
-   unsigned long free = info-shadow_free;
+   unsigned long free = rinfo-shadow_free;
BUG_ON(free = BLK_RING_SIZE);
-   info-shadow_free = info-shadow[free].req.u.rw.id;
-   info-shadow[free].req.u.rw.id = 0x0fee; /* debug */
+   rinfo-shadow_free = rinfo-shadow[free].req.u.rw.id;
+   rinfo-shadow[free].req.u.rw.id = 0x0fee; /* debug */
return free;
 }
 
-static int add_id_to_freelist(struct blkfront_info *info,
+static int add_id_to_freelist(struct blkfront_ring_info *rinfo,
   unsigned long id)
 {
-   if (info-shadow[id].req.u.rw.id != id)
+   if (rinfo-shadow[id].req.u.rw.id != id)
return -EINVAL;
-   if (info-shadow[id].request == NULL)
+   if (rinfo-shadow[id].request == NULL)
return -EINVAL;
-   info-shadow[id].req.u.rw.id  = info-shadow_free;
-   info-shadow[id].request = NULL;
-   info-shadow_free = id;
+   rinfo-shadow[id].req.u.rw.id  = rinfo-shadow_free;
+   rinfo-shadow[id].request = NULL;
+   rinfo-shadow_free = id;
return 0;
 }
 
-static int fill_grant_buffer(struct blkfront_info *info, int num)
+static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 {
struct page *granted_page;
struct grant *gnt_list_entry, *n;
+   struct blkfront_info *info = rinfo-info;
int i = 0;
 
while(i  num) {
@@ -212,7 +222,7 @@ static int fill_grant_buffer(struct blkfront_info *info, 
int num)
}
 
gnt_list_entry-gref = GRANT_INVALID_REF;
-   list_add(gnt_list_entry-node, info-grants);
+   list_add(gnt_list_entry-node, rinfo-grants);
i++;
}
 
@@ -220,7 +230,7 @@ static int fill_grant_buffer(struct blkfront_info *info, 
int num)
 
 out_of_memory:
list_for_each_entry_safe(gnt_list_entry, n,
-