Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
Hi Felipe, On 03/06/2015 06:30 PM, Felipe Franciosi wrote: -Original Message- From: Bob Liu [mailto:bob@oracle.com] Sent: 05 March 2015 00:47 To: Konrad Rzeszutek Wilk Cc: Roger Pau Monne; Felipe Franciosi; David Vrabel; xen-devel@lists.xen.org; linux-ker...@vger.kernel.org; ax...@fb.com; h...@infradead.org; avanzini.aria...@gmail.com; cheg...@amazon.de Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new struct ...snip... Meaning you weren't able to do the same test? I can if there are more details about how to set up this 5 and 10 guests environment and test pattern have been used. Just think it might be save time if somebody still have the similar environment by hand. Roger and Felipe, if you still have the environment could you please have a quick compare about feature-persistent performance with patch [PATCH v5 0/2] gnttab: Improve scaleability? I've been meaning to do that. I don't have the environment up, but it isn't too hard to put it back together. A bit swamped at the moment, but will try (very hard) to do it next week. Do you have gotten any testing result? -- Regards, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
On 03/17/2015 10:52 PM, Felipe Franciosi wrote: Hi Bob, I've put the hardware back together and am sorting out the software for testing. Things are not moving as fast as I wanted due to other commitments. I'll keep this thread updated as I progress. Malcolm is OOO and I'm trying to get his patches to work on a newer Xen. Thank you! The evaluation will compare: 1) bare metal i/o (for baseline) 2) tapdisk3 (currently using grant copy, which is what scales best in my experience) 3) blkback w/ persistent grants 4) blkback w/o persistent grants (I will just comment out the handshake bits in blkback/blkfront) 5) blkback w/o persistent grants + Malcolm's grant map patches I think you need to add the patches from Christoph Egger with title [PATCH v5 0/2] gnttab: Improve scaleability here. http://lists.xen.org/archives/html/xen-devel/2015-02/msg01188.html To my knowledge, blkback (w/ or w/o persistent grants) is always faster than user space alternatives (e.g. tapdisk, qemu-qdisk) as latency is much lower. However, tapdisk with grant copy has been shown to produce (much) better aggregate throughput figures as it avoids any issues with grant (un)mapping. I'm hoping to show that (5) above scales better than (3) and (4) in a representative scenario. If it does, I will recommend that we get rid of persistent grants in favour of a better and more scalable grant (un)mapping implementation. Right, but even if 5) have better performance, we have to make sure older hypervisors with new linux kernel won't be affected after get rid of persistent grants. -- Regards, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
Hi Bob, -Original Message- From: Bob Liu [mailto:bob@oracle.com] Sent: 17 March 2015 07:00 To: Felipe Franciosi Cc: Konrad Rzeszutek Wilk; Roger Pau Monne; David Vrabel; xen- de...@lists.xen.org; linux-ker...@vger.kernel.org; ax...@fb.com; h...@infradead.org; avanzini.aria...@gmail.com; cheg...@amazon.de Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new struct Hi Felipe, On 03/06/2015 06:30 PM, Felipe Franciosi wrote: -Original Message- From: Bob Liu [mailto:bob@oracle.com] Sent: 05 March 2015 00:47 To: Konrad Rzeszutek Wilk Cc: Roger Pau Monne; Felipe Franciosi; David Vrabel; xen-devel@lists.xen.org; linux-ker...@vger.kernel.org; ax...@fb.com; h...@infradead.org; avanzini.aria...@gmail.com; cheg...@amazon.de Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new struct ...snip... Meaning you weren't able to do the same test? I can if there are more details about how to set up this 5 and 10 guests environment and test pattern have been used. Just think it might be save time if somebody still have the similar environment by hand. Roger and Felipe, if you still have the environment could you please have a quick compare about feature-persistent performance with patch [PATCH v5 0/2] gnttab: Improve scaleability? I've been meaning to do that. I don't have the environment up, but it isn't too hard to put it back together. A bit swamped at the moment, but will try (very hard) to do it next week. Do you have gotten any testing result? I've put the hardware back together and am sorting out the software for testing. Things are not moving as fast as I wanted due to other commitments. I'll keep this thread updated as I progress. Malcolm is OOO and I'm trying to get his patches to work on a newer Xen. The evaluation will compare: 1) bare metal i/o (for baseline) 2) tapdisk3 (currently using grant copy, which is what scales best in my experience) 3) blkback w/ persistent grants 4) blkback w/o persistent grants (I will just comment out the handshake bits in blkback/blkfront) 5) blkback w/o persistent grants + Malcolm's grant map patches To my knowledge, blkback (w/ or w/o persistent grants) is always faster than user space alternatives (e.g. tapdisk, qemu-qdisk) as latency is much lower. However, tapdisk with grant copy has been shown to produce (much) better aggregate throughput figures as it avoids any issues with grant (un)mapping. I'm hoping to show that (5) above scales better than (3) and (4) in a representative scenario. If it does, I will recommend that we get rid of persistent grants in favour of a better and more scalable grant (un)mapping implementation. Comments welcome. Cheers, F. -- Regards, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
-Original Message- From: Bob Liu [mailto:bob@oracle.com] Sent: 05 March 2015 00:47 To: Konrad Rzeszutek Wilk Cc: Roger Pau Monne; Felipe Franciosi; David Vrabel; xen-devel@lists.xen.org; linux-ker...@vger.kernel.org; ax...@fb.com; h...@infradead.org; avanzini.aria...@gmail.com; cheg...@amazon.de Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new struct ...snip... Meaning you weren't able to do the same test? I can if there are more details about how to set up this 5 and 10 guests environment and test pattern have been used. Just think it might be save time if somebody still have the similar environment by hand. Roger and Felipe, if you still have the environment could you please have a quick compare about feature-persistent performance with patch [PATCH v5 0/2] gnttab: Improve scaleability? I've been meaning to do that. I don't have the environment up, but it isn't too hard to put it back together. A bit swamped at the moment, but will try (very hard) to do it next week. Cheers, Felipe Thanks, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
David assertion that better performance and scalbility can be gained with grant table locking and TLB flush avoidance is interesting - as 1). The grant locking is going in Xen 4.6 but not earlier - so when running on older hypervisors this gives an performance benefit. 2). I have not seen any prototype TLB flush avoidance code so not know when that would be available. Perhaps a better choice is to do the removal of the persistence support when the changes in Xen hypervisor are known? With patch: [PATCH v5 0/2] gnttab: Improve scaleability, I can get nearly the same performance as without persistence support. But I'm not sure about the benchmark described here: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/block/xen-blkfront.c?id=0a8704a51f386cab7394e38ff1d66eef924d8ab8 Meaning you weren't able to do the same test? -- Regards, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
On 03/05/2015 05:21 AM, Konrad Rzeszutek Wilk wrote: David assertion that better performance and scalbility can be gained with grant table locking and TLB flush avoidance is interesting - as 1). The grant locking is going in Xen 4.6 but not earlier - so when running on older hypervisors this gives an performance benefit. 2). I have not seen any prototype TLB flush avoidance code so not know when that would be available. Perhaps a better choice is to do the removal of the persistence support when the changes in Xen hypervisor are known? With patch: [PATCH v5 0/2] gnttab: Improve scaleability, I can get nearly the same performance as without persistence support. But I'm not sure about the benchmark described here: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/block/xen-blkfront.c?id=0a8704a51f386cab7394e38ff1d66eef924d8ab8 Meaning you weren't able to do the same test? I can if there are more details about how to set up this 5 and 10 guests environment and test pattern have been used. Just think it might be save time if somebody still have the similar environment by hand. Roger and Felipe, if you still have the environment could you please have a quick compare about feature-persistent performance with patch [PATCH v5 0/2] gnttab: Improve scaleability? Thanks, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
Agree, Life would be easier if we can remove the persistent feature. ..snip.. If Konrad/Bob agree I would like to send a patch to remove persistent grants and then have the multiqueue series rebased on top of that. ..snip.. I agree with this. I think we can get better performance/scalability gains of with improvements to grant table locking and TLB flush avoidance. David It doesn't change the fact that persistent grants (as well as the grant copy implementation we did for tapdisk3) were alternatives that allowed aggregate storage performance to increase drastically. Before committing to removing something that allow Xen users to scale their deployments, I think we need to revisit whether the recent improvements to the whole grant mechanisms (grant table locking, TLB flushing, batched calls, etc) are performing as we would (now) expect. The fact that this extension improved performance doesn't mean it's right or desirable. So IMHO we should just remove it and take the performance hit. Then we can figure ways to deal with the limitations .. snip.. Removing code just because without a clear forward plan might lead to re-instating said code back again - if no forward plan has been achieved. If the matter here is purely code complication I would stress that doing cleanups in code can simplify this - as in the code can do with some moving of the 'grant' ops (persistent or not) in a different file. That ought to short-term remove the problems with the 'if (persistent_grant)' problem. David assertion that better performance and scalbility can be gained with grant table locking and TLB flush avoidance is interesting - as 1). The grant locking is going in Xen 4.6 but not earlier - so when running on older hypervisors this gives an performance benefit. 2). I have not seen any prototype TLB flush avoidance code so not know when that would be available. Perhaps a better choice is to do the removal of the persistence support when the changes in Xen hypervisor are known? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
On 19/02/15 11:08, Roger Pau Monné wrote: El 19/02/15 a les 3.05, Bob Liu ha escrit: On 02/19/2015 02:08 AM, Felipe Franciosi wrote: -Original Message- From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] Sent: 18 February 2015 17:38 To: Roger Pau Monne Cc: Bob Liu; xen-devel@lists.xen.org; David Vrabel; linux- ker...@vger.kernel.org; Felipe Franciosi; ax...@fb.com; h...@infradead.org; avanzini.aria...@gmail.com Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new struct On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote: El 15/02/15 a les 9.18, Bob Liu ha escrit: AFAICT you seem to have a list of persistent grants, indirect pages and a grant table callback for each ring, isn't this supposed to be shared between all rings? I don't think we should be going down that route, or else we can hoard a large amount of memory and grants. It does remove the lock that would have to be accessed by each ring thread to access those. Those values (grants) can be limited to be a smaller value such that the overall number is the same as it was with the previous version. As in: each ring has = MAX_GRANTS / nr_online_cpus(). We should definitely be concerned with the amount of memory consumed on the backend for each plugged virtual disk. We have faced several problems in XenServer around this area before; it drastically affects VBD scalability per host. Right, so we have to keep both the lock and the amount of memory consumed in mind. This makes me think that all the persistent grants work was done as a workaround while we were facing several performance problems around concurrent grant un/mapping operations. Given all the recent submissions made around this (grant ops) area, is this something we should perhaps revisit and discuss whether we want to continue offering persistent grants as a feature? Agree, Life would be easier if we can remove the persistent feature. I was thinking about this yesterday, and IMHO I think we should remove persistent grants now while it's not too entangled, leaving it for later will just make our life more miserable. While it's true that persistent grants provide a throughput increase by preventing grant table operations and TLB flushes, it has several problems that cannot by avoided: - Memory/grants hoarding, we need to reserve the same amount of memory as the amount of data that we want to have in-flight. While this is not so critical for memory, it is for grants, since using too many grants can basically deadlock other PV interfaces. There's no way to avoid this since it's the design behind persistent grants. - Memcopy: guest needs to perform a memcopy of all data that goes through blkfront. While not so critical, Felipe found systems were memcopy was more expensive than grant map/unmap in the backend (IIRC those were AMD systems). - Complexity/interactions: when persistent grants was designed number of requests was limited to 32 and each request could only contain 11 pages. This means we had to use 352 pages/grants which was fine. Now that we have indirect IO and multiqueue in the horizon this number has gone up by orders of magnitude, I don't think this is viable/useful any more. If Konrad/Bob agree I would like to send a patch to remove persistent grants and then have the multiqueue series rebased on top of that. I agree with this. I think we can get better performance/scalability gains of with improvements to grant table locking and TLB flush avoidance. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
On 19/02/15 11:08, Roger Pau Monné wrote: El 19/02/15 a les 3.05, Bob Liu ha escrit: On 02/19/2015 02:08 AM, Felipe Franciosi wrote: -Original Message- From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] Sent: 18 February 2015 17:38 To: Roger Pau Monne Cc: Bob Liu; xen-devel@lists.xen.org; David Vrabel; linux- ker...@vger.kernel.org; Felipe Franciosi; ax...@fb.com; h...@infradead.org; avanzini.aria...@gmail.com Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new struct On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote: El 15/02/15 a les 9.18, Bob Liu ha escrit: AFAICT you seem to have a list of persistent grants, indirect pages and a grant table callback for each ring, isn't this supposed to be shared between all rings? I don't think we should be going down that route, or else we can hoard a large amount of memory and grants. It does remove the lock that would have to be accessed by each ring thread to access those. Those values (grants) can be limited to be a smaller value such that the overall number is the same as it was with the previous version. As in: each ring has = MAX_GRANTS / nr_online_cpus(). We should definitely be concerned with the amount of memory consumed on the backend for each plugged virtual disk. We have faced several problems in XenServer around this area before; it drastically affects VBD scalability per host. Right, so we have to keep both the lock and the amount of memory consumed in mind. This makes me think that all the persistent grants work was done as a workaround while we were facing several performance problems around concurrent grant un/mapping operations. Given all the recent submissions made around this (grant ops) area, is this something we should perhaps revisit and discuss whether we want to continue offering persistent grants as a feature? Agree, Life would be easier if we can remove the persistent feature. I was thinking about this yesterday, and IMHO I think we should remove persistent grants now while it's not too entangled, leaving it for later will just make our life more miserable. While it's true that persistent grants provide a throughput increase by preventing grant table operations and TLB flushes, it has several problems that cannot by avoided: - Memory/grants hoarding, we need to reserve the same amount of memory as the amount of data that we want to have in-flight. While this is not so critical for memory, it is for grants, since using too many grants can basically deadlock other PV interfaces. There's no way to avoid this since it's the design behind persistent grants. - Memcopy: guest needs to perform a memcopy of all data that goes through blkfront. While not so critical, Felipe found systems were memcopy was more expensive than grant map/unmap in the backend (IIRC those were AMD systems). - Complexity/interactions: when persistent grants was designed number of requests was limited to 32 and each request could only contain 11 pages. This means we had to use 352 pages/grants which was fine. Now that we have indirect IO and multiqueue in the horizon this number has gone up by orders of magnitude, I don't think this is viable/useful any more. If Konrad/Bob agree I would like to send a patch to remove persistent grants and then have the multiqueue series rebased on top of that. I agree that persistent grants should be removed. Modern PCI express SSD cards are so fast ( 4 gigabyte/s and probably 14 gigabytes/s soon) that a memory copy in the front end will be the limiting factor. Using grant maps allows for zero touch movement of data and allows us to focus on optimising that interface. Malcolm ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
El 19/02/15 a les 3.05, Bob Liu ha escrit: On 02/19/2015 02:08 AM, Felipe Franciosi wrote: -Original Message- From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] Sent: 18 February 2015 17:38 To: Roger Pau Monne Cc: Bob Liu; xen-devel@lists.xen.org; David Vrabel; linux- ker...@vger.kernel.org; Felipe Franciosi; ax...@fb.com; h...@infradead.org; avanzini.aria...@gmail.com Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new struct On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote: El 15/02/15 a les 9.18, Bob Liu ha escrit: AFAICT you seem to have a list of persistent grants, indirect pages and a grant table callback for each ring, isn't this supposed to be shared between all rings? I don't think we should be going down that route, or else we can hoard a large amount of memory and grants. It does remove the lock that would have to be accessed by each ring thread to access those. Those values (grants) can be limited to be a smaller value such that the overall number is the same as it was with the previous version. As in: each ring has = MAX_GRANTS / nr_online_cpus(). We should definitely be concerned with the amount of memory consumed on the backend for each plugged virtual disk. We have faced several problems in XenServer around this area before; it drastically affects VBD scalability per host. Right, so we have to keep both the lock and the amount of memory consumed in mind. This makes me think that all the persistent grants work was done as a workaround while we were facing several performance problems around concurrent grant un/mapping operations. Given all the recent submissions made around this (grant ops) area, is this something we should perhaps revisit and discuss whether we want to continue offering persistent grants as a feature? Agree, Life would be easier if we can remove the persistent feature. I was thinking about this yesterday, and IMHO I think we should remove persistent grants now while it's not too entangled, leaving it for later will just make our life more miserable. While it's true that persistent grants provide a throughput increase by preventing grant table operations and TLB flushes, it has several problems that cannot by avoided: - Memory/grants hoarding, we need to reserve the same amount of memory as the amount of data that we want to have in-flight. While this is not so critical for memory, it is for grants, since using too many grants can basically deadlock other PV interfaces. There's no way to avoid this since it's the design behind persistent grants. - Memcopy: guest needs to perform a memcopy of all data that goes through blkfront. While not so critical, Felipe found systems were memcopy was more expensive than grant map/unmap in the backend (IIRC those were AMD systems). - Complexity/interactions: when persistent grants was designed number of requests was limited to 32 and each request could only contain 11 pages. This means we had to use 352 pages/grants which was fine. Now that we have indirect IO and multiqueue in the horizon this number has gone up by orders of magnitude, I don't think this is viable/useful any more. If Konrad/Bob agree I would like to send a patch to remove persistent grants and then have the multiqueue series rebased on top of that. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
El 19/02/15 a les 13.06, Felipe Franciosi ha escrit: -Original Message- From: David Vrabel Sent: 19 February 2015 11:15 To: Roger Pau Monne; Bob Liu; Felipe Franciosi Cc: 'Konrad Rzeszutek Wilk'; xen-devel@lists.xen.org; linux- ker...@vger.kernel.org; ax...@fb.com; h...@infradead.org; avanzini.aria...@gmail.com Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new struct On 19/02/15 11:08, Roger Pau Monné wrote: El 19/02/15 a les 3.05, Bob Liu ha escrit: On 02/19/2015 02:08 AM, Felipe Franciosi wrote: -Original Message- From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] Sent: 18 February 2015 17:38 To: Roger Pau Monne Cc: Bob Liu; xen-devel@lists.xen.org; David Vrabel; linux- ker...@vger.kernel.org; Felipe Franciosi; ax...@fb.com; h...@infradead.org; avanzini.aria...@gmail.com Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new struct On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote: El 15/02/15 a les 9.18, Bob Liu ha escrit: AFAICT you seem to have a list of persistent grants, indirect pages and a grant table callback for each ring, isn't this supposed to be shared between all rings? I don't think we should be going down that route, or else we can hoard a large amount of memory and grants. It does remove the lock that would have to be accessed by each ring thread to access those. Those values (grants) can be limited to be a smaller value such that the overall number is the same as it was with the previous version. As in: each ring has = MAX_GRANTS / nr_online_cpus(). We should definitely be concerned with the amount of memory consumed on the backend for each plugged virtual disk. We have faced several problems in XenServer around this area before; it drastically affects VBD scalability per host. Right, so we have to keep both the lock and the amount of memory consumed in mind. This makes me think that all the persistent grants work was done as a workaround while we were facing several performance problems around concurrent grant un/mapping operations. Given all the recent submissions made around this (grant ops) area, is this something we should perhaps revisit and discuss whether we want to continue offering persistent grants as a feature? Agree, Life would be easier if we can remove the persistent feature. I was thinking about this yesterday, and IMHO I think we should remove persistent grants now while it's not too entangled, leaving it for later will just make our life more miserable. While it's true that persistent grants provide a throughput increase by preventing grant table operations and TLB flushes, it has several problems that cannot by avoided: - Memory/grants hoarding, we need to reserve the same amount of memory as the amount of data that we want to have in-flight. While this is not so critical for memory, it is for grants, since using too many grants can basically deadlock other PV interfaces. There's no way to avoid this since it's the design behind persistent grants. - Memcopy: guest needs to perform a memcopy of all data that goes through blkfront. While not so critical, Felipe found systems were memcopy was more expensive than grant map/unmap in the backend (IIRC those were AMD systems). - Complexity/interactions: when persistent grants was designed number of requests was limited to 32 and each request could only contain 11 pages. This means we had to use 352 pages/grants which was fine. Now that we have indirect IO and multiqueue in the horizon this number has gone up by orders of magnitude, I don't think this is viable/useful any more. If Konrad/Bob agree I would like to send a patch to remove persistent grants and then have the multiqueue series rebased on top of that. I agree with this. I think we can get better performance/scalability gains of with improvements to grant table locking and TLB flush avoidance. David It doesn't change the fact that persistent grants (as well as the grant copy implementation we did for tapdisk3) were alternatives that allowed aggregate storage performance to increase drastically. Before committing to removing something that allow Xen users to scale their deployments, I think we need to revisit whether the recent improvements to the whole grant mechanisms (grant table locking, TLB flushing, batched calls, etc) are performing as we would (now) expect. The fact that this extension improved performance doesn't mean it's right or desirable. So IMHO we should just remove it and take the performance hit. Then we can figure ways to deal with the limitations properly instead of doing this kind of hacks because they prevent us from moving forward. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
-Original Message- From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] Sent: 18 February 2015 17:38 To: Roger Pau Monne Cc: Bob Liu; xen-devel@lists.xen.org; David Vrabel; linux- ker...@vger.kernel.org; Felipe Franciosi; ax...@fb.com; h...@infradead.org; avanzini.aria...@gmail.com Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new struct On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote: El 15/02/15 a les 9.18, Bob Liu ha escrit: A ring is the representation of a hardware queue, this patch separate ring information from blkfront_info to an new struct blkfront_ring_info to make preparation for real multi hardware queues supporting. Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com Signed-off-by: Bob Liu bob@oracle.com --- drivers/block/xen-blkfront.c | 403 +++ 1 file changed, 218 insertions(+), 185 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 5a90a51..aaa4a0e 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -102,23 +102,15 @@ MODULE_PARM_DESC(max, Maximum amount of segments in indirect requests (default #define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE) /* - * We have one of these per vbd, whether ide, scsi or 'other'. They - * hang in private_data off the gendisk structure. We may end up - * putting all kinds of interesting stuff here :-) + * Per-ring info. + * A blkfront_info structure can associate with one or more + blkfront_ring_info, + * depending on how many hardware queues supported. */ -struct blkfront_info -{ +struct blkfront_ring_info { spinlock_t io_lock; - struct mutex mutex; - struct xenbus_device *xbdev; - struct gendisk *gd; - int vdevice; - blkif_vdev_t handle; - enum blkif_state connected; int ring_ref; struct blkif_front_ring ring; unsigned int evtchn, irq; - struct request_queue *rq; struct work_struct work; struct gnttab_free_callback callback; struct blk_shadow shadow[BLK_RING_SIZE]; @@ -126,6 +118,22 @@ struct blkfront_info struct list_head indirect_pages; unsigned int persistent_gnts_c; unsigned long shadow_free; + struct blkfront_info *info; AFAICT you seem to have a list of persistent grants, indirect pages and a grant table callback for each ring, isn't this supposed to be shared between all rings? I don't think we should be going down that route, or else we can hoard a large amount of memory and grants. It does remove the lock that would have to be accessed by each ring thread to access those. Those values (grants) can be limited to be a smaller value such that the overall number is the same as it was with the previous version. As in: each ring has = MAX_GRANTS / nr_online_cpus(). We should definitely be concerned with the amount of memory consumed on the backend for each plugged virtual disk. We have faced several problems in XenServer around this area before; it drastically affects VBD scalability per host. This makes me think that all the persistent grants work was done as a workaround while we were facing several performance problems around concurrent grant un/mapping operations. Given all the recent submissions made around this (grant ops) area, is this something we should perhaps revisit and discuss whether we want to continue offering persistent grants as a feature? Thanks, Felipe ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
El 15/02/15 a les 9.18, Bob Liu ha escrit: A ring is the representation of a hardware queue, this patch separate ring information from blkfront_info to an new struct blkfront_ring_info to make preparation for real multi hardware queues supporting. Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com Signed-off-by: Bob Liu bob@oracle.com --- drivers/block/xen-blkfront.c | 403 +++ 1 file changed, 218 insertions(+), 185 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 5a90a51..aaa4a0e 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -102,23 +102,15 @@ MODULE_PARM_DESC(max, Maximum amount of segments in indirect requests (default #define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE) /* - * We have one of these per vbd, whether ide, scsi or 'other'. They - * hang in private_data off the gendisk structure. We may end up - * putting all kinds of interesting stuff here :-) + * Per-ring info. + * A blkfront_info structure can associate with one or more blkfront_ring_info, + * depending on how many hardware queues supported. */ -struct blkfront_info -{ +struct blkfront_ring_info { spinlock_t io_lock; - struct mutex mutex; - struct xenbus_device *xbdev; - struct gendisk *gd; - int vdevice; - blkif_vdev_t handle; - enum blkif_state connected; int ring_ref; struct blkif_front_ring ring; unsigned int evtchn, irq; - struct request_queue *rq; struct work_struct work; struct gnttab_free_callback callback; struct blk_shadow shadow[BLK_RING_SIZE]; @@ -126,6 +118,22 @@ struct blkfront_info struct list_head indirect_pages; unsigned int persistent_gnts_c; unsigned long shadow_free; + struct blkfront_info *info; AFAICT you seem to have a list of persistent grants, indirect pages and a grant table callback for each ring, isn't this supposed to be shared between all rings? I don't think we should be going down that route, or else we can hoard a large amount of memory and grants. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote: El 15/02/15 a les 9.18, Bob Liu ha escrit: A ring is the representation of a hardware queue, this patch separate ring information from blkfront_info to an new struct blkfront_ring_info to make preparation for real multi hardware queues supporting. Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com Signed-off-by: Bob Liu bob@oracle.com --- drivers/block/xen-blkfront.c | 403 +++ 1 file changed, 218 insertions(+), 185 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 5a90a51..aaa4a0e 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -102,23 +102,15 @@ MODULE_PARM_DESC(max, Maximum amount of segments in indirect requests (default #define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE) /* - * We have one of these per vbd, whether ide, scsi or 'other'. They - * hang in private_data off the gendisk structure. We may end up - * putting all kinds of interesting stuff here :-) + * Per-ring info. + * A blkfront_info structure can associate with one or more blkfront_ring_info, + * depending on how many hardware queues supported. */ -struct blkfront_info -{ +struct blkfront_ring_info { spinlock_t io_lock; - struct mutex mutex; - struct xenbus_device *xbdev; - struct gendisk *gd; - int vdevice; - blkif_vdev_t handle; - enum blkif_state connected; int ring_ref; struct blkif_front_ring ring; unsigned int evtchn, irq; - struct request_queue *rq; struct work_struct work; struct gnttab_free_callback callback; struct blk_shadow shadow[BLK_RING_SIZE]; @@ -126,6 +118,22 @@ struct blkfront_info struct list_head indirect_pages; unsigned int persistent_gnts_c; unsigned long shadow_free; + struct blkfront_info *info; AFAICT you seem to have a list of persistent grants, indirect pages and a grant table callback for each ring, isn't this supposed to be shared between all rings? I don't think we should be going down that route, or else we can hoard a large amount of memory and grants. It does remove the lock that would have to be accessed by each ring thread to access those. Those values (grants) can be limited to be a smaller value such that the overall number is the same as it was with the previous version. As in: each ring has = MAX_GRANTS / nr_online_cpus(). ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
AFAICT you seem to have a list of persistent grants, indirect pages and a grant table callback for each ring, isn't this supposed to be shared between all rings? I don't think we should be going down that route, or else we can hoard a large amount of memory and grants. It does remove the lock that would have to be accessed by each ring thread to access those. Those values (grants) can be limited to be a smaller value such that the overall number is the same as it was with the previous version. As in: each ring has = MAX_GRANTS / nr_online_cpus(). We should definitely be concerned with the amount of memory consumed on the backend for each plugged virtual disk. We have faced several problems in XenServer around this area before; it drastically affects VBD scalability per host. This makes me think that all the persistent grants work was done as a workaround while we were facing several performance problems around concurrent grant un/mapping operations. Given all the recent submissions made around this (grant ops) area, is this something we should perhaps revisit and discuss whether we want to continue offering persistent grants as a feature? Certainly. Perhaps as a talking point at XenHackathon? Thanks, Felipe ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
On 02/19/2015 02:08 AM, Felipe Franciosi wrote: -Original Message- From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] Sent: 18 February 2015 17:38 To: Roger Pau Monne Cc: Bob Liu; xen-devel@lists.xen.org; David Vrabel; linux- ker...@vger.kernel.org; Felipe Franciosi; ax...@fb.com; h...@infradead.org; avanzini.aria...@gmail.com Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new struct On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote: El 15/02/15 a les 9.18, Bob Liu ha escrit: A ring is the representation of a hardware queue, this patch separate ring information from blkfront_info to an new struct blkfront_ring_info to make preparation for real multi hardware queues supporting. Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com Signed-off-by: Bob Liu bob@oracle.com --- drivers/block/xen-blkfront.c | 403 +++ 1 file changed, 218 insertions(+), 185 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 5a90a51..aaa4a0e 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -102,23 +102,15 @@ MODULE_PARM_DESC(max, Maximum amount of segments in indirect requests (default #define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE) /* - * We have one of these per vbd, whether ide, scsi or 'other'. They - * hang in private_data off the gendisk structure. We may end up - * putting all kinds of interesting stuff here :-) + * Per-ring info. + * A blkfront_info structure can associate with one or more + blkfront_ring_info, + * depending on how many hardware queues supported. */ -struct blkfront_info -{ +struct blkfront_ring_info { spinlock_t io_lock; - struct mutex mutex; - struct xenbus_device *xbdev; - struct gendisk *gd; - int vdevice; - blkif_vdev_t handle; - enum blkif_state connected; int ring_ref; struct blkif_front_ring ring; unsigned int evtchn, irq; - struct request_queue *rq; struct work_struct work; struct gnttab_free_callback callback; struct blk_shadow shadow[BLK_RING_SIZE]; @@ -126,6 +118,22 @@ struct blkfront_info struct list_head indirect_pages; unsigned int persistent_gnts_c; unsigned long shadow_free; + struct blkfront_info *info; AFAICT you seem to have a list of persistent grants, indirect pages and a grant table callback for each ring, isn't this supposed to be shared between all rings? I don't think we should be going down that route, or else we can hoard a large amount of memory and grants. It does remove the lock that would have to be accessed by each ring thread to access those. Those values (grants) can be limited to be a smaller value such that the overall number is the same as it was with the previous version. As in: each ring has = MAX_GRANTS / nr_online_cpus(). We should definitely be concerned with the amount of memory consumed on the backend for each plugged virtual disk. We have faced several problems in XenServer around this area before; it drastically affects VBD scalability per host. Right, so we have to keep both the lock and the amount of memory consumed in mind. This makes me think that all the persistent grants work was done as a workaround while we were facing several performance problems around concurrent grant un/mapping operations. Given all the recent submissions made around this (grant ops) area, is this something we should perhaps revisit and discuss whether we want to continue offering persistent grants as a feature? Agree, Life would be easier if we can remove the persistent feature. -- Regards, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
A ring is the representation of a hardware queue, this patch separate ring information from blkfront_info to an new struct blkfront_ring_info to make preparation for real multi hardware queues supporting. Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com Signed-off-by: Bob Liu bob@oracle.com --- drivers/block/xen-blkfront.c | 403 +++ 1 file changed, 218 insertions(+), 185 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 5a90a51..aaa4a0e 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -102,23 +102,15 @@ MODULE_PARM_DESC(max, Maximum amount of segments in indirect requests (default #define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE) /* - * We have one of these per vbd, whether ide, scsi or 'other'. They - * hang in private_data off the gendisk structure. We may end up - * putting all kinds of interesting stuff here :-) + * Per-ring info. + * A blkfront_info structure can associate with one or more blkfront_ring_info, + * depending on how many hardware queues supported. */ -struct blkfront_info -{ +struct blkfront_ring_info { spinlock_t io_lock; - struct mutex mutex; - struct xenbus_device *xbdev; - struct gendisk *gd; - int vdevice; - blkif_vdev_t handle; - enum blkif_state connected; int ring_ref; struct blkif_front_ring ring; unsigned int evtchn, irq; - struct request_queue *rq; struct work_struct work; struct gnttab_free_callback callback; struct blk_shadow shadow[BLK_RING_SIZE]; @@ -126,6 +118,22 @@ struct blkfront_info struct list_head indirect_pages; unsigned int persistent_gnts_c; unsigned long shadow_free; + struct blkfront_info *info; +}; + +/* + * We have one of these per vbd, whether ide, scsi or 'other'. They + * hang in private_data off the gendisk structure. We may end up + * putting all kinds of interesting stuff here :-) + */ +struct blkfront_info { + struct mutex mutex; + struct xenbus_device *xbdev; + struct gendisk *gd; + int vdevice; + blkif_vdev_t handle; + enum blkif_state connected; + struct request_queue *rq; unsigned int feature_flush; unsigned int feature_discard:1; unsigned int feature_secdiscard:1; @@ -135,6 +143,7 @@ struct blkfront_info unsigned int max_indirect_segments; int is_ready; struct blk_mq_tag_set tag_set; + struct blkfront_ring_info rinfo; }; static unsigned int nr_minors; @@ -167,34 +176,35 @@ static DEFINE_SPINLOCK(minor_lock); #define INDIRECT_GREFS(_segs) \ ((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME) -static int blkfront_setup_indirect(struct blkfront_info *info); +static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo); -static int get_id_from_freelist(struct blkfront_info *info) +static int get_id_from_freelist(struct blkfront_ring_info *rinfo) { - unsigned long free = info-shadow_free; + unsigned long free = rinfo-shadow_free; BUG_ON(free = BLK_RING_SIZE); - info-shadow_free = info-shadow[free].req.u.rw.id; - info-shadow[free].req.u.rw.id = 0x0fee; /* debug */ + rinfo-shadow_free = rinfo-shadow[free].req.u.rw.id; + rinfo-shadow[free].req.u.rw.id = 0x0fee; /* debug */ return free; } -static int add_id_to_freelist(struct blkfront_info *info, +static int add_id_to_freelist(struct blkfront_ring_info *rinfo, unsigned long id) { - if (info-shadow[id].req.u.rw.id != id) + if (rinfo-shadow[id].req.u.rw.id != id) return -EINVAL; - if (info-shadow[id].request == NULL) + if (rinfo-shadow[id].request == NULL) return -EINVAL; - info-shadow[id].req.u.rw.id = info-shadow_free; - info-shadow[id].request = NULL; - info-shadow_free = id; + rinfo-shadow[id].req.u.rw.id = rinfo-shadow_free; + rinfo-shadow[id].request = NULL; + rinfo-shadow_free = id; return 0; } -static int fill_grant_buffer(struct blkfront_info *info, int num) +static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num) { struct page *granted_page; struct grant *gnt_list_entry, *n; + struct blkfront_info *info = rinfo-info; int i = 0; while(i num) { @@ -212,7 +222,7 @@ static int fill_grant_buffer(struct blkfront_info *info, int num) } gnt_list_entry-gref = GRANT_INVALID_REF; - list_add(gnt_list_entry-node, info-grants); + list_add(gnt_list_entry-node, rinfo-grants); i++; } @@ -220,7 +230,7 @@ static int fill_grant_buffer(struct blkfront_info *info, int num) out_of_memory: list_for_each_entry_safe(gnt_list_entry, n, -