Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-18 Thread Denis V. Lunev
On 04/18/2017 02:52 PM, Alberto Garcia wrote: > On Thu 13 Apr 2017 05:17:21 PM CEST, Denis V. Lunev wrote: >> On 04/13/2017 06:04 PM, Alberto Garcia wrote: >>> On Thu 13 Apr 2017 03:30:43 PM CEST, Denis V. Lunev wrote: Yes, block size should be increased. I perfectly in agreement with

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-18 Thread Alberto Garcia
On Thu 13 Apr 2017 05:17:21 PM CEST, Denis V. Lunev wrote: > On 04/13/2017 06:04 PM, Alberto Garcia wrote: >> On Thu 13 Apr 2017 03:30:43 PM CEST, Denis V. Lunev wrote: >>> Yes, block size should be increased. I perfectly in agreement with >>> your. But I think that we could do that by plain

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Denis V. Lunev
On 04/13/2017 04:05 PM, Kevin Wolf wrote: > Am 13.04.2017 um 14:44 hat Denis V. Lunev geschrieben: >> On 04/13/2017 02:58 PM, Alberto Garcia wrote: >>> On Wed 12 Apr 2017 06:54:50 PM CEST, Denis V. Lunev wrote: My opinion about this approach is very negative as the problem could be

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Denis V. Lunev
On 04/13/2017 12:44 PM, Kevin Wolf wrote: > Am 12.04.2017 um 21:02 hat Denis V. Lunev geschrieben: >> On 04/12/2017 09:20 PM, Eric Blake wrote: >>> On 04/12/2017 12:55 PM, Denis V. Lunev wrote: Let me rephrase a bit. The proposal is looking very close to the following case: -

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Denis V. Lunev
On 04/13/2017 04:21 PM, Alberto Garcia wrote: > On Thu 13 Apr 2017 02:44:51 PM CEST, Denis V. Lunev wrote: 1) current L2 cache management seems very wrong to me. Each cache miss means that we have to read entire L2 cache block. This means that in the worst case (when dataset

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Denis V. Lunev
On 04/13/2017 06:04 PM, Alberto Garcia wrote: > On Thu 13 Apr 2017 03:30:43 PM CEST, Denis V. Lunev wrote: >> Yes, block size should be increased. I perfectly in agreement with >> your. But I think that we could do that by plain increase of the >> cluster size without any further dances.

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Alberto Garcia
On Thu 13 Apr 2017 03:30:43 PM CEST, Denis V. Lunev wrote: > Yes, block size should be increased. I perfectly in agreement with > your. But I think that we could do that by plain increase of the > cluster size without any further dances. Sub-clusters as sub-clusters > will help if we are able to

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Denis V. Lunev
On 04/13/2017 04:51 PM, Kevin Wolf wrote: > Am 13.04.2017 um 15:21 hat Alberto Garcia geschrieben: >> This invariant is already broken by the very design of the qcow2 format, >> subclusters don't really add anything new there. For any given cluster >> size you can write 4k in every odd cluster,

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Denis V. Lunev
On 04/13/2017 04:36 PM, Alberto Garcia wrote: > On Thu 13 Apr 2017 03:09:53 PM CEST, Denis V. Lunev wrote: For nowadays SSD we are facing problems somewhere else. Right now I can achieve only 100k IOPSes on SSD capable of 350-550k. 1 Mb block with preallocation and fragmented L2

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Kevin Wolf
Am 13.04.2017 um 16:15 hat Alberto Garcia geschrieben: > On Thu 13 Apr 2017 03:51:55 PM CEST, Kevin Wolf wrote: > >> This invariant is already broken by the very design of the qcow2 > >> format, subclusters don't really add anything new there. For any > >> given cluster size you can write 4k in

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Denis V. Lunev
On 04/13/2017 02:58 PM, Alberto Garcia wrote: > On Wed 12 Apr 2017 06:54:50 PM CEST, Denis V. Lunev wrote: >> My opinion about this approach is very negative as the problem could >> be (partially) solved in a much better way. > Hmm... it seems to me that (some of) the problems you are describing

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Alberto Garcia
On Thu 13 Apr 2017 03:51:55 PM CEST, Kevin Wolf wrote: >> This invariant is already broken by the very design of the qcow2 >> format, subclusters don't really add anything new there. For any >> given cluster size you can write 4k in every odd cluster, then do the >> same in every even cluster, and

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Kevin Wolf
Am 13.04.2017 um 15:30 hat Denis V. Lunev geschrieben: > On 04/13/2017 04:21 PM, Alberto Garcia wrote: > > On Thu 13 Apr 2017 02:44:51 PM CEST, Denis V. Lunev wrote: > 1) current L2 cache management seems very wrong to me. Each cache > miss means that we have to read entire L2 cache

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Kevin Wolf
Am 13.04.2017 um 15:21 hat Alberto Garcia geschrieben: > This invariant is already broken by the very design of the qcow2 format, > subclusters don't really add anything new there. For any given cluster > size you can write 4k in every odd cluster, then do the same in every > even cluster, and

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Alberto Garcia
On Thu 13 Apr 2017 03:09:53 PM CEST, Denis V. Lunev wrote: >>> For nowadays SSD we are facing problems somewhere else. Right now I >>> can achieve only 100k IOPSes on SSD capable of 350-550k. 1 Mb block >>> with preallocation and fragmented L2 cache gives same 100k. Tests >>> for initially empty

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Alberto Garcia
On Thu 13 Apr 2017 02:44:51 PM CEST, Denis V. Lunev wrote: >>> 1) current L2 cache management seems very wrong to me. Each cache >>> miss means that we have to read entire L2 cache block. This means >>> that in the worst case (when dataset of the test does not fit L2 >>> cache size we

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Kevin Wolf
Am 13.04.2017 um 14:44 hat Denis V. Lunev geschrieben: > On 04/13/2017 02:58 PM, Alberto Garcia wrote: > > On Wed 12 Apr 2017 06:54:50 PM CEST, Denis V. Lunev wrote: > >> My opinion about this approach is very negative as the problem could > >> be (partially) solved in a much better way. > >

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Alberto Garcia
On Wed 12 Apr 2017 06:54:50 PM CEST, Denis V. Lunev wrote: > My opinion about this approach is very negative as the problem could > be (partially) solved in a much better way. Hmm... it seems to me that (some of) the problems you are describing are different from the ones this proposal tries to

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-13 Thread Kevin Wolf
Am 12.04.2017 um 21:02 hat Denis V. Lunev geschrieben: > On 04/12/2017 09:20 PM, Eric Blake wrote: > > On 04/12/2017 12:55 PM, Denis V. Lunev wrote: > >> Let me rephrase a bit. > >> > >> The proposal is looking very close to the following case: > >> - raw sparse file > >> > >> In this case all

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-12 Thread Denis V. Lunev
On 04/12/2017 09:20 PM, Eric Blake wrote: > On 04/12/2017 12:55 PM, Denis V. Lunev wrote: >> Let me rephrase a bit. >> >> The proposal is looking very close to the following case: >> - raw sparse file >> >> In this case all writes are very-very-very fast and from the >> guest point of view all is

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-12 Thread Eric Blake
On 04/12/2017 12:55 PM, Denis V. Lunev wrote: > Let me rephrase a bit. > > The proposal is looking very close to the following case: > - raw sparse file > > In this case all writes are very-very-very fast and from the > guest point of view all is OK. Sequential data is really sequential. >

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-12 Thread Denis V. Lunev
On 04/06/2017 06:01 PM, Alberto Garcia wrote: > Hi all, > > over the past couple of months I discussed with some of you the > possibility to extend the qcow2 format in order to improve its > performance and reduce its memory requirements (particularly with very > large images). > > After some

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-12 Thread Denis V. Lunev
On 04/06/2017 06:01 PM, Alberto Garcia wrote: > Hi all, > > over the past couple of months I discussed with some of you the > possibility to extend the qcow2 format in order to improve its > performance and reduce its memory requirements (particularly with very > large images). > > After some

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-12 Thread Alberto Garcia
On Tue 11 Apr 2017 04:49:21 PM CEST, Kevin Wolf wrote: >> >>> (We could even get one more bit if we had a subcluster-flag, because I >> >>> guess we can always assume subclustered clusters to have OFLAG_COPIED >> >>> and be uncompressed. But still, three bits missing.) >> >> >> >> Why can we

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-11 Thread Max Reitz
On 11.04.2017 17:30, Eric Blake wrote: > On 04/11/2017 10:18 AM, Max Reitz wrote: > >> Hm, yeah, although you have to keep in mind that the padding is almost >> pretty much the same as the the data bits we need, effectively doubling >> the size of the L2 tables: >> >> padding = 2^{n+2} - 2^{n+1}

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-11 Thread Eric Blake
On 04/11/2017 10:18 AM, Max Reitz wrote: > Hm, yeah, although you have to keep in mind that the padding is almost > pretty much the same as the the data bits we need, effectively doubling > the size of the L2 tables: > > padding = 2^{n+2} - 2^{n+1} - 64 (=2^6) > = 2^{n+1} - 64 > > So

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-11 Thread Max Reitz
On 11.04.2017 17:29, Kevin Wolf wrote: > Am 11.04.2017 um 17:18 hat Max Reitz geschrieben: >> On 11.04.2017 17:08, Eric Blake wrote: >>> On 04/11/2017 09:59 AM, Max Reitz wrote: >>> Good point, but that also means that (with (2)) you can only use subcluster configurations where the

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-11 Thread Kevin Wolf
Am 11.04.2017 um 17:18 hat Max Reitz geschrieben: > On 11.04.2017 17:08, Eric Blake wrote: > > On 04/11/2017 09:59 AM, Max Reitz wrote: > > > >> > >> Good point, but that also means that (with (2)) you can only use > >> subcluster configurations where the L2 entry size increases by a power > >>

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-11 Thread Max Reitz
On 11.04.2017 17:08, Eric Blake wrote: > On 04/11/2017 09:59 AM, Max Reitz wrote: > >> >> Good point, but that also means that (with (2)) you can only use >> subcluster configurations where the L2 entry size increases by a power >> of two. Unfortunately, only one of those configurations itself is

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-11 Thread Eric Blake
On 04/11/2017 09:59 AM, Max Reitz wrote: > > Good point, but that also means that (with (2)) you can only use > subcluster configurations where the L2 entry size increases by a power > of two. Unfortunately, only one of those configurations itself is a > power of two, and that is 32. > > (With

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-11 Thread Max Reitz
On 11.04.2017 16:49, Kevin Wolf wrote: [...] > By the way, if you'd only allow multiple of 1s overhead > (i.e. multiples of 32 subclusters), I think (3) would be pretty much > the same as (2) if you just always write the subcluster information > adjacent to the L2 table. Should

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-11 Thread Eric Blake
On 04/11/2017 09:49 AM, Kevin Wolf wrote: Then (3) is effectively the same as (2), just that the subcluster bitmaps are at the end of the L2 cluster, and not next to each entry. >>> >>> Exactly. But it's a difference in implementation, as you won't have to >>> worry about having changed

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-11 Thread Kevin Wolf
Am 11.04.2017 um 16:31 hat Alberto Garcia geschrieben: > On Tue 11 Apr 2017 04:04:53 PM CEST, Max Reitz wrote: > >>> (We could even get one more bit if we had a subcluster-flag, because I > >>> guess we can always assume subclustered clusters to have OFLAG_COPIED > >>> and be uncompressed. But

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-11 Thread Alberto Garcia
On Tue 11 Apr 2017 04:04:53 PM CEST, Max Reitz wrote: >>> (We could even get one more bit if we had a subcluster-flag, because I >>> guess we can always assume subclustered clusters to have OFLAG_COPIED >>> and be uncompressed. But still, three bits missing.) >> >> Why can we always assume

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-11 Thread Max Reitz
On 11.04.2017 14:56, Alberto Garcia wrote: > On Fri 07 Apr 2017 07:10:46 PM CEST, Max Reitz wrote: >>> === Changes to the on-disk format === >>> >>> The qcow2 on-disk format needs to change so each L2 entry has a bitmap >>> indicating the allocation status of each subcluster. There are three >>>

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-11 Thread Alberto Garcia
On Fri 07 Apr 2017 07:10:46 PM CEST, Max Reitz wrote: >> === Changes to the on-disk format === >> >> The qcow2 on-disk format needs to change so each L2 entry has a bitmap >> indicating the allocation status of each subcluster. There are three >> possible states (unallocated, allocated, all

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-10 Thread Stefan Hajnoczi
On Fri, Apr 07, 2017 at 03:01:29PM +0200, Kevin Wolf wrote: > Am 07.04.2017 um 14:20 hat Stefan Hajnoczi geschrieben: > > On Thu, Apr 06, 2017 at 06:01:48PM +0300, Alberto Garcia wrote: > > > Here are the results (subcluster size in brackets): > > > > > >

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-10 Thread Max Reitz
On 10.04.2017 10:42, Kevin Wolf wrote: > Am 07.04.2017 um 19:10 hat Max Reitz geschrieben: >> One case I'd be especially interested in are of course 4 kB subclusters >> for 64 kB clusters (because 4 kB is a usual page size and can be >> configured to be the block size of a guest device; and

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-10 Thread Kevin Wolf
Am 07.04.2017 um 19:10 hat Max Reitz geschrieben: > One case I'd be especially interested in are of course 4 kB subclusters > for 64 kB clusters (because 4 kB is a usual page size and can be > configured to be the block size of a guest device; and because 64 kB > simply is the standard cluster

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-07 Thread Max Reitz
On 06.04.2017 17:01, Alberto Garcia wrote: > Hi all, > > over the past couple of months I discussed with some of you the > possibility to extend the qcow2 format in order to improve its > performance and reduce its memory requirements (particularly with very > large images). > > After some

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-07 Thread Alberto Garcia
On Fri 07 Apr 2017 02:41:21 PM CEST, Kevin Wolf wrote: >> 6356 5548 4740 3932 3124 2316 15 8 7 0 >> >> **<> <---><-->*

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-07 Thread Kevin Wolf
Am 07.04.2017 um 14:20 hat Stefan Hajnoczi geschrieben: > On Thu, Apr 06, 2017 at 06:01:48PM +0300, Alberto Garcia wrote: > > Here are the results (subcluster size in brackets): > > > > |-++-+---| > > | cluster size |

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-07 Thread Kevin Wolf
Am 06.04.2017 um 18:40 hat Eric Blake geschrieben: > On 04/06/2017 10:01 AM, Alberto Garcia wrote: > > I thought of three alternatives for storing the subcluster bitmaps. I > > haven't made my mind completely about which one is the best one, so > > I'd like to present all three for discussion.

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-07 Thread Alberto Garcia
On Fri 07 Apr 2017 02:20:21 PM CEST, Stefan Hajnoczi wrote: >> Here are the results when writing to an empty 40GB qcow2 image with no >> backing file. The numbers are of course different but as you can see >> the patterns are similar: >> >>

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-07 Thread Stefan Hajnoczi
On Thu, Apr 06, 2017 at 06:01:48PM +0300, Alberto Garcia wrote: > Here are the results (subcluster size in brackets): > > |-++-+---| > | cluster size | subclusters=on | subclusters=off | Max L2 cache size | >

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-07 Thread Alberto Garcia
On Thu 06 Apr 2017 06:40:41 PM CEST, Eric Blake wrote: >> This e-mail is the formal presentation of my proposal to extend the >> on-disk qcow2 format. As you can see this is still an RFC. Due to the >> nature of the changes I would like to get as much feedback as >> possible before going forward.

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-06 Thread Eric Blake
On 04/06/2017 10:01 AM, Alberto Garcia wrote: > Hi all, > > over the past couple of months I discussed with some of you the > possibility to extend the qcow2 format in order to improve its > performance and reduce its memory requirements (particularly with very > large images). > > After some

[Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-06 Thread Alberto Garcia
Hi all, over the past couple of months I discussed with some of you the possibility to extend the qcow2 format in order to improve its performance and reduce its memory requirements (particularly with very large images). After some discussion in the mailing list and the #qemu IRC channel I