Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 02:12, Jens Axboe wrote: > > It is a system wide problem. Every block device needs throttling, > > otherwise queues expand without limit. Currently, block devices > > that use the standard request library get a slipshod form of > > throttling for free in the form of

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 05:18, Evgeniy Polyakov wrote: > > Say you have a device mapper device with some physical device > > sitting underneath, the classic use case for this throttle code. > > Say 8,000 threads each submit an IO in parallel. The device mapper > > mapping function will be

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 05:04, Evgeniy Polyakov wrote: > On Mon, Aug 13, 2007 at 04:04:26AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > On Monday 13 August 2007 01:14, Evgeniy Polyakov wrote: > > > > Oops, and there is also: > > > > > > >

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 04:03, Evgeniy Polyakov wrote: > On Mon, Aug 13, 2007 at 03:12:33AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > > This is not a very good solution, since it requires all users of > > > the bios to know how to free it. > > > > No

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 01:23, Evgeniy Polyakov wrote: > On Sun, Aug 12, 2007 at 10:36:23PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > (previous incomplete message sent accidentally) > > > > On Wednesday 08 August 2007 02:54, Evgeniy Polyakov wrote: > > >

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 01:14, Evgeniy Polyakov wrote: > > Oops, and there is also: > > > > 3) The bio throttle, which is supposed to prevent deadlock, can > > itself deadlock. Let me see if I can remember how it goes. > > > > * generic_make_request puts a bio in flight > > * the bio gets

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 03:22, Jens Axboe wrote: > I never compared the bio to struct page, I'd obviously agree that > shrinking struct page was a worthy goal and that it'd be ok to uglify > some code to do that. The same isn't true for struct bio. I thought I just said that. Regards, Daniel -

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 03:06, Jens Axboe wrote: > On Mon, Aug 13 2007, Daniel Phillips wrote: > > Of course not. Nothing I said stops endio from being called in the > > usual way as well. For this to work, endio just needs to know that > > one call means "end&quo

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 02:18, Evgeniy Polyakov wrote: > On Mon, Aug 13, 2007 at 02:08:57AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: > > > But that idea fails as well, since reference counts and IO > > > completion are two completely seperate entities. So unl

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 02:13, Jens Axboe wrote: > On Mon, Aug 13 2007, Daniel Phillips wrote: > > On Monday 13 August 2007 00:45, Jens Axboe wrote: > > > On Mon, Aug 13 2007, Jens Axboe wrote: > > > > > You did not comment on the one about putting the bio &g

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 00:45, Jens Axboe wrote: > On Mon, Aug 13 2007, Jens Axboe wrote: > > > You did not comment on the one about putting the bio destructor > > > in the ->endio handler, which looks dead simple. The majority of > > > cases just use the default endio handler and the default >

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 00:28, Jens Axboe wrote: > On Sun, Aug 12 2007, Daniel Phillips wrote: > > Right, that is done by bi_vcnt. I meant bi_max_vecs, which you can > > derive efficiently from BIO_POOL_IDX() provided the bio was > > allocated in the standard way. > >

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-13 Thread Daniel Phillips
On Friday 10 August 2007 10:46, Christoph Lameter wrote: > On Fri, 10 Aug 2007, Daniel Phillips wrote: > > It is quite clear what is in your patch. Instead of just grabbing > > a page off the buddy free lists in a critical allocation situation > > you go invoke shrink_cach

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Sunday 12 August 2007 22:36, I wrote: > Note! There are two more issues I forgot to mention earlier. Oops, and there is also: 3) The bio throttle, which is supposed to prevent deadlock, can itself deadlock. Let me see if I can remember how it goes. * generic_make_request puts a bio in

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Sunday 12 August 2007 22:36, I wrote: Note! There are two more issues I forgot to mention earlier. Oops, and there is also: 3) The bio throttle, which is supposed to prevent deadlock, can itself deadlock. Let me see if I can remember how it goes. * generic_make_request puts a bio in

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-13 Thread Daniel Phillips
On Friday 10 August 2007 10:46, Christoph Lameter wrote: On Fri, 10 Aug 2007, Daniel Phillips wrote: It is quite clear what is in your patch. Instead of just grabbing a page off the buddy free lists in a critical allocation situation you go invoke shrink_caches. Why oh why? All

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 00:45, Jens Axboe wrote: On Mon, Aug 13 2007, Jens Axboe wrote: You did not comment on the one about putting the bio destructor in the -endio handler, which looks dead simple. The majority of cases just use the default endio handler and the default

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 00:28, Jens Axboe wrote: On Sun, Aug 12 2007, Daniel Phillips wrote: Right, that is done by bi_vcnt. I meant bi_max_vecs, which you can derive efficiently from BIO_POOL_IDX() provided the bio was allocated in the standard way. That would only be feasible, if we

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 03:06, Jens Axboe wrote: On Mon, Aug 13 2007, Daniel Phillips wrote: Of course not. Nothing I said stops endio from being called in the usual way as well. For this to work, endio just needs to know that one call means end and the other means destroy

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 02:18, Evgeniy Polyakov wrote: On Mon, Aug 13, 2007 at 02:08:57AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: But that idea fails as well, since reference counts and IO completion are two completely seperate entities. So unless end IO just happens

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 02:13, Jens Axboe wrote: On Mon, Aug 13 2007, Daniel Phillips wrote: On Monday 13 August 2007 00:45, Jens Axboe wrote: On Mon, Aug 13 2007, Jens Axboe wrote: You did not comment on the one about putting the bio destructor in the -endio handler, which looks

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 03:22, Jens Axboe wrote: I never compared the bio to struct page, I'd obviously agree that shrinking struct page was a worthy goal and that it'd be ok to uglify some code to do that. The same isn't true for struct bio. I thought I just said that. Regards, Daniel - To

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 01:14, Evgeniy Polyakov wrote: Oops, and there is also: 3) The bio throttle, which is supposed to prevent deadlock, can itself deadlock. Let me see if I can remember how it goes. * generic_make_request puts a bio in flight * the bio gets past the

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 01:23, Evgeniy Polyakov wrote: On Sun, Aug 12, 2007 at 10:36:23PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: (previous incomplete message sent accidentally) On Wednesday 08 August 2007 02:54, Evgeniy Polyakov wrote: On Tue, Aug 07, 2007 at 10:55:38PM

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 05:04, Evgeniy Polyakov wrote: On Mon, Aug 13, 2007 at 04:04:26AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: On Monday 13 August 2007 01:14, Evgeniy Polyakov wrote: Oops, and there is also: 3) The bio throttle, which is supposed to prevent deadlock

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 04:03, Evgeniy Polyakov wrote: On Mon, Aug 13, 2007 at 03:12:33AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: This is not a very good solution, since it requires all users of the bios to know how to free it. No, only the specific -endio needs to know

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 05:18, Evgeniy Polyakov wrote: Say you have a device mapper device with some physical device sitting underneath, the classic use case for this throttle code. Say 8,000 threads each submit an IO in parallel. The device mapper mapping function will be called 8,000

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 02:12, Jens Axboe wrote: It is a system wide problem. Every block device needs throttling, otherwise queues expand without limit. Currently, block devices that use the standard request library get a slipshod form of throttling for free in the form of limiting

Re: Block device throttling [Re: Distributed storage.]

2007-08-12 Thread Daniel Phillips
(previous incomplete message sent accidentally) On Wednesday 08 August 2007 02:54, Evgeniy Polyakov wrote: > On Tue, Aug 07, 2007 at 10:55:38PM +0200, Jens Axboe wrote: > > So, what did we decide? To bloat bio a bit (add a queue pointer) or > to use physical device limits? The latter requires to

Re: Block device throttling [Re: Distributed storage.]

2007-08-12 Thread Daniel Phillips
On Wednesday 08 August 2007 02:54, Evgeniy Polyakov wrote: > On Tue, Aug 07, 2007 at 10:55:38PM +0200, Jens Axboe ([EMAIL PROTECTED]) wrote: > > So, what did we decide? To bloat bio a bit (add a queue pointer) or > to use physical device limits? The latter requires to replace all > occurence of

Re: Distributed storage.

2007-08-12 Thread Daniel Phillips
On Tuesday 07 August 2007 13:55, Jens Axboe wrote: > I don't like structure bloat, but I do like nice design. Overloading > is a necessary evil sometimes, though. Even today, there isn't enough > room to hold bi_rw and bi_flags in the same variable on 32-bit archs, > so that concern can be

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-12 Thread Daniel Phillips
Hi Evgeniy, Sorry for not getting back to you right away, I was on the road with limited email access. Incidentally, the reason my mails to you keep bouncing is, your MTA is picky about my mailer's IP reversing to a real hostname. I will take care of that pretty soon, but for now my direct

Re: Block device throttling [Re: Distributed storage.]

2007-08-12 Thread Daniel Phillips
On Wednesday 08 August 2007 02:54, Evgeniy Polyakov wrote: On Tue, Aug 07, 2007 at 10:55:38PM +0200, Jens Axboe ([EMAIL PROTECTED]) wrote: So, what did we decide? To bloat bio a bit (add a queue pointer) or to use physical device limits? The latter requires to replace all occurence of

Re: Block device throttling [Re: Distributed storage.]

2007-08-12 Thread Daniel Phillips
(previous incomplete message sent accidentally) On Wednesday 08 August 2007 02:54, Evgeniy Polyakov wrote: On Tue, Aug 07, 2007 at 10:55:38PM +0200, Jens Axboe wrote: So, what did we decide? To bloat bio a bit (add a queue pointer) or to use physical device limits? The latter requires to

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-12 Thread Daniel Phillips
Hi Evgeniy, Sorry for not getting back to you right away, I was on the road with limited email access. Incidentally, the reason my mails to you keep bouncing is, your MTA is picky about my mailer's IP reversing to a real hostname. I will take care of that pretty soon, but for now my direct

Re: Distributed storage.

2007-08-12 Thread Daniel Phillips
On Tuesday 07 August 2007 13:55, Jens Axboe wrote: I don't like structure bloat, but I do like nice design. Overloading is a necessary evil sometimes, though. Even today, there isn't enough room to hold bi_rw and bi_flags in the same variable on 32-bit archs, so that concern can be scratched.

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-10 Thread Daniel Phillips
On 8/10/07, Christoph Lameter <[EMAIL PROTECTED]> wrote: > The idea of adding code to deal with "I have no memory" situations > in a kernel that based on have as much memory as possible in use at all > times is plainly the wrong approach. No. It is you who have read the patches wrongly, because

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-10 Thread Daniel Phillips
On 8/9/07, Christoph Lameter <[EMAIL PROTECTED]> wrote: > > If you believe that the deadlock problems we address here can be > > better fixed by making reclaim more intelligent then please post a > > patch and we will test it. I am highly skeptical, but the proof is in > > the patch. > > Then

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-10 Thread Daniel Phillips
On 8/9/07, Christoph Lameter [EMAIL PROTECTED] wrote: If you believe that the deadlock problems we address here can be better fixed by making reclaim more intelligent then please post a patch and we will test it. I am highly skeptical, but the proof is in the patch. Then please test the

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-10 Thread Daniel Phillips
On 8/10/07, Christoph Lameter [EMAIL PROTECTED] wrote: The idea of adding code to deal with I have no memory situations in a kernel that based on have as much memory as possible in use at all times is plainly the wrong approach. No. It is you who have read the patches wrongly, because what

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-09 Thread Daniel Phillips
On 8/9/07, Christoph Lameter <[EMAIL PROTECTED]> wrote: > The allocations problems that this patch addresses can be fixed by making > reclaim > more intelligent. If you believe that the deadlock problems we address here can be better fixed by making reclaim more intelligent then please post a

Re: [PATCH 04/10] mm: slub: add knowledge of reserve pages

2007-08-09 Thread Daniel Phillips
On 8/8/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > On Wed, 8 Aug 2007 10:57:13 -0700 (PDT) > Christoph Lameter <[EMAIL PROTECTED]> wrote: > > > I think in general irq context reclaim is doable. Cannot see obvious > > issues on a first superficial pass through rmap.c. The irq holdoff would > >

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-09 Thread Daniel Phillips
On 8/9/07, Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Thu, 9 Aug 2007, Daniel Phillips wrote: > > On 8/8/07, Christoph Lameter <[EMAIL PROTECTED]> wrote: > > > On Wed, 8 Aug 2007, Daniel Phillips wrote: > > > Maybe we need to kill PF_MEMALLOC

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-09 Thread Daniel Phillips
On 8/8/07, Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Wed, 8 Aug 2007, Daniel Phillips wrote: > Maybe we need to kill PF_MEMALLOC Shrink_caches needs to be able to recurse into filesystems at least, and for the duration of the recursion the filesystem must have privi

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-09 Thread Daniel Phillips
On 8/8/07, Christoph Lameter [EMAIL PROTECTED] wrote: On Wed, 8 Aug 2007, Daniel Phillips wrote: Maybe we need to kill PF_MEMALLOC Shrink_caches needs to be able to recurse into filesystems at least, and for the duration of the recursion the filesystem must have privileged access

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-09 Thread Daniel Phillips
On 8/9/07, Christoph Lameter [EMAIL PROTECTED] wrote: On Thu, 9 Aug 2007, Daniel Phillips wrote: On 8/8/07, Christoph Lameter [EMAIL PROTECTED] wrote: On Wed, 8 Aug 2007, Daniel Phillips wrote: Maybe we need to kill PF_MEMALLOC Shrink_caches needs to be able to recurse

Re: [PATCH 04/10] mm: slub: add knowledge of reserve pages

2007-08-09 Thread Daniel Phillips
On 8/8/07, Andrew Morton [EMAIL PROTECTED] wrote: On Wed, 8 Aug 2007 10:57:13 -0700 (PDT) Christoph Lameter [EMAIL PROTECTED] wrote: I think in general irq context reclaim is doable. Cannot see obvious issues on a first superficial pass through rmap.c. The irq holdoff would be pretty long

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-09 Thread Daniel Phillips
On 8/9/07, Christoph Lameter [EMAIL PROTECTED] wrote: The allocations problems that this patch addresses can be fixed by making reclaim more intelligent. If you believe that the deadlock problems we address here can be better fixed by making reclaim more intelligent then please post a patch

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-08 Thread Daniel Phillips
On 8/7/07, Christoph Lameter <[EMAIL PROTECTED]> wrote: > > > AFAICT: This patchset is not throttling processes but failing > > > allocations. > > > > Failing allocations? Where do you see that? As far as I can see, > > Peter's patch set allows allocations to fail exactly where the user has > >

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-08 Thread Daniel Phillips
On 8/7/07, Christoph Lameter [EMAIL PROTECTED] wrote: AFAICT: This patchset is not throttling processes but failing allocations. Failing allocations? Where do you see that? As far as I can see, Peter's patch set allows allocations to fail exactly where the user has always specified

Re: Distributed storage.

2007-08-07 Thread Daniel Phillips
On Tuesday 07 August 2007 05:05, Jens Axboe wrote: > On Sun, Aug 05 2007, Daniel Phillips wrote: > > A simple way to solve the stable accounting field issue is to add a > > new pointer to struct bio that is owned by the top level submitter > > (normally generic_make_r

Re: [ck] Re: Linus 2.6.23-rc1

2007-08-07 Thread Daniel Phillips
On Saturday 28 July 2007 14:06, Diego Calleja wrote: > El Sat, 28 Jul 2007 13:07:05 -0700, Bill Huey (hui) escribió: > The main problem is clearly that no scheduler was clearly better than > the other. This remembers me of the LVM2/MD vs EVMS in the 2.5 days - > both of them were good enought, but

Re: [ck] Re: Linus 2.6.23-rc1

2007-08-07 Thread Daniel Phillips
On Saturday 28 July 2007 14:06, Diego Calleja wrote: El Sat, 28 Jul 2007 13:07:05 -0700, Bill Huey (hui) escribió: The main problem is clearly that no scheduler was clearly better than the other. This remembers me of the LVM2/MD vs EVMS in the 2.5 days - both of them were good enought, but

Re: Distributed storage.

2007-08-07 Thread Daniel Phillips
On Tuesday 07 August 2007 05:05, Jens Axboe wrote: On Sun, Aug 05 2007, Daniel Phillips wrote: A simple way to solve the stable accounting field issue is to add a new pointer to struct bio that is owned by the top level submitter (normally generic_make_request but not always

Re: [PATCH 00/10] foundations for reserve-based allocation

2007-08-06 Thread Daniel Phillips
Hi Matt, On Monday 06 August 2007 13:23, Matt Mackall wrote: > On Mon, Aug 06, 2007 at 12:29:22PM +0200, Peter Zijlstra wrote: > > In the interrest of getting swap over network working and posting > > in smaller series, here is the first series. > > > > This series lays the foundations needed to

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 16:14, Christoph Lameter wrote: > On Mon, 6 Aug 2007, Daniel Phillips wrote: > > Correct. That is what the throttling part of these patches is > > about. > > Where are those patches? Here is one user: http://zumastor.googlecode.com/svn/trunk/ddsn

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 13:27, Andrew Morton wrote: > On Mon, 6 Aug 2007 13:19:26 -0700 (PDT) Christoph Lameter wrote: > > The solution may be as simple as configuring the reserves right and > > avoid the unbounded memory allocations. That is possible if one > > would make sure that the network

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 14:05, Christoph Lameter wrote: > > > That is possible if one > > > would make sure that the network layer triggers reclaim once in a > > > while. > > > > This does not make sense, we cannot reclaim from reclaim. > > But we should limit the amounts of allocation we do

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
(What Peter already wrote, but in different words) On Monday 06 August 2007 13:19, Christoph Lameter wrote: > The solution may be as simple as configuring the reserves right and > avoid the unbounded memory allocations. Exactly. That is what this patch set is about. This is the part that

Re: [PATCH 00/10] foundations for reserve-based allocation

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 12:36, Peter Zijlstra wrote: > On Mon, 2007-08-06 at 12:31 -0700, Daniel Phillips wrote: > > On Monday 06 August 2007 11:17, Peter Zijlstra wrote: > > > And how do we know a page was taken out of the reserves? > > > > Why not return that in t

Re: [PATCH 00/10] foundations for reserve-based allocation

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:17, Peter Zijlstra wrote: > And how do we know a page was taken out of the reserves? Why not return that in the low bit of the page address? This is a little more cache efficient, does not leave that odd footprint in the page union and forces the caller to examine

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:51, Christoph Lameter wrote: > On Mon, 6 Aug 2007, Daniel Phillips wrote: > > On Monday 06 August 2007 11:42, Christoph Lameter wrote: > > > On Mon, 6 Aug 2007, Daniel Phillips wrote: > > > > Currently your system likely woul

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:42, Christoph Lameter wrote: > On Mon, 6 Aug 2007, Daniel Phillips wrote: > > Currently your system likely would have died here, so ending up > > with a reserve page temporarily on the wrong node is already an > > improvement. > > The

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:31, Peter Zijlstra wrote: > > I agree that the reserve pool should be per-node in the end, but I > > do not think that serves the interest of simplifying the initial > > patch set. How about a numa performance patch that adds onto the > > end of Peter's series? > >

Re: [PATCH 00/10] foundations for reserve-based allocation

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:17, Peter Zijlstra wrote: > lim_{n -> inf} (2^(n+1)/((2^n)+1)) = 2^lim_{n -> inf} ((n+1)-n) = 2^1 = 2 Glad I asked :-) > > Patch [3/10] adds a new field to struct page. > > No it doesn't. True. It is not immediately obvious from the declaration that the overloaded

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:11, Christoph Lameter wrote: > On Mon, 6 Aug 2007, Peter Zijlstra wrote: > > Change ALLOC_NO_WATERMARK page allocation such that dipping into > > the reserves becomes a system wide event. > > Shudder. That can just be a desaster for NUMA. Both performance wise > and

Re: [PATCH 03/10] mm: tag reseve pages

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:11, Christoph Lameter wrote: > On Mon, 6 Aug 2007, Peter Zijlstra wrote: > > === > > --- linux-2.6-2.orig/include/linux/mm_types.h > > +++ linux-2.6-2/include/linux/mm_types.h > > @@ -60,6 +60,7 @@ struct

Re: [PATCH 00/10] foundations for reserve-based allocation

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 03:29, Peter Zijlstra wrote: > In the interrest of getting swap over network working and posting in > smaller series, here is the first series. > > This series lays the foundations needed to do reserve based > allocation. Traditionally we have used mempools (and others

Re: [PATCH 00/10] foundations for reserve-based allocation

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 03:29, Peter Zijlstra wrote: In the interrest of getting swap over network working and posting in smaller series, here is the first series. This series lays the foundations needed to do reserve based allocation. Traditionally we have used mempools (and others like

Re: [PATCH 03/10] mm: tag reseve pages

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:11, Christoph Lameter wrote: On Mon, 6 Aug 2007, Peter Zijlstra wrote: === --- linux-2.6-2.orig/include/linux/mm_types.h +++ linux-2.6-2/include/linux/mm_types.h @@ -60,6 +60,7 @@ struct page {

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:11, Christoph Lameter wrote: On Mon, 6 Aug 2007, Peter Zijlstra wrote: Change ALLOC_NO_WATERMARK page allocation such that dipping into the reserves becomes a system wide event. Shudder. That can just be a desaster for NUMA. Both performance wise and logic wise.

Re: [PATCH 00/10] foundations for reserve-based allocation

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:17, Peter Zijlstra wrote: lim_{n - inf} (2^(n+1)/((2^n)+1)) = 2^lim_{n - inf} ((n+1)-n) = 2^1 = 2 Glad I asked :-) Patch [3/10] adds a new field to struct page. No it doesn't. True. It is not immediately obvious from the declaration that the overloaded field

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:31, Peter Zijlstra wrote: I agree that the reserve pool should be per-node in the end, but I do not think that serves the interest of simplifying the initial patch set. How about a numa performance patch that adds onto the end of Peter's series? Trouble with

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:42, Christoph Lameter wrote: On Mon, 6 Aug 2007, Daniel Phillips wrote: Currently your system likely would have died here, so ending up with a reserve page temporarily on the wrong node is already an improvement. The system would have died? Why? Because

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:51, Christoph Lameter wrote: On Mon, 6 Aug 2007, Daniel Phillips wrote: On Monday 06 August 2007 11:42, Christoph Lameter wrote: On Mon, 6 Aug 2007, Daniel Phillips wrote: Currently your system likely would have died here, so ending up with a reserve page

Re: [PATCH 00/10] foundations for reserve-based allocation

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 11:17, Peter Zijlstra wrote: And how do we know a page was taken out of the reserves? Why not return that in the low bit of the page address? This is a little more cache efficient, does not leave that odd footprint in the page union and forces the caller to examine

Re: [PATCH 00/10] foundations for reserve-based allocation

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 12:36, Peter Zijlstra wrote: On Mon, 2007-08-06 at 12:31 -0700, Daniel Phillips wrote: On Monday 06 August 2007 11:17, Peter Zijlstra wrote: And how do we know a page was taken out of the reserves? Why not return that in the low bit of the page address

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
(What Peter already wrote, but in different words) On Monday 06 August 2007 13:19, Christoph Lameter wrote: The solution may be as simple as configuring the reserves right and avoid the unbounded memory allocations. Exactly. That is what this patch set is about. This is the part that

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 14:05, Christoph Lameter wrote: That is possible if one would make sure that the network layer triggers reclaim once in a while. This does not make sense, we cannot reclaim from reclaim. But we should limit the amounts of allocation we do while performing

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 13:27, Andrew Morton wrote: On Mon, 6 Aug 2007 13:19:26 -0700 (PDT) Christoph Lameter wrote: The solution may be as simple as configuring the reserves right and avoid the unbounded memory allocations. That is possible if one would make sure that the network layer

Re: [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK

2007-08-06 Thread Daniel Phillips
On Monday 06 August 2007 16:14, Christoph Lameter wrote: On Mon, 6 Aug 2007, Daniel Phillips wrote: Correct. That is what the throttling part of these patches is about. Where are those patches? Here is one user: http://zumastor.googlecode.com/svn/trunk/ddsnap/kernel/dm-ddsnap.c

Re: [PATCH 00/10] foundations for reserve-based allocation

2007-08-06 Thread Daniel Phillips
Hi Matt, On Monday 06 August 2007 13:23, Matt Mackall wrote: On Mon, Aug 06, 2007 at 12:29:22PM +0200, Peter Zijlstra wrote: In the interrest of getting swap over network working and posting in smaller series, here is the first series. This series lays the foundations needed to do

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Sunday 05 August 2007 08:01, Evgeniy Polyakov wrote: > On Sun, Aug 05, 2007 at 01:06:58AM -0700, Daniel Phillips wrote: > > > DST original code worked as device mapper plugin too, but its two > > > additional allocations (io and clone) per block request ended up > >

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Sunday 05 August 2007 08:08, Evgeniy Polyakov wrote: > If we are sleeping in memory pool, then we already do not have memory > to complete previous requests, so we are in trouble. Not at all. Any requests in flight are guaranteed to get the resources they need to complete. This is

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Saturday 04 August 2007 09:44, Evgeniy Polyakov wrote: > > On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote: > > > * storage can be formed on top of remote nodes and be > > > exported simultaneously (iSCSI is peer-to-peer only, NBD requires > > > device mapper and is synchronous) > > >

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Saturday 04 August 2007 09:37, Evgeniy Polyakov wrote: > On Fri, Aug 03, 2007 at 06:19:16PM -0700, I wrote: > > To be sure, I am not very proud of this throttling mechanism for > > various reasons, but the thing is, _any_ throttling mechanism no > > matter how sucky solves the deadlock problem.

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Sunday 05 August 2007 08:08, Evgeniy Polyakov wrote: If we are sleeping in memory pool, then we already do not have memory to complete previous requests, so we are in trouble. Not at all. Any requests in flight are guaranteed to get the resources they need to complete. This is guaranteed

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Sunday 05 August 2007 08:01, Evgeniy Polyakov wrote: On Sun, Aug 05, 2007 at 01:06:58AM -0700, Daniel Phillips wrote: DST original code worked as device mapper plugin too, but its two additional allocations (io and clone) per block request ended up for me as a show stopper. Ah

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Saturday 04 August 2007 09:37, Evgeniy Polyakov wrote: On Fri, Aug 03, 2007 at 06:19:16PM -0700, I wrote: To be sure, I am not very proud of this throttling mechanism for various reasons, but the thing is, _any_ throttling mechanism no matter how sucky solves the deadlock problem. Over

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Saturday 04 August 2007 09:44, Evgeniy Polyakov wrote: On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote: * storage can be formed on top of remote nodes and be exported simultaneously (iSCSI is peer-to-peer only, NBD requires device mapper and is synchronous) In fact, NBD

Re: [ck] Re: Linus 2.6.23-rc1 -- It does not matter whose code gets merged!

2007-08-04 Thread Daniel Phillips
On Thursday 02 August 2007 13:03, Frank Ch. Eigler wrote: > Arjan van de Ven <[EMAIL PROTECTED]> writes: > > [...] > > It does not matter [whose] code gets merged. > > What matters is that the problem gets solved and that the Linux > > kernel innovates forward. > > [...] > > This attitude has

Re: [ck] Re: Linus 2.6.23-rc1 -- It does not matter whose code gets merged!

2007-08-04 Thread Daniel Phillips
On Thursday 02 August 2007 13:03, Frank Ch. Eigler wrote: Arjan van de Ven [EMAIL PROTECTED] writes: [...] It does not matter [whose] code gets merged. What matters is that the problem gets solved and that the Linux kernel innovates forward. [...] This attitude has risks over the long

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 03:26, Evgeniy Polyakov wrote: > On Thu, Aug 02, 2007 at 02:08:24PM -0700, I wrote: > > I see bits that worry me, e.g.: > > > > + req = mempool_alloc(st->w->req_pool, GFP_NOIO); > > > > which seems to be callable in response to a local request, just the > > case

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
Hi Mike, On Thursday 02 August 2007 21:09, Mike Snitzer wrote: > But NBD's synchronous nature is actually an asset when coupled with > MD raid1 as it provides guarantees that the data has _really_ been > mirrored remotely. And bio completion doesn't? Regards, Daniel - To unsubscribe from this

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
Hi Evgeniy, Nit alert: On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote: > * storage can be formed on top of remote nodes and be exported > simultaneously (iSCSI is peer-to-peer only, NBD requires device > mapper and is synchronous) In fact, NBD has nothing to do with

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 07:53, Peter Zijlstra wrote: > On Fri, 2007-08-03 at 17:49 +0400, Evgeniy Polyakov wrote: > > On Fri, Aug 03, 2007 at 02:27:52PM +0200, Peter Zijlstra wrote: > > ...my main position is to > > allocate per socket reserve from socket's queue, and copy data > > there from

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 06:49, Evgeniy Polyakov wrote: > ...rx has global reserve (always allocated on > startup or sometime way before reclaim/oom)where data is originally > received (including skb, shared info and whatever is needed, page is > just an exmaple), then it is copied into per-socket

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 06:49, Evgeniy Polyakov wrote: ...rx has global reserve (always allocated on startup or sometime way before reclaim/oom)where data is originally received (including skb, shared info and whatever is needed, page is just an exmaple), then it is copied into per-socket

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 07:53, Peter Zijlstra wrote: On Fri, 2007-08-03 at 17:49 +0400, Evgeniy Polyakov wrote: On Fri, Aug 03, 2007 at 02:27:52PM +0200, Peter Zijlstra wrote: ...my main position is to allocate per socket reserve from socket's queue, and copy data there from main

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
Hi Evgeniy, Nit alert: On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote: * storage can be formed on top of remote nodes and be exported simultaneously (iSCSI is peer-to-peer only, NBD requires device mapper and is synchronous) In fact, NBD has nothing to do with device

<    1   2   3   4   5   6   7   8   9   10   >