Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-22 Thread can you guess?
> OK, I'll bite; it's not like I'm getting an answer to > my other question. Did I miss one somewhere? > > Bill, please explain why deciding what to do about > sequential scan > performance in ZFS is urgent? It's not so much that it's 'urgent' (anyone affected by it simply won't use ZFS) as t

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-21 Thread James Cone
OK, I'll bite; it's not like I'm getting an answer to my other question. Bill, please explain why deciding what to do about sequential scan performance in ZFS is urgent? ie why it's urgent rather than important (I agree that if it's bad then it's going to be important eventually). ie why

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-21 Thread can you guess?
... > This needs to be proven with a reproducible, > real-world workload before it > makes sense to try to solve it. After all, if we > cannot measure where > we are, > how can we prove that we've improved? Ah - Tests & Measurements types: you've just gotta love 'em. Wife: "Darling, is there

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-21 Thread Moore, Joe
BillTodd wrote: > In order to be reasonably representative of a real-world > situation, I'd suggest the following additions: > Your suggestions (make the benchmark big enough so seek times are really noticed) are good. I'm hoping that over the holidays, I'll get to play with an extra server...

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-21 Thread can you guess?
In order to be reasonably representative of a real-world situation, I'd suggest the following additions: > 1) create a large file (bigger than main memory) on > an empty ZFS pool. 1a. The pool should include entire disks, not small partitions (else seeks will be artificially short). 1b. The

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-21 Thread Roch - PAE
Moore, Joe writes: > Louwtjie Burger wrote: > > Richard Elling wrote: > > > > > > >- COW probably makes that conflict worse > > > > > > > > > > > > > > This needs to be proven with a reproducible, real-world > > workload before it > > > makes sense to try to solve it. After all, if

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread can you guess?
... just rearrange your blocks sensibly - > and to at least some degree you could do that while > they're still cache-resident Lots of discussion has passed under the bridge since that observation above, but it may have contained the core of a virtually free solution: let your table become fr

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread Al Hopper
On Tue, 20 Nov 2007, Ross wrote: >>> doing these writes now sounds like a >>> lot of work. I'm guessing that needing two full-path >>> updates to achieve this means you're talking about a >>> much greater write penalty. >> >> Not all that much. Each full-path update is still >> only a single wri

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread Will Murnane
On Nov 20, 2007 5:33 PM, can you guess? <[EMAIL PROTECTED]> wrote: > > But the whole point of snapshots is that they don't > > take up extra space on the disk. If a file (and > > hence a block) is in every snapshot it doesn't mean > > you've got multiple copies of it. You only have one > > copy o

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread can you guess?
> But the whole point of snapshots is that they don't > take up extra space on the disk. If a file (and > hence a block) is in every snapshot it doesn't mean > you've got multiple copies of it. You only have one > copy of that block, it's just referenced by many > snapshots. I used the wording "

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread Ross
But the whole point of snapshots is that they don't take up extra space on the disk. If a file (and hence a block) is in every snapshot it doesn't mean you've got multiple copies of it. You only have one copy of that block, it's just referenced by many snapshots. The thing is, the location of

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread Chris Csanady
On Nov 19, 2007 10:08 PM, Richard Elling <[EMAIL PROTECTED]> wrote: > James Cone wrote: > > Hello All, > > > > Here's a possibly-silly proposal from a non-expert. > > > > Summarising the problem: > >- there's a conflict between small ZFS record size, for good random > > update performance, and

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread can you guess?
Rats - I was right the first time: there's a messy problem with snapshots. The problem is that the parent of the child that you're about to update in place may *already* be in one or more snapshots because one or more of its *other* children was updated since each snapshot was created. If so,

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread Ross
> > doing these writes now sounds like a > > lot of work. I'm guessing that needing two full-path > > updates to achieve this means you're talking about a > > much greater write penalty. > > Not all that much. Each full-path update is still > only a single write request to the disk, since all >

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread Moore, Joe
Louwtjie Burger wrote: > Richard Elling wrote: > > > > >- COW probably makes that conflict worse > > > > > > > > > > This needs to be proven with a reproducible, real-world > workload before it > > makes sense to try to solve it. After all, if we cannot > measure where > > we are, > > how ca

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread can you guess?
... > With regards sharing the disk resources with other > programs, obviously it's down to the individual > admins how they would configure this, Only if they have an unconstrained budget. but I would > suggest that if you have a database with heavy enough > requirements to be suffering notica

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread Ross
Hmm... that's a pain if updating the parent also means updating the parent's checksum too. I guess the functionality is there for moving bad blocks, but since that's likely to be a rare occurence, it wasn't something that would need to be particularly efficient. With regards sharing the disk r

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread can you guess?
... > My understanding of ZFS (in short: an upside down > tree) is that each block is referenced by it's > parent. So regardless of how many snapshots you take, > each block is only ever referenced by one other, and > I'm guessing that the pointer and checksum are both > stored there. > > If that

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread Ross
In that case, this may be a much tougher nut to crack than I thought. I'll be the first to admit that other than having seen a few presentations I don't have a clue about the details of how ZFS works under the hood, however... You mention that moving the old block means updating all it's ancesto

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread can you guess?
... > - Nathan appears to have suggested a good workaround. > Could ZFS be updated to have a 'contiguous' setting > where blocks are kept together. This sacrifices > write performance for read. I had originally thought that this would be incompatible with ZFS's snapshot mechanism, but with a m

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-20 Thread Ross
My initial thought was that this whole thread may be irrelevant - anybody wanting to run such a database is likely to use a specialised filesystem optimised for it. But then I realised that for a database admin the integrity checking and other benefits of ZFS would be very tempting, but only if

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-19 Thread can you guess?
Regardless of the merit of the rest of your proposal, I think you have put your finger on the core of the problem: aside from some apparent reluctance on the part of some of the ZFS developers to believe that any problem exists here at all (and leaving aside the additional monkey wrench that us

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-19 Thread Louwtjie Burger
> > Poor sequential read performance has not been quantified. > > >- COW probably makes that conflict worse > > > > > > This needs to be proven with a reproducible, real-world workload before it > makes sense to try to solve it. After all, if we cannot measure where > we are, > how can we prov

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-19 Thread Richard Elling
James Cone wrote: > Hello All, > > Here's a possibly-silly proposal from a non-expert. > > Summarising the problem: >- there's a conflict between small ZFS record size, for good random > update performance, and large ZFS record size for good sequential read > performance > Poor sequential

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-19 Thread James Cone
Hello All, Here's a possibly-silly proposal from a non-expert. Summarising the problem: - there's a conflict between small ZFS record size, for good random update performance, and large ZFS record size for good sequential read performance - COW probably makes that conflict worse - re

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-19 Thread can you guess?
... currently what it > does is to > maintain files subject to small random writes > contiguous to > the level of the zfs recordsize. Now after a > significant > run of random writes the files ends up with a > scattered > n-disk layout. This should work well for the > transaction >

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-19 Thread Roch - PAE
Anton B. Rang writes: > > When you have a striped storage device under a > > file system, then the database or file system's view > > of contiguous data is not contiguous on the media. > > Right. That's a good reason to use fairly large stripes. (The > primary limiting factor for stripe s

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-16 Thread can you guess?
... > I personally believe that since most people will have > hardware LUN's > (with underlying RAID) and cache, it will be > difficult to notice > anything. Given that those hardware LUN's might be > busy with their own > wizardry ;) You will also have to minimize the effect > of the database > c

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-15 Thread Richard Elling
Anton B. Rang wrote: >> There are many different ways to place the data on the media and we would >> typically >> strive for a diverse stochastic spread. >> > > Err ... why? > > A random distribution makes reasonable sense if you assume that future read > requests are independent, or that th

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-15 Thread can you guess?
... > For modern disks, media bandwidths are now getting to > be > 100 MBytes/s. > If you need 500 MBytes/s of sequential read, you'll > never get it from > one disk. And no one here even came remotely close to suggesting that you should try to. > You can get it from multiple disks, so the ques

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-15 Thread can you guess?
Richard Elling wrote: ... >>> there are >>> really two very different configurations used to >>> address different >>> performance requirements: cheap and fast. It seems >>> that when most >>> people first consider this problem, they do so from >>> the cheap >>> perspective: single disk view. A

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-15 Thread can you guess?
> can you guess? wrote: > >> For very read intensive and position sensitive > >> applications, I guess > >> this sort of capability might make a difference? > > > > No question about it. And sequential table scans > in databases > > are among the most significant examples, because > (unlike thi

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-15 Thread Louwtjie Burger
> We are all anxiously awaiting data... > -- richard Would it be worthwhile to build a test case: - Build a postgresql database and import 1 000 000 (or more) lines of data. - Run a single and multiple large table scan queries ... and watch the system then, - Update a column of each row in th

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-15 Thread Anton B. Rang
> When you have a striped storage device under a > file system, then the database or file system's view > of contiguous data is not contiguous on the media. Right. That's a good reason to use fairly large stripes. (The primary limiting factor for stripe size is efficient parallel access; using

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-14 Thread can you guess?
... But every block so rearranged > (and every tree ancestor of each such block) would > then leave an equal-sized residue in the most recent > snapshot if one existed, which gets expensive fast in > terms of snapshot space overhead (which then is > proportional to the amount of reorganization >

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-14 Thread can you guess?
> Nathan Kroenert wrote: ... What if it did a double update: One to a > staged area, and another > > immediately after that to the 'old' data blocks. > Still always have > > on-disk consistency etc, at a cost of double the > I/O's... > > This is a non-starter. Two I/Os is worse than one. We

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-14 Thread Richard Elling
can you guess? wrote: >> can you guess? wrote: >> For very read intensive and position sensitive applications, I guess this sort of capability might make a difference? >>> No question about it. And sequential table scans >>> >> in databases >> >>> ar

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-14 Thread Richard Elling
can you guess? wrote: >> For very read intensive and position sensitive >> applications, I guess >> this sort of capability might make a difference? > > No question about it. And sequential table scans in databases > are among the most significant examples, because (unlike things > like stream

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-14 Thread can you guess?
> This question triggered some silly questions in my > mind: Actually, they're not silly at all. > > Lots of folks are determined that the whole COW to > different locations > are a Bad Thing(tm), and in some cases, I guess it > might actually be... > > What if ZFS had a pool / filesystem prop

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-13 Thread Richard Elling
Nathan Kroenert wrote: > This question triggered some silly questions in my mind: > > Lots of folks are determined that the whole COW to different locations > are a Bad Thing(tm), and in some cases, I guess it might actually be... There is a lot of speculation about this, but no real data. I've

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-13 Thread Nathan Kroenert
This question triggered some silly questions in my mind: Lots of folks are determined that the whole COW to different locations are a Bad Thing(tm), and in some cases, I guess it might actually be... What if ZFS had a pool / filesystem property that caused zfs to do a journaled, but non-COW upd

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-13 Thread Roch - PAE
Louwtjie Burger writes: > Hi > > After a clean database load a database would (should?) look like this, > if a random stab at the data is taken... > > [8KB-m][8KB-n][8KB-o][8KB-p]... > > The data should be fairly (100%) sequential in layout ... after some > days though that same spot (

[zfs-discuss] ZFS + DB + "fragments"

2007-11-12 Thread Louwtjie Burger
Hi After a clean database load a database would (should?) look like this, if a random stab at the data is taken... [8KB-m][8KB-n][8KB-o][8KB-p]... The data should be fairly (100%) sequential in layout ... after some days though that same spot (using ZFS) would problably look like: [8KB-m][ ][