> OK, I'll bite; it's not like I'm getting an answer to
> my other question.
Did I miss one somewhere?
>
> Bill, please explain why deciding what to do about
> sequential scan
> performance in ZFS is urgent?
It's not so much that it's 'urgent' (anyone affected by it simply won't use
ZFS) as t
OK, I'll bite; it's not like I'm getting an answer to my other question.
Bill, please explain why deciding what to do about sequential scan
performance in ZFS is urgent?
ie why it's urgent rather than important (I agree that if it's bad
then it's going to be important eventually).
ie why
...
> This needs to be proven with a reproducible,
> real-world workload before it
> makes sense to try to solve it. After all, if we
> cannot measure where
> we are,
> how can we prove that we've improved?
Ah - Tests & Measurements types: you've just gotta love 'em.
Wife: "Darling, is there
BillTodd wrote:
> In order to be reasonably representative of a real-world
> situation, I'd suggest the following additions:
>
Your suggestions (make the benchmark big enough so seek times are really
noticed) are good. I'm hoping that over the holidays, I'll get to play
with an extra server...
In order to be reasonably representative of a real-world situation, I'd suggest
the following additions:
> 1) create a large file (bigger than main memory) on
> an empty ZFS pool.
1a. The pool should include entire disks, not small partitions (else seeks
will be artificially short).
1b. The
Moore, Joe writes:
> Louwtjie Burger wrote:
> > Richard Elling wrote:
> > >
> > > >- COW probably makes that conflict worse
> > > >
> > > >
> > >
> > > This needs to be proven with a reproducible, real-world
> > workload before it
> > > makes sense to try to solve it. After all, if
...
just rearrange your blocks sensibly -
> and to at least some degree you could do that while
> they're still cache-resident
Lots of discussion has passed under the bridge since that observation above,
but it may have contained the core of a virtually free solution: let your
table become fr
On Tue, 20 Nov 2007, Ross wrote:
>>> doing these writes now sounds like a
>>> lot of work. I'm guessing that needing two full-path
>>> updates to achieve this means you're talking about a
>>> much greater write penalty.
>>
>> Not all that much. Each full-path update is still
>> only a single wri
On Nov 20, 2007 5:33 PM, can you guess? <[EMAIL PROTECTED]> wrote:
> > But the whole point of snapshots is that they don't
> > take up extra space on the disk. If a file (and
> > hence a block) is in every snapshot it doesn't mean
> > you've got multiple copies of it. You only have one
> > copy o
> But the whole point of snapshots is that they don't
> take up extra space on the disk. If a file (and
> hence a block) is in every snapshot it doesn't mean
> you've got multiple copies of it. You only have one
> copy of that block, it's just referenced by many
> snapshots.
I used the wording "
But the whole point of snapshots is that they don't take up extra space on the
disk. If a file (and hence a block) is in every snapshot it doesn't mean
you've got multiple copies of it. You only have one copy of that block, it's
just referenced by many snapshots.
The thing is, the location of
On Nov 19, 2007 10:08 PM, Richard Elling <[EMAIL PROTECTED]> wrote:
> James Cone wrote:
> > Hello All,
> >
> > Here's a possibly-silly proposal from a non-expert.
> >
> > Summarising the problem:
> >- there's a conflict between small ZFS record size, for good random
> > update performance, and
Rats - I was right the first time: there's a messy problem with snapshots.
The problem is that the parent of the child that you're about to update in
place may *already* be in one or more snapshots because one or more of its
*other* children was updated since each snapshot was created. If so,
> > doing these writes now sounds like a
> > lot of work. I'm guessing that needing two full-path
> > updates to achieve this means you're talking about a
> > much greater write penalty.
>
> Not all that much. Each full-path update is still
> only a single write request to the disk, since all
>
Louwtjie Burger wrote:
> Richard Elling wrote:
> >
> > >- COW probably makes that conflict worse
> > >
> > >
> >
> > This needs to be proven with a reproducible, real-world
> workload before it
> > makes sense to try to solve it. After all, if we cannot
> measure where
> > we are,
> > how ca
...
> With regards sharing the disk resources with other
> programs, obviously it's down to the individual
> admins how they would configure this,
Only if they have an unconstrained budget.
but I would
> suggest that if you have a database with heavy enough
> requirements to be suffering notica
Hmm... that's a pain if updating the parent also means updating the parent's
checksum too. I guess the functionality is there for moving bad blocks, but
since that's likely to be a rare occurence, it wasn't something that would need
to be particularly efficient.
With regards sharing the disk r
...
> My understanding of ZFS (in short: an upside down
> tree) is that each block is referenced by it's
> parent. So regardless of how many snapshots you take,
> each block is only ever referenced by one other, and
> I'm guessing that the pointer and checksum are both
> stored there.
>
> If that
In that case, this may be a much tougher nut to crack than I thought.
I'll be the first to admit that other than having seen a few presentations I
don't have a clue about the details of how ZFS works under the hood, however...
You mention that moving the old block means updating all it's ancesto
...
> - Nathan appears to have suggested a good workaround.
> Could ZFS be updated to have a 'contiguous' setting
> where blocks are kept together. This sacrifices
> write performance for read.
I had originally thought that this would be incompatible with ZFS's snapshot
mechanism, but with a m
My initial thought was that this whole thread may be irrelevant - anybody
wanting to run such a database is likely to use a specialised filesystem
optimised for it. But then I realised that for a database admin the integrity
checking and other benefits of ZFS would be very tempting, but only if
Regardless of the merit of the rest of your proposal, I think you have put your
finger on the core of the problem: aside from some apparent reluctance on the
part of some of the ZFS developers to believe that any problem exists here at
all (and leaving aside the additional monkey wrench that us
>
> Poor sequential read performance has not been quantified.
>
> >- COW probably makes that conflict worse
> >
> >
>
> This needs to be proven with a reproducible, real-world workload before it
> makes sense to try to solve it. After all, if we cannot measure where
> we are,
> how can we prov
James Cone wrote:
> Hello All,
>
> Here's a possibly-silly proposal from a non-expert.
>
> Summarising the problem:
>- there's a conflict between small ZFS record size, for good random
> update performance, and large ZFS record size for good sequential read
> performance
>
Poor sequential
Hello All,
Here's a possibly-silly proposal from a non-expert.
Summarising the problem:
- there's a conflict between small ZFS record size, for good random
update performance, and large ZFS record size for good sequential read
performance
- COW probably makes that conflict worse
- re
...
currently what it
> does is to
> maintain files subject to small random writes
> contiguous to
> the level of the zfs recordsize. Now after a
> significant
> run of random writes the files ends up with a
> scattered
> n-disk layout. This should work well for the
> transaction
>
Anton B. Rang writes:
> > When you have a striped storage device under a
> > file system, then the database or file system's view
> > of contiguous data is not contiguous on the media.
>
> Right. That's a good reason to use fairly large stripes. (The
> primary limiting factor for stripe s
...
> I personally believe that since most people will have
> hardware LUN's
> (with underlying RAID) and cache, it will be
> difficult to notice
> anything. Given that those hardware LUN's might be
> busy with their own
> wizardry ;) You will also have to minimize the effect
> of the database
> c
Anton B. Rang wrote:
>> There are many different ways to place the data on the media and we would
>> typically
>> strive for a diverse stochastic spread.
>>
>
> Err ... why?
>
> A random distribution makes reasonable sense if you assume that future read
> requests are independent, or that th
...
> For modern disks, media bandwidths are now getting to
> be > 100 MBytes/s.
> If you need 500 MBytes/s of sequential read, you'll
> never get it from
> one disk.
And no one here even came remotely close to suggesting that you should try to.
> You can get it from multiple disks, so the ques
Richard Elling wrote:
...
>>> there are
>>> really two very different configurations used to
>>> address different
>>> performance requirements: cheap and fast. It seems
>>> that when most
>>> people first consider this problem, they do so from
>>> the cheap
>>> perspective: single disk view. A
> can you guess? wrote:
> >> For very read intensive and position sensitive
> >> applications, I guess
> >> this sort of capability might make a difference?
> >
> > No question about it. And sequential table scans
> in databases
> > are among the most significant examples, because
> (unlike thi
> We are all anxiously awaiting data...
> -- richard
Would it be worthwhile to build a test case:
- Build a postgresql database and import 1 000 000 (or more) lines of data.
- Run a single and multiple large table scan queries ... and watch the system
then,
- Update a column of each row in th
> When you have a striped storage device under a
> file system, then the database or file system's view
> of contiguous data is not contiguous on the media.
Right. That's a good reason to use fairly large stripes. (The primary
limiting factor for stripe size is efficient parallel access; using
...
But every block so rearranged
> (and every tree ancestor of each such block) would
> then leave an equal-sized residue in the most recent
> snapshot if one existed, which gets expensive fast in
> terms of snapshot space overhead (which then is
> proportional to the amount of reorganization
>
> Nathan Kroenert wrote:
...
What if it did a double update: One to a
> staged area, and another
> > immediately after that to the 'old' data blocks.
> Still always have
> > on-disk consistency etc, at a cost of double the
> I/O's...
>
> This is a non-starter. Two I/Os is worse than one.
We
can you guess? wrote:
>> can you guess? wrote:
>>
For very read intensive and position sensitive
applications, I guess
this sort of capability might make a difference?
>>> No question about it. And sequential table scans
>>>
>> in databases
>>
>>> ar
can you guess? wrote:
>> For very read intensive and position sensitive
>> applications, I guess
>> this sort of capability might make a difference?
>
> No question about it. And sequential table scans in databases
> are among the most significant examples, because (unlike things
> like stream
> This question triggered some silly questions in my
> mind:
Actually, they're not silly at all.
>
> Lots of folks are determined that the whole COW to
> different locations
> are a Bad Thing(tm), and in some cases, I guess it
> might actually be...
>
> What if ZFS had a pool / filesystem prop
Nathan Kroenert wrote:
> This question triggered some silly questions in my mind:
>
> Lots of folks are determined that the whole COW to different locations
> are a Bad Thing(tm), and in some cases, I guess it might actually be...
There is a lot of speculation about this, but no real data.
I've
This question triggered some silly questions in my mind:
Lots of folks are determined that the whole COW to different locations
are a Bad Thing(tm), and in some cases, I guess it might actually be...
What if ZFS had a pool / filesystem property that caused zfs to do a
journaled, but non-COW upd
Louwtjie Burger writes:
> Hi
>
> After a clean database load a database would (should?) look like this,
> if a random stab at the data is taken...
>
> [8KB-m][8KB-n][8KB-o][8KB-p]...
>
> The data should be fairly (100%) sequential in layout ... after some
> days though that same spot (
Hi
After a clean database load a database would (should?) look like this,
if a random stab at the data is taken...
[8KB-m][8KB-n][8KB-o][8KB-p]...
The data should be fairly (100%) sequential in layout ... after some
days though that same spot (using ZFS) would problably look like:
[8KB-m][ ][
43 matches
Mail list logo