So are whole pages stored in rollback segments or just
the modified data?
This is implementation dependent. Storing whole pages is
much easy to do, but obviously it's better to store just
modified data.
I am not sure it is necessarily better. Seems to be a tradeoff
You mean it is restored in session that is running the transaction ?
Depends on what you mean with restored. It first reads the heap page,
sees that it needs an older version and thus reads it from the rollback
segment.
So are whole pages stored in rollback segments or just
So are whole pages stored in rollback segments or just
the modified data?
This is implementation dependent. Storing whole pages is
much easy to do, but obviously it's better to store just
modified data.
I am not sure it is necessarily better. Seems to be a tradeoff here.
pros
You mean it is restored in session that is running the transaction ?
Depends on what you mean with restored. It first reads the heap page,
sees that it needs an older version and thus reads it from the rollback segment.
I guess thet it could be slower than our current way of doing it.
You mean it is restored in session that is running the transaction ?
Depends on what you mean with restored. It first reads the heap page,
sees that it needs an older version and thus reads it from the rollback segment.
So are whole pages stored in rollback segments or just the
Impractical ? Oracle does it.
Oracle has MVCC?
With restrictions, yes.
What restrictions? Rollback segments size?
No, that is not the whole story. The problem with their rollback segment approach is,
that they do not guard against overwriting a tuple version in the rollback
- A simple typo in psql can currently cause a forced
rollback of the entire TX. UNDO should avoid this.
Yes, I forgot to mention this very big advantage, but undo is
not the only possible way to implement savepoints. Solutions
using CommandCounter have been discussed.
This would be
The downside would only be, that long running txn's cannot
[easily] rollback to savepoint.
We should implement savepoints for all or none transactions, no?
We should not limit transaction size to online available disk space for WAL.
Imho that is much more important. With guaranteed undo
If community will not like UNDO then I'll probably try to implement
Imho UNDO would be great under the following circumstances:
1. The undo is only registered for some background work process
and not done in the client's backend (or only if it is a small txn).
2.
People also have referred to an overwriting smgr
easily. Please tell me how to introduce an overwriting smgr
without UNDO.
There is no way. Although undo for an overwriting smgr would involve a
very different approach than with non-overwriting. See Vadim's post about what
info suffices
If community will not like UNDO then I'll probably try to implement
dead space collector which will read log files and so on.
I'd vote for UNDO; in terms of usability friendliness it's a big win.
Could you please try it a little more verbose ? I am very interested in
the advantages you
At 11:25 23/05/01 +0200, Zeugswetter Andreas SB wrote:
If community will not like UNDO then I'll probably try to implement
dead space collector which will read log files and so on.
I'd vote for UNDO; in terms of usability friendliness it's a big win.
Could you please try it a little more
- A simple typo in psql can currently cause a forced rollback of the entire
TX. UNDO should avoid this.
Yes, I forgot to mention this very big advantage, but undo is not the only possible
way
to implement savepoints. Solutions using CommandCounter have been discussed.
Although the pg_log
Todo:
1. Compact log files after checkpoint (save records of uncommitted
transactions and remove/archive others).
On the grounds that undo is not guaranteed anyway (concurrent heap access),
why not simply forget it, since above sounds rather expensive ?
The downside would only be, that
As a rule of thumb, online applications that hold open
transactions during user interaction are considered to be
Broken By Design (tm). So I'd slap the programmer/design
team with - let's use the server box since it doesn't contain
anything useful.
We
Correct me if I am wrong, but both cases do present a problem currently
in 7.1. The WAL log will not remove any WAL files for transactions that
are still open (even after a checkpoint occurs). Thus if you do a bulk
insert of gigabyte size you will require a gigabyte sized WAL
REDO in oracle is done by something known as a 'rollback segment'.
You are not seriously saying that you like the rollback segments in Oracle.
They only cause trouble:
1. configuration (for every different workload you need a different config)
2. snapshot too old
3. tx abort because
Vadim, can you remind me what UNDO is used for?
4. Split pg_log into small files with ability to remove old ones (which
do not hold statuses for any running transactions).
They are already small (16Mb). Or do you mean even smaller ?
This imposes one huge risk, that is already a pain in
Vadim, can you remind me what UNDO is used for?
4. Split pg_log into small files with ability to remove old ones (which
do not hold statuses for any running transactions).
and I wrote:
They are already small (16Mb). Or do you mean even smaller ?
Sorry for above little confusion of
Really?! Once again: WAL records give you *physical* address of tuples
(both heap and index ones!) to be removed and size of log to read
records from is not comparable with size of data files.
So how about a background vacuum like process, that reads the WAL
and does the cleanup ? Seems that
Would it be possible to split the WAL traffic into two sets of files,
Sure, downside is two fsyncs :-( When I first suggested physical log
I had a separate file in mind, but that is imho only a small issue.
Of course people with more than 3 disks could benefit from a split.
Tom: If your
Zeugswetter Andreas SB [EMAIL PROTECTED] writes:
Tom: If your ratio of physical pages vs WAL records is so bad, the config
should simply be changes to do fewer checkpoints (say every 20 min like a
typical Informix setup).
I was using the default configuration. What caused the problem was
My point is that we'll need in dynamic cleanup anyway and UNDO is
what should be implemented for dynamic cleanup of aborted changes.
I do not yet understand why you want to handle aborts different than outdated
tuples. The ratio in a well tuned system should well favor outdated tuples.
If
Tom: If your ratio of physical pages vs WAL records is so bad, the config
should simply be changes to do fewer checkpoints (say every 20 min like a
typical Informix setup).
I was using the default configuration. What caused the problem was
probably not so much the standard 5-minute
Correct me if I am wrong, but both cases do present a problem
currently in 7.1. The WAL log will not remove any WAL files
for transactions that are still open (even after a checkpoint
occurs). Thus if you do a bulk insert of gigabyte size you will
require a gigabyte sized WAL directory.
Isn't current implementation bulk delete ?
No, the index AM is called separately for each index tuple to be
deleted; more to the point, the search for deletable index tuples
should be moved inside the index AM for performance reasons.
Wouldn't a sequential scan on the heap table be
Zeugswetter Andreas SB [EMAIL PROTECTED] writes:
foreach tuple in heap that can be deleted do:
foreach index
call the current index delete with constructed key and xtid
See discussion with Hiroshi. This is much more complex than TID-based
delete and would be faster only
A particular point worth making is that in the common case where you've
updated the same row N times (without changing its index key), the above
approach has O(N^2) runtime. The indexscan will find all N index tuples
matching the key ... only one of which is the one you are looking for on
Zeugswetter Andreas SB [EMAIL PROTECTED] writes:
It was my understanding, that the heap xtid is part of the key now,
It is not.
There was some discussion of doing that, but it fell down on the little
problem that in normal index-search cases you *don't* know the heap tid
you are looking for.
29 matches
Mail list logo