Re: [HACKERS] cheaper snapshots redux

2011-09-13 Thread Robert Haas
On Tue, Sep 13, 2011 at 7:49 AM, Amit Kapila amit.kap...@huawei.com wrote: Yep, that's pretty much what it does, although xmax is actually defined as the XID *following* the last one that ended, and I think xmin needs to also be in xip, so in this case you'd actually end up with xmin = 15, xmax =

Re: [HACKERS] cheaper snapshots redux

2011-09-13 Thread Amit Kapila
: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] cheaper snapshots redux On Mon, Sep 12, 2011 at 11:07 AM, Amit Kapila amit.kap...@huawei.com wrote: If you know what transactions were running the last time a snapshot summary was written and what transactions have ended since then, you can work

Re: [HACKERS] cheaper snapshots redux

2011-09-12 Thread Robert Haas
On Sun, Sep 11, 2011 at 11:08 PM, Amit Kapila amit.kap...@huawei.com wrote:   In the approach mentioned in your idea, it mentioned that once after taking snapshot, only committed XIDs will be updated and sometimes snapshot itself.   So when the xmin will be updated according to your idea as

Re: [HACKERS] cheaper snapshots redux

2011-09-12 Thread Robert Haas
On Mon, Sep 12, 2011 at 11:07 AM, Amit Kapila amit.kap...@huawei.com wrote: If you know what transactions were running the last time a snapshot summary was written and what transactions have ended since then, you can work out the new xmin on the fly.  I have working code for this and it's

Re: [HACKERS] cheaper snapshots redux

2011-09-12 Thread Amit Kapila
- From: Robert Haas [mailto:robertmh...@gmail.com] Sent: Monday, September 12, 2011 7:39 PM To: Amit Kapila Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] cheaper snapshots redux On Sun, Sep 11, 2011 at 11:08 PM, Amit Kapila amit.kap...@huawei.com wrote: In the approach mentioned

Re: [HACKERS] cheaper snapshots redux

2011-09-12 Thread Amit Kapila
it! -Original Message- From: Robert Haas [mailto:robertmh...@gmail.com] Sent: Thursday, September 08, 2011 7:50 PM To: Amit Kapila Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] cheaper snapshots redux On Tue, Sep 6, 2011 at 11:06 PM, Amit Kapila amit.kap...@huawei.com wrote: 1

Re: [HACKERS] cheaper snapshots redux

2011-09-12 Thread Amit kapila
4. Won't it effect if we don't update xmin everytime and just noting the committed XIDs. The reason I am asking is that it is used in tuple visibility check so with new idea in some cases instead of just returning from begining by checking xmin it has to go through the committed XID

Re: [HACKERS] cheaper snapshots redux

2011-09-08 Thread Robert Haas
On Tue, Sep 6, 2011 at 11:06 PM, Amit Kapila amit.kap...@huawei.com wrote: 1. With the above, you want to reduce/remove the concurrency issue between the GetSnapshotData() [used at begining of sql command execution] and ProcArrayEndTransaction() [used at end transaction]. The concurrency issue

Re: [HACKERS] cheaper snapshots redux

2011-09-07 Thread Amit Kapila
, August 28, 2011 7:17 AM To: Gokulakannan Somasundaram Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] cheaper snapshots redux On Sat, Aug 27, 2011 at 1:38 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: First i respectfully disagree with you on the point of 80MB. I would say

Re: [HACKERS] cheaper snapshots redux

2011-08-28 Thread Gokulakannan Somasundaram
No, I don't think it will all be in memory - but that's part of the performance calculation. If you need to check on the status of an XID and find that you need to read a page of data in from disk, that's going to be many orders of magnitude slower than anything we do with s snapshot now.

Re: [HACKERS] cheaper snapshots redux

2011-08-28 Thread Robert Haas
On Sun, Aug 28, 2011 at 4:33 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: No, I don't think it will all be in memory - but that's part of the performance calculation.  If you need to check on the status of an XID and find that you need to read a page of data in from disk, that's

Re: [HACKERS] cheaper snapshots redux

2011-08-27 Thread Robert Haas
On Sat, Aug 27, 2011 at 1:38 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: First i respectfully disagree with you on the point of 80MB. I would say that its very rare that a small system( with 1 GB RAM ) might have a long running transaction sitting idle, while 10 million transactions

Re: [HACKERS] cheaper snapshots redux

2011-08-26 Thread Robert Haas
On Thu, Aug 25, 2011 at 6:24 PM, Jim Nasby j...@nasby.net wrote: On Aug 25, 2011, at 8:24 AM, Robert Haas wrote: My hope (and it might turn out that I'm an optimist) is that even with a reasonably small buffer it will be very rare for a backend to experience a wraparound condition.  For

Re: [HACKERS] cheaper snapshots redux

2011-08-26 Thread Robert Haas
On Thu, Aug 25, 2011 at 6:29 PM, Jim Nasby j...@nasby.net wrote: Actually, I wasn't thinking about the system dynamically sizing shared memory on it's own... I was only thinking of providing the ability for a user to change something like shared_buffers and allow that change to take effect

Re: [HACKERS] cheaper snapshots redux

2011-08-26 Thread Gokulakannan Somasundaram
On Tue, Aug 23, 2011 at 5:25 AM, Robert Haas robertmh...@gmail.com wrote: I've been giving this quite a bit more thought, and have decided to abandon the scheme described above, at least for now. It has the advantage of avoiding virtually all locking, but it's extremely inefficient in its

Re: [HACKERS] cheaper snapshots redux

2011-08-25 Thread Robert Haas
On Thu, Aug 25, 2011 at 1:55 AM, Markus Wanner mar...@bluegap.ch wrote: One difference with snapshots is that only the latest snapshot is of any interest. Theoretically, yes.  But as far as I understood, you proposed the backends copy that snapshot to local memory.  And copying takes some

Re: [HACKERS] cheaper snapshots redux

2011-08-25 Thread Markus Wanner
Robert, On 08/25/2011 03:24 PM, Robert Haas wrote: My hope (and it might turn out that I'm an optimist) is that even with a reasonably small buffer it will be very rare for a backend to experience a wraparound condition. It certainly seems less likely than with the ring-buffer for imessages,

Re: [HACKERS] cheaper snapshots redux

2011-08-25 Thread Robert Haas
On Thu, Aug 25, 2011 at 10:19 AM, Markus Wanner mar...@bluegap.ch wrote: Note, however, that for imessages, I've also had the policy in place that a backend *must* consume its message before sending any.  And that I took great care for all receivers to consume their messages as early as

Re: [HACKERS] cheaper snapshots redux

2011-08-25 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: Well, one long-running transaction that only has a single XID is not really a problem: the snapshot is still small. But one very old transaction that also happens to have a large number of subtransactions all of which have XIDs assigned might be a

Re: [HACKERS] cheaper snapshots redux

2011-08-25 Thread Markus Wanner
Robert, On 08/25/2011 04:48 PM, Robert Haas wrote: What's a typical message size for imessages? Most message types in Postgres-R are just a couple bytes in size. Others, especially change sets, can be up to 8k. However, I think you'll have an easier job guaranteeing that backends consume their

Re: [HACKERS] cheaper snapshots redux

2011-08-25 Thread Markus Wanner
Tom, On 08/25/2011 04:59 PM, Tom Lane wrote: That's a good point. If the ring buffer size creates a constraint on the maximum number of sub-XIDs per transaction, you're going to need a fallback path of some sort. I think Robert envisions the same fallback path we already have:

Re: [HACKERS] cheaper snapshots redux

2011-08-25 Thread Robert Haas
On Thu, Aug 25, 2011 at 11:15 AM, Markus Wanner mar...@bluegap.ch wrote: On 08/25/2011 04:59 PM, Tom Lane wrote: That's a good point.  If the ring buffer size creates a constraint on the maximum number of sub-XIDs per transaction, you're going to need a fallback path of some sort. I think

Re: [HACKERS] cheaper snapshots redux

2011-08-25 Thread Jim Nasby
On Aug 25, 2011, at 8:24 AM, Robert Haas wrote: My hope (and it might turn out that I'm an optimist) is that even with a reasonably small buffer it will be very rare for a backend to experience a wraparound condition. For example, consider a buffer with ~6500 entries, approximately 64 *

Re: [HACKERS] cheaper snapshots redux

2011-08-25 Thread Jim Nasby
On Aug 22, 2011, at 6:22 PM, Robert Haas wrote: With respect to a general-purpose shared memory allocator, I think that there are cases where that would be useful to have, but I don't think there are as many of them as many people seem to think. I wouldn't choose to implement this using a

Re: [HACKERS] cheaper snapshots redux

2011-08-24 Thread Markus Wanner
Hello Dimitri, On 08/23/2011 06:39 PM, Dimitri Fontaine wrote: I'm far from familiar with the detailed concepts here, but allow me to comment. I have two open questions: - is it possible to use a distributed algorithm to produce XIDs, something like Vector Clocks? Then each

Re: [HACKERS] cheaper snapshots redux

2011-08-24 Thread Markus Wanner
Robert, Jim, thanks for thinking out loud about dynamic allocation of shared memory. Very much appreciated. On 08/23/2011 01:22 AM, Robert Haas wrote: With respect to a general-purpose shared memory allocator, I think that there are cases where that would be useful to have, but I don't think

Re: [HACKERS] cheaper snapshots redux

2011-08-24 Thread Robert Haas
On Wed, Aug 24, 2011 at 4:30 AM, Markus Wanner mar...@bluegap.ch wrote: I'm in respectful disagreement regarding the ring-buffer approach and think that dynamic allocation can actually be more efficient if done properly, because there doesn't need to be head and tail pointers, which might turn

Re: [HACKERS] cheaper snapshots redux

2011-08-24 Thread Markus Wanner
Robert, On 08/25/2011 04:59 AM, Robert Haas wrote: True; although there are some other complications. With a sufficiently sophisticated allocator you can avoid mutex contention when allocating chunks, but then you have to store a pointer to the chunk somewhere or other, and that then

Re: [HACKERS] cheaper snapshots redux

2011-08-23 Thread Simon Riggs
On Mon, Aug 22, 2011 at 10:25 PM, Robert Haas robertmh...@gmail.com wrote: I've been giving this quite a bit more thought, and have decided to abandon the scheme described above, at least for now. I liked your goal of O(1) snapshots and think you should go for that. I didn't realise you were

Re: [HACKERS] cheaper snapshots redux

2011-08-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: With respect to the first problem, what I'm imagining is that we not do a complete rewrite of the snapshot in shared memory on every commit. Instead, when a transaction ends, we'll decide whether to (a) write a new snapshot or (b) just record the XIDs

Re: [HACKERS] cheaper snapshots redux

2011-08-23 Thread Robert Haas
On Tue, Aug 23, 2011 at 12:13 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm a bit concerned that this approach is trying to optimize the heavy contention situation at the cost of actually making things worse anytime that you're not bottlenecked by contention for access to this shared data

Re: [HACKERS] cheaper snapshots redux

2011-08-23 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes: I think the real trick is figuring out a design that can improve concurrency. I'm far from familiar with the detailed concepts here, but allow me to comment. I have two open questions: - is it possible to use a distributed algorithm to produce XIDs,

Re: [HACKERS] cheaper snapshots redux

2011-08-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: That's certainly a fair concern, and it might even be worse than O(n^2). On the other hand, the current approach involves scanning the entire ProcArray for every snapshot, even if nothing has changed and 90% of the backends are sitting around playing

[HACKERS] cheaper snapshots redux

2011-08-22 Thread Robert Haas
On Wed, Jul 27, 2011 at 10:51 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder whether we could do something involving WAL properties --- the current tuple visibility logic was designed before WAL existed, so it's not

Re: [HACKERS] cheaper snapshots redux

2011-08-22 Thread Jim Nasby
On Aug 22, 2011, at 4:25 PM, Robert Haas wrote: What I'm thinking about instead is using a ring buffer with three pointers: a start pointer, a stop pointer, and a write pointer. When a transaction ends, we advance the write pointer, write the XIDs or a whole new snapshot into the buffer, and

Re: [HACKERS] cheaper snapshots redux

2011-08-22 Thread Robert Haas
On Mon, Aug 22, 2011 at 6:45 PM, Jim Nasby j...@nasby.net wrote: Something that would be really nice to fix is our reliance on a fixed size of shared memory, and I'm wondering if this could be an opportunity to start in a new direction. My thought is that we could maintain two distinct shared

Re: [HACKERS] cheaper snapshots

2011-07-30 Thread Simon Riggs
On Thu, Jul 28, 2011 at 8:32 PM, Hannu Krosing ha...@2ndquadrant.com wrote: Maybe this is why other databases don't offer per backend async commit ? Oracle has async commit but very few people know about it. --  Simon Riggs   http://www.2ndQuadrant.com/  PostgreSQL

Re: [HACKERS] cheaper snapshots

2011-07-29 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: (4) We communicate acceptable snapshots to the replica to make the order of visibility visibility match the master even when that doesn't match the order that transactions returned from commit. I (predictably) like (4) -- even though it's a lot

Re: [HACKERS] cheaper snapshots

2011-07-29 Thread Hannu Krosing
On Thu, 2011-07-28 at 20:14 -0400, Robert Haas wrote: On Thu, Jul 28, 2011 at 7:54 PM, Ants Aasma ants.aa...@eesti.ee wrote: On Thu, Jul 28, 2011 at 11:54 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: (4) We communicate acceptable snapshots to the replica to make the order of

Re: [HACKERS] cheaper snapshots

2011-07-29 Thread Robert Haas
On Fri, Jul 29, 2011 at 10:20 AM, Hannu Krosing ha...@2ndquadrant.com wrote: An additional point to think about: if we were willing to insist on streaming replication, we could send the commit sequence numbers via a side channel rather than writing them to WAL, which would be a lot cheaper.

Re: [HACKERS] cheaper snapshots

2011-07-29 Thread Hannu Krosing
On Fri, 2011-07-29 at 10:23 -0400, Robert Haas wrote: On Fri, Jul 29, 2011 at 10:20 AM, Hannu Krosing ha...@2ndquadrant.com wrote: An additional point to think about: if we were willing to insist on streaming replication, we could send the commit sequence numbers via a side channel rather

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Simon Riggs
On Thu, Jul 28, 2011 at 3:51 AM, Robert Haas robertmh...@gmail.com wrote: All that having been said, even if I haven't made any severe conceptual errors in the above, I'm not sure how well it will work in practice.  On the plus side, taking a snapshot becomes O(1) rather than O(MaxBackends) -

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Florian Pflug
On Jul28, 2011, at 04:51 , Robert Haas wrote: One fly in the ointment is that 8-byte stores are apparently done as two 4-byte stores on some platforms. But if the counter runs backward, I think even that is OK. If you happen to read an 8 byte value as it's being written, you'll get 4 bytes

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder whether we could do something involving WAL properties --- the current tuple visibility logic was designed before WAL existed, so it's not exploiting that resource at all. I'm imagining that the kernel of a

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 3:46 AM, Simon Riggs si...@2ndquadrant.com wrote: Sounds like the right set of thoughts to be having. Thanks. If you do this, you must cover subtransactions and Hot Standby. Work in this area takes longer than you think when you take the complexities into account, as

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 4:16 AM, Florian Pflug f...@phlo.org wrote: On Jul28, 2011, at 04:51 , Robert Haas wrote: One fly in the ointment is that 8-byte stores are apparently done as two 4-byte stores on some platforms. But if the counter runs backward, I think even that is OK.  If you happen

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 6:50 AM, Hannu Krosing ha...@2ndquadrant.com wrote: On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder whether we could do something involving WAL properties --- the current tuple visibility logic was designed before WAL existed, so it's

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 09:38 -0400, Robert Haas wrote: On Thu, Jul 28, 2011 at 6:50 AM, Hannu Krosing ha...@2ndquadrant.com wrote: On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder whether we could do something involving WAL properties --- the current tuple

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 10:17 AM, Hannu Krosing ha...@2ndquadrant.com wrote: My hope was, that this contention would be the same than simply writing the WAL buffers currently, and thus largely hidden by the current WAL writing sync mechanisma. It really covers just the part which writes

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Jul 28, 2011 at 10:17 AM, Hannu Krosing ha...@2ndquadrant.com wrote: My hope was, that this contention would be the same than simply writing the WAL buffers currently, and thus largely hidden by the current WAL writing sync mechanisma. It

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 10:23 -0400, Robert Haas wrote: On Thu, Jul 28, 2011 at 10:17 AM, Hannu Krosing ha...@2ndquadrant.com wrote: My hope was, that this contention would be the same than simply writing the WAL buffers currently, and thus largely hidden by the current WAL writing sync

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Tom Lane
Hannu Krosing ha...@2ndquadrant.com writes: On Thu, 2011-07-28 at 10:23 -0400, Robert Haas wrote: I'm confused by this, because I don't think any of this can be done when we insert the commit record into the WAL stream. The update to stored snapshot needs to happen at the moment when the WAL

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 10:33 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jul 28, 2011 at 10:17 AM, Hannu Krosing ha...@2ndquadrant.com wrote: My hope was, that this contention would be the same than simply writing the WAL buffers currently, and

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 10:45 -0400, Tom Lane wrote: Hannu Krosing ha...@2ndquadrant.com writes: On Thu, 2011-07-28 at 10:23 -0400, Robert Haas wrote: I'm confused by this, because I don't think any of this can be done when we insert the commit record into the WAL stream. The update to

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 11:10 AM, Hannu Krosing ha...@2ndquadrant.com wrote: My main point was, that we already do synchronization when writing wal, why not piggyback on this to also update latest snapshot . Well, one problem is that it would break sync rep. Another problem is that pretty much

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 17:10 +0200, Hannu Krosing wrote: On Thu, 2011-07-28 at 10:45 -0400, Tom Lane wrote: Hannu Krosing ha...@2ndquadrant.com writes: On Thu, 2011-07-28 at 10:23 -0400, Robert Haas wrote: I'm confused by this, because I don't think any of this can be done when we

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 11:15 -0400, Robert Haas wrote: On Thu, Jul 28, 2011 at 11:10 AM, Hannu Krosing ha...@2ndquadrant.com wrote: My main point was, that we already do synchronization when writing wal, why not piggyback on this to also update latest snapshot . Well, one problem is that it

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Jul 28, 2011 at 10:33 AM, Tom Lane t...@sss.pgh.pa.us wrote: But should we rethink that? Your point that hot standby transactions on a slave could see snapshots that were impossible on the parent was disturbing. Should we look for a way to

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Wed, 2011-07-27 at 22:51 -0400, Robert Haas wrote: On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder whether we could do something involving WAL properties --- the current tuple visibility logic was designed before WAL existed, so it's not exploiting that

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 11:57 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jul 28, 2011 at 10:33 AM, Tom Lane t...@sss.pgh.pa.us wrote: But should we rethink that? Your point that hot standby transactions on a slave could see snapshots that were impossible on the

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 18:05 +0200, Hannu Krosing wrote: But it is also possible, that you can get logically consistent snapshots by protecting only some ops. for example, if you protect only insert and get snapshot, then the worst that can happen is that you get a snapshot that is a few

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 18:48 +0200, Hannu Krosing wrote: On Thu, 2011-07-28 at 18:05 +0200, Hannu Krosing wrote: But it is also possible, that you can get logically consistent snapshots by protecting only some ops. for example, if you protect only insert and get snapshot, then the worst

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 11:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jul 28, 2011 at 10:33 AM, Tom Lane t...@sss.pgh.pa.us wrote: But should we rethink that?  Your point that hot standby transactions on a slave could see snapshots that were

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 11:36 AM, Hannu Krosing ha...@krosing.net wrote: On Thu, 2011-07-28 at 11:15 -0400, Robert Haas wrote: On Thu, Jul 28, 2011 at 11:10 AM, Hannu Krosing ha...@2ndquadrant.com wrote: My main point was, that we already do synchronization when writing wal, why not

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 14:27 -0400, Robert Haas wrote: On Thu, Jul 28, 2011 at 11:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jul 28, 2011 at 10:33 AM, Tom Lane t...@sss.pgh.pa.us wrote: But should we rethink that? Your point that hot standby

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Tom Lane
Hannu Krosing ha...@krosing.net writes: So the basic design could be a sparse snapshot, consisting of 'xmin, xmax, running_txids[numbackends] where each backend manages its own slot in running_txids - sets a txid when aquiring one and nulls it at commit, possibly advancing xmin if

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 21:32 +0200, Hannu Krosing wrote: On Thu, 2011-07-28 at 14:27 -0400, Robert Haas wrote: Hmm, interesting idea. However, consider the scenario where some transactions are using synchronous_commit or synchronous replication, and others are not. If a transaction that

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Tom Lane
Hannu Krosing ha...@2ndquadrant.com writes: On Thu, 2011-07-28 at 14:27 -0400, Robert Haas wrote: We can't make either transaction visible without making both visible, and we certainly can't acknowledge the second transaction to the client until we've made it visible. I'm not going to say

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 15:42 -0400, Tom Lane wrote: Hannu Krosing ha...@2ndquadrant.com writes: On Thu, 2011-07-28 at 14:27 -0400, Robert Haas wrote: We can't make either transaction visible without making both visible, and we certainly can't acknowledge the second transaction to the

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 15:38 -0400, Tom Lane wrote: Hannu Krosing ha...@krosing.net writes: So the basic design could be a sparse snapshot, consisting of 'xmin, xmax, running_txids[numbackends] where each backend manages its own slot in running_txids - sets a txid when aquiring one and nulls

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Kevin Grittner
Hannu Krosing ha...@2ndquadrant.com wrote: but I still think that it is right semantics to make your commit visible to others, even before you have gotten back the confirmation yourself. Possibly. That combined with building snapshots based on the order of WAL entries of commit records

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 3:32 PM, Hannu Krosing ha...@2ndquadrant.com wrote: Hmm, interesting idea.  However, consider the scenario where some transactions are using synchronous_commit or synchronous replication, and others are not.  If a transaction that needs to wait (either just for WAL

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 3:40 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On Thu, 2011-07-28 at 21:32 +0200, Hannu Krosing wrote: On Thu, 2011-07-28 at 14:27 -0400, Robert Haas wrote: Hmm, interesting idea.  However, consider the scenario where some transactions are using

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 4:12 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Hannu Krosing ha...@2ndquadrant.com wrote: but I still think that it is right semantics to make your commit visible to others, even before you have gotten back the confirmation yourself. Possibly. That

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 16:20 -0400, Robert Haas wrote: On Thu, Jul 28, 2011 at 3:40 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On Thu, 2011-07-28 at 21:32 +0200, Hannu Krosing wrote: On Thu, 2011-07-28 at 14:27 -0400, Robert Haas wrote: Hmm, interesting idea. However, consider the

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 4:36 PM, Hannu Krosing ha...@krosing.net wrote: so in case of stuck slave the syncrep transcation is committed after crash, but is not committed before the crash happens ? Yep. ow will the replay process get stuc gaian during recovery ? Nope. -- Robert Haas

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: Having transactions become visible in the same order on the master and the standby is very appealing, but I'm pretty well convinced that allowing commits to become visible before they've been durably committed is throwing the D an ACID out the window.

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Jeff Davis
On Thu, 2011-07-28 at 14:27 -0400, Robert Haas wrote: Right, but if the visibility order were *defined* as the order in which commit records appear in WAL, that problem neatly goes away. It's only because we have the implementation artifact that set my xid to 0 in the ProcArray is

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Kevin Grittner
Jeff Davis pg...@j-davis.com wrote: Wouldn't the same issue exist if one transaction is waiting for sync rep (synchronous_commit=on), and another is waiting for just a WAL flush (synchronous_commit=local)? I don't think that a synchronous_commit=off is required. I think you're right --

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote: to make visibility atomic with commit I meant: to make visibility atomic with WAL-write of the commit record -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread karavelov
- Цитат от Hannu Krosing (ha...@2ndquadrant.com), на 28.07.2011 в 22:40 - Maybe this is why other databases don't offer per backend async commit ? Isn't Oracle's COMMIT WRITE NOWAIT; basically the same - ad hoc async commit? Though their idea of backend do not maps exactly to

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 4:54 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: Having transactions become visible in the same order on the master and the standby is very appealing, but I'm pretty well convinced that allowing commits to become

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Hannu Krosing
On Thu, 2011-07-28 at 16:42 -0400, Robert Haas wrote: On Thu, Jul 28, 2011 at 4:36 PM, Hannu Krosing ha...@krosing.net wrote: so in case of stuck slave the syncrep transcation is committed after crash, but is not committed before the crash happens ? Yep. ow will the replay process get

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Ants Aasma
On Thu, Jul 28, 2011 at 11:54 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: (4)  We communicate acceptable snapshots to the replica to make the order of visibility visibility match the master even when that doesn't match the order that transactions returned from commit. I wonder if

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Ants Aasma
On Fri, Jul 29, 2011 at 2:20 AM, Robert Haas robertmh...@gmail.com wrote: Well, again, there are three levels: (A) synchronous_commit=off.  No waiting! (B) synchronous_commit=local transactions, and synchronous_commit=on transactions when sync rep is not in use.  Wait for xlog flush. (C)

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 7:54 PM, Ants Aasma ants.aa...@eesti.ee wrote: On Thu, Jul 28, 2011 at 11:54 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: (4)  We communicate acceptable snapshots to the replica to make the order of visibility visibility match the master even when that doesn't

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Robert Haas
On Thu, Jul 28, 2011 at 8:12 PM, Ants Aasma ants.aa...@eesti.ee wrote: On Fri, Jul 29, 2011 at 2:20 AM, Robert Haas robertmh...@gmail.com wrote: Well, again, there are three levels: (A) synchronous_commit=off.  No waiting! (B) synchronous_commit=local transactions, and synchronous_commit=on

[HACKERS] cheaper snapshots

2011-07-27 Thread Robert Haas
On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder whether we could do something involving WAL properties --- the current tuple visibility logic was designed before WAL existed, so it's not exploiting that resource at all.  I'm imagining that the kernel of a snapshot