Re: [HACKERS] Free Space Map data structure

2008-04-12 Thread Heikki Linnakangas
Hannu Krosing wrote: index tree node binary mask mask bits 00 0 10-1? 1 21 1 30-3000?? 2 42 00010 52-30001?

Re: [HACKERS] Free Space Map data structure

2008-04-11 Thread Heikki Linnakangas
Hannu Krosing wrote: BTW, I'm pretty sure I have figured out the method for storing minimal required binary tree as an array, where adding each page adds exactly one upper node. The node added is often not the immediate parent, but is always the one needed for covering all nodes. I just have to

Re: [HACKERS] Free Space Map data structure

2008-04-10 Thread Hannu Krosing
> BTW, I'm pretty sure I have figured out the method for storing minimal > required binary tree as an array, where adding each page adds exactly > one upper node. The node added is often not the immediate parent, but is > always the one needed for covering all nodes. > > I just have to write it u

Re: [HACKERS] Free Space Map data structure

2008-04-10 Thread Heikki Linnakangas
Hannu Krosing wrote: On Wed, 2008-04-09 at 21:09 +0300, Heikki Linnakangas wrote: Hannu Krosing wrote: Saving 1 byte is an atomic op Unfortunately, it's not. Most if not all modern CPUs will perform one byte modification as "load word" + "modify word" + "save word". Hmm, maybe we I should ch

Re: [HACKERS] Free Space Map data structure

2008-04-10 Thread PFC
PFC wrote: About the FSM : Would it be possible to add a flag marking pages where all tuples are visible to all transactions ? (kinda like frozen I think) Ah, the visibility map. That's another line of discussion. The current plan is to not tie that to the FSM, but implement it se

Re: [HACKERS] Free Space Map data structure

2008-04-09 Thread Heikki Linnakangas
PFC wrote: About the FSM : Would it be possible to add a flag marking pages where all tuples are visible to all transactions ? (kinda like frozen I think) Ah, the visibility map. That's another line of discussion. The current plan is to not tie that to the FSM, but implement it sepa

Re: [HACKERS] Free Space Map data structure

2008-04-09 Thread PFC
About the FSM : Would it be possible to add a flag marking pages where all tuples are visible to all transactions ? (kinda like frozen I think) This could be useful to implement index-only scans, for count(), or to quickly skip rows when OFFSET is used, or to use only the index whe

Re: [HACKERS] Free Space Map data structure

2008-04-09 Thread Hannu Krosing
On Wed, 2008-04-09 at 21:09 +0300, Heikki Linnakangas wrote: > Hannu Krosing wrote: > > Saving 1 byte is an atomic op > > Unfortunately, it's not. Most if not all modern CPUs will perform one > byte modification as "load word" + "modify word" + "save word". Hmm, maybe we I should change my desi

Re: [HACKERS] Free Space Map data structure

2008-04-09 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: ... what I really wanted to discuss is the data structure needed for the Free Space Map. The FSM data structure needs to support two basic operations: 1. Fast lookup of page with >= X bytes of free space 2. Update of arbitrary, in

Re: [HACKERS] Free Space Map data structure

2008-04-09 Thread Heikki Linnakangas
Hannu Krosing wrote: Sabing 1 byte is an atomic op Unfortunately, it's not. Most if not all modern CPUs will perform one byte modification as "load word" + "modify word" + "save word". -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing lis

Re: [HACKERS] Free Space Map data structure

2008-04-09 Thread Hannu Krosing
On Wed, 2008-04-09 at 10:17 +0530, Pavan Deolasee wrote: > On Wed, Apr 9, 2008 at 12:22 AM, Heikki Linnakangas > <[EMAIL PROTECTED]> wrote: > > > > > Well, if you add the higher levels, we're no longer talking about O(1), but > > O(log n) (for a pretty steep logarithm), just like my proposal. >

Re: [HACKERS] Free Space Map data structure

2008-04-08 Thread Pavan Deolasee
On Wed, Apr 9, 2008 at 12:22 AM, Heikki Linnakangas <[EMAIL PROTECTED]> wrote: > > Well, if you add the higher levels, we're no longer talking about O(1), but > O(log n) (for a pretty steep logarithm), just like my proposal. > For updates, I would still call it O(1) because the number of levels

Re: [HACKERS] Free Space Map data structure

2008-04-08 Thread Heikki Linnakangas
Pavan Deolasee wrote: Can we not use bitmaps to track approximate rather than exact free space ? For example, we can have 8 or 16 buckets of free space. A page with 1-32 bytes free, goes into bucket 1, a page with at 33-64 bytes free, goes into bucket 2 and so on. If you want a page with X bytes

Re: [HACKERS] Free Space Map data structure

2008-04-08 Thread Heikki Linnakangas
Hannu Krosing wrote: On Tue, 2008-04-08 at 12:26 +0300, Hannu Krosing wrote: Probably we could do without sparse files, if we find an efficient way to compute the "add order" of leaf and parent pages for above algorithm. if we always add only the minimal needed set of parents then the order w

Re: [HACKERS] Free Space Map data structure

2008-04-08 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > ... what I really wanted to discuss is the data structure needed > for the Free Space Map. > The FSM data structure needs to support two basic operations: > 1. Fast lookup of page with >= X bytes of free space > 2. Update of arbitrary, individual p

Re: [HACKERS] Free Space Map data structure

2008-04-08 Thread Hannu Krosing
On Tue, 2008-04-08 at 13:38 +0300, Hannu Krosing wrote: > On Tue, 2008-04-08 at 12:26 +0300, Hannu Krosing wrote: > > > Probably we could do without sparse files, if we find an efficient way > > to compute the "add order" of leaf and parent pages for above algorithm. > > if we always add only th

Re: [HACKERS] Free Space Map data structure

2008-04-08 Thread Gregory Stark
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes: > For example: > > 9 > 4 9 > 2 4 0 9 It occurs to me now that this it actually wouldn't be so easy to ripple up changes due to the amount amount of space *decreasing*. Consider if you reduce the leaf 9 to an 8. You want to decrease its p

Re: [HACKERS] Free Space Map data structure

2008-04-08 Thread Hannu Krosing
On Tue, 2008-04-08 at 12:26 +0300, Hannu Krosing wrote: > Probably we could do without sparse files, if we find an efficient way > to compute the "add order" of leaf and parent pages for above algorithm. if we always add only the minimal needed set of parents then the order will look something l

Re: [HACKERS] Free Space Map data structure

2008-04-08 Thread Pavan Deolasee
On Tue, Apr 8, 2008 at 2:03 PM, Heikki Linnakangas <[EMAIL PROTECTED]> wrote: > Our current code doesn't support 2, as we always update the FSM in bulk > after vacuum, but we will need that capability to be able to do partial > vacuums in the future. > +1 for that. This will be extremely useful

Re: [HACKERS] Free Space Map data structure

2008-04-08 Thread Hannu Krosing
On Tue, 2008-04-08 at 09:33 +0100, Heikki Linnakangas wrote: > The last thread about Free Space Map evolved into discussion about > whether the FSM and other kinds of auxiliary data should be stored > within the heap pages, in "map forks", or auxiliary relfilenodes > attached to the relation. I

Re: [HACKERS] Free Space Map data structure

2008-04-08 Thread Dave Page
On Tue, Apr 8, 2008 at 9:54 AM, Brendan Jurd <[EMAIL PROTECTED]> wrote: > If I understand your design correctly, this claim isn't true. If the > topmost node reports 9 bytes free, could you not have seven pages each > with 1 byte free, and an eighth page with 2 bytes free? > > 9 >

Re: [HACKERS] Free Space Map data structure

2008-04-08 Thread Brendan Jurd
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, Apr 8, 2008 at 6:33 PM, Heikki Linnakangas wrote: > For example: > > 9 > 4 9 > 2 4 0 9 > > The leaf nodes correspond the heap pages, so page #0 has 2 units of free > space, page #1 has 4, page #1 is full and page has 9. > > Let

[HACKERS] Free Space Map data structure

2008-04-08 Thread Heikki Linnakangas
The last thread about Free Space Map evolved into discussion about whether the FSM and other kinds of auxiliary data should be stored within the heap pages, in "map forks", or auxiliary relfilenodes attached to the relation. It seems the consensus was to go with the map forks, but what I really