Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote >> Can you CLUSTER >> against a minmax index? > > Not in this release, at least in my understanding. It's not yet > possible to do an ordered fetch, so the cluster scan probably won't > work. As per the patch I helped writing, CLUSTER should use the sequential heap scan+sort whe

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote > From our discussions here, IMHO there is a strong case for avoiding > btrees completely for larger historical data tables. That isn't > something I had even considered as desirable before this conversation > but ISTM now that taking that approach will be more fruitful than > att

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Jeremy Harris wrote > Surely there's good correlation between IMSI & IMEI, so have a separate > table to translate one to (a group of) the others, and > halve the indexes on your main table? Yes; unfortunately not always both are available; but it's something we are thinking about (it requires log

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote > So in the use case you describe, the min max index would require a > scan of only 25% of the table, not the 80% described earlier for > random inserts. In my experience, people wish to keep data for much > longer periods and so the percentage of scan required would drop lower >

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote > On 5 November 2013 14:28, Leonardo Francalanci < > m_lists@ > > wrote: > >> Either my sql is not correct (likely), or my understanding of the minmax >> index is >> not correct (even more likely), or the minmax index is not usab

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Jeff Janes wrote > Some experiments I did a few years ago showed that applying sorts to the > data to be inserted could be helpful even when the sort batch size was as > small as one tuple per 5 pages of existing index. Maybe even less. Cool!!! Do you have any idea/hint on how I could try and rep

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Claudio Freire wrote > Well, of course, they're not magic pixie dust. Of course they aren't. I think they can make a difference in a sequential input scenario. But I'm not the one who said that they are fit to solve the problems me and other people are talking about in this thread. Claudio Frei

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Claudio Freire wrote > real data isn't truly random Well, let's try normal_rand??? create table t1 as select trunc(normal_rand(100, 50, 3)) as n, generate_series(1, 100) as i; with cte as (select min(n) as minn, max(n) as maxn, i/100 from t1 group by i/100),

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Claudio Freire wrote > Min-max indexes always require a sequential scan of the min-max index > itself when querying. I'm worried about the number of heap pages that will be scanned. My understanding is that given the random input, the index will not be selective enough, and will end up requiring

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Simon Riggs wrote > On 5 November 2013 09:57, Leonardo Francalanci < > m_lists@ > > wrote: >> While I do believe in testing (since "In theory there is no difference >> between theory and practice. In practice there is"), I would like to know >> the &qu

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Simon Riggs wrote > Minmax indexes seem to surprise many people, so broad generalisations > aren't likely to be useful. > > I think the best thing to do is to publish some SQL requests that > demonstrate in detail what you are trying to achieve and test them > against minmax indexes. That way we c

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Simon Riggs wrote > Everybody on this thread is advised to look closely at Min Max indexes > before starting any further work. > > MinMax will give us access to many new kinds of plan, plus they are > about as close to perfectly efficient, by which I mean almost zero > overhead, with regard to ins

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Andres Freund-3 wrote > On 2013-11-04 11:27:33 -0500, Robert Haas wrote: >> On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire < > klaussfreire@ > > wrote: >> > Such a thing would help COPY, so maybe it's worth a look >> >> I have little doubt that a deferred insertion buffer of some kind >> could

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-31 Thread Leonardo Francalanci
Gavin Flower-2 wrote > How about being able to mark indexes: > 'MEMORY ONLY' to make them not go to disk > and > 'PERSISTENT | TRANSIENT' to mark if they should be recreated on > machine bootup? I would love that. But: 1) I'd like to make some tests with a "memory drive", and confirm t

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-31 Thread Leonardo Francalanci
Jeff Janes wrote > True, but that is also true of indexes created in bulk. It all has to > reach disk eventually-- > [...] > If the checkpoint interval is as long as the partitioning period, then > hopefully the active index buffers get re-dirtied while protected in > shared_buffers, and only get

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Jeff Janes wrote > You could periodically merge older partitions into larger tables, index > those aggregated tables, then transactionally disinherit the old > partitions > and inherit the new aggregated one. This would keep the value of K down, > at the expense of re-writing data multiple times (

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Jeff Janes wrote > Are partitions read-only once time has moved on, or can stragglers show up > that need to be inserted into older partitions? > > You could periodically merge older partitions into larger tables, index > those aggregated tables, then transactionally disinherit the old > partition

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
> Point being: hardware is marching along pretty fast (after 20+ years > of stagnation) and it's dangerous (IMO) to make big software > investments based on the situation on the ground *today*. Yes, that's a good point. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To m

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Jeff Janes wrote > The index insertions should be fast until the size of the active part of > the indexes being inserted into exceeds shared_buffers by some amount > (what > that amount is would depend on how much dirty data the kernel is willing > to > allow in the page cache before it starts suff

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
> LSM-trees seem patent free I'm no expert, and I gave it just a look some time ago: it looked to me very complicated to get right... and as far as I remember you don't get that much gain, unless you go multi-level which would complicate things further > Please somebody advise patent status of

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
> What is the reason for needing such fast access to individual groups > of records? Sure sounds like the NSA or similar ;-) Users need to search all calls originated from/to a user or from/to a specific mobile phone to answer/analyze customers' probl... ok, I give up: I work for the NSA ;) >

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
> Presumably the data you are inserting isn't actually random. Please > describe the use case you are considering in more detail and some view > on how frequent that is, with some examples. Once we understand the > use case and agree it is important, we might solve problems. Collecting calls data

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
> Hmm, you realise Alvaro is working on MinMax indexes in this release? > They are very efficient with regard to index inserts and specially > designed for use on large tables. > > Prior work by Heikki on Grouped Item Tuples was a way of reducing the > size of indexes, yet still allowing uniquenes

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
> Hmm, you realise Alvaro is working on MinMax indexes in this release? > They are very efficient with regard to index inserts and specially > designed for use on large tables. > > Prior work by Heikki on Grouped Item Tuples was a way of reducing the > size of indexes, yet still allowing uniquene

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
> I bet you've mis-diagnosed the problem.  Btrees don't have a problem > keeping up with 50m records; you're problem is that after a certain > point your page cache can't keep up with the pseudo-random i/o > patterns and you start seeing faults to storage. > [...]  This has nothing to do the btree

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
> They should, in theory, be faster than btrees -- O(1) not O(log N) page > fetches per lookup.  In practice they don't seem to be faster, and > nobody's bothered to find out exactly why.  Again, this isn't a terribly > encouraging precedent for implementing some other index type that's > supposed

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
> Another point to add: I don't really see btree as a barrier to > performance for most of the problems I face.  The real barriers to > database performance are storage, contention, and query planning. Ehm that's true for regular OLTP stuff, which I understand is what most (95%?) of people use/ne

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
> Before getting too excited about some new academic index type, it's worth > noting the sad state in which hash indexes have languished for years. > Nobody's bothered to add WAL support, let alone do any other real work > on them.  The non-btree index types that have been getting love are the > on

[HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
Hi, I don't see much interest in insert-efficient indexes. These are the ones I've found: - LSM-tree (used by Cassandra and SQLite4?) - Y-Tree (http://www.bossconsulting.com/oracle_dba/white_papers/DW%20in%20oracle/P23%20(ytree%20index%20structure%20for%20DWs).pdf ) - Fractal indexes (TokuDB,

Re: [HACKERS] Page Checksums

2011-12-22 Thread Leonardo Francalanci
Agreed. I do agree with Heikki that it really ought to be the OS problem, but then we thought that about dtrace and we're still waiting for that or similar to be usable on all platforms (+/- 4 years). My point is that it looks like this is going to take 1-2 years in postgresql, so it looks li

Re: [HACKERS] Page Checksums

2011-12-21 Thread Leonardo Francalanci
I think what I meant was: isn't this going to be useless in a couple of years (if, say, btrfs will be available)? Or it actually gives something that FS will never be able to give? Yes, it will help you find/address bugs in the filesystem. These things are not unheard of... It sounds to me li

Re: [HACKERS] Page Checksums

2011-12-21 Thread Leonardo Francalanci
On 21/12/2011 16.19, Stephen Frost wrote: * Leonardo Francalanci (m_li...@yahoo.it) wrote: I can't help in this discussion, but I have a question: how different would this feature be from filesystem-level CRC, such as the one available in ZFS and btrfs? Depends on how much you trus

Re: [HACKERS] Page Checksums

2011-12-21 Thread Leonardo Francalanci
I can't help in this discussion, but I have a question: how different would this feature be from filesystem-level CRC, such as the one available in ZFS and btrfs? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.o

Re: [HACKERS] WIP: Collecting statistics on CSV file data

2011-10-18 Thread Leonardo Francalanci
> New API AnalyzeForeignTable I didn't look at the patch, but I'm using CSV foreign tables with named pipes to get near-realtime KPI calculated by postgresql. Of course, pipes can be read just once, so I wouldn't want an "automatic analyze" of foreign tables... -- Sent via pgsql-hackers mailin

Re: [HACKERS] Is there any plan to add unsigned integer types?

2011-09-26 Thread Leonardo Francalanci
  > compression is an interesting topic: the guys over at tokudb are > making some wild claims...i'm curious if they are real, and what the > real tradeoffs are. I don't know how much of the performance they claim comes from compression and how much from the different indexing technique they use

Re: [HACKERS] Your Postgresql 9.2 patch

2011-06-28 Thread Leonardo Francalanci
> I've nearly finished editing prior to commit, so no worries. Thank you, let me know if I can help. Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Your Postgresql 9.2 patch

2011-06-28 Thread Leonardo Francalanci
> Leonardo, > > Your patch: > > use less space in xl_xact_commit > > ... has been waiting on an updated version from you for 10 days now. Do > you think you're likely to complete it for this CommitFest? I sent an email on the subject: http://postgresql.1045698.n5.nabble.com/use-less-space-

Re: [HACKERS] use less space in xl_xact_commit patch

2011-06-16 Thread Leonardo Francalanci
> The important thing is that we retain backwards compatibility with > current XLOG_XACT_COMMIT. I'm not worried what we call the other one. Ok, let me see if I got it right: #define XLOG_XACT_COMMIT0x00 should become: #define XLOG_XACT_COMMIT_WITH_INFO 0x00 and I'll add a

Re: [HACKERS] use less space in xl_xact_commit patch

2011-06-16 Thread Leonardo Francalanci
> With regards to the naming, I think it would be better if we kept > XLOG_XACT_COMMIT record exactly as it is now, and make the second > record an entirely new record called XLOG_XACT_COMMIT_FASTPATH. That > way we retain backwards compatibility. > > If you'd like to rework like that please,

Re: [HACKERS] use less space in xl_xact_commit patch

2011-06-15 Thread Leonardo Francalanci
> > On Wed, May 25, 2011 at 3:05 PM, Simon Riggs wrote: > > Leonardo, can you submit an updated version of this patch today that > > incorporates Simon's suggestion? Mmmh, maybe it was simpler than I thought; I must be missing something... patch attached How can I test it with "weird" s

Re: [HACKERS] use less space in xl_xact_commit patch

2011-06-15 Thread Leonardo Francalanci
> On Tue, Jun 14, 2011 at 2:31 PM, Simon Riggs wrote: > Well, we certainly have the option to review and commit the patch any > time up until feature freeze. However, I don't want the CommitFest > application to be full of entries for patches that are not actually > being worked on, because it

Re: [HACKERS] use less space in xl_xact_commit patch

2011-06-15 Thread Leonardo Francalanci
> We don't need to be in a hurry here. As the reviewer I'm happy to give > Leonardo some time, obviously no more than the end of the commit fest. > > If he doesn't respond at all, I'll do it, but I'd like to give him the > chance and the experience if possible. Sorry I couldn't update the pat

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-31 Thread Leonardo Francalanci
> Well, I sort of assumed the design was OK, too, but the more we talk > about this WAL-logging stuff, the less convinced I am that I really > understand the problem. :-( I see. In fact, I think nobody thought about restart points... To sum up: 1) everything seems ok when in the wal_level =

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-31 Thread Leonardo Francalanci
> I think > we need a detailed design document for how this is all going to work. > We need to not only handle the master properly but also handle the > slave properly. Consider, for example, the case where the slave > begins to replay the transaction, reaches a restartpoint after > replaying

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-30 Thread Leonardo Francalanci
> Why is it necessary to replay the operation only on the slave? Can we > just use XLOG_HEAP_NEWPAGE? Uh, I don't know why but I thought I shouldn't log a page on the master, since all the pages are already there and fsync-ed. But if it makes no harm, I can easily use XLOG_HEAP_NEWPAGE (of co

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-27 Thread Leonardo Francalanci
> From: Noah Misch > > - the patch is missing the "send all table pages to the > > standby" part; is there some code I can use as base? > > Nothing comes to mind as especially similar. > > > I guess I have to generate some special log type that > > is only "played" by standby servers. > > W

Re: [HACKERS] use less space in xl_xact_commit patch

2011-05-25 Thread Leonardo Francalanci
Sorry, email sent without body. Fixed some English mistakes. commitlog_lessbytes02.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use less space in xl_xact_commit patch

2011-05-25 Thread Leonardo Francalanci
commitlog_lessbytes02.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use less space in xl_xact_commit patch

2011-05-25 Thread Leonardo Francalanci
> Da: Simon Riggs > I can't find a clear discussion of what you are trying to do, and how, > just a URL back to a complex discussion on another topic. While trying to write a patch to allow changing an unlogged table into a logged one, I had to add another int field to xl_xact_commit. Robert H

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-20 Thread Leonardo Francalanci
> - the patch is missing the "send all table pages to the > standby" part; is there some code I can use as base? > I guess I have to generate some special log type that > is only "played" by standby servers. Maybe I could use log_newpage, but instead of XLOG_HEAP_NEWPAGE I could use something

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-20 Thread Leonardo Francalanci
I'll try to sum up what I understood: 1) the standby keeps the lock, so no problem with stray files coming from the unlogged->logged log reply, as the table can't be read during the operation 2) calling ResetUnloggedRelations before ProcArrayApplyRecoveryInfo would remove the problem of the stra

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-19 Thread Leonardo Francalanci
> I'd guess some WAL record arising from the post-crash master restart makes the > standby do so. When a crash isn't involved, the commit or abort record is >that > signal. You could test and find out how it happens after a master crash > with >a > procedure like this: > > 1. Start a mas

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-19 Thread Leonardo Francalanci
> On Wed, May 18, 2011 at 04:02:59PM +0100, Leonardo Francalanci wrote: > > > By the time the startup process > > > releases the AccessExclusiveLock acquired by the proposed > > > UNLOGGED -> normal conversion process, that relfilenode > > > needs t

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-18 Thread Leonardo Francalanci
> By the time the startup process > releases the AccessExclusiveLock acquired by the proposed > UNLOGGED -> normal conversion process, that relfilenode > needs to be either fully copied or unlinked all over again. > (Alternately, find some other way to make sure queries don't > read the half-c

Re: [HACKERS] use less space in xl_xact_commit patch

2011-05-18 Thread Leonardo Francalanci
this is a second version: now using intcounts[1]; /* variable-length array of counts */ in xl_xact_commit to keep track of number of different arrays at the end of the struct. Waiting for feedbacks... Leonardo commitlog_lessbytes00.patch Description: Binary data -- Se

Re: [HACKERS] use less space in xl_xact_commit patch

2011-05-18 Thread Leonardo Francalanci
> int counts[1]; /* variable-length array of counts, xinfo flags define > length of array and meaning of counts */ Damn, that's much cleaner than what I did. I don't know why I stuck with the idea that it had to be: int array int array ... instead of: int int ... array array ... which mak

[HACKERS] use less space in xl_xact_commit patch

2011-05-16 Thread Leonardo Francalanci
Hi, following the conversation at http://postgresql.1045698.n5.nabble.com/switch-UNLOGGED-to-LOGGED-tp4290461p4382333.html I tried to remove some bytes from xl_xact_commit. The way I did it needs palloc+memcpy. I guess it could be done reusing the memory for smgrGetPendingDeletes. But I don

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-10 Thread Leonardo Francalanci
> I don't think making xinfo shorter will save anything, because > whatever follows it is going to be a 4-byte quantity and therefore > 4-byte aligned. ups, didn't notice it. I'll splitxinfo into: uint16 xinfo; uint16 presentFlags; I guess it helps with the reading? I mean, instead

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-10 Thread Leonardo Francalanci
> Yes, that seems like a very appealing approach. There is plenty of > bit-space available in xinfo, and we could reserve a bit each for > nrels, nsubxacts, and nmsgs, with set meaning that an integer count of > that item is present and clear meaning that the count is omitted from > the struct

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-07 Thread Leonardo Francalanci
> On Fri, Apr 22, 2011 at 4:13 AM, Leonardo Francalanci wrote: > >> Maybe you should change xl_act_commit to have a separate list of rels to > >> drop the init fork for (instead of mixing those with the list of files to > >> drop as a whole). > > &g

Re: [HACKERS] Unlogged tables, persistent kind

2011-04-26 Thread Leonardo Francalanci
> > If that 1% is random (not time/transaction related), usually you'd rather >have an empty table. > > Why do you think it would be random? "Heap blocks would be zeroed if they were found to be damaged, following a crash." If you erase full blocks, you have no idea what data you erased; it

Re: [HACKERS] Unlogged tables, persistent kind

2011-04-25 Thread Leonardo Francalanci
> The amount of data loss on a big > table will be <1% of the data loss >caused by truncating the whole table. If that 1% is random (not time/transaction related), usually you'd rather have an empty table. In other words: is a table that is not consistant with anything else in the db useful? --

Re: [HACKERS] Unlogged tables, persistent kind

2011-04-25 Thread Leonardo Francalanci
The only data we can't rebuild it's the heap. So what about an option for UNlogged indexes on a LOGged table? It would always preserve data, and it would 'only' cost a rebuilding of the indexes in case of an unclean shutdown. I think it would give a boost in performance for all those cases where

Re: [HACKERS] What Index Access Method Functions are really needed?

2011-04-22 Thread Leonardo Francalanci
> > another question regarding indexes. Sadly I can't find enough info in > > the documentation. Which of the functions are needed in order for a > > index to work? > > All of them. Maybe I completely misunderstood the question, but some functions are "optionals", such as amgetbitmap, right?

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-04-22 Thread Leonardo Francalanci
> Maybe you should change xl_act_commit to have a separate list of rels to > drop the init fork for (instead of mixing those with the list of files to > drop as a whole). I tried to follow your suggestion, thank you very much. Here's a first attempt at the patch. I "tested" it with: crea

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-04-18 Thread Leonardo Francalanci
I think I coded a very basic version of the UNLOGGED to LOGGED patch (only wal_level=minimal case for the moment). To remove the INIT fork, I changed somehow PendingRelDelete to have a flag "bool onlyInitFork" so that the delete would remove only the INIT fork at commit. Everything "works" (note

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-04-16 Thread Leonardo Francalanci
> > If the master crashes while a transaction that used CREATE TABLE is >unfinished, > > both the master and the standby will indefinitely retain identical, stray >(not > > referenced by pg_class) files. The catalogs do reference the relfilenode of > > each unlogged relation; currently, tha

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-04-11 Thread Leonardo Francalanci
> > But re-reading it, I don't understand: what's the difference in creating > > a new "regular" table and crashing before emitting the abort record, > > and converting an unlogged table to logged and crashing before > > emitting the abort record? How do the standby servers handle a > > "CREATE

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-04-09 Thread Leonardo Francalanci
> I'm pretty sure we wouldn't accept a patch for a feature that would > only work with wal_level=minimal, but it might be a useful starting > point for someone else to keep hacking on. I understand. Reading your post at http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php I thoug

[HACKERS] switch UNLOGGED to LOGGED

2011-04-08 Thread Leonardo Francalanci
Hi, I read the discussion at http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php From what I can understand, going from/to unlogged to/from logged in the wal_level == minimal case is not too complicated. Suppose I try to write a patch that allows ALTER TABLE tablename SET L

Re: [HACKERS] Fwd: What do these terms mean in the SOURCE CODE?

2010-11-21 Thread Leonardo Francalanci
> Here's my single biggest tip for newcomers to the Postgres source: > if you don't use ctags, glimpse, or some other tool that can quickly > show you all references to a given identifier, go out and get one. > It's one of the easiest ways to learn about things. I guess Eclipse is the best to

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-29 Thread Leonardo Francalanci
> Cases with lots of irrelevant indexes. Zoltan's example had 4 indexes > per child table, only one of which was relevant to the query. In your > test case there are no irrelevant indexes, which is why the runtime > didn't change. Mmh... I must be doing something wrong. It looks to me it's

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-29 Thread Leonardo Francalanci
> Hmm, maybe I spoke too soon. With 9000 child tables I get a profile > like this: Well, the 9000-table-test-case was meant to check the difference in performance with/without the patch... I don't see the reason for trying to optimize such an unrealistic case. BTW can someone explain to me whi

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-29 Thread Leonardo Francalanci
> This is going to be dominated by constraint exclusion checking. There's > basically no fix for that except a more explicit representation of the > partitioning rules. Damn, I knew that was going to be more complicated :) So in which case does this patch help? I guess in a multi-index scena

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-29 Thread Leonardo Francalanci
> but I don't get any gain from the patch... explain time is still around 250 >ms. > Tried with 9000 partitions, time is still 2 secs. Small correction: I tried with 3000 partitions (FOR i IN 0..3000 ...) and got 250ms with both versions, with 9000 partitions 2 secs (again no gain from the pat

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-29 Thread Leonardo Francalanci
> On the other hand, if I use a similar test case to my original one > (i.e. the tables are much wider) then the query planning takes > 1.42 seconds in 9.1 with this patch instead of about 4.7 seconds > as we observed it using PostgreSQL 9.0.0. The beginning of the gprof > output now looks like

Re: [HACKERS] Custom aggragation function that creates an array

2010-10-23 Thread Leonardo Francalanci
> Before you start coding, have you looked over what's in > contrib/intarray ? There's nothing that fulfills my needs there, but I guess it would be the perfect place to watch for code examples! Thank you (I think I made my custom aggregation function work, but I'll look into intarray code to

[HACKERS] Custom aggragation function that creates an array

2010-10-22 Thread Leonardo Francalanci
Hi, I want to write a custom agg function that, given an "int4 index", increments the element at "index" of an array and, at the end, returns the array. The array will always be int4[]. I need it in C, since plpgsql is way slower (and I need to use it in 5M+ rows). I did it, but I also need to c

Re: [HACKERS] On the usefulness of hint bits

2010-10-11 Thread Leonardo Francalanci
> I wonder if we could improve this with some sort of process-local > cache - not to get rid of hint bits, just to reduce pg_clog > contention. We might easily end up testing the same XID many times > during the same table scan. I guess that's my scenario... not that many transactions, so ev

Re: [HACKERS] On the usefulness of hint bits

2010-10-11 Thread Leonardo Francalanci
> Reduction of contention for pg_clog access, for one thing. If you read > the archives, you'll find that pg_clog access contention has been shown > to be one cause of "context swap storms". Having to go to clog for > every single tuple access would make that orders of magnitude worse. Ok;

[HACKERS] On the usefulness of hint bits

2010-10-11 Thread Leonardo Francalanci
Hi, I was wondering what is the advantage of having hint bits for OLAP -style workloads, that is when the number of transactions is not that high. If I got it right, in 10 pg_clog pages we can store the status for more than 32 transactions. That's a lot, in a very small space (80KB?). So I

Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-10-08 Thread Leonardo Francalanci
> Applied with some significant editorialization. The biggest problem I > found was that the code for expression indexes didn't really work, and > would leak memory like there's no tomorrow even when it did work. Sorry I couldn't write the way it was supposed to... I'll look at the difference

Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-10-04 Thread Leonardo Francalanci
> It sounds like the costing model might need a bit more work before we commit >this. I tried again the simple sql tests I posted a while ago, and I still get the same ratios. I've tested the applied patch on a dual opteron + disk array Solaris machine. I really don't get how a laptop hard dr

Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-10-01 Thread Leonardo Francalanci
> I ran a few more performance tests on this patch. Here's what I got > for the tests Leonardo posted originally: >* 2M rows: 22 seconds for seq. scan, 24 seconds for index scan >* 5M rows: 139 seconds for seq. scan, 97 seconds for index scan >* 10M rows: 256 seconds seq. scan, 61

Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-09-29 Thread Leonardo Francalanci
> > Here's my post with a (very simple) performance test: > > http://archives.postgresql.org/pgsql-hackers/2010-02/msg00766.php > I think the 10M rows test is more in line with what we want (83s vs. 646). Can someone else test the patch to see if what I found is still valid? I don't think it ma

Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-09-29 Thread Leonardo Francalanci
> > 10% is nothing. I was expecting this patch would give an order of > > magnitude of improvement or somethine like that in the worst cases of > > the current code (highly unsorted input) > > Yes. It should be x10 faster than ordinary method in the worst cases. Here's my post with a (very

Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-07-21 Thread Leonardo Francalanci
> I think writetup_rawheap() and readtup_rawheap() are a little complex, > but should work as long as there are no padding between t_len and t_self > in HeapTupleData struct. > > - It might be cleaner if you write the total item length > and tuple data separately. > - "(char *) tuple + sizeo