[HACKERS] Table and Index compression

2009-08-06 Thread PFC
With the talk about adding compression to pg_dump lately, I've been wondering if tables and indexes could be compressed too. So I've implemented a quick on-the-fly compression patch for postgres Sorry for the long email, but I hope you find this interesting. Why compress ? 1- To sa

Re: [HACKERS] More thoughts on sorting

2009-08-01 Thread PFC
PFC writes: - for short strings (average 12 bytes), sort is CPU-bound in strcoll() - for longer strings (average 120 bytes), sort is even more CPU-bound in strcoll() No news there. If you are limited by the speed of text comparisons, consider using C locale. regards

[HACKERS] More thoughts on sorting

2009-07-31 Thread PFC
There was a thread some time ago about sorting... it kind of died... I did some tests on a desktop (Postgres 8.3.7, kubuntu, Core 2 dual core, 4GB RAM, RAID1 of 2 SATA disks) Quick conclusions : - grabbing the stuff to sort can be IO bound of course (not here) - for short strings (average 12

Re: [HACKERS] Protection from SQL injection

2008-05-01 Thread PFC
Sure, modifying the WHERE clause is still possible, but the attacker is a lot more limited in what he can do if he can't tack on a whole new command. I hacked into a site like that some day to show a guy that you shouldn't trust magicquotes (especially when you switch hosting providers and

Re: [HACKERS] Protection from SQL injection

2008-04-30 Thread PFC
Could we also get a mode, where PREPARE would only be allowed for queries of the form "SELECT * FROM func(?,?,?,?,?); :) Actually, that is similar to the concept of "global prepared statements" that I proposed some time ago, but I will not have time to write the patch, alas... Idea was t

Re: [HACKERS] Protection from SQL injection

2008-04-29 Thread PFC
For example, some applications need to replace whole phrases: $criteria = "WHERE $var1 = '$var2'" This is a very common approach for dynamic search screens, and really not covered by placeholder approaches. Python, again : params = { 'column1': 10, 'column2': "a st'rin

Re: [HACKERS] Protection from SQL injection

2008-04-29 Thread PFC
zero developer pain Actually it's not zero pain, but the main problem is: there is no way to enforce using it. Sure, there is no way to enforce it (apart from grepping the source for pg_query() and flogging someone if it is found), but is it really necessary when the right solution is eas

Re: [HACKERS] Protection from SQL injection

2008-04-29 Thread PFC
On Tue, 29 Apr 2008 01:03:33 +0200, Brendan Jurd <[EMAIL PROTECTED]> wrote: On Tue, Apr 29, 2008 at 7:00 AM, PFC <[EMAIL PROTECTED]> wrote: I have found that the little bit of code posted afterwards did eliminate SQL holes in my PHP applications with zero developer pain, actu

Re: [HACKERS] Protection from SQL injection

2008-04-28 Thread PFC
As you know, "SQL injection" is the main security problem of databases today. I think I found a solution: 'disabling literals'. Or you may call it 'enforcing the use of parameterized statements'. This means that SQL statements with embedded user input are rejected at runtime. My solution goes

Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-23 Thread PFC
Example : let's imagine a "cache priority" setting. Which we can presume the DBA will set incorrectly because the tools needed to set that right aren't easy to use. LOL, yes. Jim threw out that you can just look at the page hit percentages instead. That's not completely true. I

[HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-22 Thread PFC
It started with this query : EXPLAIN ANALYZE SELECT * FROM relations r JOIN nodes n ON (n.id=r.child_id) WHERE r.parent_id=16330; QUERY PLAN

Re: [HACKERS] Plan targetlists in EXPLAIN output

2008-04-17 Thread PFC
On Thu, 17 Apr 2008 20:42:49 +0200, Simon Riggs <[EMAIL PROTECTED]> wrote: On Thu, 2008-04-17 at 12:34 -0400, Tom Lane wrote: I'm tempted to propose redefining the currently-nearly-useless EXPLAIN VERBOSE option as doing this. Yes please. Sounds like a good home for other useful things al

Re: [HACKERS] count(*) performance improvement ideas

2008-04-17 Thread PFC
My wife has a snake phobia, besides, I've just started learning Scala. Just had a look at Scala, it looks nice. Slightly Lispish (like all good languages)... txid_current() No... hold on, it is per session, and a session can't have two or more transactions active at once can it?

Re: [HACKERS] count(*) performance improvement ideas

2008-04-17 Thread PFC
On Thu, 17 Apr 2008 02:48:37 +0200, Stephen Denne <[EMAIL PROTECTED]> wrote: PFC wrote: Let's try this quick & dirty implementation of a local count-delta cache using a local in-memory hashtable (ie. {}). CREATE OR REPLACE FUNCTION update_count( key TEXT, delta INTEG

Re: [HACKERS] count(*) performance improvement ideas

2008-04-16 Thread PFC
The whole thing is a bit of an abuse of what the mechanism was intended for, and so I'm not sure we should rejigger GUC's behavior to make it more pleasant, but on the other hand if we're not ready to provide a better substitute ... In my experiments with materialized views, I identified these

Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread PFC
On Mon, 14 Apr 2008 16:17:18 +0200, Csaba Nagy <[EMAIL PROTECTED]> wrote: On Mon, 2008-04-14 at 16:10 +0200, Csaba Nagy wrote: ... or plan the query with the actual parameter value you get, and also record the range of the parameter values you expect the plan to be valid for. If at execution ti

Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread PFC
Bind message behaviour was modified : - If the user asks for execution of a named prepared statement, and the named statement does not exist in PG's prepared statements cache, instead of issuing an error and borking the transaction, it Binds to an empty statement, that takes no parameters, an

Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread PFC
If cached plans would be implemented, the dependence on parameter values could be solved too: use special "fork" nodes in the plan which execute different sub-plans depending on special parameter values/ranges, possibly looking up the stats at runtime, so that the plan is in a compiled state wit

Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-13 Thread PFC
Why limit ourselves with Oracle? How all major proprietary RDBMSs do it. Thanks for the links. Very interesting. The DB2 document especially mentions an important point : in order to make their planner/optimizer smarter, they had to make it slower, hence it became crucial to cache the plans

Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-13 Thread PFC
On Fri, Apr 11, 2008 at 12:34 PM, PFC <[EMAIL PROTECTED]> wrote: Well, I realized the idea of global prepared statements actually sucked, so I set on another approach thanks to ideas from this list, this is caching query plans. Well, that's a blatantly bad realizatio

Re: [HACKERS] Cached Query Plans

2008-04-12 Thread PFC
Well if you're caching per-connection then it doesn't really matter whether you do it on the client side or the server side, it's pretty much exactly the same problem. Actually I thought about doing it on the server since it would then also work with connection pooling. Doi

Re: [HACKERS] Cached Query Plans

2008-04-11 Thread PFC
I think what he's referring to is persistently caching plans so that new connections can use them. That makes a lot more sense if you have lots of short-lived connections like a stock php server without persistent connections turned on or a connection pooler. You can prepare queries but they o

[HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-11 Thread PFC
Well, I realized the idea of global prepared statements actually sucked, so I set on another approach thanks to ideas from this list, this is caching query plans. First, let's see if there is low hanging fruit with the typical small, often-executed queries that are so frequent on website

[HACKERS] Dumb Micro-Optimization

2008-04-10 Thread PFC
* Dumb Optimization #1: - Add executorFunc function pointer to struct PlanState - in ExecProcNode.c -> ExecProcNode() : - upon first execution, set executorFunc to the function corresponding to node type - next calls use function pointer Effect : removes a switch (nodeTag(no

Re: [HACKERS] Free Space Map data structure

2008-04-10 Thread PFC
PFC wrote: About the FSM : Would it be possible to add a flag marking pages where all tuples are visible to all transactions ? (kinda like frozen I think) Ah, the visibility map. That's another line of discussion. The current plan is to not tie that to the FSM, but impleme

Re: [HACKERS] Free Space Map data structure

2008-04-09 Thread PFC
About the FSM : Would it be possible to add a flag marking pages where all tuples are visible to all transactions ? (kinda like frozen I think) This could be useful to implement index-only scans, for count(), or to quickly skip rows when OFFSET is used, or to use only the index whe

Re: [HACKERS] modules

2008-04-05 Thread PFC
On Sat, 05 Apr 2008 02:17:10 +0100 Gregory Stark <[EMAIL PROTECTED]> wrote: I was inclined to dismiss it myself but I think the point that's come up here is interesting. The ISP has to not just install an RPM or type make install in some source tree -- but actually log into each customer's dat

Re: [HACKERS] COPY Transform support

2008-04-03 Thread PFC
INSERT INTO mytable (id, date, ...) SELECT id, NULLIF( date, '-00-00' ), ... FROM mydump WHERE (FKs check and drop the borken records); What do we gain against current way of doing it, which is: COPY loadtable FROM 'dump.txt' WITH ... INSERT INTO destination_table(...) SELECT .

Re: [HACKERS] COPY Transform support

2008-04-03 Thread PFC
On Thu, 03 Apr 2008 16:57:53 +0200, Csaba Nagy <[EMAIL PROTECTED]> wrote: On Thu, 2008-04-03 at 16:44 +0200, PFC wrote: CREATE FLATFILE READER mydump ( id INTEGER, dateTEXT, ... ) FROM file 'dump.txt' (followed by delimiter specification syn

Re: [HACKERS] COPY Transform support

2008-04-03 Thread PFC
Data transformation while doing a data load is a requirement now and then. Considering that users will have to do mass updates *after* the load completes to mend the data to their liking should be reason enough to do this while the loading is happening. I think to go about it the right way w

Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-04-02 Thread PFC
The MAJOR benefit of Microsoft's approach is that it works on existing application, Yes, that is a nice benefit ! Is there a way to turn it on/off ? Or is it smart enough to only cache plans for cases where it is relevant ? For instance, I absolutely want some queries to be planne

Re: [JDBC] [HACKERS] How embarrassing: optimization of a one-shot query doesn't work

2008-04-01 Thread PFC
On Tue, 01 Apr 2008 16:06:01 +0200, Tom Lane <[EMAIL PROTECTED]> wrote: Dave Cramer <[EMAIL PROTECTED]> writes: Was the driver ever changed to take advantage of the above strategy? Well, it's automatic as long as you use the unnamed statement. About all that might need to be done on the clie

Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-03-31 Thread PFC
* Server restart and assorted like failover (you need to redo a global prepare). Hmm? He's proposing storing the info in a system catalog. That hardly seems "volatile"; it'll certainly survive a server restart. Yes, it's in a system catalog. I agree with the point that th

[HACKERS] Patch : Global Prepared Statements

2008-03-30 Thread PFC
Here is the result of my little experiment, for your viewing pleasure, lol. Now it works. Well, it hasn't crashed yet... so I guess I can show it to people ;) - Purpose : Allow PHP (or other languages) users to use prepared statements (pg_exec()) together with persistent connections, w

Re: [HACKERS] first time hacker ;) messing with prepared statements

2008-03-30 Thread PFC
PFC wrote: Hello, So, I embarked (yesterday) on a weekend project to add a new feature to Postgres... I use PHP with persistent connections and always have been bothered that those very small AJAX queries (usually simple selects returning 1 row) take more CPU in postgres to

[HACKERS] first time hacker ;) messing with prepared statements

2008-03-30 Thread PFC
Hello, So, I embarked (yesterday) on a weekend project to add a new feature to Postgres... I use PHP with persistent connections and always have been bothered that those very small AJAX queries (usually simple selects returning 1 row) take more CPU in postgres to parse & plan than

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread PFC
On Fri, 22 Jun 2007 16:43:00 +0200, Bruce Momjian <[EMAIL PROTECTED]> wrote: Simon Riggs wrote: On Fri, 2007-06-22 at 14:29 +0100, Gregory Stark wrote: > "Joshua D. Drake" <[EMAIL PROTECTED]> writes: > > > Tom Lane wrote: > > > >> untrustworthy disk hardware, for instance. I'd much rather use

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread PFC
So now we're poking a hole in that but we certainly have to ensure that any transactions that do see the results of our deferred commit themselves don't record any visible effects until both their commit and ours hit WAL. The essential point in Simon's approach that guarantees that is that w

Re: [HACKERS] Load Distributed Checkpoints test results

2007-06-15 Thread PFC
On Fri, 15 Jun 2007 22:28:34 +0200, Gregory Maxwell <[EMAIL PROTECTED]> wrote: On 6/15/07, Gregory Stark <[EMAIL PROTECTED]> wrote: While in theory spreading out the writes could have a detrimental effect I think we should wait until we see actual numbers. I have a pretty strong suspicion t

Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-13 Thread PFC
>If we extended relations by more than 8k at a time, we would know a lot >more about disk layout, at least on filesystems with a decent amount of >free space. I doubt it makes that much difference. If there was a significant amount of fragmentation, we'd hear more complaints about seq scan perfor

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread PFC
My idea is that if an UPDATE places the new tuple on the same page as the old tuple, it will not create new index entries for any indexes where the key doesn't change. Basically the idea behind preventing index bloat by updates is to have one index tuple point to several actual tuples havin

Re: [HACKERS] vacuum, performance, and MVCC, and compression

2006-06-26 Thread PFC
What about increasing the size of an existing index entry? Can that be done easily when a new row is added? I'd say it looks pretty much like inserting a new index tuple... Say "value" is the indexed column. Find first page in the index featuring "value". 1 If t

Re: [HACKERS] vacuum, performance, and MVCC, and compression

2006-06-26 Thread PFC
There were some talks lately about compression. With a bit of lateral thinking I guess this can be used to contain the bloat induced by updates. Of course this is just my hypothesis. Compression in indexes : Instead of storing (value, tuple identifier) keys in the i

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-24 Thread PFC
What I see in this discussion is a huge amount of "the grass must be greener on the other side" syndrome, and hardly any recognition that every technique has its downsides and complications. Sure ;) MVCC generates dead rows, by its very nature ; however I see two trends in this :

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread PFC
Well, then please help me find a better design cause I can't see one... what we have here is a big "membership" table of email lists. When there's a sendout then the memberships of the affected group are heavily read/updated, otherwise they are idle. None of the memberships is archive data, they

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-22 Thread PFC
What you seem not to grasp at this point is a large web-farm, about 10 or more servers running PHP, Java, ASP, or even perl. The database is usually the most convenient and, aside from the particular issue we are talking about, best suited. The answer is sticky sessions : each user is assi

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-22 Thread PFC
As you can see, in about a minute at high load, this very simple table lost about 10% of its performance, and I've seen worse based on update frequency. Before you say this is an obscure problem, I can tell you it isn't. I have worked with more than a few projects that had to switch away fro

Re: [HACKERS] Rethinking stats communication mechanisms

2006-06-19 Thread PFC
Great minds think alike ;-) ... I just committed exactly that protocol. I believe it is correct, because AFAICS there are only four possible risk cases: Congrats ! For general culture you might be interested in reading this : http://en.wikipedia.org/wiki/Software_tran

Re: [HACKERS] Rethinking stats communication mechanisms

2006-06-17 Thread PFC
It strikes me that we are using a single communication mechanism to handle what are really two distinct kinds of data: Interesting. I recently read a paper on how to get rid of locks for this kind of pattern. * For the Command String - Problem : need to display the currently ex

Re: [HACKERS] CSV mode option for pg_dump

2006-06-13 Thread PFC
From what I gather, the CSV format dump would only contain data. I think pg_dump is the friend of pg_restore. It dumps everything including user defined functions, types, schemas etc. CSV does not fit with this. Besides, people will probably want to dump into CSV the result of any

[HACKERS] Faster Updates

2006-06-03 Thread PFC
Hello, Sometimes people complain that UPDATE is slow in postgres. UPDATE... - generates dead tuples which must be vacuumed. - needs to hit all indexes even if only one column was modified. From what I know UPDATE creates a new copy of the old row with the rele

Re: [HACKERS] COPY (query) TO file

2006-06-03 Thread PFC
MySQL already does this for INSERT : INSERT INTO x (a,b) VALUES (1,2), (3,4), (5,6)...; Does MySQL really let you stream that? Trying to do syntax like that in Postgres wouldn't work because the parser would try to build up a parse tree for the whole statement before runnin

Re: [HACKERS] COPY (query) TO file

2006-06-03 Thread PFC
I was also vaguely pondering whether all the DDL commands could be generalized to receive or send COPY formatted data for repeated execution. It would be neat to be able to prepare an UPDATE with placeholders and stream data in COPY format as parameters to the UPDATE to execute it thousand

Re: [HACKERS] Generalized concept of modules

2006-06-03 Thread PFC
Think about version API compatibility. Suppose you have a working database on server A which uses module foo version 1. Some time passes, you buy another server B and install postgres on it. Meanwhile the module foo has evolved into version 2 which is cooler, but has some minor A

Re: [HACKERS] pg_proc probin misuse

2006-05-29 Thread PFC
Hm, thinking again, I guess Tom Lane is right Surely the initialization code would have to be run anyway ... and if the function does import a pile of modules, do you really want to cache all that in its pg_proc entry? What happens if some of the modules get updated later? Besides,

Re: [HACKERS] pg_proc probin misuse

2006-05-28 Thread PFC
If it were really expensive to derive bytecode from source text then maybe it'd make sense to do what you're doing, but surely that's not all that expensive. Everyone else manages to parse prosrc on the fly and cache the result in memory; why isn't plpython doing that? It depends on the numb

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-11 Thread PFC
> 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC > LIMIT 20 > 0.443 ms ANALYZE tmp > 0.365 ms SELECT * FROM tmp > 0.310 ms DROP TABLE tmp > 32.918 ms COMMIT The 32 seconds for commit can hardly be catalog related. It seems the file is fsynced before it is dropped.

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-10 Thread PFC
On Tue, May 09, 2006 at 06:29:31PM +0200, PFC wrote: You mean the cursors'storage is in fact the same internal machinery as a temporary table ? Use the source, Luke... LOL, yeah, I should have, sorry. See tuplestore_begin_heap in backend/utils/sort/tuplest

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread PFC
Have you tried getting a profile of what exactly PostgreSQL is doing that takes so long when creating a temp table? Nope, I'm not proficient in the use of these tools (I stopped using C some time ago). BTW, I suspect catalogs might be the answer, Probably, because : -

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread PFC
Speaking of which, if a temp table is defined as ON COMMIT DROP or DELETE ROWS, there shouldn't be any need to store xmin/xmax, only cmin/cmax, correct? Yes, that's that type of table I was thinking about... You can't ROLLBACK a transaction on such a table. You can however rol

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread PFC
The problem is that you need a set-returning function to retrieve the values. SRFs don't have rowcount estimates, so the plans suck. What about adding some way of rowcount estimation to SRFs, in the way of: CREATE FUNCTION foo (para, meters) RETURNS SETOF bar AS $$ ... function code ...

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC
> SELECT * FROM somewhere WHERE id IN (SELECT id FROM result) Well, you can either SELECT * FROM somewhere JOIN (SELECT id FROM result GROUP BY id) AS a USING (id); It's the same thing (and postgres knows it) You might want to use PL to store values, say PLperl, or even C

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC
Does the time for commit change much if you leave out the analyze? Yes, when I don't ANALYZE the temp table, commit time changes from 30 ms to about 15 ms ; but the queries get horrible plans (see below) : Fun thing is, the rowcount from a temp table (which is the problem here) should

Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC
Creating cursors for a simple plan like a single sequential scan is fast because it's using the original data from the table. I used the following query : SELECT * FROM bookmarks ORDER BY annonce_id DESC LIMIT 20 It's a backward index scan + limit... not a seq scan. And it's

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC
It would be interesting to know what the bottleneck is for temp tables for you. They do not go via the buffer-cache, they are stored in private memory in the backend, they are not xlogged. Nor flushed to disk on backend exit. They're about as close to in-memory tables as you're going to get...

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC
Additionally to your query you are already transferring the whole result set multiple times. First you copy it to the result table. Then you read it again. Your subsequent queries will also have to read over all the unneeded tuples just to get your primary key. Considering that the resul

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC
You might consider just selecting your primary key or a set of primary keys to involved relations in your search query. If you currently use "select *" this can make your result set very large. Copying all the result set to the temp. costs you additional IO that you propably dont need. It i

[HACKERS] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC
The moral of the story is that you're probably better off running a bunch of small selects than in trying to optimize things with one gargantuan select. Ever experiment with loading the parameters into a temp table and joining to that? Also, it might be worth re-testing that conclusion with

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread PFC
Bulk loading speed is irrelevant here - that is dominated by parsing, which we have covered copiously (har har) previously and have sped up by 500%, which still makes Postgres < 1/2 the loading speed of MySQL. Let's ask MySQL 4.0 LOAD DATA INFILE blah 0 errors, 666 warnings SHOW

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread PFC
Just to add a little anarchy in your nice debate... Who really needs all the results of a sort on your terabyte table ? I guess not many people do a SELECT from such a table and want all the results. So, this leaves : - Really wanting all the results, to fetch using

Re: [HACKERS] [PERFORM] Avoiding tuple construction/deconstruction during joining

2005-03-20 Thread PFC
On my machine (Laptop with Pentium-M 1.6 GHz and 512MB DDR333) I get the following timings : Big Joins Query will all the fields and no order by (I just put a SELECT * in the first table) yielding about 6k rows : => 12136.338 ms Replacing the SELECT * from the table with many fields by ju

Re: [HACKERS] [PERFORM] Avoiding tuple construction/deconstruction during joining

2005-03-20 Thread PFC
I have asked him for the data and played with his queries, and obtained massive speedups with the following queries : http://boutiquenumerique.com/pf/miroslav/query.sql http://boutiquenumerique.com/pf/miroslav/query2.sql http://boutiquenumerique.com/pf/miroslav/materialize.sql Note that my opt