Re: [HACKERS] Running PostGre on DVD

2005-11-15 Thread Zeugswetter Andreas DAZ SD
This has been proposed before, and always rejected. While you're always welcome to provide a patch, I'm very doubtful it would be accepted into the main product. The example given in this thread certainly isn't going to change anybody's mind. Hi, I propose reducing everybody's

Re: [HACKERS] [GENERAL] aix build question re: duplicate symbol warning

2005-10-28 Thread Zeugswetter Andreas DAZ SD
Hmm. pqStrerror is defined in libpgport (which is linked into the backend) as well as libpq. ISTM that libpq should not be linked with -Wl,-bI:../../../src/backend/postgres.imp, since it's not intended to be loaded into the backend. Without having looked at the code, I'm

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-06 Thread Zeugswetter Andreas DAZ SD
Now I've asked for the quickest path to detailed understanding of the pg IO subsystem. The goal being to get more up to speed on its coding details. Certainly not to annoy you or anyone else. Basically pg does random 8k (compile time blocksize) reads/writes only. Bitmap and sequential

Re: [HACKERS] Vacuum and Transactions

2005-10-05 Thread Zeugswetter Andreas DAZ SD
Is it reasonable to cancel and restart the vacuum process periodically (say every 12 hours) until it manages to complete the work? It takes about 2 hours to do the table scan, and should get in about 10 hours of index work each round. If we started the vacuum with the indexes,

Re: [HACKERS] Inherited indexes.

2005-10-04 Thread Zeugswetter Andreas DAZ SD
Another possibility is optimizing for the special case of indexing on a partitioning key. In this case, index values would be very localized to one table, so just storing the table info on each index page (or something similar) would work well. If you have the partitioning key in the

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-29 Thread Zeugswetter Andreas DAZ SD
In my original example, a sequential scan of the 1TB of 2KB or 4KB records, = 250M or 500M records of data, being sorted on a binary value key will take ~1000x more time than reading in the ~1GB Btree I described that used a Key+RID (plus node pointers) representation of the data. Imho

Re: [HACKERS] Call for 7.5 feature completion

2005-09-02 Thread Zeugswetter Andreas DAZ SD
I think the main problem with switching to visual studio project files is maintainabilty. (It's not easy to get all I think the target should be a way to auto create those files with gmake (maybe with mingw for configure). The format of VS6 project and workspace files is pretty simple. It

Re: [HACKERS] Pre-allocated free space for row

2005-09-01 Thread Zeugswetter Andreas DAZ SD
My wild guess is that deleting all index pointers for a removed index is more-or-less the same cost as creating new ones for inserted/updated page. Only if you are willing to make the removal process recalculate the index keys from looking at the deleted tuple. The bgwriter could

Re: [HACKERS] Pre-allocated free space for row

2005-09-01 Thread Zeugswetter Andreas DAZ SD
The bgwriter could update all columns of dead heap tuples in heap pages to NULL and thus also gain free space without the need to touch the indexes. The slot would stay used but it would need less space. Not unless it's running a transaction (consider TOAST updates). Ok, you could

Re: [HACKERS] wal_buffer tests in

2005-07-28 Thread Zeugswetter Andreas DAZ SD
I ran a wal_buffer test series. It appears that increasing the wal_buffers is indeed very important for OLTP applications, potentially resulting in as much as a 15% average increase in transaction processing. What's interesting is that this is not just true for 8.1, it's true for

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Zeugswetter Andreas DAZ SD
The point here is that fsync-off is only realistic for development or playpen installations. You don't turn it off in a production machine, and I can't see that you'd turn off the full-page-write option either. So we have not solved anyone's performance problem. Yes, this is basically

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD
Are you sure about that? That would probably be the normal case, but are you promised that the hardware will write all of the sectors of a block in order? I don't think you can possibly assume that. If the block crosses a cylinder boundary then it's certainly an unsafe assumption,

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD
Here's an idea: We read the page that we would have backed up, calc the CRC and write a short WAL record with just the CRC, not the block. When we recover we re-read the database page, calc its CRC and compare it with the CRC from the transaction log. If they differ, we know that the

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD
Only workable solution would imho be to write the LSN to each 512 byte block (not that I am propagating that idea). Only workable was a stupid formulation, I meant a solution that works with a LSN. We're not doing anything like that, as it would create an impossible space-management

Re: [HACKERS] NOLOGGING option, or ?

2005-06-02 Thread Zeugswetter Andreas DAZ SD
Escape processing would proceed as before, but the semantics would change to allow the use of different characters as the escape character, in addition to the special characters for delimiter and newline. If you mean syntax to specify escape and delimiter (and newline ?), that is a great

Re: [HACKERS] Tablespace-level Block Size Definitions

2005-06-01 Thread Zeugswetter Andreas DAZ SD
The problem I see with this proposal is that the buffer manager knows how to handle only a equally-sized pages. And the shared memory stuff gets sized according to size * num_pages. So what happens if a certain tablespace A with pagesize=X gets to have a lot of its pages cached, evicting

Re: [HACKERS] compiling postgres with Visual Age compiler on OpenPower5 / Linux

2005-05-30 Thread Zeugswetter Andreas DAZ SD
I have not researched any deeper,but currently it fails with [EMAIL PROTECTED] postgresql-8.0.3]$ CC=/opt/ibmcmp/vac/7.0/bin/xlc ./configure ..A.. checking for int timezone... yes checking types of arguments for accept()... configure: error: could not determine argument types The odds

Re: [HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)

2005-05-19 Thread Zeugswetter Andreas DAZ SD
Incrementing random_page_cost from 4 (the default) to 5 causes the planner to make a better decision. We have such a low default random_page_cost primarily to mask other problems in the optimizer, two of which are . multi-column index correlation . interpolation between min_IO_Cost

Re: [HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)

2005-05-19 Thread Zeugswetter Andreas DAZ SD
But to get the estimated cost ratio to match up with the actual cost ratio, we'd have to raise random_page_cost to nearly 70, which is a bit hard to credit. What was the platform being tested here? Why ? Numbers for modern single disks are 1-2Mb/s 8k random and 50-120 Mb/s sequential.

Re: [HACKERS] DO INSTEAD and conditional rules

2005-04-27 Thread Zeugswetter Andreas DAZ SD
If we did not define it that way, I think your example would have to error out --- how would you choose which INSTEAD rule wins? The documentation says that they evaluate in alphabetical order by name. So I would expect that the first one to have its WHERE statement evaluate to true

Re: [HACKERS] locks in CREATE TRIGGER, ADD FK

2005-03-23 Thread Zeugswetter Andreas DAZ SD
It would keep the old table around while building the new, then grab an exclusive lock to swap the two. Lock upgrading is right out. It would need a whole of new family of intent locks, with different rules. Andreas ---(end of

Re: [HACKERS] We are not following the spec for HAVING without GROUP BY

2005-03-10 Thread Zeugswetter Andreas DAZ SD
What that means is that neither the HAVING clause nor the targetlist can use any ungrouped columns except within aggregate calls; that is, select col from tab having 21 Informix: select tabname from systables having 2 1; 294: The column (tabname) must be in the GROUP BY list.

Re: [pgsql-hackers-win32] [HACKERS] win32 performance - fsync question

2005-02-25 Thread Zeugswetter Andreas DAZ SD
Are you verifying that all the data that was committed was actually stored? Or just verifying that the database works properly after rebooting? I verified the data. Does pg startup increase the xid by some amount (say 1000 xids) after crash ? Else I think you would also need to rollback a

Re: [HACKERS] win32 performance - fsync question

2005-02-21 Thread Zeugswetter Andreas DAZ SD
One point that I no longer recall the reasoning behind is that xlog.c doesn't think O_SYNC is a preferable default over fsync. For larger (8k) transactions O_SYNC|O_DIRECT is only good with the recent pending patch to group WAL writes together. The fsync method gives the OS a chance

Re: [HACKERS] win32 performance - fsync question

2005-02-18 Thread Zeugswetter Andreas DAZ SD
One point that I no longer recall the reasoning behind is that xlog.c doesn't think O_SYNC is a preferable default over fsync. For larger (8k) transactions O_SYNC|O_DIRECT is only good with the recent pending patch to group WAL writes together. The fsync method gives the OS a chance to do

Re: [HACKERS] Design notes for BufMgrLock rewrite

2005-02-17 Thread Zeugswetter Andreas DAZ SD
Would there be any value in incrementing by 2 for index accesses and 1 for seq-scans/vacuums? Actually, it should probably be a ratio based on random_page_cost shouldn't it? What happens with very small hot tables that are only a few pages and thus have no index defined. I think it would

Re: [HACKERS] enforcing a plan (in brief)

2005-02-15 Thread Zeugswetter Andreas DAZ SD
And the user maintenance of updating those hints for every release of PostgreSQL as we improve the database engine. I don't think so. Basically an optimizer hint simply raises or lowers the cost of an index, mandates a certain join order, allows or disallows a seq scan ... Imho it is not so

Re: [HACKERS] New form of index persistent reference

2005-02-11 Thread Zeugswetter Andreas DAZ SD
I asked the question how do you get a record without going through an index, the answer was CTID, which unfortunately changes when the row is updated. The ctid is a physical location of the row. On update a new tuple is written in a new location, that is why the ctid changes. The old tuple

Re: [HACKERS] ARC patent

2005-01-19 Thread Zeugswetter Andreas DAZ SD
There's a very recent paper at http://carmen.cs.uiuc.edu/~zchen9/paper/TPDS-final.ps on an alternative to ARC which claims superior performance ... From a quick glance, this doesn't look applicable. The authors are discussing buffer replacement strategies for a multi-level cache

Re: [HACKERS] ARC patent

2005-01-17 Thread Zeugswetter Andreas DAZ SD
FYI, IBM has applied for a patent on ARC (AFAICS the patent application is still pending, although the USPTO site is a little hard to grok):

Re: [HACKERS] Shared row locking

2004-12-21 Thread Zeugswetter Andreas DAZ SD
In general, I agree with Tom: I haven't seen many programs that use extended SELECT FOR UPDATE logic. However, the ones I have seen have been batch style programs written using a whole-table cursor - these latter ones have been designed for the cursor stability approach. I think if we add

Re: [HACKERS] RC2 and open issues

2004-12-21 Thread Zeugswetter Andreas DAZ SD
If we don't start where we left off, I am thinking if you do a lot of writes then do nothing, the next checkpoint would be huge because a lot of the LRU will be dirty because the bgwriter never got to it. I think the problem is, that we don't see wether a read hot page is also write hot. We

Re: [HACKERS] [Testperf-general] BufferSync and bgwriter

2004-12-16 Thread Zeugswetter Andreas DAZ SD
Hmmm, I've not seen this. For example, with people who are having trouble with checkpoint spikes on Linux, I've taken to recommending that they call sync() (via cron) every 5-10 seconds (thanks, Bruce, for suggestion!). Believe it or not, this does help smooth out the spikes and give

Re: [HACKERS] bgwriter changes

2004-12-16 Thread Zeugswetter Andreas DAZ SD
Only if you redefine the meaning of bgwriter_percent. At present it's defined by reference to the total number of dirty pages, and that can't be known without collecting them all. If it were, say, a percentage of the total length of the T1/T2 lists, then we'd have some chance of

Re: [HACKERS] bgwriter changes

2004-12-15 Thread Zeugswetter Andreas DAZ SD
and stops early when eighter maxpages is reached or bgwriter_percent pages are scanned ? Only if you redefine the meaning of bgwriter_percent. At present it's defined by reference to the total number of dirty pages, and that can't be known without collecting them all. If it

Re: [HACKERS] bgwriter changes

2004-12-15 Thread Zeugswetter Andreas DAZ SD
The two alternative algorithms are similar, but have these differences: The former (option (2)) finds a constant number of dirty pages, though has varying search time. This has the disadvantage of converging against 0 dirty pages. A system that has less than maxpages dirty will write every

Re: [HACKERS] bgwriter changes

2004-12-14 Thread Zeugswetter Andreas DAZ SD
(2) Remove bgwriter_percent. I have yet to hear anyone argue that there's an actual need for bgwriter_percent in tuning bgwriter behavior, One argument for it is to avoid writing very hot pages. (3) Change the meaning of bgwriter_percent, per Simon's proposal. Make it mean the percentage

Re: [HACKERS] V8 Beta 5 on AIX

2004-12-10 Thread Zeugswetter Andreas DAZ SD
However, one thing we can do is to try this in Makefile.aix: # AIX needs threads for everything that uses libpq LIBS += $(PTHREAD_LIBS) That is going to enable thread libs for all linking including the backend, but it might work. That is certainly wrong. The correct thing is to

Re: [HACKERS] V8 Beta 5 on AIX

2004-12-07 Thread Zeugswetter Andreas DAZ SD
OK, so does someone want to suggest why a library used to link libpq would also be needed to link binaries that use libpq? And with no cc_r it seems I have no idea how to get this working. AIX does not pull in any libraries that a shared lib depends upon, not even libc. You only specify

Re: [PATCHES] [HACKERS] Open Items

2004-12-02 Thread Zeugswetter Andreas DAZ SD
I am going to discard these emails. We haven't solve the Win32 terminal server problem and I think it needs to be moved to the TODO list instead. Yes, please do that. I do not think there is a problem on TS other than some missing permissions. The patch was only intended to avoid starting 2

Re: [HACKERS] Stopgap solution for table-size-estimate updatingproblem

2004-11-30 Thread Zeugswetter Andreas DAZ SD
I think I recall that lseek may have a negative effect on some OS's readahead calculations (probably only systems that cannot handle an lseek to the next page eighter) ? Do you think we should cache the last value to avoid the syscall ? We really can't, since the point of doing it is to

Re: [HACKERS] Stopgap solution for table-size-estimate updatingproblem

2004-11-29 Thread Zeugswetter Andreas DAZ SD
One possibility: vacuum already knows how many tuples it removed. We could set reltuples equal to, say, the mean of the number-of-tuples- after-vacuuming and the number-of-tuples-before. In a steady state situation this would represent a fairly reasonable choice. In cases where the table

Re: [HACKERS] Stopgap solution for table-size-estimate updatingproblem

2004-11-29 Thread Zeugswetter Andreas DAZ SD
This is not true in my case, since I only update statistics/analyze when the tables have representative content (i.e. not empty). I'm unsure why you feel you need a knob to defeat this. The only time when the plan would change from what you think of as the hand-tuned case is when the

Re: [HACKERS] Stopgap solution for table-size-estimate updating problem

2004-11-26 Thread Zeugswetter Andreas DAZ SD
rel-pages = RelationGetNumberOfBlocks(relation); Is RelationGetNumberOfBlocks cheap enough that you can easily use it for the optimizer ? I myself have always preferred more stable estimates that only change when told to. I never liked that vacuum (without analyze) and create index

Re: [HACKERS] Can postgresql accept mutliple connections in the same instance?

2004-11-19 Thread Zeugswetter Andreas DAZ SD
I am running of postgresql database servers with generally 30-50 users at a time per server. I have noticed one thing for web based databases that they fail to initialse a pg_connection connection every now and again and return no error message at all. I am thinking of the PG_SOMAXCONN

Re: [HACKERS] pg_resetxlog options

2004-11-18 Thread Zeugswetter Andreas DAZ SD
Some other time maybe. Meanwhile, this patch ought to make it compile more cleanly on Windows - not sure why I get errors there but not Linux. Because getopt() is normally declared in unistd.h, not getopt.h (Windows being an exception?). getopt is not in any standard Windows headers. The

Re: [HACKERS] psql \e broken again

2004-11-16 Thread Zeugswetter Andreas DAZ SD
So I thought I'd try a few scenarios, since I have it installed: [ none of which work ] So to answer your question, at least in part, there is no current good behavior to emulate. At least on this version of CVS: I think this is fairly graphic proof that (1) a straight port without doing

Re: [HACKERS] psql \e broken again

2004-11-16 Thread Zeugswetter Andreas DAZ SD
I like Kevin Brown's suggestion of writing out a temporary .txt file and 'executing' it. It will follow the principle of least suprise for Windows users. Note that the current default behavior (assuming you've not set EDITOR) is vi foo.txt which is quite unlikely to be helpful to a Windows

Re: [HACKERS] psql \e broken again

2004-11-15 Thread Zeugswetter Andreas DAZ SD
The EDITOR variable seems to have a fairly standard meaning on Unix systems. I've been using that EDITOR value for years without problems, only when I use psql's \e once in a while it breaks. I don't think we should deviate from what seems to be a standard practice. Agreed, no quotes

Re: [HACKERS] psql \e broken again

2004-11-15 Thread Zeugswetter Andreas DAZ SD
We could maybe compromise with (again, for Windows only) a policy like double-quote unless the value already contains at least one double-quote. This should work conveniently for everybody except the user who wants EDITOR='gvim.exe -y' on Windows; he'll have to add some useless-looking

Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Zeugswetter Andreas DAZ SD
How are you planning to represent the association between MIN/MAX and particular index orderings in the system catalogs? Don't we already have that info to decide whether an index handles an ORDER BY without a sort node ? Andreas ---(end of

Re: [PATCHES] [HACKERS] Open Items

2004-11-02 Thread Zeugswetter Andreas DAZ SD
It makes no difference on any of my systems, so at least it doesn't completely solve the problem. I haven't heard any confirmation on wether it partially solves it. It certainly does not solve any part of your problem. I think your problem is a permissions problem. It does however make

Re: [PATCHES] [HACKERS] Open Items

2004-11-02 Thread Zeugswetter Andreas DAZ SD
o fix shared memory on Win2k terminal server We might be able to just mark this as not supported. The shmem code works in a terminal server session with or without the patch. Magnus had a different problem, probably permissions. Since I do not have a non admin user (on a TS server)

Re: [PATCHES] [HACKERS] Open Items

2004-11-02 Thread Zeugswetter Andreas DAZ SD
Just one question about the actual implementation of the patch - why are you setting the OS version *before* you call GetVersionEx()? The Microsoft Example did a memset on the structure before calling void GetVersionEx(). Setting it to a version that needs the Global\ is only a safeguard

Re: [HACKERS] rmtree() failure on Windows

2004-10-27 Thread Zeugswetter Andreas DAZ SD
300 secs (!) fs timeout is really broken. Looks more like a locking or network timeout issue. What error codes does unlink(3) return? success. Oops! 5min timeout for success is certainly problematic. You misunderstood. The 300 secs is not in waiting for unlink() to

Re: [PATCHES] [HACKERS] Open Items

2004-10-19 Thread Zeugswetter Andreas DAZ SD
o fix shared memory on Win2k terminal server We might be able to just mark this as not supported. I have attached a patch that I think fixes this. The problem I saw and fixed is, that the shmem created in a terminal services client is not visible to the console (or

Re: [HACKERS] Open Items

2004-10-18 Thread Zeugswetter Andreas DAZ SD
o fix shared memory on Win2k terminal server We might be able to just mark this as not supported. I have attached a patch that I think fixes this. The problem I saw and fixed is, that the shmem created in a terminal services client is not visible to the console (or

Re: [HACKERS] plans for bitmap indexes?

2004-10-14 Thread Zeugswetter Andreas DAZ SD
create index people_male_gay_ix on people (city) where gender = 'male' and orientation = 'gay'; You've forgotten part of my premise (based on a real case I discussed on IRC) that there are EIGHTEEN criteria columns. That is why I said maybe :-) Whether it helps depends on the number of

Re: [HACKERS] plans for bitmap indexes?

2004-10-13 Thread Zeugswetter Andreas DAZ SD
The most nearly comparable thing is be the notion of partial indexes, where, supposing you had 60 region codes (e.g. - 50 US states, 10 Canadian provinces), you might set up indices thus: For example, imagine you have a table on a dating website with 18 columns representing 18 different

Re: [HACKERS] Security implications of config-file-location patch

2004-10-08 Thread Zeugswetter Andreas DAZ SD
If they are using tablespaces is it OK that anyone can see their location? Good point. Should we obscure pg_tablespace similarly to what we do for pg_shadow? Hmm, I can not see how a person with file access could not easily find the file for a specific table without pg_tablespace

Re: [HACKERS] Two-phase commit

2004-10-07 Thread Zeugswetter Andreas DAZ SD
Well, the question is how long must the individual databases retain state with which to answer recover requests. I don't like forever, so I'm proposing that there should be an explicit command to say you can forget about this gid. I think it would be ok to forget the gid after: 1.