This has been proposed before, and always rejected. While you're
always welcome to provide a patch, I'm very doubtful it would be
accepted into the main product.
The example given in this thread certainly isn't going to change
anybody's mind.
Hi, I propose reducing everybody's
Hmm. pqStrerror is defined in libpgport (which is linked into the
backend) as well as libpq. ISTM that libpq should not be
linked with
-Wl,-bI:../../../src/backend/postgres.imp, since it's not
intended to
be loaded into the backend. Without having looked at the code, I'm
Now I've asked for the quickest path to detailed
understanding of the pg IO subsystem. The goal being to get
more up to speed on its coding details. Certainly not to
annoy you or anyone else.
Basically pg does random 8k (compile time blocksize) reads/writes only.
Bitmap and sequential
Is it reasonable to cancel and restart the vacuum process
periodically
(say every 12 hours) until it manages to complete the work? It takes
about 2 hours to do the table scan, and should get in about 10 hours
of index work each round.
If we started the vacuum with the indexes,
Another possibility is optimizing for the special case of
indexing on a partitioning key. In this case, index values
would be very localized to one table, so just storing the
table info on each index page (or something similar) would work well.
If you have the partitioning key in the
In my original example, a sequential scan of the 1TB of 2KB
or 4KB records, = 250M or 500M records of data, being sorted
on a binary value key will take ~1000x more time than reading
in the ~1GB Btree I described that used a Key+RID (plus node
pointers) representation of the data.
Imho
I think the main problem with switching to visual studio
project files is maintainabilty. (It's not easy to get all
I think the target should be a way to auto create those files with gmake
(maybe with mingw for configure).
The format of VS6 project and workspace files is pretty simple.
It
My wild guess is that deleting all index pointers for a removed
index
is more-or-less the same cost as creating new ones for
inserted/updated page.
Only if you are willing to make the removal process
recalculate the index keys from looking at the deleted tuple.
The bgwriter could
The bgwriter could update all columns of dead heap tuples in heap
pages to NULL and thus also gain free space without the need to
touch
the indexes.
The slot would stay used but it would need less space.
Not unless it's running a transaction (consider TOAST updates).
Ok, you could
I ran a wal_buffer test series. It appears that increasing the
wal_buffers is indeed very important for OLTP applications,
potentially resulting in as much as a 15% average increase in
transaction processing.
What's interesting is that this is not just true for 8.1, it's true
for
The point here is that fsync-off is only realistic for development
or
playpen installations. You don't turn it off in a production
machine, and I can't see that you'd turn off the full-page-write
option either. So we have not solved anyone's performance problem.
Yes, this is basically
Are you sure about that? That would probably be the normal case, but
are you promised that the hardware will write all of the sectors of a
block in order?
I don't think you can possibly assume that. If the block
crosses a cylinder boundary then it's certainly an unsafe
assumption,
Here's an idea:
We read the page that we would have backed up, calc the CRC and
write a short WAL record with just the CRC, not the block. When
we recover we re-read the database page, calc its CRC and
compare it with the CRC from the transaction log. If they
differ, we know that the
Only workable solution would imho be to write the LSN to each 512
byte
block (not that I am propagating that idea).
Only workable was a stupid formulation, I meant a solution that works
with
a LSN.
We're not doing anything like that, as it would create an
impossible space-management
Escape processing would proceed as before, but the semantics would change to
allow the use of different characters as the escape character, in addition
to the special characters for delimiter and newline.
If you mean syntax to specify escape and delimiter (and newline ?),
that is a great
The problem I see with this proposal is that the buffer manager knows
how to handle only a equally-sized pages. And the shared memory stuff
gets sized according to size * num_pages. So what happens if a certain
tablespace A with pagesize=X gets to have a lot of its pages cached,
evicting
I have not researched any deeper,but currently it fails with
[EMAIL PROTECTED] postgresql-8.0.3]$
CC=/opt/ibmcmp/vac/7.0/bin/xlc ./configure
..A..
checking for int timezone... yes
checking types of arguments for accept()... configure: error:
could not determine argument types
The odds
Incrementing random_page_cost from 4 (the default) to 5 causes the
planner to make a better decision.
We have such a low default random_page_cost primarily to mask other
problems in the optimizer, two of which are
. multi-column index correlation
. interpolation between min_IO_Cost
But to get the estimated cost ratio to match up with the actual cost
ratio, we'd have to raise random_page_cost to nearly 70, which is a bit
hard to credit. What was the platform being tested here?
Why ? Numbers for modern single disks are 1-2Mb/s 8k random and 50-120 Mb/s
sequential.
If we did not define
it that way, I think your example would have to error out --- how
would you choose which INSTEAD rule wins?
The documentation says that they evaluate in alphabetical order by
name. So I would expect that the first one to have its WHERE statement
evaluate to true
It would keep the old table around while building the new, then grab
an exclusive lock to swap the two.
Lock upgrading is right out.
It would need a whole of new family of intent locks, with different rules.
Andreas
---(end of
What that means is that neither the HAVING clause nor the targetlist
can use any ungrouped columns except within aggregate calls; that is,
select col from tab having 21
Informix:
select tabname from systables having 2 1;
294: The column (tabname) must be in the GROUP BY list.
Are you verifying that all the data that was committed was actually stored?
Or
just verifying that the database works properly after rebooting?
I verified the data.
Does pg startup increase the xid by some amount (say 1000 xids) after crash ?
Else I think you would also need to rollback a
One point that I no longer recall the reasoning behind is that xlog.c
doesn't think O_SYNC is a preferable default over fsync.
For larger (8k) transactions O_SYNC|O_DIRECT is only good with the recent
pending patch to group WAL writes together. The fsync method gives the OS a
chance
One point that I no longer recall the reasoning behind is that xlog.c
doesn't think O_SYNC is a preferable default over fsync.
For larger (8k) transactions O_SYNC|O_DIRECT is only good with the recent
pending patch to group WAL writes together. The fsync method gives the OS a
chance to do
Would there be any value in incrementing by 2 for index accesses and 1
for seq-scans/vacuums? Actually, it should probably be a ratio based on
random_page_cost shouldn't it?
What happens with very small hot tables that are only a few pages and thus have
no index defined.
I think it would
And the user maintenance of updating those hints for every release of
PostgreSQL as we improve the database engine.
I don't think so. Basically an optimizer hint simply raises or lowers the cost
of an index, mandates a certain join order, allows or disallows a seq scan ...
Imho it is not so
I asked the question how do you get a record without going through an
index, the answer was CTID, which unfortunately changes when the row is
updated.
The ctid is a physical location of the row. On update a new tuple is written
in a new location, that is why the ctid changes. The old tuple
There's a very recent paper at
http://carmen.cs.uiuc.edu/~zchen9/paper/TPDS-final.ps on an alternative
to ARC which claims superior performance ...
From a quick glance, this doesn't look applicable. The authors are
discussing buffer replacement strategies for a multi-level cache
FYI, IBM has applied for a patent on ARC (AFAICS the patent application
is still pending, although the USPTO site is a little hard to grok):
In general, I agree with Tom: I haven't seen many programs that use
extended SELECT FOR UPDATE logic. However, the ones I have seen have
been batch style programs written using a whole-table cursor - these
latter ones have been designed for the cursor stability approach.
I think if we add
If we don't start where we left off, I am thinking if you do a lot of
writes then do nothing, the next checkpoint would be huge because a lot
of the LRU will be dirty because the bgwriter never got to it.
I think the problem is, that we don't see wether a read hot
page is also write hot. We
Hmmm, I've not seen this. For example, with people who are having trouble
with checkpoint spikes on Linux, I've taken to recommending that they call
sync() (via cron) every 5-10 seconds (thanks, Bruce, for suggestion!).
Believe it or not, this does help smooth out the spikes and give
Only if you redefine the meaning of bgwriter_percent. At present it's
defined by reference to the total number of dirty pages, and that can't
be known without collecting them all.
If it were, say, a percentage of the total length of the T1/T2 lists,
then we'd have some chance of
and stops early when eighter maxpages is reached or bgwriter_percent
pages are scanned ?
Only if you redefine the meaning of bgwriter_percent. At present it's
defined by reference to the total number of dirty pages, and that can't
be known without collecting them all.
If it
The two alternative algorithms are similar, but have these
differences:
The former (option (2)) finds a constant number of dirty pages, though
has varying search time.
This has the disadvantage of converging against 0 dirty pages.
A system that has less than maxpages dirty will write every
(2) Remove bgwriter_percent. I have yet to hear anyone argue that
there's an actual need for bgwriter_percent in tuning
bgwriter behavior,
One argument for it is to avoid writing very hot pages.
(3) Change the meaning of bgwriter_percent, per Simon's proposal. Make
it mean the percentage
However, one thing we can do is to try this in Makefile.aix:
# AIX needs threads for everything that uses libpq
LIBS += $(PTHREAD_LIBS)
That is going to enable thread libs for all linking including the
backend, but it might work.
That is certainly wrong. The correct thing is to
OK, so does someone want to suggest why a library used to link libpq
would also be needed to link binaries that use libpq? And with no cc_r
it seems I have no idea how to get this working.
AIX does not pull in any libraries that a shared lib depends upon, not even
libc.
You only specify
I am going to discard these emails. We haven't solve the Win32 terminal
server problem and I think it needs to be moved to the TODO list instead.
Yes, please do that. I do not think there is a problem on TS other than some
missing permissions. The patch was only intended to avoid starting 2
I think I recall that lseek may have a negative effect on some OS's
readahead calculations (probably only systems that cannot handle an
lseek to the next page eighter) ? Do you think we should cache the
last value to avoid the syscall ?
We really can't, since the point of doing it is to
One possibility: vacuum already knows how many tuples it removed. We
could set reltuples equal to, say, the mean of the number-of-tuples-
after-vacuuming and the number-of-tuples-before. In a steady state
situation this would represent a fairly reasonable choice. In cases
where the table
This is not true in my case, since I only update statistics/analyze
when the tables have representative content (i.e. not empty).
I'm unsure why you feel you need a knob to defeat this. The only time
when the plan would change from what you think of as the hand-tuned
case is when the
rel-pages = RelationGetNumberOfBlocks(relation);
Is RelationGetNumberOfBlocks cheap enough that you can easily use it for the
optimizer ?
I myself have always preferred more stable estimates that only change
when told to. I never liked that vacuum (without analyze) and create index
I am running of postgresql database servers with generally 30-50 users
at a time per server. I have noticed one thing for web based databases
that they fail to initialse a pg_connection connection every now and
again and return no error message at all.
I am thinking of the PG_SOMAXCONN
Some other time maybe. Meanwhile, this patch ought to make it compile
more cleanly on Windows - not sure why I get errors there but not
Linux.
Because getopt() is normally declared in unistd.h, not getopt.h (Windows
being an exception?).
getopt is not in any standard Windows headers. The
So I thought I'd try a few scenarios, since I have it installed:
[ none of which work ]
So to answer your question, at least in part, there is no current good
behavior to emulate. At least on this version of CVS:
I think this is fairly graphic proof that (1) a straight port without
doing
I like Kevin Brown's suggestion of writing out a temporary .txt file and
'executing' it. It will follow the principle of least suprise for Windows
users.
Note that the current default behavior (assuming you've not set EDITOR)
is vi foo.txt which is quite unlikely to be helpful to a Windows
The EDITOR variable seems to have a fairly standard meaning on Unix
systems. I've been using that EDITOR value for years without problems,
only when I use psql's \e once in a while it breaks. I don't think we
should deviate from what seems to be a standard practice.
Agreed, no quotes
We could maybe compromise with (again, for Windows only) a policy like
double-quote unless the value already contains at least one
double-quote. This should work conveniently for everybody except the
user who wants EDITOR='gvim.exe -y' on Windows; he'll have to add
some useless-looking
How are you planning to represent the association between MIN/MAX and
particular index orderings in the system catalogs?
Don't we already have that info to decide whether an index handles
an ORDER BY without a sort node ?
Andreas
---(end of
It makes no difference on any of my systems, so at least it doesn't
completely solve the problem. I haven't heard any
confirmation on wether
it partially solves it.
It certainly does not solve any part of your problem. I think your problem
is a permissions problem.
It does however make
o fix shared memory on Win2k terminal server
We might be able to just mark this as not supported.
The shmem code works in a terminal server session with or without the patch.
Magnus had a different problem, probably permissions. Since I do not have a
non admin user (on a TS server)
Just one question about the actual implementation of the patch - why are
you setting the OS version *before* you call GetVersionEx()?
The Microsoft Example did a memset on the structure before calling void GetVersionEx().
Setting it to a version that needs the Global\ is only a safeguard
300 secs (!) fs timeout is really broken.
Looks more like a locking or network timeout issue.
What error codes does unlink(3) return?
success.
Oops! 5min timeout for success is certainly problematic.
You misunderstood. The 300 secs is not in waiting for unlink() to
o fix shared memory on Win2k terminal server
We might be able to just mark this as not supported.
I have attached a patch that I think fixes this. The problem I saw
and fixed is, that the shmem created in a terminal services client is not
visible to the console (or
o fix shared memory on Win2k terminal server
We might be able to just mark this as not supported.
I have attached a patch that I think fixes this. The problem I saw
and fixed is, that the shmem created in a terminal services client is not
visible to the console (or
create index people_male_gay_ix on people (city) where gender = 'male' and
orientation = 'gay';
You've forgotten part of my premise (based on a real case I discussed on IRC)
that there are EIGHTEEN criteria columns.
That is why I said maybe :-) Whether it helps depends on the number of
The most nearly comparable thing is be the notion of partial
indexes, where, supposing you had 60 region codes (e.g. - 50 US
states, 10 Canadian provinces), you might set up indices thus:
For example, imagine you have a table on a dating website with 18 columns
representing 18 different
If they are using tablespaces is it OK that anyone can see their
location?
Good point. Should we obscure pg_tablespace similarly to
what we do for pg_shadow?
Hmm, I can not see how a person with file access could not easily find the
file for a specific table without pg_tablespace
Well, the question is how long must the individual databases retain
state with which to answer recover requests. I don't like forever,
so I'm proposing that there should be an explicit command to say you
can forget about this gid.
I think it would be ok to forget the gid after:
1.
61 matches
Mail list logo