Tom Lane wrote:
Well, yeah, because the first thing it does is pg_detoast_datum.
Just as a cross-check, try changing it to copy the value without
forcibly detoasting --- I'll bet it's still slow then.
Yeah, that appears to be exactly the case. After some grepping through
various header files
Obviously we run into problems when
a) we have a poor estimate for ndistinct - but then we have
worse problems
b) our length measure doesn't correspond well with ndistinct
in an interval
One more problem with low ndistinct values is that the condition might very well
hit no rows at all. But
Tom Lane wrote:
So you are saying it is de-toasted 32880 times, in this case? If not,
where are the repeated de-toastings happening?
Inside the index support functions. I'm thinking we could fix this by
forcibly detoasting values passed as index scan keys, but it's not quite
clear where's
Greg Smith [EMAIL PROTECTED] writes:
On Mon, 9 Jun 2008, Tom Lane wrote:
It should also be pointed out that the whole thing becomes uninteresting
if we get real-time log shipping implemented. So I see absolutely no
point in spending time integrating pg_clearxlogtail now.
There are remote
Gregory Stark wrote:
Instead of zeroing bytes and depending on compression why not just pass an
extra parameter to the archive command with the offset to the logical end of
data.
Because the archiver process doesn't have that information.
--
Heikki Linnakangas
EnterpriseDB
Josh Berkus [EMAIL PROTECTED] writes:
Greg,
Speak to the statisticians. Our sample size is calculated using the same
theory behind polls which sample 600 people to learn what 250 million
people are going to do on election day. You do NOT need (significantly)
larger samples for larger
Zeugswetter Andreas OSB sIT [EMAIL PROTECTED] writes:
I think for low ndistinct values we will want to know the exact
value + counts and not a bin. So I think we will want additional stats rows
that represent value 'a1' stats.
Isn't that what our most frequent values list does?
--
Gregory
I think for low ndistinct values we will want to know the exact
value + counts and not a bin. So I think we will want
additional stats rows
that represent value 'a1' stats.
Isn't that what our most frequent values list does?
Maybe ? Do we have the relevant stats for each ?
But the trick
Hello!
Currently the TOASTing code does its magic when whole tuple is
larger than TOAST_TUPLE_TARGET which happens to be around 2KB.
There are times though when one is willing to trade using (fast) CPU to
reduce amount of (slow) I/O. A data warehousing types of workload
most notably. Rarely
I would like to consider adding constraints to GiST indexes. I think it
is possible to add constraints that are more sophisticated than just
UNIQUE. My use case is a non-overlapping constraint, but I think it's
possible for this to be more general.
Sounds good
The idea is to make an array in
This can be solved by my proposal, but I just don't know how it would
apply to something like GIN, for instance. It could replace the unique
Hmm, your proposal isn't applicable to GIN, because GIN stores a lot of keys for
only one value to be indexed.
being inserted by other concurrent
Dawid Kuroczko napsal(a):
Hello!
Currently the TOASTing code does its magic when whole tuple is
larger than TOAST_TUPLE_TARGET which happens to be around 2KB.
There are times though when one is willing to trade using (fast) CPU to
reduce amount of (slow) I/O. A data warehousing types of
One of our guys in Pakistan noticed a problem with Slony that seems to
have manifested itself since the last zic update. Slony uses
timeofday() as the default value for a timestamp column:
-- Executing query:
SET timezone='Asia/Karachi';
SELECT timeofday()::timestamp with time zone;
ERROR:
hello, everyone:
version 8.3.0 in function plpgsql_exec_trigger.
in a trigger, if NEW is returned as the result and we do nothing to NEW.
for example, we have a table like this:
create table test (a int);
insert into test values(1);
and a
On 6/8/08, Andrew Dunstan [EMAIL PROTECTED] wrote:
Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
One complicating factor I see is that there is no protocol level support
for anything other than simple objects - each data value is simply a stream
of bytes of a known length. We
Merlin Moncure [EMAIL PROTECTED] writes:
On 6/8/08, Andrew Dunstan [EMAIL PROTECTED] wrote:
Tom Lane wrote:
Are you intending that these operations support both text and binary
results?
I'm a bit open on that.
IMO, support for binary is critical. Because of the interplay of the
array
Dawid Kuroczko [EMAIL PROTECTED] writes:
As we already have four types of ALTER COLUMN .. SET STORAGE
{ PLAIN | EXTERNAL | EXTENDED | MAIN } I would like to add
COMPRESSED which would force column compression (if column is
smaller than some minimun, I guess somwehwere between 16 and 32 bytes).
Dawid Kuroczko escribió:
Currently the TOASTing code does its magic when whole tuple is
larger than TOAST_TUPLE_TARGET which happens to be around 2KB.
There are times though when one is willing to trade using (fast) CPU to
reduce amount of (slow) I/O. A data warehousing types of workload
On 6/10/08, Tom Lane [EMAIL PROTECTED] wrote:
Merlin Moncure [EMAIL PROTECTED] writes:
IMO, support for binary is critical. Because of the interplay of the
array and composite out formats, the number of backslashes grows
exponentially (!) with nesting levels. This makes text format arrays
Greg,
The analogous case in our situation is not having 300 million distinct
values, since we're not gathering info on specific values, only the
buckets. We need, for example, 600 samples *for each bucket*. Each bucket
is chosen to have the same number of samples in it. So that means that we
Dave Page [EMAIL PROTECTED] writes:
It seems like a bug that we happily output PKST as a timezone (in a
'timestamp with time zone'), but won't accept it back in.
[ shrug... ] The set of timezone abbrevs recognized on input is
user-configurable, so that situation will always be possible.
It
On Tue, Jun 10, 2008 at 4:37 PM, Tom Lane [EMAIL PROTECTED] wrote:
Dave Page [EMAIL PROTECTED] writes:
It seems like a bug that we happily output PKST as a timezone (in a
'timestamp with time zone'), but won't accept it back in.
[ shrug... ] The set of timezone abbrevs recognized on input is
Dave Page [EMAIL PROTECTED] writes:
On Tue, Jun 10, 2008 at 4:37 PM, Tom Lane [EMAIL PROTECTED] wrote:
[ shrug... ] The set of timezone abbrevs recognized on input is
user-configurable, so that situation will always be possible.
Right, but shouldn't we always output something we know we can
One more problem with low ndistinct values is that the condition might
very well
hit no rows at all. But Idea 1 will largely overestimate the number of
hits.
Thats a good point, but I don't see a clear solution. Maybe we could
look at past queries
and keep track of how often they return
Dave Page wrote:
On Tue, Jun 10, 2008 at 4:37 PM, Tom Lane [EMAIL PROTECTED] wrote:
Dave Page [EMAIL PROTECTED] writes:
It seems like a bug that we happily output PKST as a timezone (in a
'timestamp with time zone'), but won't accept it back in.
[ shrug... ] The set of timezone abbrevs
On Tue, Jun 10, 2008 at 4:51 PM, Alvaro Herrera
[EMAIL PROTECTED] wrote:
Dave Page wrote:
Right, but shouldn't we always output something we know we can read
back in (unambiguously), assuming a server with no user defined
abbreviations?
That makes no sense because it amounts to saying that
All,
For the slave to not interfere with the master at all, we would need to
delay application of WAL files on each slave until visibility on that
slave allows the WAL to be applied, but in that case we would have
long-running transactions delay data visibility of all slave sessions.
Alvaro Herrera [EMAIL PROTECTED] writes:
... A more useful restriction would be to only output
those that are in the set of input-acceptable abbreviations, but perhaps
this is not easy to implement.
I think that is actually what Dave is suggesting, but I don't really
agree with it. To me it's
Dave Page [EMAIL PROTECTED] writes:
Or just output offsets in every case :-p
Which is what the default ISO datestyle does ... I believe pg_dump
is careful to force ISO style for exactly this reason.
regards, tom lane
--
Sent via pgsql-hackers mailing list
Ron,
I wonder if the fastest way to generate the configurator
would be to simply ask everyone to post their tuned
postgresql.conf files along with a brief description of
the use case for that file. The we could group the
use-cases into various classes; and average the values
of the
Robert,
shared_buffers
effective_cache_size
default_stats_target
work_mem
maintainance_work_mem
listen_address
max_connections
the fsm parameters
checkpoint_segements
random_page_cost
My list is very similar, execept that I drop random_page_cost and add
synchronous_commit, autovaccum
On Tue, 2008-06-10 at 16:59 +0400, Teodor Sigaev wrote:
This can be solved by my proposal, but I just don't know how it would
apply to something like GIN, for instance. It could replace the unique
Hmm, your proposal isn't applicable to GIN, because GIN stores a lot of keys
for
only one
On Tuesday 10 June 2008 09:37, Josh Berkus wrote:
Robert,
shared_buffers
effective_cache_size
default_stats_target
work_mem
maintainance_work_mem
listen_address
max_connections
the fsm parameters
checkpoint_segements
random_page_cost
My list is very similar, execept that I
Marc G. Fournier [EMAIL PROTECTED] writes:
Tom Lane wrote:
What I was thinking was just to have the script
print out something like
Tagged tree as 8.3.4
Don't forget to run autoconf 2.59 before committing
I like that one ...
I've checked in a script to do this --- executing
Josh Berkus [EMAIL PROTECTED] writes:
Oh, and wal_buffers, the default for which we should just change if it
weren't for SHMMAX.
Uh, why? On a workload of mostly small transactions, what value is
there in lots of wal_buffers?
regards, tom lane
--
Sent via
On Tue, 2008-06-10 at 08:51 -0700, Nathan Boley wrote:
One more problem with low ndistinct values is that the condition might
very well
hit no rows at all. But Idea 1 will largely overestimate the number of
hits.
Thats a good point, but I don't see a clear solution. Maybe we could
Gregory Stark wrote:
Joshua D. Drake [EMAIL PROTECTED] writes:
...default_statistics_target?...Uhh 10.
Ah, but we only ever hear about the cases where it's wrong of course. In other
words even if we raised it to some optimal value we would still have precisely
the same experience of seeing
One more problem with low ndistinct values is that the condition might
very well
hit no rows at all. But Idea 1 will largely overestimate the number of
hits.
Thats a good point, but I don't see a clear solution. Maybe we could
I think that MCVs are the solution, right?
Only if
Nathan Boley [EMAIL PROTECTED] writes:
If we query on values that aren't in the table, the planner will
always overestimate the expected number of returned rows because it (
implicitly ) assumes that every query will return at least 1 record.
That's intentional and should not be changed. I
If we query on values that aren't in the table, the planner will
always overestimate the expected number of returned rows because it (
implicitly ) assumes that every query will return at least 1 record.
That's intentional and should not be changed.
Why? What if ( somehow ) we knew that
Nathan Boley [EMAIL PROTECTED] writes:
If we query on values that aren't in the table, the planner will
always overestimate the expected number of returned rows because it (
implicitly ) assumes that every query will return at least 1 record.
That's intentional and should not be changed.
Am Monday, 9. June 2008 schrieb Tom Lane:
So while tagging the upcoming releases, I got annoyed once again about
what a tedious, error-prone bit of donkeywork it is.
Could you explain what the problem is? Your script sounds like an ad hoc
workaround for some problem, but I haven't seen the
Peter Eisentraut [EMAIL PROTECTED] writes:
Am Monday, 9. June 2008 schrieb Tom Lane:
So while tagging the upcoming releases, I got annoyed once again about
what a tedious, error-prone bit of donkeywork it is.
Could you explain what the problem is? Your script sounds like an ad hoc
In theory, any indexed value in index (for GiST, after compression) should fit
into page at least.
So are you saying we should dedicate one page multiplied by
max_connections in shared memory? It's possible to do it that way, but
Yes, we could. Storing index keys in shared memory allows
Why? What if ( somehow ) we knew that there was a 90% chance that
query would return an empty result set on a big table with 20 non-mcv
distinct values. Currently the planner would always choose a seq scan,
where an index scan might be better.
(1) On what grounds do you assert the above?
Nathan Boley [EMAIL PROTECTED] writes:
(1) On what grounds do you assert the above?
For a table with 100 non-mcv rows, the planner estimates a result
set of cardinality 100/20 = 5, not 1.
The real problem in that situation is that you need another twenty slots
in the MCV list.
Tom Lane [EMAIL PROTECTED] writes:
(In fact, I don't think the plan would change, in this case. The reason
for the clamp to 1 row is to avoid foolish results for join situations.)
The screw case I've seen is when you have a large partitioned table where
constraint_exclusion fails to exclude
Gregory Stark [EMAIL PROTECTED] writes:
The screw case I've seen is when you have a large partitioned table where
constraint_exclusion fails to exclude the irrelevant partitions. You're going
to get 0 rows from all but the one partition which contains the 1 row you're
looking for. But since
Tom Lane [EMAIL PROTECTED] writes:
Gregory Stark [EMAIL PROTECTED] writes:
The screw case I've seen is when you have a large partitioned table where
constraint_exclusion fails to exclude the irrelevant partitions. You're going
to get 0 rows from all but the one partition which contains the 1
Gregory Stark [EMAIL PROTECTED] writes:
It's possible that the second option I described -- teaching Append when to
use something other than sum() -- would only work in the cases where
constraint exclusion could be fixed though. In which case having fractional
row counts might actually be
On Tue, Jun 10, 2008 at 5:25 PM, Tom Lane [EMAIL PROTECTED] wrote:
Dawid Kuroczko [EMAIL PROTECTED] writes:
As we already have four types of ALTER COLUMN .. SET STORAGE
{ PLAIN | EXTERNAL | EXTENDED | MAIN } I would like to add
COMPRESSED which would force column compression (if column is
Josh Berkus [EMAIL PROTECTED] wrote:
I still see having 2 different settings:
Synchronous: XID visibility is pushed to the master. Maintains synchronous
failover, and users are expected to run *1* master to *1* slave for most
installations.
Asynchronous: replication stops on the
Tom Lane wrote:
This gets back to the discussions at PGCon about needing to have a more
explicit representation of partitioning. Right now, for a
many-partition table we spend huge amounts of time deriving the expected
behavior from first principles, each time we make a plan. And even then
Hi pghackers:
version 8.3.0 in function plpgsql_exec_trigger.
in a trigger, if NEW is returned as the result and we do nothing to NEW.
for example, we have a table like this:
create table test (a int);
insert into test values(1);
and a trigger like:
billy [EMAIL PROTECTED] writes:
I think we can add some judgment conditions in function
plpgsql_exec_trigger() to avoid this problem.
I don't especially see the point of adding extra complexity here.
AFAICS you are talking about avoiding one or two palloc/pfree
cycles, which is surely down
Tom Lane,
er, your explanation is reasonable.
But at least the comment
if (newtuple != tuple) /* modified by Trigger(s) */
{
.
is likely to misdirect us.
It took me a few hours to figure it out. :-(
===
56 matches
Mail list logo