On Tue, 2005-05-10 at 11:03 +0100, Alex Stapleton wrote:
So, when/is PG meant to be getting a decent partitioning system?
ISTM that your question seems to confuse where code comes from. Without
meaning to pick on you, or reply rudely, I'd like to explore that
question. Perhaps it should be a
For an interesting look at scalability, clustering, caching, etc for a
large site have a look at how livejournal did it.
http://www.danga.com/words/2004_lisa/lisa04.pdf
They have 2.6 Million active users, posting 200 new blog entries per
minute, plus many comments and countless page views.
On 11 May 2005, at 08:57, David Roussel wrote:
For an interesting look at scalability, clustering, caching, etc for a
large site have a look at how livejournal did it.
http://www.danga.com/words/2004_lisa/lisa04.pdf
I have implemented similar systems in the past, it's a pretty good
technique,
On 11 May 2005, at 09:50, Alex Stapleton wrote:
On 11 May 2005, at 08:57, David Roussel wrote:
For an interesting look at scalability, clustering, caching, etc
for a
large site have a look at how livejournal did it.
http://www.danga.com/words/2004_lisa/lisa04.pdf
I have implemented similar
Acceptable Answers to 'So, when/is PG meant to be getting a decent
partitioning system?':
1. Person X is working on it I believe.
2. It's on the list, but nobody has done anything about it yet
3. Your welcome to take a stab at it, I expect the community would
support your efforts
Hi Dan,
I tried to understand your query, but I couldn't get my understanding of
the query and your description in sync.
Why do you use sub selects? Wouldn't a simple recordtext like '%RED%'
do the trick too?
You combine all your where conditions with and. To me this looks like
you get only
Mischa Sandberg [EMAIL PROTECTED] writes:
So, simplicity dictates something like:
table pg_remote(schemaname text, connectby text, remoteschema text)
Previous discussion of this sort of thing concluded that we wanted to
follow the SQL-MED standard.
regards, tom lane
On Wed, 2005-05-11 at 12:53 +0800, Christopher Kings-Lynne wrote:
Another trick you can use with large data sets like this when you want
results
back in seconds is to have regularly updated tables that aggregate the data
along each column normally aggregated against the main data set.
Edin Kadribasic wrote:
Hi,
I have a query that is giving the optimizer (and me) great headache. When
its in the good mood the optimizer chooses Hash Left Join and the query
executes in 13ms or so, but sometimes (more and more often) it chooses
Nested Loop Left Join and the execution time goes up
Edin Kadribasic [EMAIL PROTECTED] writes:
I have a query that is giving the optimizer (and me) great headache.
The main problem seems to be that the rowcount estimates for
axp_temp_order_match and axp_dayschedule are way off:
- Index Scan using axp_temp_order_match_idx1 on
Christopher Kings-Lynne wrote:
Another trick you can use with large data sets like this when you want
results
back in seconds is to have regularly updated tables that aggregate the
data
along each column normally aggregated against the main data set.
Maybe some bright person will prove me wrong
Neil Conway [EMAIL PROTECTED] writes:
Greg Stark wrote:
What if the hash index stored *only* the hash code?
Attached is a WIP patch that implements this.
Performance?
I'm posting mainly because I wasn't sure what to do to avoid false
positives in the case of hash collisions. In the hash
Neil Conway [EMAIL PROTECTED] writes:
I'm posting mainly because I wasn't sure what to do to avoid false positives
in
the case of hash collisions. In the hash AM code it is somewhat awkward to
fetch the pointed-to heap tuple and recheck the scankey.[1] I just did the
first thing that came
Tom Lane wrote:
Performance?
I'll run some benchmarks tomorrow, as it's rather late in my time zone.
If anyone wants to post some benchmark results, they are welcome to.
I disagree completely with the idea of forcing this behavior for all
datatypes. It could only be sensible for fairly wide
John A Meinel [EMAIL PROTECTED] writes:
Unfortunately, because Hash Join doesn't report the number of rows
(rows=0 always), it's hard to tell how good the estimator is.
This is only a cosmetic problem because you can just look at the number
of rows actually emitted by the Hash node's child;
On Sun, 24 Apr 2005 17:01:46 -0500, Jim C. Nasby [EMAIL PROTECTED]
wrote:
Feel free to propose better cost equations.
I did. More than once.
estimated index scan cost for (project_id, id, date) is
0.00..100117429.34 while the estimate for work_units is
0.00..103168408.62; almost no
David,
It's interesting that the solution livejournal have arrived at is quite
similar in ways to the way google is set up.
Yes, although again, they're using memcached as pseudo-clustering software,
and as a result are limited to what fits in RAM (RAM on 27 machines, but it's
still RAM).
Hi,
We have some performances problem on a particular query.
We reproduced the problem on a 7.4.5 and on a 7.4.7 server.
* we load the dump in a new database
* query: it's fast ( 1ms)
* VACUUM FULL ANALYZE;
* query: it's really slow (130ms) and it's another plan
* set enable_seqscan=off;
* query:
Guillaume,
We reproduced the problem on a 7.4.5 and on a 7.4.7 server.
* we load the dump in a new database
* query: it's fast ( 1ms)
* VACUUM FULL ANALYZE;
* query: it's really slow (130ms) and it's another plan
* set enable_seqscan=off;
* query: it's fast ( 1ms) : it uses the best plan
Josh Berkus josh@agliodbs.com writes:
- Merge Right Join (cost=8.92..9.26 rows=1 width=529) (actual
time=129.100..129.103 rows=1 loops=1)
Merge Cond: (outer.object_id = inner.parent_application_id)
- Index Scan using acs_objects_object_id_p_hhkb1 on
Quoting Christopher Kings-Lynne [EMAIL PROTECTED]:
Another trick you can use with large data sets like this when you
want
results
back in seconds is to have regularly updated tables that aggregate
the data
along each column normally aggregated against the main data set.
Maybe some
Tom,
So, the usual questions: have these two tables been ANALYZEd lately?
Yes, of course.
As I wrote in my previous mail, here is how I reproduce the problem:
- we load the dump in a new database (to be sure, there is no problem on
an index or something like that)
- query: it's fast ( 1ms)
-
Guillaume Smet [EMAIL PROTECTED] writes:
If so, can we see the pg_stats rows for the object_id and
parent_application_id columns?
See attached file.
Well, those stats certainly appear to justify the planner's belief that
the indexscan needn't run very far: the one value of
Well, those stats certainly appear to justify the planner's belief that
the indexscan needn't run very far: the one value of
parent_application_id is 1031 and this is below the smallest value of
object_id seen by analyze.
Yes, it seems rather logical but why does it cost so much if it should
On Wed, 2005-05-11 at 17:13 +0800, Christopher Kings-Lynne wrote:
Alex Stapleton wrote
Be more helpful, and less arrogant please.
Simon told you all the reasons clearly and politely.
Thanks Chris for your comments.
PostgreSQL can always do with one more developer and my sole intent was
Ah-ha, I can replicate the problem. This example uses tenk1 from the
regression database, which has a column unique2 containing just the
integers 0...
regression=# create table t1(f1 int);
CREATE TABLE
regression=# insert into t1 values(5);
INSERT 154632 1
regression=# insert into t1
Josh, Tom,
Thanks for your explanations.
In the meantime it seems like the quickest answer for Guillaume might
be to try to avoid keeping any NULLs in parent_application_id.
I can't do that as the majority of the applications don't have any
parent one. Moreover, we use a third party application
First, I've got some updated numbers up at
http://stats.distributed.net/~decibel/
timing2.log shows that the planner actually under-estimates an index
scan by several orders of magnitude. Granted, random_page_cost is set to
an unrealistic 1.1 (otherwise I can't force the index scan), but that
Was curious why you pointed out SQL-MED as a SQL-standard approach to
federated servers. Always thought of it as covering access to non-SQL
data, the way the lo_* interface works; as opposed to meshing compatible
(to say nothing of identical) SQL servers. Just checked Jim Melton's
last word on
Quoting Guillaume Smet [EMAIL PROTECTED]:
Hi,
We have some performances problem on a particular query.
...
I have to say it, this was the best laid-out set of details behind a
problem I've ever seen on this list; I'm going to try live up to it, the
next time I have a problem of my own.
On Wed, May 11, 2005 at 08:57:57AM +0100, David Roussel wrote:
For an interesting look at scalability, clustering, caching, etc for a
large site have a look at how livejournal did it.
http://www.danga.com/words/2004_lisa/lisa04.pdf
They have 2.6 Million active users, posting 200 new blog
However, memcached (and for us, pg_memcached) is an excellent way to
improve
horizontal scalability by taking disposable data (like session
information)
out of the database and putting it in protected RAM.
So, what is the advantage of such a system versus, say, a sticky
sessions system
From: Tom Lane [EMAIL PROTECTED]
Edin Kadribasic [EMAIL PROTECTED] writes:
I have a query that is giving the optimizer (and me) great headache.
The main problem seems to be that the rowcount estimates for
axp_temp_order_match and axp_dayschedule are way off:
- Index Scan using
* [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Hi,
My next queststion is dedicated to blobs in my Webapplication (using
Tomcat 5 and JDBC
integrated a the J2EE Appserver JBoss).
Filesystems with many Filesystem Objects can slow down the Performance
at opening and reading Data.
As others
[EMAIL PROTECTED] wrote:
Greg,
I posted this link under a different thread (the $7k server thread). It is
a very good read on why SCSI is better for servers than ATA. I didn't note
bias, though it is from a drive manufacturer. YMMV. There is an
interesting, though dated appendix on
35 matches
Mail list logo