not to say that pgpool couldn't make a good guess in the majority
of cases!
M
Joshua D. Drake wrote:
Matt Clark wrote:
Presumably it can't _ever_ know without being explicitly told,
because even for a plain SELECT there might be triggers involved that
update tables, or it might be
Presumably it can't _ever_ know without being explicitly told, because
even for a plain SELECT there might be triggers involved that update
tables, or it might be a select of a stored proc, etc. So in the
general case, you can't assume that a select doesn't cause an update,
and you can't be su
This page may be of use:
http://www.serverworldmagazine.com/monthly/2003/02/solaris.shtml
From personal experience, for god's sake don't think Solaris' VM/swap
implementation is easy - it's damn good, but it ain't easy!
Matt
Kevin Schroeder wrote:
I think it's probably just reserving them. I can
Another man working to the bitter end this Christmas!
There could be many reasons, but maybe first you should look at the amount
of RAM available? If the tables fit in RAM on the production server but not
on the dev server, then that will easily defeat the improvement due to using
the native DB v
> Am I right to assume that "writeback" is both fastest and at
> the same time as safe to use as ordered? Maybe any of you
> did some benchmarks?
It should be fastest because it is the least overhead, and safe because
postgres does it's own write-order guaranteeing through fsync(). You should
So there are really only 9000 rows out of 76 million starting with 'I'? How
about combining some techniques - you could create an index on the first two
chars of the field (should be selective enough to give an index scan),
select from that, and select the actual data with the like clause.
I
> With that many rows, and a normal index on the field,
> postgres figures the best option for say "I%" is not an index
> scan, but a sequential scan on the table, with a filter --
> quite obviously this is slow as heck, and yes, I've run
> analyze several times and in fact have the vacuum anal
> For some reason it's a requirement that partial wildcard
> searches are done on this field, such as "SELECT ... WHERE
> field LIKE 'A%'"
>
> I thought an interesting way to do this would be to simply
> create partial indexes for each letter on that field, and it
> works when the query matche
Pierre-Frédéric Caillaud wrote:
check this marvelus piece of 5 minutes of work :
http://boutiquenumerique.com/test/iframe_feed.html
cela m'a fait le sourire :-)
(apologies for bad french)
M
---(end of broadcast)---
TIP 7: don't forget to increase yo
A note though : you'll have to turn off HTTP persistent
connections in your server (not in your proxy) or youre back to
square one.
I hadn't considered that. On the client side it would seem to be up to
the client whether to use a persistent connection or not. If it does,
then yeah, a
Javascript is too powerful to turn for any random web page. It is only
essential for web pages because people write their web pages to only
work with javascript.
Hmm... I respectfully disagree. It is so powerful that it is impossible
to ignore when implementing a sophisticated app. And it is
All 3 plans have crappy estimates.
Run ANALYZE in production, then send another explain analyze (as an
attachment please, to avoid linewrap).
Er, no other possible answer except Rod's :-)
---(end of broadcast)---
TIP 2: you can get off all lists a
[EMAIL PROTECTED] wrote:
Citando Rod Taylor <[EMAIL PROTECTED]>:
Please send an explain analyze from both.
I'm sendin three explains. In the first the Dell machine didn't use existing
indexes, so I turn enable_seqscan off (this is the second explain). The total
cost dec
- ITEM table will, grow, grow, grow (sold items are not deleted)
WHERE PRODUCT.SECTION_USED_FK IS NOT NULL AND ITEM.STATUS=1 and
(ITEM.KIND=2 or ITEM.KIND=3)
Partial index on item.status ?
---(end of broadcast)---
TIP 1: subscribe and unsubscribe c
In your webpage include an iframe with a Javascript to refresh it
every five seconds. The iframe fetches a page from the server which
brings in the new data in form of generated JavaScript which writes
in the parent window. Thus, you get a very short request every 5
seconds to fetch new
These are CGI scripts at the lowest level, nothing more and nothing
less. While I could probably embed a small webserver directly into
the perl scripts and run that as a daemon, it would take away the
portability that the scripts currently offer.
If they're CGI *scripts* then they just use the
1- You have a query that runs for half an hour and you spoon feed
the results to the client ?
(argh)
2- Your script looks for new data every few seconds, sends a
packet, then sleeps, and loops ?
If it's 2 I have a readymade solution for you, just ask.
I'm guessing (2) - PG do
> Correct the 75% of all hits are on a script that can take
> anywhere from
> a few seconds to a half an hour to complete.The script
> essentially
> auto-flushes to the browser so they get new information as it arrives
> creating the illusion of on demand generation.
This is more like a s
> Case in point: A first time visitor hits your home page. A
> dynamic page is generated (in about 1 second) and served
> (taking 2 more seconds) which contains links to 20 additional
The gain from an accelerator is actually even more that that, as it takes
essentially zero seconds for Apache
> Apache::DBI overall works better to what I require, even if
> it is not a
> pool per sey. Now if pgpool supported variable rate pooling like
> Apache does with it's children, it might help to even things
> out. That
> and you'd still get the spike if you have to start the webserver and
>
> I have a dual processor system that can support over 150 concurrent
> connections handling normal traffic and load. Now suppose I setup
> Apache to spawn all of it's children instantly, what will
...
> This will spawn 150 children in a short order of time and as
> this takes
"Doctor, it h
Title: Message
The best way to get all the stuff needed by a query into
RAM is to run the query. Is it more that you want to 'pin' the data in RAM
so it doesn't get overwritten by other
queries?
-Original Message-From:
[EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[E
and certainly anyone who's been around a computer more than a week or
two knows which direction "in" and "out" are customarily seen from.
regards, tom lane
Apparently not whoever wrote the man page that everyone copied ;-)
Interesting. I checked this on several machines. They actually
I don't have iostat on that machine, but vmstat shows a lot of writes to
the drives, and the runnable processes are more than 1:
6 1 0 3617652 292936 279192800 0 52430 1347 4681 25
19 20 37
Assuming that's the output of 'vmstat 1' and not some other delay,
50MB/second of sus
> Looking at that list, I got the feeling that you'd want to
> push that PG-awareness down into the block-io layer as well,
> then, so as to be able to optimise for (perhaps) conflicting
> goals depending on what the app does; for the IO system to be
> able to read the apps mind it needs to hav
, I'm not an FS guru...):
* great append performance for the WAL?
* optimised scattered writes for checkpointing?
* Knowledge that FSYNC is being used for preserving ordering a lot of the
time, rather than requiring actual writes to disk (so long as the writes
eventually happen in order...)?
Mat
Hyperthreading is actually an excellent architectural feature that
can give significant performance gains when implemented well and used
for an appropriate workload under a decently HT aware OS.
IMO, typical RDBMS streams are not an obviously appropriate workload,
Intel didn't implement it partic
The real
performance differences between unices are so small as to be ignorable
in this context.
<>
Well, at least the difference between Linux and BSD. There are
substantial
tradeoffs should you chose to use Solaris or UnixWare.
Yes, quite right, I should have said 'po
How would I turn that off? In the kernel config? Not too familiar with
that. I have a 2 proc xeon with 4 gigs of mem on the way for postgres,
so I hope HT isn't a problem. If HT is turned off, does it just not
use the other "half" of the processor? Or does the processor just work
as one unit?
Y
You are asking the wrong question. The best OS is the OS you (and/or
the customer) knows and can administer competently. The real
performance differences between unices are so small as to be ignorable
in this context. The context switching bug is not OS-dependent, but
varys in severity acro
> SELECT cmp.WELL_INDEX, cmp.COMPOUND, con.CONCENTRATION
> FROM SCR_WELL_COMPOUND cmp, SCR_WELL_CONCENTRATION con
> WHERE cmp.BARCODE=con.BARCODE
> AND cmp.WELL_INDEX=con.WELL_INDEX
> AND cmp.MAT_ID=con.MAT_ID
> AND cmp.MAT_ID = 3
>
> >>trainwreck... If you're going through IBM, then they won't want to
> >>respond to any issues if you're not running a
> "bog-standard" RHAS/RHES
> >>release from Red Hat.
...> To be fair, we keep on actually running into things that
> _can't_ be backported, like fibrechannel drivers that
In the MySQL manual it says that MySQL performs best with Linux 2.4 with
ReiserFS on x86. Can anyone official, or in the know, give similar
information regarding PostgreSQL?
I'm neither official, nor in the know, but I do have a spare moment! I
can tell you that any *NIX variant on any modern
As for "vendor support" for Opteron, that sure looks like a
trainwreck... If you're going through IBM, then they won't want to
respond to any issues if you're not running a "bog-standard" RHAS/RHES
release from Red Hat. And that, on Opteron, is preposterous, because
there's plenty of the bits of
> I don't know what you are exactly referring to in above URL
> when you are talking about "potential pitfalls of pooling".
> Please explain more.
Sorry, I wasn't implying that pgpool doesn't deal with the issues, just that
some people aren't necessarily aware of them up front. For instance, p
> OK, that'd work too... the point is if you're re-connecting
> all the time it doesn't really matter what else you do for
> performance.
Yeah, although there is the chap who was asking questions on the list
recently who had some very long-running code on his app servers, so was best
off closing
If you're not using a connection pool of some kind then you might as
well forget query plan caching, because your connect overhead will swamp
the planning cost. This does not mean you have to use something like
pgpool (which makes some rather questionable claims IMO); any decent web
application la
Any competently written application where caching results would be a
suitable performance boost can already implement application or
middleware caching fairly easily, and increase performance much more
than putting result caching into the database would.
I guess the performance increase is that
More to the point though, I think this is a feature that really really
should be in the DB, because then it's trivial for people to use.
How does putting it into PGPool make it any less trivial for people to
use?
The answers are at
http://www2b.biglobe.ne.jp/~caco/pgpoo
It might be easiest to shove the caching logic into pgpool instead.
...
When pg_pool is told to cache a query, it can get a table list and
monitor for changes. When it gets changes, simply dumps the cache.
It's certainly the case that the typical web app (which, along with
warehouses, seems to
Basically you set a default in seconds for the HTML results to be
cached, and then have triggers set that force the cache to regenerate
(whenever CRUD happens to the content, for example).
Can't speak for Perl/Python/Ruby/.Net/Java, but Cache_Lite sure made a
believer out of me!
Nice to have it
> I've looked at PREPARE, but apparently it only lasts
> per-session - that's worthless in our case (web based
> service, one connection per data-requiring connection).
That's a non-sequitur. Most 'normal' high volume web apps have persistent
DB connections, one per http server process. Are yo
> >That looks like poor database normalization, really. Are you
> sure you
> >don't want to split this into multiple tables instead of having 62
> >columns?
> >
> No, it is properly normalized. The data in this table is stock
> fundamentals, stuff like 52 week high, ex-dividend date, etc, etc.
> Immediately after performing a vaccum, updates take upto 50
> milliseconds to occur, however the update performance
> degrades over time, such that after a few hours of continuous
> updates, each update takes about half a second. Regular
> vacuuming improves the performance temporarily, but d
> > It is likely that you are missing an index on one of those foreign
> > key'd items.
>
> I don't think that is too likely as a foreign key reference
> must be a unique key which would have an index.
I think you must be thinking of primary keys, not foreign keys. All
one-to-many relationshi
> And this is exactly where the pgpool advantage lies.
> Especially with the
> TPC-W, the Apache is serving a mix of PHP (or whatever CGI
> technique is
> used) and static content like images. Since the 200+ Apache
> kids serve
> any of that content by random and the emulated browsers very mu
> This is normal. My personal workstation has been up for 16
> days, and it shows 65 megs used for swap. The linux kernel
> looks for things that haven't been accessed in quite a while
> and tosses them into swap to free up the memory for other uses.
>
> This isn't PostgreSQL's fault, or anyt
How about iSCSI? This is exactly what it's for - presenting a bunch of
remote SCSI hardware as if it were local.
There are several reference implementations on SourceForge from Intel, Cisco
& others.
I've never tried it myself, but I would if I had the need. And let's face
it there are some v
As a cross-ref to all the 7.4.x tests people have sent in, here's 7.2.3 (Redhat 7.3),
Quad Xeon 700MHz/1MB L2 cache, 3GB RAM.
Idle-ish (it's a production server) cs/sec ~5000
3 test queries running:
procs memoryswap io system cpu
r b w swpd
> > Now if these vendors could somehow eliminate downtime due to human error
> > we'd be talking *serious* reliablity.
>
> You mean making the OS smart enough to know when clearing the arp
> cache is a bonehead operation, or just making the hardware smart
> enough to realise that the keyswitch real
If it's going to be write intensive then the RAID controller will be the most
important thing. A dual p3/500 with a write-back
cache will smoke either of the boxes you mention using software RAID on write
performance.
As for the compute intensive side (complex joins & sorts etc), the Dell will
> On a variety of hardware with Redhat, and versions of postgres, we're
> not getting much better than 50 inserts per second. This is prior to
> moving WAL to another disk, and fsync is on.
>
> However, with postgres 7.4 on Mac OSX 10.2.3, we're getting an amazing
> 500 inserts per second.
>
> We c
If you have 3 1.5GB tables then you might as well go for 4GB while you're at
it. Make sure you've got a bigmem kernel either running or available, and
boost effective_cache_size by whatever amount you increase the RAM by. We
run a Quad Xeon/4GB server on Redhat 7.3 and it's solid as a rock.
Ther
> I ended up going back to a default postgresql.conf and reapplying the
> various tunings one-by-one. Turns out that while setting fsync = false
> had little effect on the slow IDE box, it had a drastic effect on this
> faster SCSI box and performance is quite acceptable now (aside from the
> expec
> Mainly 'cause it doesn't know where the dead tuples are till it's
> looked.
At this point I feel very stupid...
> Also, VACUUM is the data collector for the free space map,
> and so it is also charged with finding out how much free space exists
> on every page.
Ah, now I just feel enlightened!
> The point is that a big seqscan (either VACUUM or a plain table scan)
> hits a lot of pages, and thereby tends to fill your cache with pages
> that aren't actually likely to get hit again soon, perhaps pushing out
> pages that will be needed again soon. This happens at both the
> shared-buffer a
> On Sun, Oct 05, 2003 at 12:14:24PM +0100, Matt Clark wrote:
> > more info on how it clobbers shared_buffers?
>
> Vacuum is like a seqscan. It touches everything on a table. So it
> doesn't clobber them, but that's the latest data. It's unlikely your
>
> On Sat, Oct 04, 2003 at 12:29:55AM +0100, Matt Clark wrote:
> > My real world experience on a *very* heavily updated OLTP type
> DB, following
> > advice from this list (thanks guys!), is that there is
> essentially zero cost
> > to going ahead and vacuuming as oft
> > Also, if you find that you need to run VACUUM FULL often, then
> > you need to
> > raise your max_fsm_pages.
>
> Yes and no. If it's run often enough then the number of tracked pages
> shouldn't need to be raised, but then again...
Oops, sorry, didn't pay attention and missed the mention of F
> Also, if you find that you need to run VACUUM FULL often, then
> you need to
> raise your max_fsm_pages.
Yes and no. If it's run often enough then the number of tracked pages
shouldn't need to be raised, but then again...
...max_fsm_pages should be raised anyway. I'm about to reclaim a Pentiu
> > In summary, I suspect that it is better from a UI perspective to
> > bring down the app on Sat at 3 a.m and reimport with a fixed time
> > period than to live through reindexing/vacuuming which may deadlock.
> > Am I missing something?
>
> Consider running pg_autovacuum, and thereby do a little
> > vacuum full does require exclusive lock, plain vacuum does not.
>
> I think I need full, because there are updates on the table. As I
> understand it, an update in pg is an insert/delete, so it needs
> to be garbage collected.
Yes and no. You only need a plain VACUUM that is run often enough
d well be wrong. And there could well be strategies exploitable
with the knowledge that a write-back cache exists that aren't currently
implemented - though intuitively I doubt it.
M
> -Original Message-
> From: Palle Girgensohn [mailto:[EMAIL PROTECTED]
> Sent: 29 Septembe
As others have mentioned, you really ought to get battery-backed cache if
you're doing any volume of writes. The ability to do safe write-back
caching makes an *insane* difference to write performance.
The site you link to also has that for only 15% more money:
http://uk.azzurri.com/product/produ
> There are about 2500 rows in that table.
>
> 1st query explain analyze: Seq Scan on PRIORITY_STATISTICS
> (cost=0.00..491.44 rows=127 width=12) (actual time=98.58..98.58 rows=0
> loops=1)
> Total runtime: 98.74 msec
>
> 2nd query explain analyze: NOTICE: QUERY PLAN:
>
> Index Scan using PRIORITY
> What causes this behaviour? is there any workaround? Suggestions?
>
How many rows are there in the table, and can you post the 'explain analyze' for both
queries after doing a 'vacuum verbose analyze
[tablename]'?
Cheers
Matt
---(end of broadcast)--
> I put the EXPLAIN ANALYZE output at:
> http://developer.osdl.org/~jenny/large_explain_analyze
> http://developer.osdl.org/~jenny/small_explain_analyze
> The actual execution time is 37 seconds(large) vs 5 seconds (small).
>
There's an obvious row count misestimation in the 'large' plan:
-> So
> We thought the large effective_cache_size should lead us to better
> plans. But we found the opposite.
Maybe it's inappropriate for little old me to jump in here, but the plan
isn't usually that important compared to the actual runtime. The links you
give show the output of 'explain' but not 'e
> 2) Are you sure that ANALYZE is needed? Vacuum is required
> whenever lots of
> rows are updated, but analyze is needed only when the *distribution* of
> values changes significantly.
You are right. I have a related qn in this thread about random vs. monotonic
values in indexed fields.
> 3) u
ven though the DB server is under no more
apparent load. I can only assume some kind of locking issue. Is that fair?
M
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of
> scott.marlowe
> Sent: 17 September 2003 20:55
> To: Matt
I'm running a load of stress scripts against my staging environment to
simulate user interactions, and watching the various boxen as time goes by.
I noticed that the CPU utilisation on the DB server (PG 7.2.3, RH7.3, Dual
PII 550MHz, 1GB RAM, 1GB database on disk, Single 10k SCSI drive) was
increa
...
> #effective_cache_size = 1000# typically 8KB each
That's horribly wrong. It's telling PG that your OS is only likely to cache
8MB of the DB in RAM. If you've got 1GB of memory it should be between
64000 and 96000
---(end of broadcast)-
> the machine will be dealing with lots of inserts, basically as many as we can
> throw at it
If you mean lots of _transactions_ with few inserts per transaction you should get a
RAID controller w/ battery backed write-back
cache. Nothing else will improve your write performance by nearly as muc
> > Know what we (OK, I) need? An explicitly non-aggregate max() and min(),
> > implemented differently, so they can be optimised.
>
> Not per se. The way I've been visualizing this is that we add to
> pg_aggregate a column named, say, aggsortop, with the definition:
...snip of cunning potentiall
> "Matt Clark" <[EMAIL PROTECTED]> writes:
> > Actually, referring down to later parts of this thread, why can't this
> > optimisation be performed internally for built-in types? I
> understand the
> > issue with aggregates over user-defined types, but
> Actually, referring down to later parts of this thread, why can't this
> optimisation be performed internally for built-in types? I understand the
> issue with aggregates over user-defined types, but surely optimising max()
> for int4, text, etc is safe and easy?
Sorry, missed the bit about use
>This is a Frequently asked question about something that isn't likely to
>change any time soon.
You're right, it is in the FAQ, but pretty well buried. It is entirely
non-obvious to most people that min() and max() don't/can't use indices.
Something so counterintuitive should be explicitly and
> Just a data point, but on my Dual Xeon 2.4Gig machine with a 10k SCSI
> drive I can do 4k inserts/second if I turn fsync off. If you have a
> battery-backed controller, you should be able to do the same. (You will
> not need to turn fsync off --- fsync will just be fast because of the
> disk dr
> SELECT
> IF
> BEGIN
> INSERT
>or
> UPDATE
> COMMIT;
>
> He says his current h/w peaks at 1/10th that rate.
>
> My question is: is that current peak rate ("300 inserts/updates
> *or* 2500 selects") based upon 1 connection, or many c
> Ok.. I would be surprised if you needed much more actual CPU power. I
> suspect they're mostly idle waiting on data -- especially with a Quad
> Xeon (shared memory bus is it not?).
In reality the CPUs get pegged: about 65% PG and 35% system. But I agree that memory
throughput and latency is an
> Just how big do you expect your DB to grow? For a 1GB disk-space
> database, I'd probably just splurge for an SSD hooked up either via
> SCSI or FibreChannel. Heck, up to about 5Gb or so it is not that
> expensive (about $25k) and adding another 5Gb should set you back
> probably another $20k.
> > 2. If you want to search for a sequence you'll need to deal with the case
> > where it starts in one chunk and ends in another.
>
> I forgot about searching--I suspect that application is why I faced
> opposition for shredding in my schema development group. Maybe I should
> push that off to t
82 matches
Mail list logo