Greg Smith wrote:
There are also some severe query plan stability issues with this idea
beyond this. The idea that your plan might vary based on execution
latency, that the system load going up can make query plans alter with
it, is terrifying for a production server.
I thought I was clear t
Robert Haas wrote:
The server can and does measure hit rates for the PG buffer pool, but to my
knowledge there is no clear-cut way for PG to know whether read() is satisfied
from the OS cache or a drive cache or the platter.
Does the server know which IO it thinks is sequential, and which i
Tom Lane wrote:
Uh, no, it is not. The difference is that we can update a byte in a
shared buffer, and know that it *isn't* getting written out before we
Well, I don't know where yu got the idea I was refering to that sort of
thing - its
the same as writing to a buffer before copying to the
Tom Lane wrote:
The other and probably worse problem is that there's no application
control over how soon changes to mmap'd pages get to disk. An msync
will flush them out, but the kernel is free to write dirty pages sooner.
So if they're depending for consistency on writes not happening until
m
Kevin Grittner wrote:
On what do you base that assumption? I assume that we send a full
8K to the OS cache, and the file system writes disk sectors
according to its own algorithm. With either platters or BBU cache,
the data is persisted on fsync; why do you see a risk with one but
not the other
Hannu Krosing wrote:
Pulling the plug should not corrupt a postgreSQL database, unless it was
using disks which lie about write caching.
Didn't we recently put the old wife's 'the disks lied' tale to bed in
favour of actually admiting that some well known filesystems and
saftware raid system
Matthew Wakeling wrote:
Just a heads up - apparently the more recent Dell RAID controllers
will no longer recognise hard discs that weren't sold through Dell.
http://www.channelregister.co.uk/2010/02/10/dell_perc_11th_gen_qualified_hdds_only/
As one of the comments points out, that kind o
Ivan Voras wrote:
I wish that, when people got the idea to run a simplistic benchmark
like this, they would at least have the common sense to put the
database on a RAM drive to avoid problems with different cylinder
speeds of rotational media and fragmentation from multiple runs.
Huh?
It's to
Kevin Grittner wrote:
Sure, but the architecture of those products is based around all the
work being done by "engines" which try to establish affinity to
different CPUs, and loop through the various tasks to be done. You
don't get a context switch storm because you normally have the number
of e
Greg Smith wrote:
3500 active connections across them. That doesn't work, and what
happens
is exactly the sort of context switch storm you're showing data for.
Think about it for a minute: how many of those can really be doing
work at any time? 32, that's how many. Now, you need some multip
Stephen Frost wrote:
You're re-hashing things I've already said. The big win is batching the
inserts, however that's done, into fewer transactions. Sure, multi-row
inserts could be used to do that, but so could dropping begin/commits in
right now which probably takes even less effort.
Well,
Stephen Frost wrote:
apart again. That's where the performance is going to be improved by
going that route, not so much in eliminating the planning.
Fine. But like I said, I'd suggest measuring the fractional improvement
for this
when sending multi-row inserts before writing something compl
da...@lang.hm wrote:
2. insert into table values (),(),(),()
Using this structure would be more database agnostic, but won't perform
as well as the COPY options I don't believe. It might be interesting to
do a large "insert into table values (),(),()" as a prepared statement,
but then you'd ha
Greg Smith wrote:
OK, that's clearly cached writes where the drive is lying about fsync.
The claim is that since my drive supports both the flush calls, I just
need to turn on barrier support, right?
That's a big pointy finger you are aiming at that drive - are you sure
it was sent the flush
The driver will use unnamed statements for all statements until it
sees the same statement N times where N is 5 I believe, after that it
uses a named statement.
Shame there's no syntax for it to pass the a table of the parameters to
the server when it creates the named statement as planne
Craig Ringer wrote:
These devices would be interesting for a few uses, IMO. One is temp
table space and sort space in Pg. Another is scratch space for apps
(like Photoshop) that do their own VM management. There's also potential
Surely temp tables and sort space isn't subject to fsync and won'
Ron wrote:
I think the idea is that with SSDs or a RAID with a battery backed
cache you
can leave fsync on and not have any significant performance hit since
the seek
times are very fast for SSD. They have limited bandwidth but
bandwidth to the
WAL is rarely an issue -- just latency.
Yes, Greg
Scott Marlowe wrote:
involves tiny bits of data scattered throughout the database. Our
current database is about 20-25 Gig, which means it's quickly reaching
the point where it will not fit in our 32G of ram, and it's likely to
grow too big for 64Gig before a year or two is out.
...
I wonde
less) than their default and
NOT return the data to the controller?
I've never heard of such a thing, but I'm no expert in the command sets
for any of this stuff.
James
On Thu, Sep 11, 2008 at 12:54 PM, James Mansion
<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
wr
Greg Smith wrote:
The point I was trying to make there is that even under impossibly
optimal circumstances, you'd be hard pressed to blow out the disk's
read cache with seek-dominated data even if you read a lot at each
seek point. That idea didn't make it from my head into writing very
well
Greg Smith wrote:
Average seek time: 4ms
Seeks/second:250
Data read/seek:1MB(read-ahead number goes here)
Total read bandwidth:250MB/s
Most spinning disks now are nearer to 100MB/s streaming. You've talked
yourself into twice that, random access!
James
--
Sent
[EMAIL PROTECTED] wrote:
for example if you have a process that uses 1G of ram (say firefox)
and it needs to start a new process (say acroread to handle a pdf
file), what it does is it forks the firefox process (each of which
have 1G of ram allocated), and then does an exec of the acroread
pro
andrew klassen wrote:
I'll try adding more threads to update the table as you suggest.
You could try materially increasing the update batch size too. As an
exercise you could
see what the performance of COPY is by backing out the data and
reloading it from
a suitable file.
--
Sent via pgsql
Matthew Wakeling wrote:
If you're running a "work queue" architecture, that probably means you
only have one thread doing all the updates/inserts? It might be worth
going multi-threaded, and issuing inserts and updates through more
than one connection. Postgres is designed pretty well to scale
Alvaro Herrera wrote:
Hint bits are used to mark tuples as created and/or deleted by
transactions that are know committed or aborted. To determine the
visibility of a tuple without such bits set, you need to consult pg_clog
and possibly pg_subtrans, so it is an expensive check. On the other
PFC wrote:
PCI limits you to 133 MB/s (theoretical), actual speed being
around 100-110 MB/s.
Many servers do have more than one bus. You have to process that data
too so its not going to be as much of a limit as you are suggesting. It
may be possible to stream a compressed data file to th
Greg Smith wrote:
If you write a giant block of writes, those tend to be sorted by the
OS and possibly the controller to reduce total seeks. That's a pretty
efficient write and it can clear relatively fast.
But if you're been trickling writes in an unstructured form and in low
volume, there
Greg Smith wrote:
Using the background writer more assures that the cache on the
controller is going to be written to aggressively, so it may be
somewhat filled already come checkpoint time. If you leave the writer
off, when the checkpoint comes you're much more likely to have the
full 2GB av
Has there ever been any analysis regarding the redundant write overhead
of full page writes?
I'm wondering if once could regard an 8k page as being 64 off 128 byte
paragraphs or
32 off 256byte paragraphs, each represented by a bit in a word. And,
when a pageis dirtied
by changes some record i
Robins Tharakan wrote:
I think James was talking about Sybase. Postgresql on the other hand
has a slightly better way to do this.
SELECT ... FOR UPDATE allows you to lock a given row (based on the
SELECT ... WHERE clause) and update it... without worrying about a
concurrent modification. Of
Greg Smith wrote:
You turn on direct I/O differently under Solaris then everywhere else,
and nobody has bothered to write the patch (trivial) and OS-specific
code to turn it on only when appropriate (slightly tricker) to handle
this case. There's not a lot of pressure on PostgreSQL to handle th
Andreas Kostyrka wrote:
takes over. The thing you worry about is if all data has made it to the
replication servers, not if some data might get lost in the hardware
cache of a controller. (Actually, talk to your local computer forensics
guru, there are a number of way to keep the current to elect
Stephen Denne wrote:
A third option is to update, if not found, insert.
I find myself having to do this in Sybase, but it sucks because there's
a race - if there's no row updated then there's no lock and you race
another thread doing the same thing. So you grab a row lock on a
sacrificial ro
Greg Smith wrote:
"After a write() to a regular file has successfully returned, any
successful read() from each byte position in the file that was
modified by that write() will return the data that was written by the
write()...a similar requirement applies to multiple write operations
to the s
Tried harder to find info on the write cycles: found som CFs that claim
2million
cycles, and found the Mtron SSDs which claim to have very advanced wear
levelling and a suitably long lifetime as a result even with an
assumption that
the underlying flash can do 100k writes only.
The 'consumer'
Mark Mielke wrote:
Is there anything in POSIX that seems to suggest this? :-) (i.e. why
are you going under the assumption that the answer is yes - did you
read something?)
Perhaps it was just this:
http://kevinclosson.wordpress.com/2007/01/18/yes-direct-io-means-concurrent-writes-oracle-doe
I don't believe POSIX has any restriction such as you describe - or if
it does, and I don't know about it, then most UNIX file systems (if
not most file systems on any platform) are not POSIX compliant.
I suspect that indeed there are two different issues here in that the
file mutex relates
Mark Mielke wrote:
Is there anything in POSIX that seems to suggest this? :-) (i.e. why
are you going under the assumption that the answer is yes - did you
read something?)
It was something somewhere on the Sun web site, relating to tuning Solaris
filesystems. Or databases. Or ZFS. :-(
Need
(Declaration of interest: I'm researching for a publication
on OLTP system design)
I have a question about file writes, particularly on POSIX.
This arose while considering the extent to which cache memory
and command queueing on disk
drives can help improve performance.
Is it correct that POSIX
PFC wrote:
Why do you claim that 'More platters also means slower seeks
and generally slower performance.'?
More platters -> more heads -> heavier head assembly -> slower
seek time
Note sure I've sen a lot of evidence of that in drive specifications!
Gigabyte should revamp their i-
Greg Smith wrote:
As for SCSI vs. SATA, I collected up the usual arguments on both sides
at http://www.postgresqldocs.org/index.php/SCSI_vs._IDE/SATA_Disks
Why do you claim that 'More platters also means slower seeks
and generally slower performance.'?
On the face of it, it should mean that t
[EMAIL PROTECTED] wrote:
WAL is on a RAID 0 drive along with the OS
Isn't that just as unsafe as having the whole lot on RAID0?
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-perform
Do CREATE TEMP TABLE table have any special treatment regarding eliding
sync operations
or deferring creation of disk files in the case where memory pressure
does not require a spill?
James
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your s
Tom Lane wrote:
Hardly --- how's that going to pass a notify name? Also, a lot of
people want some payload data in a notify, not just a condition name;
any reimplementation that doesn't address that desire probably won't
get accepted.
Ah - forgot about the name. At least there need be just o
Tom Lane wrote:
read-modify-write type of operation it uses an exclusive lock, so only
one can clear its pg_listener entry at a time. The 'waiting' ones you
are seeing are stacked up behind whichever one has the lock at the
moment. They shouldn't be waiting for long.
I certainly hadn't expec
Jakub Ouhrabka wrote:
How can we diagnose what is happening during the peaks?
Can you try forcing a core from a bunch of the busy processes? (Hmm -
does Linux have an equivalent to the useful Solaris pstacks?)
James
---(end of broadcast)---
TIP
Craig James wrote:
Don't confuse thread-friendly with a threaded implemetation of
Postgres itself. These are two separate questions. Thread-friendly
involves compile/link options that don't affect the Postgres source
code at all.
Indeed. I'm specifically not suggesting that Postgres should o
Tom Lane wrote:
Yes.
1) It's of no value to us
2) On many platforms there is a nonzero performance penalty
I think you have your head in the ground, but its your perogative. *You*
might not care, but anyone wanting to use thread-aware libraries (and I'm
*not* talking about threading in an
Gregory Stark wrote:
1) No Postgres function is guaranteed to be thread-safe so you better protect
against concurrent calls to Postgres API functions. Also Postgres functions
use longjmp which can restore the stack pointer to a value which may have
been set earlier, possibly by another t
Mark Mielke wrote:
PostgreSQL or the kernel should already have the hottest pages in
memory, so the value of doing async I/O is very likely the cooler
pages that are unique to the query. We don't know what the cooler
pages are until we follow three tree down.
I'm assuming that at the time we
Mark Mielke wrote:
At a minimum, this breaks your query into: 1) Preload all the index
pages you will need
Isn't this fairly predictable - the planner has chosen the index so it
will be operating
on a bounded subset.
, 2) Scan the index pages you needed
Yes, and AIO helps when you can scan the
Mark Mielke wrote:
This assumes that you can know which pages to fetch ahead of time -
which you do not except for sequential read of a single table.
Why doesn't it help to issue IO ahead-of-time requests when you are
scanning an index? You can read-ahead
in index pages, and submit requests f
[EMAIL PROTECTED] wrote:
So, if you hand requests one by one to the disc, it will almost always be
faster to order them. On the other hand, if you hand a huge long list of
requests to a decent SCSI or SATA-NCQ disc in one go, it will reorder the
reads itself, and it will do it much better than yo
Carlo Stonebanks wrote:
Isn't it just easier to assume that Windows Server can't do anything right?
;-)
Well, avoiding the ;-) - people do, and its remarkably foolish of them. Its
a long-standing whinge that many people with a UNIX-background seem to
just assume that Windows sucks, but you
Scott Marlowe wrote:
Where unixes generally outperform windows is in starting up new
backends, better file systems, and handling very large shared_buffer
settings.
Why do you think that UNIX systems are better at handling large shared
buffers than Wndows?
32 bit Windows systems can suffer f
Scott Marlowe wrote:
And there's the issue that with windows / NTFS that when one process
opens a file for read, it locks it for all other users. This means
that things like virus scanners can cause odd, unpredictable failures
of your database.
Can you provide some justification for this?
[EMAIL PROTECTED] wrote:
What is a real life example where an intelligent and researched
database application would issue a like or ilike query as their
primary condition in a situation where they expected very high
selectivity?
In my case the canonical example is to search against textual key
If Sybase is still like SQL Server (or the other way around), it *may*
end up scanning the index *IFF* the index is a clustered index. If it's
a normal index, it will do a sequential scan on the table.
Are you sure its not covered? Have to check at work - but I'm off next
week so it'll hav
Alexander Staubo wrote:
On 5/23/07, Andy <[EMAIL PROTECTED]> wrote:
An example would be:
SELECT * FROM table
WHERE name like '%john%' or street like
'%srt%'
Anyway, the query planner always does seq scan on the whole table and
that
takes some time. How can this
>Logic?
Foul! That's NOT evidence.
>
>Mechanical devices have decreasing MTBF when run in hotter environments,
>often at non-linear rates.
I agree that this seems intuitive. But I think taking it as a cast-iron
truth is dangerous.
>Server class drives are designed with a longer lifespan in mi
>Server drives are generally more tolerant of higher temperatures. I.e.
>the failure rate for consumer and server class HDs may be about the same
>at 40 degrees C, but by the time the internal case temps get up to 60-70
>degrees C, the consumer grade drives will likely be failing at a much
>higher
>Right --- the point is not the interface, but whether the drive is built
>for reliability or to hit a low price point.
Personally I take the marketing mublings about the enterprise drives
with a pinch of salt. The low-price drives HAVE TO be reliable too,
because a non-negligible failure rate wi
>sure but for any serious usage one either wants to disable that
>cache(and rely on tagged command queuing or how that is called in SATAII
>world) or rely on the OS/raidcontroller implementing some sort of
>FUA/write barrier feature(which linux for example only does in pretty
>recent kernels)
Does
>On sequential read speed HDs outperform flash disks... only on random
>access the flash disks are better. So if your application is a DW one,
>you're very likely better off using HDs.
This looks likely to be a non-issue shortly, see here:
http://www.reghardware.co.uk/2007/03/27/sams_doubles_ssd_
Isn't it likely that a single stream (or perhaps one that can be partitioned
across spindles) will tend to be fastest, since it has a nice localised
stream that a) allows for compression of reasonable blocks and b) fits with
commit aggregation?
RAM capacity on servers is going up and up, but the s
>WARNING: modern TOtL flash RAMs are only good for ~1.2M writes per
>memory cell. and that's the =good= ones.
>Using flash RAM for write heavy applications like OLTP, or for WAL,
>etc can be very dangerous
Well, that's why I suggested that the WAL would stream to a hard disk
array, where the lar
I see that one can now get compact flash to SATA connectors.
If I were to use a filesystem with noatime etc and little non-sql traffic,
does the physical update pattern tend to have hot sectors that will tend to
wear out CF?
I'm wondering about a RAID5 with data on CF drives and RAID1 for teh WAL
67 matches
Mail list logo