Sounds like you need some way to match a subset of the data first,
rather than try indices that are bigger than the data. Can you add
operation indices, perhaps on the first 10 bytes of the keys in both
tables or on a integer hash of all of the strings? If so you could
join on the exact set differe
The intuitive thing would be to put pg into a file system.
/Aaron
On Thu, 21 Oct 2004 12:44:10 +0200, Leeuw van der, Tim
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I guess the difference is in 'severe hacking inside PG' vs. 'some unknown amount of
> hacking that doesn't touch PG code'.
>
> Hacking
I'm driving from Tenafly NJ and going to both sessions. If you're able
to get to the George Washington Bridge (A train to 178th Street [Port
Authority North] and a bus over the bridge), I can drive you down. I'm
not sure right now about the return because I have confused plans to
meet someone.
/Aa
Right - if you split a table to a lot of more selective tables, it can often
dramatically change the plan options (e.g. - in a single table, selectivity
for a query may be 1% and require an expensive nested loop while in the more
restrictive table it may match 14% of the data and do a cheaper scan)
pg to my mind is unique in not trying to avoid OS buffering. Other
dbmses spend a substantial effort to create a virtual OS (task
management, I/O drivers, etc.) both in code and support. Choosing mmap
seems such a limiting an option - it adds OS dependency and limits
kernel developer options (2G li
Makes sense. See DB2 8.2 info on their new implementation of cross column
statistics. If this is common and you're willing to change code, you can
fake that by adding a operation index on some hash function of both columns,
and search for both columns and the hash.
- Original Message -
Fr
- Original Message -
From: "Gabriele Bartolini" <[EMAIL PROTECTED]>
To: "Aaron Werman" <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Thursday, October 07, 2004 1:07 PM
Subject: Re: [PERFORM] Data warehousing requirements
> At 13.30 07/10/2004
Consider how the fact table is going to be used, and review hacking it up
based on usage. Fact tables should be fairly narrow, so if there are extra
columns beyond keys and dimension keys consider breaking it into parallel
tables (vertical partitioning).
Horizontal partitioning is your friend; esp
I'm not sure I understand your req fully. If the same request is repeatedly
done with same parameters, you could implement a proxy web server with a
croned script to purge stale pages. If there is substantially the same data
being summarized, doing your own summary tables works; if accessed enough,
Some quick notes:
- Using a side effect of a function to update the
database feels bad to me
- how long does the SELECT into varQueryRecord
md5(upc.keyp
function take / what does it's explain look
like?
- There are a lot of non-indexed columns on that
delta master table, such as key
or a big table loaded monthly.)
- Original Message -
From: "Jeff" <[EMAIL PROTECTED]>
To: "Mitch Pirtle" <[EMAIL PROTECTED]>
Cc: "Aaron Werman" <[EMAIL PROTECTED]>; "Scott Kirkwood"
<[EMAIL PROTECTED]>; "Neil Conway" <[EMAI
- Original Message -
From: "Tom Lane" <[EMAIL PROTECTED]>
To: "Aaron Werman" <[EMAIL PROTECTED]>
Cc: "Iain" <[EMAIL PROTECTED]>; "Jim C. Nasby" <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Tuesday, September 28, 20
- Original Message -
From: "Tom Lane" <[EMAIL PROTECTED]>
To: "Iain" <[EMAIL PROTECTED]>
Cc: "Jim C. Nasby" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, September 27, 2004 11:17 PM
Subject: Re: [PERFORM] Caching of Queries
> "Iain" <[EMAIL PROTECTED]> writes:
> > I can only te
There is a difference between MySQL and Oracle here.
Oracle, to reduce parse/planner costs, hashes statements to see if it can
match an existing optimizer plan. This is optional and there are a few
flavors that range from a characher to characyter match through parse tree
matches through replacing
From: "Harald Lau (Sector-X)" <[EMAIL PROTECTED]>
...
> From: "Mischa Sandberg" <[EMAIL PROTECTED]>
>
> > If your company is currently happy with MySQL, there probably are
> > other (nontechnical) reasons to stick with it. I'm impressed that
> > you'd consider reconsidering PG.
>
> I'd like to seco
> Mark Cotner wrote:
> > The time has come to reevaluate/rearchitect an
> > application which I built about 3 years ago. There
> > are no performance concerns with MySQL, but it would
> > benefit greatly from stored procedures, views, etc.
>
From: "Mischa Sandberg" <[EMAIL PROTECTED]>
> If your
The comparison is actually dead on. If you have lots of write through / read
behind cache, RAID 5 can run very quickly, until the write rate overwhelms
the cache - at which point the 4 I/O per write / 2 per read stops it. This
means that RAID 5 works, except when stressed, which is a bad paradigm.
Your second server has queuing (load averages are highish), only 2 processes
running, and almost all cycles are idle. You need to track down your
bottleneck. Have you looked at iostat/vmstat? I think it would be useful to
post these, ideally both before and after full vacuum analyze.
/Aaron
By definition, it is equivalent to:
SELECT t1.id, t2.url FROM referral_temp t2 LEFT /*OUTER*/ JOIN d_referral t1
ON t2.url = t1.referral_raw_url
union all
SELECT null, url FROM referral_temp WHERE url is null
ORDER BY 1;
/Aaron
- Original Message -
From: "Joe Conway" <[EMAIL PROTECTED
I agree on not linking and adding non-SAN disk
dependancy to your DB. I'm trying to understand your FS reasoning. I have never seen XFS run faster than ReiserFS in any
situation (or for that matter beat any FS in performance except JFS). XFS has
some nifty very large file features, but we're
There are a few things that you can do to help force yourself to be I/O
bound. These include:
- RAID 5 for write intensive applications, since multiple writes per synch
write is good. (There is a special case for logging or other streaming
sequential writes on RAID 5)
- Data journaling file syste
fsync I'm thinking 50 inserts, if autocommiting is 50TPS = ~100 IO per
second (50 WAL + checkpoint IO) = roughly the I/O rate of a single drive.
Huang - Are you using a single drive for pg? If so, there is a safety
problem of both the data and logs used for recovery on the same drive. If
the drive
- Original Message -
From: "Josh Berkus" <[EMAIL PROTECTED]>
To: "Aaron Werman" <[EMAIL PROTECTED]>; "Qing Zhao" <[EMAIL PROTECTED]>;
"Tom Lane" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, April 06, 2
- Original Message -
From: "Tom Lane" <[EMAIL PROTECTED]>
To: "Qing Zhao" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, April 06, 2004 1:47 AM
Subject: Re: [PERFORM] possible improvement between G4 and G5
> Qing Zhao <[EMAIL PROTECTED]> writes:
> > We have got a G5 64-bit p
CTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, April 04, 2004 5:06 PM
Subject: Re: [PERFORM] single index on more than two coulumns a bad thing?
Hi Aaron,
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of
> Aaron Werman
> Sent: v
Almost any cross dbms migration shows a drop in performance. The engine
effectively trains developers and administrators in what works and what
doesn't. The initial migration thus compares a tuned to an untuned version.
/Aaron
- Original Message -
From: "Josh Berkus" <[EMAIL PROTECTED]>
another thing that I have all over the place is a hierarchy:
index on grandfather_table(grandfather)
index on father_table(grandfather, father)
index on son_table(grandfather, father, son)
almost all of my indices are composite. Are you thinking about composite
indices with low cardinality leading
Are you talking about
http://www.potentialtech.com/wmoran/postgresql.php#conclusion
- Original Message -
From: "Subbiah, Stalin" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; "Matt Clark" <[EMAIL PROTECTED]>; "Subbiah, Stalin"
<[EMAIL PROTECTED]>; "'Andrew Sullivan'" <[EMAIL PROTECTED]>;
<[
Quick observations:
1. We have an explanation for what's going on, based on the message being
exactly 666 lines long :-)
2. I'm clueless on the output, but perhaps Tom can see something. A quick
glance shows that the strace seemed to run 27 seconds, during which it did:
count| call
---|---
The original point was about a very slow update of an entire table with a
plan that looped, and over a dozen conditional indices - vs. a 'create as'
in a CPU starved environment. I stand by my statement about observing the
orders of magnitude difference. In theory I agree that the update should be
Bulk updates are generally dogs (not just in pg), so I avoid doing them by
doing faster selects and inserts. You can create a new table using 'create
table as' to produce your target results. This is real fast - avoiding the
row iteration in insert, allowing the select optimizer to run and no index
Sorry about not chiming in before - I've been too swamped to think. I agree
with most of the points, but a lot of these posts are interesting and seem
to describe systems from an SA perspective to my DBA-centric view.
- Original Message -
From: "Marty Scholes" <[EMAIL PROTECTED]>
To: <[EM
32 matches
Mail list logo