Tom,
Sure ... but you'll find that it's not large enough to be useful.
Once you remove all the interesting consistency checks such as
unique indexes and foreign keys, the COPY will tend to go through
just fine, and then you're still stuck trying to weed out bad data
without very good tools
2007/12/16, Tom Lane [EMAIL PROTECTED]:
Hannu Krosing [EMAIL PROTECTED] writes:
But can't we _define_ such a subset, where we can do a transactionless
load ?
Sure ... but you'll find that it's not large enough to be useful.
Once you remove all the interesting consistency checks such as
On Saturday 2007-12-15 02:14, Simon Riggs wrote:
On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote:
Neil Conway [EMAIL PROTECTED] writes:
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too
Ühel kenal päeval, L, 2007-12-15 kell 01:12, kirjutas Tom Lane:
Josh Berkus [EMAIL PROTECTED] writes:
There's no way we can do a transactionless load, then? I'm thinking of the
load-into-new-partition which is a single pass/fail operation. Would
ignoring individual row errors in for
Hannu Krosing [EMAIL PROTECTED] writes:
But can't we _define_ such a subset, where we can do a transactionless
load ?
Sure ... but you'll find that it's not large enough to be useful.
Once you remove all the interesting consistency checks such as
unique indexes and foreign keys, the COPY will
Hi,
On Dec 15, 2007 1:14 PM, Tom Lane [EMAIL PROTECTED] wrote:
NikhilS [EMAIL PROTECTED] writes:
Any errors which occur before doing the heap_insert should not require
any recovery according to me.
A sufficient (though far from all-encompassing) rejoinder to that is
triggers and CHECK
On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote:
Neil Conway [EMAIL PROTECTED] writes:
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too many or too few columns, rows that result in constraint
On Tue, 2007-12-11 at 19:11 -0500, Greg Smith wrote:
I'm curious what you feel is missing that pgloader doesn't fill that
requirement: http://pgfoundry.org/projects/pgloader/
For complicated ETL, I agree that using an external tool makes the most
sense. But I think there is still merit in
On 16/12/2007, Neil Conway [EMAIL PROTECTED] wrote:
On Tue, 2007-12-11 at 19:11 -0500, Greg Smith wrote:
I'm curious what you feel is missing that pgloader doesn't fill that
requirement: http://pgfoundry.org/projects/pgloader/
For complicated ETL, I agree that using an external tool makes
Ühel kenal päeval, T, 2007-12-11 kell 15:41, kirjutas Neil Conway:
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load.
I actually had to cook up a version of this for Truviso
On Fri, 2007-12-14 at 14:48 +0200, Hannu Krosing wrote:
How did you do it ?
Did you enchance COPY command or was it something completely new ?
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too many
Neil Conway wrote:
On Fri, 2007-12-14 at 14:48 +0200, Hannu Krosing wrote:
How did you do it ?
Did you enchance COPY command or was it something completely new ?
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed
Neil Conway [EMAIL PROTECTED] writes:
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too many or too few columns, rows that result in constraint violations,
and rows containing columns where the data
On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote:
If we could somehow only do a subtransaction per failure, things would
be much better, but I don't see how.
One approach would be to essentially implement the pg_bulkloader
approach inside the backend. That is, begin by doing a subtransaction
Neil Conway [EMAIL PROTECTED] writes:
One approach would be to essentially implement the pg_bulkloader
approach inside the backend. That is, begin by doing a subtransaction
for every k rows (with k = 1000, say). If you get any errors, then
either repeat the process with k/2 until you locate
Tom,
I think such an approach is doomed to hopeless unreliability. There is
no concept of an error that doesn't require a transaction abort in the
system now, and that doesn't seem to me like something that can be
successfully bolted on after the fact. Also, there's a lot of
bookkeeping
On Friday 2007-12-14 16:22, Tom Lane wrote:
Neil Conway [EMAIL PROTECTED] writes:
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too many or too few columns, rows that result in constraint
Josh Berkus [EMAIL PROTECTED] writes:
There's no way we can do a transactionless load, then? I'm thinking of the
load-into-new-partition which is a single pass/fail operation. Would
ignoring individual row errors in for this case still cause these kinds of
problems?
Given that COPY fires
Hi,
Another approach would be to distinguish between errors that require a
subtransaction to recover to a consistent state, and less serious errors
that don't have this requirement (e.g. invalid input to a data type
input function). If all the errors that we want to tolerate during a
bulk
NikhilS [EMAIL PROTECTED] writes:
Any errors which occur before doing the heap_insert should not require
any recovery according to me.
A sufficient (though far from all-encompassing) rejoinder to that is
triggers and CHECK constraints can do anything.
The overhead of having a subtransaction
Hello Gregory,
Gregory Stark wrote:
Oracle is using Direct I/O so they need the reader and writer threads to avoid
blocking on i/o all the time. We count on the OS doing readahead and buffering
our writes so we don't have to. Direct I/O and needing some way to do
asynchronous writes and reads
Hi,
Le mercredi 12 décembre 2007, Josh Berkus a écrit :
I'm curious what you feel is missing that pgloader doesn't fill that
requirement: http://pgfoundry.org/projects/pgloader/
Because pgloader is implemented in middleware, it carries a very high
overhead if you have bad rows. As little
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote:
Simon, we should start a VLDB-Postgres developer wiki page.
http://developer.postgresql.org/index.php/DataWarehousing
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
---(end of
Hi,
Josh Berkus wrote:
Here's the other VLDB features we're missing:
Parallel Query
Uh.. this only makes sense in a distributed database, no? I've thought
about parallel querying on top of Postgres-R. Does it make sense
implementing some form of parallel querying apart from the
Markus,
Parallel Query
Uh.. this only makes sense in a distributed database, no? I've thought
about parallel querying on top of Postgres-R. Does it make sense
implementing some form of parallel querying apart from the distribution
or replication engine?
Sure. Imagine you have a 5TB
Hi Josh,
Josh Berkus wrote:
Sure. Imagine you have a 5TB database on a machine with 8 cores and only one
concurrent user. You'd like to have 1 core doing I/O, and say 4-5 cores
dividing the scan and join processing into 4-5 chunks.
Ah, right, thank for enlightenment. Heck, I'm definitely
On Wed, Dec 12, 2007 at 08:26:16PM +0100, Markus Schiltknecht wrote:
Isn't Gavin Sherry working on this? Haven't read anything from him
lately...
Me neither. Swallowed by Greenplum and France.
Hm.. good for him, I guess!
Yes, I'm around -- just extremely busy with a big release at
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Greenplum as well as other Real Life stuff.
For those of us here who have no idea what you are talking about can
you define what Real Life is like?
Joshua D. Drake
- --
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/
Josh Berkus [EMAIL PROTECTED] writes:
Markus,
Parallel Query
Uh.. this only makes sense in a distributed database, no? I've thought
about parallel querying on top of Postgres-R. Does it make sense
implementing some form of parallel querying apart from the distribution
or replication
I'm starting work on next projects for 8.4.
Many applications have the need to store very large data volumes for
both archival and analysis. The analytic databases are commonly known as
Data Warehouses, though there isn't a common term for large archival
data stores. The use cases for those can
Simon.
VLDB Features I'm expecting to work on are
- Read Only Tables/WORM tables
- Advanced Partitioning
- Compression
plus related performance features
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load. Unfortunately, I don't
Ühel kenal päeval, T, 2007-12-11 kell 10:53, kirjutas Josh Berkus:
Simon.
VLDB Features I'm expecting to work on are
- Read Only Tables/WORM tables
- Advanced Partitioning
- Compression
plus related performance features
Just so you don't lose sight of it, one of the biggest VLDB
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
Simon.
VLDB Features I'm expecting to work on are
- Read Only Tables/WORM tables
- Advanced Partitioning
- Compression
plus related performance features
Just so you don't lose sight of it, one of the biggest VLDB features we're
Hannu,
COPY ... WITH ERRORS TO ...
Yeah, that's a start.
or something more advanced, like bulkload which can be continued after
crash ?
Well, we could also use a loader which automatically parallelized, but that
functionality can be done at the middleware level. WITH ERRORS is the
most
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load.
I actually had to cook up a version of this for Truviso recently. I'll
take a look at submitting a cleaned-up implementation for
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote:
Here's the other VLDB features we're missing:
Parallel Query
Windowing Functions
Parallel Index Build (not sure how this works exactly, but it speeds Oracle
up considerably)
On-disk Bitmap Index (anyone game to finish GP patch?)
I
On Tue, 11 Dec 2007, Josh Berkus wrote:
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load. Unfortunately, I don't know anyone
who's working on it.
I'm curious what you feel is missing that pgloader doesn't fill that
requirement:
Greg,
I'm curious what you feel is missing that pgloader doesn't fill that
requirement: http://pgfoundry.org/projects/pgloader/
Because pgloader is implemented in middleware, it carries a very high overhead
if you have bad rows. As little as 1% bad rows will slow down loading by 20%
due to
38 matches
Mail list logo