Re: [HACKERS] VLDB Features

2007-12-20 Thread Josh Berkus
Tom, Sure ... but you'll find that it's not large enough to be useful. Once you remove all the interesting consistency checks such as unique indexes and foreign keys, the COPY will tend to go through just fine, and then you're still stuck trying to weed out bad data without very good tools

Re: [HACKERS] VLDB Features

2007-12-18 Thread Michał Zaborowski
2007/12/16, Tom Lane [EMAIL PROTECTED]: Hannu Krosing [EMAIL PROTECTED] writes: But can't we _define_ such a subset, where we can do a transactionless load ? Sure ... but you'll find that it's not large enough to be useful. Once you remove all the interesting consistency checks such as

Re: [HACKERS] VLDB Features

2007-12-16 Thread Trent Shipley
On Saturday 2007-12-15 02:14, Simon Riggs wrote: On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote: Neil Conway [EMAIL PROTECTED] writes: By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY to drop (and log) rows that contain malformed data. That is, rows with too

Re: [HACKERS] VLDB Features

2007-12-16 Thread Hannu Krosing
Ühel kenal päeval, L, 2007-12-15 kell 01:12, kirjutas Tom Lane: Josh Berkus [EMAIL PROTECTED] writes: There's no way we can do a transactionless load, then? I'm thinking of the load-into-new-partition which is a single pass/fail operation. Would ignoring individual row errors in for

Re: [HACKERS] VLDB Features

2007-12-16 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes: But can't we _define_ such a subset, where we can do a transactionless load ? Sure ... but you'll find that it's not large enough to be useful. Once you remove all the interesting consistency checks such as unique indexes and foreign keys, the COPY will

Re: [HACKERS] VLDB Features

2007-12-16 Thread NikhilS
Hi, On Dec 15, 2007 1:14 PM, Tom Lane [EMAIL PROTECTED] wrote: NikhilS [EMAIL PROTECTED] writes: Any errors which occur before doing the heap_insert should not require any recovery according to me. A sufficient (though far from all-encompassing) rejoinder to that is triggers and CHECK

Re: [HACKERS] VLDB Features

2007-12-15 Thread Simon Riggs
On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote: Neil Conway [EMAIL PROTECTED] writes: By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY to drop (and log) rows that contain malformed data. That is, rows with too many or too few columns, rows that result in constraint

Re: [HACKERS] VLDB Features

2007-12-15 Thread Neil Conway
On Tue, 2007-12-11 at 19:11 -0500, Greg Smith wrote: I'm curious what you feel is missing that pgloader doesn't fill that requirement: http://pgfoundry.org/projects/pgloader/ For complicated ETL, I agree that using an external tool makes the most sense. But I think there is still merit in

Re: [HACKERS] VLDB Features

2007-12-15 Thread Pavel Stehule
On 16/12/2007, Neil Conway [EMAIL PROTECTED] wrote: On Tue, 2007-12-11 at 19:11 -0500, Greg Smith wrote: I'm curious what you feel is missing that pgloader doesn't fill that requirement: http://pgfoundry.org/projects/pgloader/ For complicated ETL, I agree that using an external tool makes

Re: [HACKERS] VLDB Features

2007-12-14 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-12-11 kell 15:41, kirjutas Neil Conway: On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote: Just so you don't lose sight of it, one of the biggest VLDB features we're missing is fault-tolerant bulk load. I actually had to cook up a version of this for Truviso

Re: [HACKERS] VLDB Features

2007-12-14 Thread Neil Conway
On Fri, 2007-12-14 at 14:48 +0200, Hannu Krosing wrote: How did you do it ? Did you enchance COPY command or was it something completely new ? By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY to drop (and log) rows that contain malformed data. That is, rows with too many

Re: [HACKERS] VLDB Features

2007-12-14 Thread Andrew Dunstan
Neil Conway wrote: On Fri, 2007-12-14 at 14:48 +0200, Hannu Krosing wrote: How did you do it ? Did you enchance COPY command or was it something completely new ? By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY to drop (and log) rows that contain malformed

Re: [HACKERS] VLDB Features

2007-12-14 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes: By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY to drop (and log) rows that contain malformed data. That is, rows with too many or too few columns, rows that result in constraint violations, and rows containing columns where the data

Re: [HACKERS] VLDB Features

2007-12-14 Thread Neil Conway
On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote: If we could somehow only do a subtransaction per failure, things would be much better, but I don't see how. One approach would be to essentially implement the pg_bulkloader approach inside the backend. That is, begin by doing a subtransaction

Re: [HACKERS] VLDB Features

2007-12-14 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes: One approach would be to essentially implement the pg_bulkloader approach inside the backend. That is, begin by doing a subtransaction for every k rows (with k = 1000, say). If you get any errors, then either repeat the process with k/2 until you locate

Re: [HACKERS] VLDB Features

2007-12-14 Thread Josh Berkus
Tom, I think such an approach is doomed to hopeless unreliability. There is no concept of an error that doesn't require a transaction abort in the system now, and that doesn't seem to me like something that can be successfully bolted on after the fact. Also, there's a lot of bookkeeping

Re: [HACKERS] VLDB Features

2007-12-14 Thread Trent Shipley
On Friday 2007-12-14 16:22, Tom Lane wrote: Neil Conway [EMAIL PROTECTED] writes: By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY to drop (and log) rows that contain malformed data. That is, rows with too many or too few columns, rows that result in constraint

Re: [HACKERS] VLDB Features

2007-12-14 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes: There's no way we can do a transactionless load, then? I'm thinking of the load-into-new-partition which is a single pass/fail operation. Would ignoring individual row errors in for this case still cause these kinds of problems? Given that COPY fires

Re: [HACKERS] VLDB Features

2007-12-14 Thread NikhilS
Hi, Another approach would be to distinguish between errors that require a subtransaction to recover to a consistent state, and less serious errors that don't have this requirement (e.g. invalid input to a data type input function). If all the errors that we want to tolerate during a bulk

Re: [HACKERS] VLDB Features

2007-12-14 Thread Tom Lane
NikhilS [EMAIL PROTECTED] writes: Any errors which occur before doing the heap_insert should not require any recovery according to me. A sufficient (though far from all-encompassing) rejoinder to that is triggers and CHECK constraints can do anything. The overhead of having a subtransaction

Re: [HACKERS] VLDB Features

2007-12-13 Thread Markus Schiltknecht
Hello Gregory, Gregory Stark wrote: Oracle is using Direct I/O so they need the reader and writer threads to avoid blocking on i/o all the time. We count on the OS doing readahead and buffering our writes so we don't have to. Direct I/O and needing some way to do asynchronous writes and reads

Re: [HACKERS] VLDB Features

2007-12-12 Thread Dimitri Fontaine
Hi, Le mercredi 12 décembre 2007, Josh Berkus a écrit : I'm curious what you feel is missing that pgloader doesn't fill that requirement: http://pgfoundry.org/projects/pgloader/ Because pgloader is implemented in middleware, it carries a very high overhead if you have bad rows. As little

Re: [HACKERS] VLDB Features

2007-12-12 Thread Simon Riggs
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote: Simon, we should start a VLDB-Postgres developer wiki page. http://developer.postgresql.org/index.php/DataWarehousing -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of

Re: [HACKERS] VLDB Features

2007-12-12 Thread Markus Schiltknecht
Hi, Josh Berkus wrote: Here's the other VLDB features we're missing: Parallel Query Uh.. this only makes sense in a distributed database, no? I've thought about parallel querying on top of Postgres-R. Does it make sense implementing some form of parallel querying apart from the

Re: [HACKERS] VLDB Features

2007-12-12 Thread Josh Berkus
Markus, Parallel Query Uh.. this only makes sense in a distributed database, no? I've thought about parallel querying on top of Postgres-R. Does it make sense implementing some form of parallel querying apart from the distribution or replication engine? Sure. Imagine you have a 5TB

Re: [HACKERS] VLDB Features

2007-12-12 Thread Markus Schiltknecht
Hi Josh, Josh Berkus wrote: Sure. Imagine you have a 5TB database on a machine with 8 cores and only one concurrent user. You'd like to have 1 core doing I/O, and say 4-5 cores dividing the scan and join processing into 4-5 chunks. Ah, right, thank for enlightenment. Heck, I'm definitely

Re: [HACKERS] VLDB Features

2007-12-12 Thread Gavin Sherry
On Wed, Dec 12, 2007 at 08:26:16PM +0100, Markus Schiltknecht wrote: Isn't Gavin Sherry working on this? Haven't read anything from him lately... Me neither. Swallowed by Greenplum and France. Hm.. good for him, I guess! Yes, I'm around -- just extremely busy with a big release at

Re: [HACKERS] VLDB Features

2007-12-12 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Greenplum as well as other Real Life stuff. For those of us here who have no idea what you are talking about can you define what Real Life is like? Joshua D. Drake - -- The PostgreSQL Company: Since 1997, http://www.commandprompt.com/

Re: [HACKERS] VLDB Features

2007-12-12 Thread Gregory Stark
Josh Berkus [EMAIL PROTECTED] writes: Markus, Parallel Query Uh.. this only makes sense in a distributed database, no? I've thought about parallel querying on top of Postgres-R. Does it make sense implementing some form of parallel querying apart from the distribution or replication

[HACKERS] VLDB Features

2007-12-11 Thread Simon Riggs
I'm starting work on next projects for 8.4. Many applications have the need to store very large data volumes for both archival and analysis. The analytic databases are commonly known as Data Warehouses, though there isn't a common term for large archival data stores. The use cases for those can

Re: [HACKERS] VLDB Features

2007-12-11 Thread Josh Berkus
Simon. VLDB Features I'm expecting to work on are - Read Only Tables/WORM tables - Advanced Partitioning - Compression plus related performance features Just so you don't lose sight of it, one of the biggest VLDB features we're missing is fault-tolerant bulk load. Unfortunately, I don't

Re: [HACKERS] VLDB Features

2007-12-11 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-12-11 kell 10:53, kirjutas Josh Berkus: Simon. VLDB Features I'm expecting to work on are - Read Only Tables/WORM tables - Advanced Partitioning - Compression plus related performance features Just so you don't lose sight of it, one of the biggest VLDB

Re: [HACKERS] VLDB Features

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote: Simon. VLDB Features I'm expecting to work on are - Read Only Tables/WORM tables - Advanced Partitioning - Compression plus related performance features Just so you don't lose sight of it, one of the biggest VLDB features we're

Re: [HACKERS] VLDB Features

2007-12-11 Thread Josh Berkus
Hannu, COPY ... WITH ERRORS TO ... Yeah, that's a start. or something more advanced, like bulkload which can be continued after crash ? Well, we could also use a loader which automatically parallelized, but that functionality can be done at the middleware level. WITH ERRORS is the most

Re: [HACKERS] VLDB Features

2007-12-11 Thread Neil Conway
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote: Just so you don't lose sight of it, one of the biggest VLDB features we're missing is fault-tolerant bulk load. I actually had to cook up a version of this for Truviso recently. I'll take a look at submitting a cleaned-up implementation for

Re: [HACKERS] VLDB Features

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote: Here's the other VLDB features we're missing: Parallel Query Windowing Functions Parallel Index Build (not sure how this works exactly, but it speeds Oracle up considerably) On-disk Bitmap Index (anyone game to finish GP patch?) I

Re: [HACKERS] VLDB Features

2007-12-11 Thread Greg Smith
On Tue, 11 Dec 2007, Josh Berkus wrote: Just so you don't lose sight of it, one of the biggest VLDB features we're missing is fault-tolerant bulk load. Unfortunately, I don't know anyone who's working on it. I'm curious what you feel is missing that pgloader doesn't fill that requirement:

Re: [HACKERS] VLDB Features

2007-12-11 Thread Josh Berkus
Greg, I'm curious what you feel is missing that pgloader doesn't fill that requirement: http://pgfoundry.org/projects/pgloader/ Because pgloader is implemented in middleware, it carries a very high overhead if you have bad rows. As little as 1% bad rows will slow down loading by 20% due to