On Sun, Jul 27, 2008 at 10:37:34AM +0100, Simon Riggs wrote: > > On Sat, 2008-07-26 at 11:03 -0700, Joshua D. Drake wrote: > > > 2. We have no concurrency which means, anyone with any database over 50G > > has unacceptable restore times. > > Agreed. > > Also the core reason for wanting -w > > > 3. We have to continue develop hacks to define custom utilization. Why > > am I passing pre-data anything? It should be automatic. For example:
[adding hackers for discussion] On Sat, Jul 26, 2008 at 01:56:14PM -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > I want to dump tables separately for performance reasons. There are > > documented tests showing 100% gains using this method. There is no gain > > adding this to pg_restore. There is a gain to be had - parallelising > > index creation, but this patch doesn't provide parallelisation. > > Right, but the parallelization is going to happen sometime, and it is > going to happen in the context of pg_restore. So I think it's pretty > silly to argue that no one will ever want this feature to work in > pg_restore. > > To extend the example I just gave to Stephen, I think a fairly probable > scenario is where you only need to tweak some "before" object > definitions, and then you could do > > pg_restore --schema-before-data whole.dump >before.sql > edit before.sql > psql -f before.sql target_db > pg_restore --data-only --schema-after-data -d target_db whole.dump > > which (given a parallelizing pg_restore) would do all the time-consuming > steps in a fully parallelized fashion. A few thoughts about pg_restore performance: To take advantage of non-logged copy, the table create and data load should be in the same transaction. To take advantage of file and buffer cache, it would be better to create indexes immediately after loading table data. Many tables will be small enough to fit in cache on and this will avoid re-reading them for index builds. This is more advantagious with more indexes on one table. There may also be some filesytem placement benefits to building the indexes for a table immediately after loading the data. Creating constraints immediately after loading data also would benefit from warm buffer and file caches. Doing this this is complicated by the need for indexes and data in the referenced tables to exist first. It seems that a high performance restore will want to procede in a different order than the current sort order or that proposed by the before/data/after patch. - The simplest unit of work for parallelism may be the table and its "decorations", eg indexes and relational constraints. - Sort tables by foreign key dependency so that referenced tables are loaded before referencing tables. - Do table creation and data load together in one transaction to use non-logged copy. Index builds, and constraint creation should follow immediately, either as part of the same transaction, or possibly parallelized themselves. Table creation, data load, index builds, and constraint creation could be packaged up as the unit of work to be done in a subprocess which either completes or fails as a unit. The worker process would be called with connection info, a file pointer to the data, and the DDL for the table. pg_restore would keep a work queue of tables to be restored in FK dependency order and also do the other schema operations such as functions and types. -dg -- David Gould [EMAIL PROTECTED] 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers