[Slony1-general] RFC tool to support development / operations work with slony replicated databases, second revision.

Andrew Hammond Tue, 20 Mar 2007 09:44:22 -0800

I pasted this out of our wiki, so the formatting may be off. Comments,
as always, are appreciated.


Andrew


Current Approach

A common problem in the database world is handling revisions to the
database that go with revisions in the software running against this
database. Currently our method is to include upgrade.sql and
downgrade.sql scripts with each software release. These are
automatically applied by the packaging system.

Problem Statement

This will fail when we start using slony since we need to handle DML
differently from DDL and DCL. We also need a way to apply slonik
scripts. Ordering matters in the application of these scripts.

After talking about it for a while, we agreed that developers want a
way to apply their updates without stepping on each other's toes while
in the process of developing and testing their work.

Design

Interface

uply [-f target] [-i] [-U pguser] [-h pghost] [-p pgport] [-d
pgdatabase] [--cluster clustername]
dply [-f target] [-i] [-U pguser] [-h pghost] [-p pgport] [-d
pgdatabase] [--cluster clustername]

uply forces an upgrade application of plys. dply forces a downgrade
application of plys.

-f    Optional Defaults to the current working directory. Specifies
the target intended to be upgraded or downgraded to. This may be
either the full or relative path. This may be either a directory or a
file.

-i    Optional Interactive mode. If set, ask questions, otherwise
assume that the user "knows what she is doing".

-U -h -p -d    Optional As for psql and other PostgreSQL command line
utilities.

--cluster    Optional Defaults to the database name. Specifies the
name of the slony cluster to work with. This should have a one-letter
short form that conforms with other similar tools. Gotta figure out
what those are though...

--trust-me-i-know-what-im-doing    Optional and not recommended. When
applying DDL plys, skip the per-node BEGIN; ... ROLLBACK pre-EXECUTE
validation test.

Limitations

   * When using this to manage a replicated database, it is assumed
that the connection information presented is for the origin database.
We should probably verify this upon connection.
   * Since we're using a python connector which is based on libqp, we
will auto-magically respect the standard postgres environment
variables including the .pgpass file for handling passwords.
   * We are not trying to deal cleverly with databases with more than
one slony replication cluster in them.
   * We are not going to deal with the case where various sets have
different origins.
   * We assume that this is run off the same machine that is
currently running the slons. We can connect to every database in the
cluster.
   * Aside from generating the slonik preamble, we are not going to
try and auto-generate slonik scripts that do anything more complicated
than EXECUTE SCRIPT. At least not initially. Maybe we can get more
clever later?
   * We will not (yet) try to be clever about detecting changes to
plys. Alfred floated the idea of using the SVN id tag to detect if a
file had been changed since it was last applied and then forcing a
downgrade/upgrade cycle. That seems like a lot of code for a corner
case. Alfred and Long agreed that it's probably a good idea to create
a convention instead. Do not edit files after they're committed unless
it will cause in-efficiencies in the application to the production
database. Instead, create a new file. If you are forced to edit a
committed file, then email the dev list.
   * Along the lines of not being clever, we assume there is only one
set and that's the one we're supposed to do stuff with. If there is
more than one set floating around, if there's a set 1, we pay
attention to that. Otherwise, we throw an error?
   * We will not assume the existence of a node 1. The whole point of
increasing availability by replicating is that we don't have to rely
on the existence of a given database.

Data Structure

Each release will include a directory that has the same name as the
full release tag. Every file in this directory is a ply (we'd call it
a patch but patches already have a very clear definition that doesn't
quite match, similarly script and file are just to vague). Each
directory must contain all the plys to be applied for the version it
represents. The release may include directories of plys from prior
releases in the same parent directory. The plys may have an arbitrary
name, but must end with a suffix of either dml.sql, ddl.sql, dcl.sql
or slonik. Ply names should incorporate the bug number they're
addressing. Ply names should somewhat describe what they do.

   * /my/base/directory
         o 3.10.0
               + create_foo_23451.ddl.sql
               + populate_foo_23451.dml.sql
               + alter_bar_add_column_reference_foo_23451.ddl.sql
               + update_bar_reference_foo_23451.dml.sql
               + alter_bar_column_not_null_23451.ddl.sql
               + subscribe_foo_23451.slonik
               + cleanup_some_data_migration_stuff_23451.ddl.sql
               + fix_bug_24341.ddl.sql -- these are poorly chosen
names, but hey, it's an example...
               + fix_bug_24341.dml.sql
               + fix_bug_24341.slonik
               + drop_broken_node_30031.slonik
         o 3.10.1
               + another_table_29341.ddl.sql

Inside the script, we add some semantics to what are usually comments.
An example is probably the best way to show this.

-- alter_bar_column_not_null_23451.ddl.sql
-- Witty comment about why this column needs to be not null.
--dep update_bar_reference_foo_23451.dml.sql
ALTER TABLE bar ALTER COLUMN foo_id DROP NOT NULL;
--upgrade
ALTER TABLE bar ALTER COLUMN foo_id SET NOT NULL;

At the top of the script, before any line that isn't either a comment
or whitespace, you can have zero or more comments of the form --dep
<filename> in SQL or #dep <filename> in slonik scripts. This is how
you define which other files the current file depends on. All files
for a given version depend on all files of all previous versions.
Filenames are all characters following the space after def until the
end of the line, not including any whitespace at the tail of the line.
Don't use filenames that end with whitespace, it's annoying.

I don't see any need to get even more restrictive and disallow
whitespace in filenames. This would allow brief comments on the same
line. More involved comments will take more than just a single line
anyway. Thoughtful selection of filenames should eliminate the need
for brief comments and tab eliminates the annoyance of typing them.

The other additional semantic is the --upgrade or #upgrade tag. This
defines when the downgrade section ends and the upgrade begins.

Finally, we need to add a table in the database which lists files applied.

CREATE SCHEMA _plyers;
ALTER SCHEMA _plyers OWNER TO pgsql;

CREATE TABLE _plyers.ply
(   release text  -- reference dbinfo.version
,   name text
,   uplied_on timestamptz default now()
,   primary key (release, name)
);
ALTER TABLE _plyers.ply OWNER TO pgsql;

Algorithm

Determining the Target

The parameter passed to -f must be either a relative or absolute path
to either a ply or a directory. Obviously, it's an error to pass -f
something that doesn't exist.

A target must consist of a version tag and may include a ply. If there
is no -f parameter, then the cwd is assumed to be the parameter. If
the parameter is a directory, then the name of that directory is
inspected. If it looks like \d+\.\d+\.\d+ (ie 3.10.0 or 3.10.1 or
3.10.2) then this is the target version tag. If it doesn't match this
pattern, then look for sub-directories within this directory which do
match this pattern. If no sub-directories that match the pattern are
found, then fail. In the event of an uply, the highest (sorted by
dotted numeric, not alphabetically) found is the target version tag.
For dply, the target will be the version immediately preceding the
current version (as obtained from the database). This does not support
alpha/beta style tags since there is no reasonable way of defining the
application order.

If the -f parameter is a file then the version tag must be the name of
the directory in which that file resides.

Deciding Between Upgrade and Downgrade

If the binary is called by the uply name, we are upgrading. If the
binary is called by the dply name, we are downgrading.

The current version and file set are obtained from the database. The
current version is the largest _plyers.ply.revision (sorted by dotted
numeric, not alphabetically). If the target version is higher than our
current version, we had better be uplying. If the target is a lower
version than our current version, we had better be dplying.
agh hmmm, it looks like our schema kind of sucks. Maybe we need a
column each for major_vision, minor_vision and patch_level?

If we're upgrading and there is no filename involved then we need to
uply all the plys necessary to achieve the target version (including
all the plys for that version). If we're downgrading, then we need to
dply all the plys necessary to achieve the target version (do not
downgrade any of the patches for that version).

When file names are involved then an upgrade means to apply all the
plys necessary to achieve the version prior to the target version,
plus any files from the target version upon which the target ply
depends (recursively) followed by the target ply itself, of course.
For a downgrade the opposite effect is desired: downgrade all the
versions greater than the target version, then recursively dply any
ply which depends on the target ply followed by dplying the target
ply.

Neither upgrading nor downgrading a ply will have any effect on other
plys unless there is an explicit dependency defined between that ply
and the ply being upgraded or downgraded. Dependencies are processed
recursively.

The process of upgrading or downgrading is incremental by patch
number, then by minor version number, finally by major version number.

For example, to upgrade from an existing version to a new version, the
update tool checks to see what the current version of the database is
as well as seeing what plys have been applied. If there are more plys
to be applied for the current version, it applies them. It then looks
for the next reversion up by incrementing the patch number. If that
doesn't exist, increments the minor version and checks again. If that
doesn't exist then set the minor version to zero and increments the
major version. If that doesn't exist then we're done. Downgrades are
the same except that they decrement instead of increment.

As each upgrade file is applied, the file name is inserted into the
_plyers.ply table. Downgrades delete the file name out of the
_plyers.ply table once the downgrade has been applied.

agh hmm, we could instead have a dply_on timestamptz column that's
null by default. Then we could track both uplys and dplys over time.
Dunno if there's much real value in this though. YAGNI?

Applying Different Types of Changes

The method of application for updates varies depending on if the
database is replicated or not. It also varies depending on the nature
of the update as determined by it's suffix. A database is assumed to
be replicated if it has a _cluster schema.

dml
   This is the easiest category of changes. The update tool needs
only to connect to the origin, issue a BEGIN statement, send all the
DML statements from the ply and a COMMIT if they succeed. Otherwise it
should issue a ROLLBACK and abort the upgrade/downgrade process with
an appropriate error message.

ddl / dcl
   These changes need to be applied globally to the cluster. The
update tool must connect to all the databases in the cluster, issue a
BEGIN statement, run the DDL / DCL script and see if it completes,
finally issue a ROLLBACK. This is to verify that it can reasonably be
expected to succeed on all the nodes in the cluster. Abort with a
suitable error message if the script doesn't apply to all members in
the cluster. We might want to be able to skip the verification step by
having a --trust-me-i-know-what-im-doing parameter. Once the scripts
are verified, they are applied via slonik execute. The slonik preamble
(cluster name, and connection info for all nodes) should be drawn from
the slonik schema in the origin database.
   Long said DDL has two cases: 1. alter subscribed tables which
needs to run with "execute script"; 2. create new tables,
modify/create stored procedures which need to be executed directly on
all databases. We need to distinct those two cases with different
suffixes.
     agh: Why? While I agree that ALTERs need to be handled by
EXECUTE SCRIPT, and obviously so to TRUNCATEs. However, I don't see
any reason not to use EXECUTE SCRIPT to handle other DDL statements as
well.
     Long See http://cbbrowne.com/info/addthings.html for details. I
tried to add a new by execute script. It never adds new table to
slaves.
     agh: Yeah, the locking actually is a pretty good reason not to
do it this way when we don't have to. I guess we need to have three
categories of SQL. DML (applies only to master server), DDL (that is
applied to each server directly) and DDL (that gets applied through
slonik EXECUTE SCRIPT). That suggests to me that ddl / dml tags aren't
quite what we're looking for. Any suggestions for an alternative
convention?
     agh: I think we might have to pay some locking tax. Consider if
we create a new table that REFERENCEs a table which is already under
replication. This will create triggers on the replicated table,
however slony does some [clever stuff] with triggers. We certainly
don't want to step on slony's toes with regards to trigger management.
We could specify that REFERENCE constraints must always be added by
ALTER scripts and that CREATEs are never allowed to contain
REFERENCEs. However this still seems kinda awkward. The more I think
about it, the less happy I am with trying to make a distinction here.
Since we should only really be uplying or dplying during a maintenance
window, I think that we should just pay the lock tax. If at some time
we actually need a way to avoid locking tax, that's when we should add
that capability. YANGI.

     slonik
         These are intended to manipulate the cluster directly. For
example, creating a new set, adding some tables and sequences and then
subscribing a bunch of nodes to it. The update tool will generate the
preamble for this.

Implementation protocols/rules

  1. The main set number is 1.
  2. Temporary file name will be
/tmp/ddl-pid-timestamp-plyname(including extension). This file must be
removed after successful execution. Unless of execution fails in which
case we want to leave it around to facilitate debugging.
  3. Given that that file is not intended for concurrency control, do
we want one? Seems to me there's no reason to allow two of these
processes to run together.
     agh: Unless of course they're running against different
databases / clusters / whatever. If we're gonna have a lock file, then
it should be specific to the cluster we're working on. Since this can
be run over the network from any arbitrary client, the Right Way to
lock it might be inside the database. Do we want to bother?

  4. Temporary set (to hold new tables) number is arbitrary, but
should probably be based on the PID.
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

[Slony1-general] RFC tool to support development / operations work with slony replicated databases, second revision.

Reply via email to