[Slony1-general] 2d grid partitioned geographical database with local replication: practical with slony?

Andy Ballingall Thu, 24 Nov 2005 00:08:11 -0800

Hello,

I’m new to slony, but have been evaluating it in order to examine the feasibility of splitting up a national database so that it can run on a farm of simple servers.

Very briefly, I describe the scheme, my tests so far, and then ask my questions:

------------------------------------------------

THE SCHEME

----------

a) The entire country is subdivided into a grid of cells, (the size of which to be decided. Let’s say 2km square for now)

b) All the objects in the database are located by grid reference.

c) All queries are designed to return results which are geographically close (let's say no more than 2km away from a particular point). No queries need ever return data further than one cell away.

d) Each cell maintains its own database.

e) A cell's database is the 'master' database for all data located within the cell's boundaries

f) The tables in each cell are slaved to the adjoining neighbours (with slony-1)

g) In order to provide cross border searches (to prevent the problem of people living near the edge of a cell only seeing half the stuff nearby), queries served by the cell use a union of the master tables and the sets slaved from the adjoining cells.

h) Cells are distributed across a server farm. The number of cells on each server depends upon the activity in each cell and the capability of the server. The worst case scenario is that a single cell occupies its own server. To start with, many cells (20 or so) may occupy a single server, but will be migrated to new servers as they become busier.

What this means in practice is best shown with a diagram:

----------------------------

| | | |

| Cell A | Cell B | Cell C |

| | | |

----------------------------

| | | |

| Cell D | Cell E | Cell F |

| | | |

----------------------------

| | | |

| Cell G | Cell H | Cell I |

| | | |

----------------------------

Considering Cell E:

a)Cell E's database is the master database for information located geographically within cell E.

b) The 8 adjoining cells slave this data

c) Cell E slaves data from all 8 adjoining cells.

---------------------------------------------------------

MY TESTS SO FAR

---------------

So far, I've manually setup a test case with just the top row of the example above (3 cells - A B and C in a row).

It works. Add something to A, it appears in B. Add something to B, it appears in A and C. Add something to C, it appears in B.

So far, so good.

---------------------------------------------------------

MY CONCERNS

-----------

1. For cell E, the number of slon threads would be 18. (it seems to be two for each node). Is this within the acceptable parameters, or is it a really bad idea? What are the system overheads? In my example of a single server running 20 (not very active) cells, this would inflate 20 360 processes.

2. In order to provide 'cross border' local searches, all 'cross border data' is effectively slaved 8 times. Should I be concerned by this?

The aim is that both the web serving and the database for a particular cell is managed by the same server, and that because the cells are small, the local data will easily fit within RAM (allowing for apache and other services) - even with the local slaved copies of adjacent cells' data.

3. Has this been tried before with disastrous consequences?!

Many thanks,

Andy Ballingall

_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

[Slony1-general] 2d grid partitioned geographical database with local replication: practical with slony?

Reply via email to