On 11/25/09 10:50 , "Simon Slavin" <slav...@bigfraud.org> wrote:

>  The message is that if you are short of
> space it is already too late for any software to cope with the problem.
> 

I disagree. It all depends on where you set the threshold for "short of
space". To give you a trivial example, if I set the threshold to 2GB on *my*
machine for *my* application, then I will never be at risk.

But the correct threshold depends on several factors:

- you host platform, mainly its OS, but also any services running on it.
- your application usage patterns, which I cannot speculate on.
- the internal needs of SQLite during space scavenging, which are knowable
though I don't know them. They themselves might depend on your application
usage patterns.

The threshold might be difficult to determine, and might not even be a
constant, but a function of your current data set size. For example, setting
the threshold to the triple of your current database size might be enough -
or totally overkill.

Here is how I would tackle this issue, through experimenting. I would start
by determining what kind of function of the dataset size the threshold it.
Whatever it is, it can be approximated by an affine function for small
dataset sizes:

T(N) = a+b*N

Where T is the threshold, and N is the dataset size. We first need an upper
bound for the constant a. Note that all we need is an upper bound.

More generally speaking, you can always write:

T(N) = T(0)+T'(N) where T'(0)=0, where a=T(0).

1- To assess a=T(0) first build a small -but not empty- dataset, and let
your system and your application run*. Then artificially deplete the
available disk space to zero, for example by storing on it a dummy file.
Then run your scavenging procedure. If it runs OK, then you can use a = 0
(though I'd still use some value >0). But it's possible (likely?) it will
fail because it hasn't got enough disk space. So release some increasing
disk space until your scavenging procedure succeeds (you can use a dichotomy
procedure for example). When it succeeds, the initial free disk space you
had to set aside can be used as T0, an upper bound for T(0).

(*) this procedure assumes your application is the only one depleting disk
space. If that's not the case, then you need to take the other consumers
into consideration.

2- Of course experimenting with a small data set is not good enough since
you want to handle a situation that occurs with large data sets by
definition! So repeat step 1 with a series of datasets of increasing  sizes:
N1, N2, N3, N4... (I would typically double the size at each step).

Using N1, you will get an upper bound T1 for T(N1).
Using N2, you will get an upper bound T2 for T(N2).

And so on.

Eventually, you'll have a set of points for a working T(N). You can fit a
function to that set of points:

- T1 will let you determine whether the threshold can be constant or whether
it needs to have some multiple of the dataset size.

- T2 will let you determine whether an affine function is enough to
represent T(N) or whether you need something more sophisticated.

- T3 and the others will let you see whether a low-order polynomial function
is enough or whether you need to go to an exponential function. I refuse to
consider the possibility that an exponential function would not even be
enough :-)

Of course how far you want to work through all this depends on how certain
you want to be of your system resilience under low storage space conditions.

This is how I would do it. Did I miss anything?

Jean-Denis

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to