Re: [Wikitech-l] not all tables need to be backed up

2009-03-16 Thread Roan Kattouw
Aryeh Gregor schreef:
> On Sun, Mar 15, 2009 at 3:02 PM, Gerard Meijssen
>  wrote:
>> The one thing missing in this discussion is a risk assessment and the
>> importance given to maintaing our infrastructure availability.
> 
> What's "our"?  Jidanni is talking about backing up his own wiki, not
> any Wikimedia wiki.
> 
On top of that, restoring from dumps with these tables empty doesn't 
break anything. While the rebuild is in progress, features relying on 
them (search, whatlinkshere, category views) will show incomplete 
results, but if that means restoring is faster, I guess that's acceptable.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] not all tables need to be backed up

2009-03-15 Thread Daniel Kinzler
Aryeh Gregor schrieb:
> On Sun, Mar 15, 2009 at 2:43 PM,   wrote:
>> OK, but then the structure of those tables are gone from the dump too,
>> not just their contents. We cannot thus recover from scratch straight
>> from the SQL dump. But perhaps we then can immediately run a
>> maintenance script to recreate the missing tables' structure?
> 
> Then you could just do
> 
> $ mysqldump --ignore-table=my_database.wiki_objectcache
> --ignore-table=my_database.wiki_searchindex my_database > backup.sql
> $ mysqldump -d my_database wiki_objectcache wiki_searchindex >> backup.sql

Or you can make an extra backup with just the structure:

mysqldump -d yourdatabase > structure.sql

or use maintenance/tables.sql to re-create the structure (if you use a table
prefix, you need to run an appropriate search&replace first).

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] not all tables need to be backed up

2009-03-15 Thread Aryeh Gregor
On Sun, Mar 15, 2009 at 3:02 PM, Gerard Meijssen
 wrote:
> The one thing missing in this discussion is a risk assessment and the
> importance given to maintaing our infrastructure availability.

What's "our"?  Jidanni is talking about backing up his own wiki, not
any Wikimedia wiki.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] not all tables need to be backed up

2009-03-15 Thread Gerard Meijssen
Hoi,
With all respect, what is the purpose of this theoretical excercise ? You
_may_ be able to ommit selected files when you make a backup. This comes at
a cost. The cost is the time that it will take to recreate the data pr the
loss of that data. This is fairly obvious. It is equally obvious that a
recovery can take a considerable amount of time. When the system is
essentially available for users, this may be not that much of a problem.
When the system is not available for users for an extended period of time,
it may be considered problematic.

The one thing missing in this discussion is a risk assessment and the
importance given to maintaing our infrastructure availability. When downtime
is considered to be acceptable, it can be considered to omit files from a
backup. In all scenarios it makes sense to define a backup and recovery
procedure and test it. I know first hand of backup procedures that were done
regularly and in the end proved to be of no value.
Thanks,
  GerardM

2009/3/15 

> AG> Why don't you just do:
>
> AG> $ mysqldump --ignore-table=my_database.wiki_objectcache
> AG> --ignore-table=my_database.wiki_searchindex my_database
>
> OK, but then the structure of those tables are gone from the dump too,
> not just their contents. We cannot thus recover from scratch straight
> from the SQL dump. But perhaps we then can immediately run a
> maintenance script to recreate the missing tables' structure?
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] not all tables need to be backed up

2009-03-15 Thread Aryeh Gregor
On Sun, Mar 15, 2009 at 2:43 PM,   wrote:
> OK, but then the structure of those tables are gone from the dump too,
> not just their contents. We cannot thus recover from scratch straight
> from the SQL dump. But perhaps we then can immediately run a
> maintenance script to recreate the missing tables' structure?

Then you could just do

$ mysqldump --ignore-table=my_database.wiki_objectcache
--ignore-table=my_database.wiki_searchindex my_database > backup.sql
$ mysqldump -d my_database wiki_objectcache wiki_searchindex >> backup.sql

or something similar to that.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] not all tables need to be backed up

2009-03-15 Thread jidanni
AG> Why don't you just do:

AG> $ mysqldump --ignore-table=my_database.wiki_objectcache
AG> --ignore-table=my_database.wiki_searchindex my_database

OK, but then the structure of those tables are gone from the dump too,
not just their contents. We cannot thus recover from scratch straight
from the SQL dump. But perhaps we then can immediately run a
maintenance script to recreate the missing tables' structure?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] not all tables need to be backed up

2009-03-14 Thread Aryeh Gregor
On Sat, Mar 14, 2009 at 1:33 AM,   wrote:
> Looking at the mysqldump man page, one finds no perfect options to do
> so, so instead makes one's own script:
>
> $ mysqldump my_database|
> perl -nwle 'BEGIN{$dontdump="wiki_(objectcache|searchindex)"}
> s/(^-- )(Dumping data for table `$dontdump`$)/$1NOT $2/;
> next if /^LOCK TABLES `$dontdump` WRITE;$/../^UNLOCK TABLES;$/;
> print;'

Why don't you just do:

$ mysqldump --ignore-table=my_database.wiki_objectcache
--ignore-table=my_database.wiki_searchindex my_database

Certainly you can skip objectcache.  searchindex can be rebuilt.  So
can all the *links tables, and the redirect table, and probably some
others.  Of course, rebuilding all these tables on backup restore
might take an awfully long time, which you need to weigh against the
convenience of not having to back them up.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] not all tables need to be backed up

2009-03-14 Thread Daniel Kinzler
Chad schrieb:
> On Sat, Mar 14, 2009 at 1:33 AM,   wrote:
>> Gentlemen, it occurred to me that under close examination one finds
>> that when making a backup of one's wiki's database, some of the tables
>> dumped have various degrees of temporariness, and thus though needing
>> to be present in a proper dump, could perhaps be emptied of their
>> values, saving much space in the SQL.bz2 etc. file produced.
>>
>> Looking at the mysqldump man page, one finds no perfect options to do
>> so, so instead makes one's own script:
>>
>> $ mysqldump my_database|
>> perl -nwle 'BEGIN{$dontdump="wiki_(objectcache|searchindex)"}
>> s/(^-- )(Dumping data for table `$dontdump`$)/$1NOT $2/;
>> next if /^LOCK TABLES `$dontdump` WRITE;$/../^UNLOCK TABLES;$/;
>> print;'
>>
>> Though not myself daring to make any recommendations on
>> http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki#Tables
>> I am still curious which tables can be emptied always,
>> which can be emptied if one is willing to remember to run a
>> maintenance script to resurrect their contents, etc.
>>
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> 
> Really, the only 3 that are needed are page, revision and text.
> Of course, to keep old versions of stuff you'll need archive, oldimage
> and filearchive too.

That would be sufficient to keep page content. You would however also want to
keep the user and user_groups tables, probably. and the interwiki table, for it
can not be restored and determins the interpretation of link prefixes. The log,
too, can't be restored, but if you need it is another question. The image table
can generally be restored by looking at the files, though the result may not be
exactly the same as the original.

I think it's better to put it this way: tables with "cache" in their name can
safely be truncated, the same is true for the profiling table (if used at all).
Tables with "link" in their name can always be rebuild, though it may take a
while, same for the searchindex.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] not all tables need to be backed up

2009-03-13 Thread Chad
On Sat, Mar 14, 2009 at 1:33 AM,   wrote:
> Gentlemen, it occurred to me that under close examination one finds
> that when making a backup of one's wiki's database, some of the tables
> dumped have various degrees of temporariness, and thus though needing
> to be present in a proper dump, could perhaps be emptied of their
> values, saving much space in the SQL.bz2 etc. file produced.
>
> Looking at the mysqldump man page, one finds no perfect options to do
> so, so instead makes one's own script:
>
> $ mysqldump my_database|
> perl -nwle 'BEGIN{$dontdump="wiki_(objectcache|searchindex)"}
> s/(^-- )(Dumping data for table `$dontdump`$)/$1NOT $2/;
> next if /^LOCK TABLES `$dontdump` WRITE;$/../^UNLOCK TABLES;$/;
> print;'
>
> Though not myself daring to make any recommendations on
> http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki#Tables
> I am still curious which tables can be emptied always,
> which can be emptied if one is willing to remember to run a
> maintenance script to resurrect their contents, etc.
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Really, the only 3 that are needed are page, revision and text.
Of course, to keep old versions of stuff you'll need archive, oldimage
and filearchive too.

For each table you remove from a dump, that's data you're losing. With
some tables (recentchanges, *links) you can repopulate the data. Others
you can't. I guess it comes down to deciding which data is important for
you to backup.

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] not all tables need to be backed up

2009-03-13 Thread jidanni
Gentlemen, it occurred to me that under close examination one finds
that when making a backup of one's wiki's database, some of the tables
dumped have various degrees of temporariness, and thus though needing
to be present in a proper dump, could perhaps be emptied of their
values, saving much space in the SQL.bz2 etc. file produced.

Looking at the mysqldump man page, one finds no perfect options to do
so, so instead makes one's own script:

$ mysqldump my_database|
perl -nwle 'BEGIN{$dontdump="wiki_(objectcache|searchindex)"}
s/(^-- )(Dumping data for table `$dontdump`$)/$1NOT $2/;
next if /^LOCK TABLES `$dontdump` WRITE;$/../^UNLOCK TABLES;$/;
print;'

Though not myself daring to make any recommendations on
http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki#Tables
I am still curious which tables can be emptied always,
which can be emptied if one is willing to remember to run a
maintenance script to resurrect their contents, etc.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l