DaB. wrote:

I am following up on a discussion here in October:

>> I run a script that downloads 1200 MB every night
> 
> if you do this, please save the data at 
> 
> /mnt/user-store/
> 
> (create a directoy there). So every usercan use the data and they have to 
> downloaded one 1 time.

Since I am becoming involved with statistics too, I have setup such a 
scheme in /mnt/user-store/stats.  Data files starting from 1 October 
2008 are currently available (emijrp asked if I could get older files 
too, which should be doable but I haven't looked into it yet). I still 
have to fine tune the update process, but basically a cron task will 
take care of this at least every day (probably more often, but I have to 
see when the original files are actually updated)

Let me know if anyone else is interested in using this data.

> Perhaps there is a better way (rsync or something) to get the data from the 
> source.

I use wget; it will not download files twice unless they have been 
modified (which should not happen). Also, files are already gz'ipped, so 
compression would not be of much use here. Even though rsync is a better 
solution on paper, all in all, I don't think it would improve the 
situation much here.

Currently, the directory contains 112 Go, growing by about 1.2 Gb 
everyday. So far, it is not a problem (2.5 Tb are currently available in 
user-store), but I'd like to know when it would start to be considered 
"too big". What do the admins think ?

On the main statistics server of the WMF, Erik Zachte is developing 
scripts to compact these individual hourly files into daily files, 
reducing the size of the data by two; this could also be used here.

Frédéric

_______________________________________________
Toolserver-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/toolserver-l

Reply via email to