If we make a cron job, could we also have it purge all SVG thumbnails older 
than say 5 years?

Ryan Kaldari

On Aug 31, 2012, at 5:36 AM, "Ariel T. Glenn" <ar...@wikimedia.org> wrote:

> So it's time to have this discussion again.  At least, I think we're
> having it again, though I could not find previous threads on this list
> about the subject.
> 
> In short, scaled media is currently generated on the fly for any size
> and for any user.  The resulting files are kept around forever or until
> we run perilously short of space, at which point we make some guesses
> about what we can toss and then do a mass purge. Last time we did so, we
> had the rotation bug going at the same time, which made for a real fine
> mess.
> 
> A little bit of crunching shows me that we have about 6 million images
> in use on the projects, and yet we manage to have around 130 million
> thumbnails.  Just for fun I checked to see how many thumbs each image
> has, what sizes we are looking at, etc.  Here's the results.
> 
> Some "standard" sizes are most popular, with between 200K and 640K media
> files having thumbs scaled to each of these widths:  
> 75, 120, 150, 180, 200, 220, 320, 640, 800, 1024, and 1280 pixels
> 
> But there's plenty of "odd" sizes with lots of thumbs too. For example, 
> over 65K files with width 181px, 20K with width 138px.
> 
> As an experiment and before having this data, I purged from ms5 (no
> longer in use for thumbs) 1/16 of the thumbs that were greater than
> 100px wide but not one of these widths:
> 120px, 200px, 220px, 250px, 320px, 640px, 800px
> We got back over 300GB of space.
> 
> The other thing about delivering any scaled version on demand is that we
> have some media files with several hundred different thumb sizes in
> there. Here's a few of the top offenders for your entertainment:
> 
> 2514  wikipedia/commons/thumb/f/f9/Orange_and_cross_section.jpg
> 2285  wikipedia/commons/thumb/f/fb/Thrermal_grease.jpg
> 2218  wikipedia/commons/thumb/f/fc/Blue_sport.jpg
> 2071  wikipedia/commons/thumb/f/f3/Flag_of_Switzerland.svg
> 2062  wikipedia/commons/thumb/f/f2/Flag_of_Costa_Rica.svg
> 2034  wikipedia/commons/thumb/f/f8/Wiktionary-logo-en.svg
> 1915  wikipedia/commons/thumb/f/f6/VeulesLesRoses.JPG
> 1689  wikipedia/commons/thumb/f/fa/Wikibooks-logo.svg
> 1447  wikipedia/commons/thumb/f/fa/Wikiquote-logo.svg
> 1371  wikipedia/commons/thumb/f/f0/Mori_Uncanny_Valley.svg
> 1249  wikipedia/commons/thumb/f/f5/Grand_prismatic_spring.jpg
> 1246  wikipedia/commons/thumb/f/f3/Mature.jpg
> 1191  wikipedia/commons/thumb/f/f7/Kirchdorf_in_Tirol.JPG
> 1187  wikipedia/commons/thumb/f/f8/Camille_Cabral_pour_les_Trans.JPG
> 1143  wikipedia/commons/thumb/f/f7/Profanity.svg
> 1079  wikipedia/commons/thumb/f/f2/HSV_color_solid_cone.png
> 1040  wikipedia/commons/thumb/f/f2/Carmen_Electra.jpg
> 1032  wikipedia/commons/thumb/f/f1/Pink_eye.jpg
> 1001  wikipedia/commons/thumb/f/f6/USNS_Medgar_Evers_announcement.jpg
> 
> I'd comment on some of those but I'd be too snarky.
> 
> So there are some things we could change:
> 
> 1.  We could generate and keep only certain sizes, tossing the rest.
> 2.  We could keep *nothing*, scaling all media as required.
> 3.  We could have a cron job that was clever about tossing thumbs every
> day (not sure how easy it would be to be clever).
> 4.  ??
> 
> In any of these cases, the squids will have copies of recently requested
> scaled media, so we won't be scaling the same file to the same size over
> and over in a short time frame.
> 
> What do folks think about how to proceed?
> 
> Ariel
> 
> 
> 
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to