If we make a cron job, could we also have it purge all SVG thumbnails older than say 5 years?
Ryan Kaldari On Aug 31, 2012, at 5:36 AM, "Ariel T. Glenn" <ar...@wikimedia.org> wrote: > So it's time to have this discussion again. At least, I think we're > having it again, though I could not find previous threads on this list > about the subject. > > In short, scaled media is currently generated on the fly for any size > and for any user. The resulting files are kept around forever or until > we run perilously short of space, at which point we make some guesses > about what we can toss and then do a mass purge. Last time we did so, we > had the rotation bug going at the same time, which made for a real fine > mess. > > A little bit of crunching shows me that we have about 6 million images > in use on the projects, and yet we manage to have around 130 million > thumbnails. Just for fun I checked to see how many thumbs each image > has, what sizes we are looking at, etc. Here's the results. > > Some "standard" sizes are most popular, with between 200K and 640K media > files having thumbs scaled to each of these widths: > 75, 120, 150, 180, 200, 220, 320, 640, 800, 1024, and 1280 pixels > > But there's plenty of "odd" sizes with lots of thumbs too. For example, > over 65K files with width 181px, 20K with width 138px. > > As an experiment and before having this data, I purged from ms5 (no > longer in use for thumbs) 1/16 of the thumbs that were greater than > 100px wide but not one of these widths: > 120px, 200px, 220px, 250px, 320px, 640px, 800px > We got back over 300GB of space. > > The other thing about delivering any scaled version on demand is that we > have some media files with several hundred different thumb sizes in > there. Here's a few of the top offenders for your entertainment: > > 2514 wikipedia/commons/thumb/f/f9/Orange_and_cross_section.jpg > 2285 wikipedia/commons/thumb/f/fb/Thrermal_grease.jpg > 2218 wikipedia/commons/thumb/f/fc/Blue_sport.jpg > 2071 wikipedia/commons/thumb/f/f3/Flag_of_Switzerland.svg > 2062 wikipedia/commons/thumb/f/f2/Flag_of_Costa_Rica.svg > 2034 wikipedia/commons/thumb/f/f8/Wiktionary-logo-en.svg > 1915 wikipedia/commons/thumb/f/f6/VeulesLesRoses.JPG > 1689 wikipedia/commons/thumb/f/fa/Wikibooks-logo.svg > 1447 wikipedia/commons/thumb/f/fa/Wikiquote-logo.svg > 1371 wikipedia/commons/thumb/f/f0/Mori_Uncanny_Valley.svg > 1249 wikipedia/commons/thumb/f/f5/Grand_prismatic_spring.jpg > 1246 wikipedia/commons/thumb/f/f3/Mature.jpg > 1191 wikipedia/commons/thumb/f/f7/Kirchdorf_in_Tirol.JPG > 1187 wikipedia/commons/thumb/f/f8/Camille_Cabral_pour_les_Trans.JPG > 1143 wikipedia/commons/thumb/f/f7/Profanity.svg > 1079 wikipedia/commons/thumb/f/f2/HSV_color_solid_cone.png > 1040 wikipedia/commons/thumb/f/f2/Carmen_Electra.jpg > 1032 wikipedia/commons/thumb/f/f1/Pink_eye.jpg > 1001 wikipedia/commons/thumb/f/f6/USNS_Medgar_Evers_announcement.jpg > > I'd comment on some of those but I'd be too snarky. > > So there are some things we could change: > > 1. We could generate and keep only certain sizes, tossing the rest. > 2. We could keep *nothing*, scaling all media as required. > 3. We could have a cron job that was clever about tossing thumbs every > day (not sure how easy it would be to be clever). > 4. ?? > > In any of these cases, the squids will have copies of recently requested > scaled media, so we won't be scaling the same file to the same size over > and over in a short time frame. > > What do folks think about how to proceed? > > Ariel > > > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l