https://bugzilla.wikimedia.org/show_bug.cgi?id=32478
Platonides <platoni...@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |platoni...@gmail.com --- Comment #3 from Platonides <platoni...@gmail.com> 2011-11-18 22:31:05 UTC --- (In reply to comment #2) > Ah, the 'gzip,external/simple pointer' is stuff not marked as UTF-8... that > might be a bit worrying actually. :) > > Shouldn't occur on new entries unless there's some config special case off the > top of my head. Whether those entries are problematic or not depends on what > the $wgLegacyEncoding setting is on the sites those blobs belong to. This is interesting. I went to inquiry about a just-produced one (eswiki) +----------+---------------------+ | old_id | old_flags | +----------+---------------------+ | 51956028 | utf-8,gzip,external | | 51956027 | utf-8,gzip,external | | 51956026 | utf-8,gzip,external | | 51956025 | utf-8,gzip,external | | 51956024 | utf-8,gzip,external | | 51956023 | gzip,external | | 51956022 | utf-8,gzip,external | | 51956021 | utf-8,gzip,external | | 51956020 | utf-8,gzip,external | | 51956019 | utf-8,gzip,external | +----------+---------------------+ It turns out it doesn't (apparently) have revision: select rev_id, rev_text_id, old_flags from revision join text on (rev_text_id=old_id) where rev_id <= 51506304 order by rev_id desc limit 10; +----------+-------------+---------------------+ | rev_id | rev_text_id | old_flags | +----------+-------------+---------------------+ | 51506304 | 51956026 | utf-8,gzip,external | | 51506303 | 51956025 | utf-8,gzip,external | | 51506302 | 51956024 | utf-8,gzip,external | <-- | 51506301 | 51956022 | utf-8,gzip,external | <-- | 51506300 | 51956021 | utf-8,gzip,external | | 51506299 | 51956020 | utf-8,gzip,external | | 51506298 | 51956019 | utf-8,gzip,external | | 51506297 | 51956018 | utf-8,gzip,external | | 51506296 | 51956016 | utf-8,gzip,external | | 51506295 | 51956015 | utf-8,gzip,external | +----------+-------------+---------------------+ Special:Recentchanges doesn't show anything suspicious around those two entries. There were three Abusefilter hits at that time, Especial:AbuseLog/1077805-1077807 and it more or less correlates with the number of gzip,external entries. AbuseFilter is indeed storing items in text table, *and not setting utf-8 flag for them*. So I think all of them will be AbuseFilter hits, which are utf-8 but not marked as such, not content in legacy encoding. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l