https://bugzilla.wikimedia.org/show_bug.cgi?id=32478

Platonides <platoni...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |platoni...@gmail.com

--- Comment #3 from Platonides <platoni...@gmail.com> 2011-11-18 22:31:05 UTC 
---
(In reply to comment #2)
> Ah, the 'gzip,external/simple pointer' is stuff not marked as UTF-8... that
> might be a bit worrying actually. :)
> 
> Shouldn't occur on new entries unless there's some config special case off the
> top of my head. Whether those entries are problematic or not depends on what
> the $wgLegacyEncoding setting is on the sites those blobs belong to.

This is interesting. I went to inquiry about a just-produced one (eswiki)

+----------+---------------------+
| old_id   | old_flags           |
+----------+---------------------+
| 51956028 | utf-8,gzip,external |
| 51956027 | utf-8,gzip,external |
| 51956026 | utf-8,gzip,external |
| 51956025 | utf-8,gzip,external |
| 51956024 | utf-8,gzip,external |
| 51956023 | gzip,external       |
| 51956022 | utf-8,gzip,external |
| 51956021 | utf-8,gzip,external |
| 51956020 | utf-8,gzip,external |
| 51956019 | utf-8,gzip,external |
+----------+---------------------+


It turns out it doesn't (apparently) have revision:

select rev_id, rev_text_id, old_flags  from revision join text on
(rev_text_id=old_id) where rev_id <= 51506304 order by rev_id desc limit 10;

+----------+-------------+---------------------+
| rev_id   | rev_text_id | old_flags           |
+----------+-------------+---------------------+
| 51506304 |    51956026 | utf-8,gzip,external |
| 51506303 |    51956025 | utf-8,gzip,external |
| 51506302 |    51956024 | utf-8,gzip,external | <--
| 51506301 |    51956022 | utf-8,gzip,external | <-- 
| 51506300 |    51956021 | utf-8,gzip,external |
| 51506299 |    51956020 | utf-8,gzip,external |
| 51506298 |    51956019 | utf-8,gzip,external |
| 51506297 |    51956018 | utf-8,gzip,external |
| 51506296 |    51956016 | utf-8,gzip,external |
| 51506295 |    51956015 | utf-8,gzip,external |
+----------+-------------+---------------------+

Special:Recentchanges doesn't show anything suspicious around those two
entries.

There were three Abusefilter hits at that time,
Especial:AbuseLog/1077805-1077807 and it more or less correlates with the
number of gzip,external entries.

AbuseFilter is indeed storing items in text table, *and not setting utf-8 flag
for them*.
So I think all of them will be AbuseFilter hits, which are utf-8 but not marked
as such, not content in legacy encoding.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to