On Thu, Jan 14, 2010 at 12:22 PM, Conrad Irwin
<conrad.ir...@googlemail.com> wrote:
> Wiktionary is case-sensitive and so case-folding there may not be
> appropriate; I personally would be interested in seeing these logs
> before even the NFC normalizers get to them (given a lack of any other
> source to find out how people type fun characters in the wild) though I
> can appreciate this is somewhat sadistic, and probably the logs are
> taken too late for this.

The logs are taken from the Squids, long before MediaWiki touches
them, so they shouldn't be normalized at all.

> I don't think the IP addresses should come into the analysis at all,
> though possibly a cut-off at 5 or 10 searches might be useful to prevent
> a huge tail-end of probably useless information (it also might exclude
> cases where people have typed things into the search box by accident -
> maybe they got distracted while logging in)

Some people might search for their own name more than five times in a
week, possibly together with other embarrassing or incriminating
search terms.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to