On Thu, Jan 14, 2010 at 12:22 PM, Conrad Irwin <conrad.ir...@googlemail.com> wrote: > Wiktionary is case-sensitive and so case-folding there may not be > appropriate; I personally would be interested in seeing these logs > before even the NFC normalizers get to them (given a lack of any other > source to find out how people type fun characters in the wild) though I > can appreciate this is somewhat sadistic, and probably the logs are > taken too late for this.
The logs are taken from the Squids, long before MediaWiki touches them, so they shouldn't be normalized at all. > I don't think the IP addresses should come into the analysis at all, > though possibly a cut-off at 5 or 10 searches might be useful to prevent > a huge tail-end of probably useless information (it also might exclude > cases where people have typed things into the search box by accident - > maybe they got distracted while logging in) Some people might search for their own name more than five times in a week, possibly together with other embarrassing or incriminating search terms. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l