On Tue, 2009-06-30 at 13:14 +0200, Jan P. Kessler wrote: > Martin Gregorie schrieb: > >> ... go to WWW EVIL ORG for new meds ... > >> > >> and > >> > >> ... digging through the WWW HE SAW this link ... > >> > > Both IMO should be caught and given a positive score. I've never seen > > legitimate mail containing URLs written this way. > > Maybe I was not clear: The last one is NOT an url. Do you really want to > use the whole bunch of SA's URI tests against sentences like: > What makes you think I'm using URI tests or that any of these would be recognised as a URI? My tests are simple body tests with {1,n} limits on repetitions to keep things under control.
> And again: What about urls that do not start with www? > So far, all the munged URLs I've seen have started with www. If that changes the rules can be easily extended, but IMO its unlikely to change since the punters are being invited to 'repair' something they are intended to recognise as a web address. > Which characters > should be examined for obfuscation ([ ,;:|?!=])? > So far, only space, tab and stop have been used. On the face of it, no more are likely. The target audience must pretty thick if they actually 'repair' these urls before cutting and pasting into the brower's search box, so my guess is that said target audience would either not recognise further obfuscation as a url or they would retain any other non-whitespace characters and then wonder why their browser won't do what they want. What's the betting they'd even call their help desk to complain? Martin