On Wed, Feb 18, 2004 at 05:13:34PM -0700, Darryl Bleau wrote:
> > URIs should be HTML decoded.  I've run into a number of spammers who
> > write things like:
> > 
> >     ht&#0x00054;p://
> 
> This is a good idea. So if you wanted to match an encoded URI, you would
> match on Full, otherwise, you could match against URI.

Just an fyi...  In 3.0.0, the URI list will contain the "raw" (as listed
in the message) and "cooked" (decode things that don't need encoding,
encode things that need encoding, etc) ...  So you can easily catch stuff
like the above. :)  I don't think we currently support all the encoding
methods above though...  Definitely the %## version, and &\####; ...

Is &\#0x####; valid?

> Full -> Raw (as this really is the 'raw' message)
> Rawbody -> Decoded or RawDecoded (if this were just like the 'raw' message, 
> but with decoded parts)
> Body -> Text (just to make it more explicit that this is the text from the 
> message)

In 3.0.0:

body -> Text -- fully decoded and HTML rendered into text
rawbody -> Text -- fully decoded, but not rendered
full -> Raw -- the pristine message as passed to SA

:)

-- 
Randomly Generated Tagline:
OK, enough hype.
              -- Larry Wall in the perl man page

Attachment: pgp7Gzk2AYalL.pgp
Description: PGP signature

Reply via email to