> Given any sizeable chunk of text, it ought to be possible to estimate > the statistical likelihood of its being in a certain > encoding/[language] even if it's in an unspecified 8859-* encoding. > It would be quite an interesting exercise, but I'd be surprised if > someone hasn't done it before. Perhaps someone here knows.
http://www.let.rug.nl/~vannoord/TextCat/ has a paper on the subject and an implemenation in Perl. http://mnogosearch.org has an alternate implementation in compiled code (called mguesser). -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm

