RE: Unicode and end users

Yves Arrouye Sat, 16 Feb 2002 20:45:17 -0800

> If "foo" is a US-ASCII string, "grep foo file" will work fine with any
> US-ASCII-superset charset for which non-ASCII characters do not use
> bytes < 0x80, including the hypothetical one I described, with no
> possibility of a false match. However "grep fóó file" will work only
> if the current shell charset (i.e. of argv[1]) matches the encoding of
> "file".


Not necessarily. It will work as long as the sequence of 3 bytes fóó is the
representation of the string you are looking for in the file, in that file's
encoding. grep does not validate anything, nor should it IMHO. If you want
to guarantee the encoding, use a converter like ICU's uconv(1) or iconv(1).

YA

RE: Unicode and end users

Reply via email to