On Mon, 17 Feb 2003 08:13:51 -0500 (EST),
Jungshik Shin <[EMAIL PROTECTED]> wrote:

>  Incidentally, it just occurred to me that  ftp/ssh clients may offer an
> user-configurable option for the  automatic removal of  'UTF-8 BOM' at
> the beginning of a text file in UTF-8 when moving files from Windows to
> non-Windows platforms (Unix/Unix-like OS and MacOS). The same is true
> of Kermit (Frank, are you here?).
>
Yes, Kermit does this in both Kermit and FTP protocol (for those who hadn't
heard, Kermit is now also a Unicode-aware FTP client):

  http://www.columbia.edu/kermit/ftpclient.html

> All those tools can be configured
> to translate between three (and nowadays even more?) EOL conventions,
> CF/LF/CR,LF for text files.
>
Kermit on a particular platform understands the text record format of
that platform (CR, LF, CRLF, 80-column card images, length fields, etc)
and converts between it and the standard transfer format, i.e. lines
terminated by CRLF.  Thus it converts between all combinations of record
formats among all the platforms where it runs as a fundamental aspect of
text-mode file transfer.  This applies to both Kermit and FTP transfers.

> Then, the automatic removal(and addition if
> that's regarded as necessary) of UTF-8 BOM at platform boundaries
> would be as useful.
> 
Kermit's BOM removal occurs (or not, as desired) on a per-file basis
(not per record).  Kermit's Unicode features are described here:

  http://www.columbia.edu/kermit/ckermit70.html#x6.6

and (for the FTP client) here:

  http://www.columbia.edu/kermit/ckermit80.html#x3.7.1

For those who don't know, Kermit converts not only record format but also
character sets using the same technique: local set -> standard intermediate
set -> remote set.  The repertoire of character sets it knows about
depends on the platform, but is likely to include PC and Windows code
pages, various other corporate sets such as HP-Roman8, ISO 646 and 8859
sets, the many KOI and JIS variations, and different forms of Unicode.

This occurs during text-mode file transfer as well as online terminal
connections (serial, modem, telnet, ssh, etc).

Kermit also has a TRANSLATE command for converting the character sets
of local text files, and this too can add or remove BOMs at the user's
discretion.

- Frank

Reply via email to