Jamie Wilkinson wrote:
 Jeremy Portzer wrote:

Common confusion/misconception is that \n refers to LF only. This is not always the case-usually it refers to the portable "newline" that gets expanded to the proper characters depending on platform or context.

Over the wire, it is just newline, no carriage return.

No, using your definition of "newline = LF", this is incorrect. Standard TCP/IP protocols like SMTP and HTTP use CR+LF !

The \r component of
the CRLF bog is only a problem when you're doing file IO.  For a wire
protocol it's the \n that counts.

You are still confusing the issue by using the semi-portable notation of "\n" (newline) to refer interchangeably to the LF (linefeed) character, ASCII 10. This is imprecise - but you're not the first to be tripped up by this.

As I stated, \n (or "newline") and LF (ASCII 10) are NOT precisely the same thing. They are equivalent when dealing with Unix files, yes - so many people with Unix/Linux background tend to think of them interchangeably, but this isn't the case on other platforms. Yes, if you write a C or Perl program on Unix and want to make a DOS compatible file, you can use \r\n and it will work - because in Unix, \n becomes the ASCII 10, so the combined sequence is chr(13) then chr(10), or CR+LF. But if you compile that same program on Windows you will end up with the sequence CR+CR+LF since \n on Windows means CR+LF. This make sense? Again, \n is supposed to mean "the newline character sequence on the relevant platform" - and this ONLY equates to LF on some platforms, like Unix.

When dealing with TCP/IP protocols, the newline (\n) sequence is typically expanded to CRLF, just like Windows/DOS files. I have not done C socket programming so I don't know whether \n automatically expands to CR+LF in standard socket libraries, or whether it is the responsibility of the programmer. But certainly, user-land utilities like "netcat" or "telnet" take care of this translation for you, as do Perl, PHP, and similar scripting languages.

References:

http://en.wikipedia.org/wiki/Newline talks about the general problem of the definition of "newline"

http://www.faqs.org/rfcs/rfc822.html - clearly defines the line separator as "CRLF" for Internet messages (email)

http://www.faqs.org/rfcs/rfc1945.html (HTTP 1.0) uses the same definition for protocol elements


Background for those who aren't familiar: CR is "carriage return" - which on an old teletype/typewriter, means to move the carriage head back to the start. LF, or line feed, advances the paper one line. You need both of these to start a new line, so the DOS/Windows or TCP/IP interpretation is more technically correct for a teletype system. I guess Unix tried to simplify things by only using LF, trying to get away from physical aspects of the device. The wikipedia article has more on this esoterica.

Hope this helps,
Jeremy Portzer
newline pedant


--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to