Jamie Wilkinson wrote:
Jeremy Portzer wrote:
Common confusion/misconception is that \n refers to LF only. This is not
always the case-usually it refers to the portable "newline" that gets
expanded to the proper characters depending on platform or context.
Over the wire, it is just newline, no carriage return.
No, using your definition of "newline = LF", this is incorrect.
Standard TCP/IP protocols like SMTP and HTTP use CR+LF !
The \r component of
the CRLF bog is only a problem when you're doing file IO. For a wire
protocol it's the \n that counts.
You are still confusing the issue by using the semi-portable notation of
"\n" (newline) to refer interchangeably to the LF (linefeed) character,
ASCII 10. This is imprecise - but you're not the first to be tripped up
by this.
As I stated, \n (or "newline") and LF (ASCII 10) are NOT precisely the
same thing. They are equivalent when dealing with Unix files, yes - so
many people with Unix/Linux background tend to think of them
interchangeably, but this isn't the case on other platforms. Yes, if
you write a C or Perl program on Unix and want to make a DOS compatible
file, you can use \r\n and it will work - because in Unix, \n becomes
the ASCII 10, so the combined sequence is chr(13) then chr(10), or
CR+LF. But if you compile that same program on Windows you will end up
with the sequence CR+CR+LF since \n on Windows means CR+LF. This make
sense? Again, \n is supposed to mean "the newline character sequence
on the relevant platform" - and this ONLY equates to LF on some
platforms, like Unix.
When dealing with TCP/IP protocols, the newline (\n) sequence is
typically expanded to CRLF, just like Windows/DOS files. I have not
done C socket programming so I don't know whether \n automatically
expands to CR+LF in standard socket libraries, or whether it is the
responsibility of the programmer. But certainly, user-land utilities
like "netcat" or "telnet" take care of this translation for you, as do
Perl, PHP, and similar scripting languages.
References:
http://en.wikipedia.org/wiki/Newline talks about the general problem of
the definition of "newline"
http://www.faqs.org/rfcs/rfc822.html - clearly defines the line
separator as "CRLF" for Internet messages (email)
http://www.faqs.org/rfcs/rfc1945.html (HTTP 1.0) uses the same
definition for protocol elements
Background for those who aren't familiar: CR is "carriage return" -
which on an old teletype/typewriter, means to move the carriage head
back to the start. LF, or line feed, advances the paper one line. You
need both of these to start a new line, so the DOS/Windows or TCP/IP
interpretation is more technically correct for a teletype system. I
guess Unix tried to simplify things by only using LF, trying to get away
from physical aspects of the device. The wikipedia article has more on
this esoterica.
Hope this helps,
Jeremy Portzer
newline pedant
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html