[SLUG] Newlines - was Re: Comsec behaving badly

Jeremy Portzer Mon, 08 Oct 2007 04:13:17 -0700

Jamie Wilkinson wrote:

 Jeremy Portzer wrote:
Common confusion/misconception is that \n refers to LF only. This is notalways the case-usually it refers to the portable "newline" that getsexpanded to the proper characters depending on platform or context.
Over the wire, it is just newline, no carriage return.

No, using your definition of "newline = LF", this is incorrect.Standard TCP/IP protocols like SMTP and HTTP use CR+LF !

The \r component of
the CRLF bog is only a problem when you're doing file IO.  For a wire
protocol it's the \n that counts.

You are still confusing the issue by using the semi-portable notation of"\n" (newline) to refer interchangeably to the LF (linefeed) character,ASCII 10. This is imprecise - but you're not the first to be tripped upby this.

As I stated, \n (or "newline") and LF (ASCII 10) are NOT precisely thesame thing. They are equivalent when dealing with Unix files, yes - somany people with Unix/Linux background tend to think of theminterchangeably, but this isn't the case on other platforms. Yes, ifyou write a C or Perl program on Unix and want to make a DOS compatiblefile, you can use \r\n and it will work - because in Unix, \n becomesthe ASCII 10, so the combined sequence is chr(13) then chr(10), orCR+LF. But if you compile that same program on Windows you will end upwith the sequence CR+CR+LF since \n on Windows means CR+LF. This makesense? Again, \n is supposed to mean "the newline character sequenceon the relevant platform" - and this ONLY equates to LF on someplatforms, like Unix.

When dealing with TCP/IP protocols, the newline (\n) sequence istypically expanded to CRLF, just like Windows/DOS files. I have notdone C socket programming so I don't know whether \n automaticallyexpands to CR+LF in standard socket libraries, or whether it is theresponsibility of the programmer. But certainly, user-land utilitieslike "netcat" or "telnet" take care of this translation for you, as doPerl, PHP, and similar scripting languages.


References:

http://en.wikipedia.org/wiki/Newline talks about the general problem ofthe definition of "newline"

http://www.faqs.org/rfcs/rfc822.html - clearly defines the lineseparator as "CRLF" for Internet messages (email)

http://www.faqs.org/rfcs/rfc1945.html (HTTP 1.0) uses the samedefinition for protocol elements

Background for those who aren't familiar: CR is "carriage return" -which on an old teletype/typewriter, means to move the carriage headback to the start. LF, or line feed, advances the paper one line. Youneed both of these to start a new line, so the DOS/Windows or TCP/IPinterpretation is more technically correct for a teletype system. Iguess Unix tried to simplify things by only using LF, trying to get awayfrom physical aspects of the device. The wikipedia article has more onthis esoterica.


Hope this helps,
Jeremy Portzer
newline pedant


--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

[SLUG] Newlines - was Re: Comsec behaving badly

Reply via email to