Re: Seeking clarification of Desktop Entry Specification

2020-04-24 Thread Bollinger, John C

(1)  #basic-format says "A file is interpreted as a series of lines that
are separated by linefeed characters."  #value-types says "The escape
sequences \s, \n, \t, \r, and \\ are supported for values of type string
and localestring, meaning ASCII space, newline, tab, carriage return,
and backslash, respectively."  Even though the former is about the
structure of the file itself while the latter is about the encoded
payload, it is confusing that one talks about "linefeed" while the other
talks about "newline" and "carriage return".  Should "newline" read
"linefeed" (meaning U+000A LINE FEED) instead?

When referring to ASCII characters, the names "newline" and "line feed" are 
synonymous.  Both names refer to the character with ASCII code 10 (decimal).  
Additionally, both names convey the same idea for the action of a line-printer 
type output device: the print head is advanced to the next line (or, more 
accurately, the paper is scrolled one line forward).  I would be inclined to 
say that no, the escape sequence '\n' should not be described as referring to a 
line "feed instead" of a "newline", because '\n' is mnemonic for "newline", and 
that is the prevailing terminology from C and C-influenced languages, from 
whence "\n" comes.

On the other hand, I take the words "linefeed character" in the description of 
the format to be chosen intentionally to avoid ambiguity about line 
termination.  Desktop files are specified to consist of lines separated by 
exactly one linefeed character, regardless of whether that matches the standard 
line-termination semantics for text files on the host system.  This word choice 
thus minimizes confusion that might otherwise arise around the fact that C and 
C++, when reading or writing a in text mode, automatically translate between 
newline (linefeed) characters internally and whatever line termination is 
locally appropriate externally.  The design choice results in the 
interpretation of .desktop files being insensitive to the conventions of the 
host environment.

Whereas I agree that the disparity in word choice is potentially confusing, I 
do not think that the wording should be changed in either place.  Possibly, 
however, there is room for a clarifying comment.


(2)  #entries says "Space before and after the equals sign should be
ignored".  Does that mean just U+0020 SPACE, or also other kinds of
white space, like U+0009 CHARACTER TABULATION?

Inasmuch as the wording says simply "space", and not "space characters", I take 
it to be inclusive of any sequences of U+0020 and / or U+0009 characters.  
Neither of these may appear in keys, in any case, so the alternative to 
accepting them both as constituents of "space" is to reject files that use any 
tabs between key and "=".  But I do agree that this is ambiguous and should be 
cleared up.  In particular, since desktop files are encoded in UTF-8, they can 
also contain any of the relatively many other characters that Unicode 
categorizes as space characters, and it is unclear whether it is intended that 
the "space" around '=' signs be inclusive of all these.  I suspect that 
implementations generally accept only U+0020 and U+0009 as "space" in this 
sense, and maybe U+000D, but it is hard to justify that specific choice from 
the wording.


(3)  It is unclear exactly when the escape sequences mentioned in (1)
need to be used in string/localestring values:

*  "\\" apparently needs to be used at least whenever the following
character is one of "s", "n", "t", "r", or "\".  But what about
sequences like "\a", does it render the file ill-formed, or is it an
accepted shortcut for the fully escaped "\\a"?

That is a fair question, and one whose answer I agree is ambiguous.  I'm would 
be inclined to guess that there is a diversity of implementation.  It is clear 
that "\a" is not a recognized escape sequence, as it is not included in the 
enumeration of those, so what is it?  Absent an update to the spec, I would be 
inclined to say that authors should avoid writing such combinations, and 
processors should interpret any they encounter as if they were "\\a".

Note also that there are special rules for an additional level of quoting for 
values of "Exec" keys.

*  "\n" apparently needs to always be used (at least with the "newline"
vs. "linefeed" clarification from (1)).

If you want a newline (linefeed) character in a value then you must represent 
it as "\n", because a literal newline would terminate the value.


*  "\s" (and maybe also "\t" and "\r"?) apparently needs to be used at
the very start of a string/localestring value (see (2)).  But does it
also need to be used e.g. at the very end of such a value?  (From common
practice, it appears that it at least doesn't need to be used for a
space somewhere in the middle of such a value.)

*  What about "\t" and "\r"?

"\s" is definitely needed, and I would argue "\t", too, if spaces or tabs are 
wanted in a value before the first non-"space" character.  On the other 

Re: Seeking clarification of Desktop Entry Specification

2020-04-24 Thread rhkramer
On Friday, April 24, 2020 06:37:59 AM Stephan Bergmann wrote:
> I have three questions regarding
>  ec-1.1.html>:
> 
> (1)  #basic-format says "A file is interpreted as a series of lines that
> are separated by linefeed characters."  #value-types says "The escape
> sequences \s, \n, \t, \r, and \\ are supported for values of type string
> and localestring, meaning ASCII space, newline, tab, carriage return,
> and backslash, respectively."  Even though the former is about the
> structure of the file itself while the latter is about the encoded
> payload, it is confusing that one talks about "linefeed" while the other
> talks about "newline" and "carriage return".  Should "newline" read
> "linefeed" (meaning U+000A LINE FEED) instead?

Replying only to (1) re linefeed / newline characters:

Some of the ambiguity / confusion no doubt is because of differences among 
Windows / Linux / Mac usage to indicate line ends.

I won't get these details correct, but you'll get the idea:

Linux uses \n to indicate the end of a line

Windows uses (iirc) \r\n (2 characters, but maybe it is \n\r) to indicate the 
end of a line

Mac (at least the older versions -- the newer versions, based on BSD may use 
\n like Linux) uses \r to indicate the end of a line

(When I talk about things like this, I typically point out that, in (wrapped) 
text files, those line endings indicate the end of a paragraph, not a single 
line -- maybe I'm being a little ambiguous here, so I'll think about 
clarifying that.)

___
xdg mailing list
xdg@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/xdg


Re: Seeking clarification of Desktop Entry Specification

2020-04-24 Thread Stefan Blachmann
Regarding (2):
I have found both kinds of whitespace in desktop entry files. So, as a
practical matter, both SPACEs and TABs need to be filtered out when
reading these files.

On 4/24/20, Stephan Bergmann  wrote:
> I have three questions regarding
> :
>
> (1)  #basic-format says "A file is interpreted as a series of lines that
> are separated by linefeed characters."  #value-types says "The escape
> sequences \s, \n, \t, \r, and \\ are supported for values of type string
> and localestring, meaning ASCII space, newline, tab, carriage return,
> and backslash, respectively."  Even though the former is about the
> structure of the file itself while the latter is about the encoded
> payload, it is confusing that one talks about "linefeed" while the other
> talks about "newline" and "carriage return".  Should "newline" read
> "linefeed" (meaning U+000A LINE FEED) instead?
>
> (2)  #entries says "Space before and after the equals sign should be
> ignored".  Does that mean just U+0020 SPACE, or also other kinds of
> white space, like U+0009 CHARACTER TABULATION?
>
> (3)  It is unclear exactly when the escape sequences mentioned in (1)
> need to be used in string/localestring values:
>
> *  "\\" apparently needs to be used at least whenever the following
> character is one of "s", "n", "t", "r", or "\".  But what about
> sequences like "\a", does it render the file ill-formed, or is it an
> accepted shortcut for the fully escaped "\\a"?
>
> *  "\n" apparently needs to always be used (at least with the "newline"
> vs. "linefeed" clarification from (1)).
>
> *  "\s" (and maybe also "\t" and "\r"?) apparently needs to be used at
> the very start of a string/localestring value (see (2)).  But does it
> also need to be used e.g. at the very end of such a value?  (From common
> practice, it appears that it at least doesn't need to be used for a
> space somewhere in the middle of such a value.)
>
> *  What about "\t" and "\r"?
>
> (These questions occurred to me when doing
> 
>
> "Properly escape desktop file string values".)
>
> ___
> xdg mailing list
> xdg@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/xdg
>
___
xdg mailing list
xdg@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/xdg