RE: Nested lists in SPDX XML files.

Kris . re Mon, 09 May 2016 10:16:38 -0700

For what it’s worth, pretty much everything is legal structure-wise; the 
structure of the final form of the XML will definitely be guided by any 
problems we need to solve that we encounter during this review process. One 
notable one that isn’t in the current version is dropping the <body> tag 
entirely and simply allowing nesting of the other entities (copyright, title, 
optional) as “one-or-none” matches.

From: Brad Edmondson [mailto:brad.edmond...@gmail.com]
Sent: Monday, May 09, 2016 11:33
To: Kris.re <kris...@bbhmedia.com>
Cc: Philippe Ombredanne <pombreda...@nexb.com>; Sam Ellis <sam.el...@arm.com>; 
SPDX-legal (spdx-legal@lists.spdx.org) <spdx-legal@lists.spdx.org>
Subject: Re: Nested lists in SPDX XML files.

Thanks Kris for the good recap of where we are on error-correcting the XML 
representation of required, optional, and replaceable text. Of course it is no 
fun to manually edit XML, but I think I recall correctly that the group decided 
at the outset of this github project that since we can programmatically 
validate (at least the structural legality of) our edits, and eventually adapt 
existing XML-parsing and editing code into SPDX-specific tooling, this process 
will be worth it. Especially if Sam continues to pull everyone else's weight on 
github as well as his own.  :-)

Thanks also for confirming the possibility of nesting entities. I have made a 
few edits assuming that nesting is legal, but I made that assumption silently 
-- props to Sam for specifically raising the question.

Best,
Brad

--
Brad Edmondson, Esq.
512-673-8782 | brad.edmond...@gmail.com<mailto:brad.edmond...@gmail.com>

On Mon, May 9, 2016 at 10:39 AM, Kris.re 
<kris...@bbhmedia.com<mailto:kris...@bbhmedia.com>> wrote:
Sam:

They are definitely supposed to be nested. If you are seeing one like this than 
either the original template's spacing was equally flat (I could see it for one 
or two, but not many), or the correction pass I made to remove the redundant 
list items wasn't quite correct. There were about 3-4 types of failures that I 
thought I was able to automatically correct, but obviously I didn't look quite 
deep enough. If you spot these, feel free to label them with a 'bug' label and 
I will address them in batch. I can roll back the git history and extract the 
correctly nested version, or at least part of it, or convert it by hand or 
something.

Philippe:

The initial proposal tried to minimize the use of XML tags at the cost of 
making whitespace significant. After some discussion, this seemed (besides 
being a bit of a messy solution) to not meet our needs, and so I revised it so 
that whitespace was NOT significant, which required more structural tags to 
identify paragraphs and the like.

Regarding "mixed format and structure", if you're referring to the <b> tag, it 
is not a format tag but was intended to be "b for bullet". Many of the tags are 
likely getting renamed and this is probably one of them. The original choice 
was geared towards having as little visual space taken up by tags as possible, 
but as you can see we've moved past the point where that's a useful compromise 
here. If you're referring to <p>, <br/>, <list> or <li>, these are indeed 
structural tags, though the line is a bit fuzzy because it's often easiest to 
think about them in formatting terms. We might think of "<br/>" as a new line, 
which is presentation, as opposed to a structural break between two sections of 
text, or "<p>" as two new lines as opposed to a grouping of text into a 
paragraph, for example. Their primary use IS for presentation: we do need to 
render HTML files as one of our outputs, so we need enough information about 
the *structure* of the document to *format* it usefully.

> But I think I lost track of the value and purpose of this editing in the 
> first place... can someone refresh me?
Our purpose right now is not primarily editing, though I welcome Sam's 
contributions on this count, since otherwise it'll be me doing this work ;) The 
primary work right now is verifying the selection of "matchable sections" of 
text, which were done partly by an automated process, and partly by myself 
without full legal understanding and context of these licenses.

> I am questioning the use of XML in first place, which may be a format that is 
> barely OK for saving data files, but is quite terrible for editing IMHO.
I have to agree in most cases, but luckily for us, editing is not something we 
should have to do a lot of and when we do it will be pretty targeted. The bulk 
of the effort is the initial construction of the XML file, which can be eased 
by way of tooling. It does happen that XML is perfect for our particular use 
case, in my opinion, since what we need to do is mark up some source text 
somewhat arbitrarily to add information about it. That's something that a 
data-only format like JSON just doesn't have the capability to elegantly 
express.

> At least why not use plain HTML if you need to mix format and structure?
> You could then use some of the decent HTML WYSIWYG editing tools available 
> and not have to spend more time on the form than the substance?
I don't know about you, but in the past when I've used such tools, the output 
HTML is far from minimal and often times a mess. Different tools will produce 
different HTML, which is a problem when working in a shared space (git 
repository). A WYSIWYG editor doesn't necessarily have the tools to represent 
the information we are working with, either: replaceable text, you could 
perhaps hack as a hyperlink but that would be kind of nasty... and we still 
have other information to bind, such as the name of the variable to store the 
match in. Our lists are sometimes ordered, sometimes unordered; when they're 
ordered, they are often formatted various ways. Adjusting the format to be 
correct (letters, numbers, roman numerals, whatever) is doable to some extent 
with css, but not in a way a WYSIWYG editor would necessarily support (what do 
you do about "2.a)"?) How do you represent sections of text that should be 
optional?

I like the idea of using an existing such editor naively, but when it comes 
down to the details we'd essentially be hacking or working around almost every 
part of the structure of the files except, maybe, list formatting.

I do think that in the future some work to make actually editing the files and 
creating them easier should be done, and I'll probably be volunteering for 
that, but I'd like to wrap up by bringing back to the first point: the current 
effort is not primarily intended to be focused on editing massive swathes of 
XML, and certainly not creating whole files. After we've vetted the structure 
of the documents from a legal perspective with regards to matching, there is a 
lot of programmatic work we can do without worry, so I wouldn't get too wrapped 
up in the details of the names of tags or stuff like that just yet.

Kris

-----Original Message-----
From: 
spdx-legal-boun...@lists.spdx.org<mailto:spdx-legal-boun...@lists.spdx.org> 
[mailto:spdx-legal-boun...@lists.spdx.org<mailto:spdx-legal-boun...@lists.spdx.org>]
 On Behalf Of Philippe Ombredanne
Sent: Monday, May 09, 2016 09:55
To: Sam Ellis <sam.el...@arm.com<mailto:sam.el...@arm.com>>
Cc: SPDX-legal (spdx-legal@lists.spdx.org<mailto:spdx-legal@lists.spdx.org>) 
<spdx-legal@lists.spdx.org<mailto:spdx-legal@lists.spdx.org>>
Subject: Re: Nested lists in SPDX XML files.

On Mon, May 9, 2016 at 9:25 AM, Sam Ellis 
<sam.el...@arm.com<mailto:sam.el...@arm.com>> wrote:
> Hi,
>
> When reviewing the SPDX XML files, I see many licenses that contain
> nested lists such as:
>
> 1) some text…
>    a) some text…
>
> And these are converted to XML like this:
>
> <list>
>   <li>
>   <b>1)</b><p>some text…</p>
>   </li>
> </list>
> <list>
>   <li>
>   <b>a)</b><p>some text…</p>
>   </li>
> </list>
>
> Note that the XML above places the bullets sequentially rather than
> being nested. I would like to check, does the XML syntax support
> nesting, and if so, should we be using it to represent cases such as this?
>
> To make it clearer, this is how the nested equivalent of the above
> might look, with the a) list inside the 1) list:
>
> <list>
>   <li>
>   <b>1)</b><p>some text…</p>
>   </li>
>   <list>
>     <li>
>     <b>a)</b><p>some text…</p>
>    </li>
>   </list>
> </list>
>
> My view is that by representing nested lists sequentially then we are
> losing some of the structure of the original text. On the other hand,
> if the main purpose of the lists is to allow for identification of
> bullets then the sequential representation is just fine for this.

I have a lot of respect for what you are embarking in:
I would not dare editing by hand 100 of such XML files: I find this rather 
confusing and error prone.
And I am a programmer...

But I think I lost track of the value and purpose of this editing in the first 
place... can someone refresh me?

Now, if there is a purpose, you raise a good point in this post (and your 
previous post about XML entities escaping).
Why is this format mixing structure and formatting together, in a pseudo-HTML 
format?
Is this meant to become the reference text for SPDX licenses?

I am questioning the use of XML in first place, which may be a format that is 
barely OK for saving data files, but is quite terrible for editing IMHO.

At least why not use plain HTML if you need to mix format and structure?
You could then use some of the decent HTML WYSIWYG editing tools available and 
not have to spend more time on the form than the substance?
--
Cordially
Philippe Ombredanne
_______________________________________________
Spdx-legal mailing list
Spdx-legal@lists.spdx.org<mailto:Spdx-legal@lists.spdx.org>
https://lists.spdx.org/mailman/listinfo/spdx-legal
_______________________________________________
Spdx-legal mailing list
Spdx-legal@lists.spdx.org<mailto:Spdx-legal@lists.spdx.org>
https://lists.spdx.org/mailman/listinfo/spdx-legal

_______________________________________________
Spdx-legal mailing list
Spdx-legal@lists.spdx.org
https://lists.spdx.org/mailman/listinfo/spdx-legal

RE: Nested lists in SPDX XML files.

Reply via email to