Thanks Kris for the good recap of where we are on error-correcting the XML
representation of required, optional, and replaceable text. Of course it is
no fun to manually edit XML, but I think I recall correctly that the group
decided at the outset of this github project that since we can
programmatically validate (at least the structural legality of) our edits,
and eventually adapt existing XML-parsing and editing code into
SPDX-specific tooling, this process will be worth it. Especially if Sam
continues to pull everyone else's weight on github as well as his own.  :-)

Thanks also for confirming the possibility of nesting entities. I have made
a few edits assuming that nesting is legal, but I made that assumption
silently -- props to Sam for specifically raising the question.

Best,
Brad

--
Brad Edmondson, *Esq.*
512-673-8782 | brad.edmond...@gmail.com

On Mon, May 9, 2016 at 10:39 AM, Kris.re <kris...@bbhmedia.com> wrote:

> Sam:
>
> They are definitely supposed to be nested. If you are seeing one like this
> than either the original template's spacing was equally flat (I could see
> it for one or two, but not many), or the correction pass I made to remove
> the redundant list items wasn't quite correct. There were about 3-4 types
> of failures that I thought I was able to automatically correct, but
> obviously I didn't look quite deep enough. If you spot these, feel free to
> label them with a 'bug' label and I will address them in batch. I can roll
> back the git history and extract the correctly nested version, or at least
> part of it, or convert it by hand or something.
>
> Philippe:
>
> The initial proposal tried to minimize the use of XML tags at the cost of
> making whitespace significant. After some discussion, this seemed (besides
> being a bit of a messy solution) to not meet our needs, and so I revised it
> so that whitespace was NOT significant, which required more structural tags
> to identify paragraphs and the like.
>
> Regarding "mixed format and structure", if you're referring to the <b>
> tag, it is not a format tag but was intended to be "b for bullet". Many of
> the tags are likely getting renamed and this is probably one of them. The
> original choice was geared towards having as little visual space taken up
> by tags as possible, but as you can see we've moved past the point where
> that's a useful compromise here. If you're referring to <p>, <br/>, <list>
> or <li>, these are indeed structural tags, though the line is a bit fuzzy
> because it's often easiest to think about them in formatting terms. We
> might think of "<br/>" as a new line, which is presentation, as opposed to
> a structural break between two sections of text, or "<p>" as two new lines
> as opposed to a grouping of text into a paragraph, for example. Their
> primary use IS for presentation: we do need to render HTML files as one of
> our outputs, so we need enough information about the *structure* of the
> document to *format* it usefully.
>
> > But I think I lost track of the value and purpose of this editing in the
> first place... can someone refresh me?
> Our purpose right now is not primarily editing, though I welcome Sam's
> contributions on this count, since otherwise it'll be me doing this work ;)
> The primary work right now is verifying the selection of "matchable
> sections" of text, which were done partly by an automated process, and
> partly by myself without full legal understanding and context of these
> licenses.
>
> > I am questioning the use of XML in first place, which may be a format
> that is barely OK for saving data files, but is quite terrible for editing
> IMHO.
> I have to agree in most cases, but luckily for us, editing is not
> something we should have to do a lot of and when we do it will be pretty
> targeted. The bulk of the effort is the initial construction of the XML
> file, which can be eased by way of tooling. It does happen that XML is
> perfect for our particular use case, in my opinion, since what we need to
> do is mark up some source text somewhat arbitrarily to add information
> about it. That's something that a data-only format like JSON just doesn't
> have the capability to elegantly express.
>
> > At least why not use plain HTML if you need to mix format and structure?
> > You could then use some of the decent HTML WYSIWYG editing tools
> available and not have to spend more time on the form than the substance?
> I don't know about you, but in the past when I've used such tools, the
> output HTML is far from minimal and often times a mess. Different tools
> will produce different HTML, which is a problem when working in a shared
> space (git repository). A WYSIWYG editor doesn't necessarily have the tools
> to represent the information we are working with, either: replaceable text,
> you could perhaps hack as a hyperlink but that would be kind of nasty...
> and we still have other information to bind, such as the name of the
> variable to store the match in. Our lists are sometimes ordered, sometimes
> unordered; when they're ordered, they are often formatted various ways.
> Adjusting the format to be correct (letters, numbers, roman numerals,
> whatever) is doable to some extent with css, but not in a way a WYSIWYG
> editor would necessarily support (what do you do about "2.a)"?) How do you
> represent sections of text that should be optional?
>
> I like the idea of using an existing such editor naively, but when it
> comes down to the details we'd essentially be hacking or working around
> almost every part of the structure of the files except, maybe, list
> formatting.
>
> I do think that in the future some work to make actually editing the files
> and creating them easier should be done, and I'll probably be volunteering
> for that, but I'd like to wrap up by bringing back to the first point: the
> current effort is not primarily intended to be focused on editing massive
> swathes of XML, and certainly not creating whole files. After we've vetted
> the structure of the documents from a legal perspective with regards to
> matching, there is a lot of programmatic work we can do without worry, so I
> wouldn't get too wrapped up in the details of the names of tags or stuff
> like that just yet.
>
> Kris
>
> -----Original Message-----
> From: spdx-legal-boun...@lists.spdx.org [mailto:
> spdx-legal-boun...@lists.spdx.org] On Behalf Of Philippe Ombredanne
> Sent: Monday, May 09, 2016 09:55
> To: Sam Ellis <sam.el...@arm.com>
> Cc: SPDX-legal (spdx-legal@lists.spdx.org) <spdx-legal@lists.spdx.org>
> Subject: Re: Nested lists in SPDX XML files.
>
> On Mon, May 9, 2016 at 9:25 AM, Sam Ellis <sam.el...@arm.com> wrote:
> > Hi,
> >
> > When reviewing the SPDX XML files, I see many licenses that contain
> > nested lists such as:
> >
> > 1) some text…
> >    a) some text…
> >
> > And these are converted to XML like this:
> >
> > <list>
> >   <li>
> >   <b>1)</b><p>some text…</p>
> >   </li>
> > </list>
> > <list>
> >   <li>
> >   <b>a)</b><p>some text…</p>
> >   </li>
> > </list>
> >
> > Note that the XML above places the bullets sequentially rather than
> > being nested. I would like to check, does the XML syntax support
> > nesting, and if so, should we be using it to represent cases such as
> this?
> >
> > To make it clearer, this is how the nested equivalent of the above
> > might look, with the a) list inside the 1) list:
> >
> > <list>
> >   <li>
> >   <b>1)</b><p>some text…</p>
> >   </li>
> >   <list>
> >     <li>
> >     <b>a)</b><p>some text…</p>
> >    </li>
> >   </list>
> > </list>
> >
> > My view is that by representing nested lists sequentially then we are
> > losing some of the structure of the original text. On the other hand,
> > if the main purpose of the lists is to allow for identification of
> > bullets then the sequential representation is just fine for this.
>
> I have a lot of respect for what you are embarking in:
> I would not dare editing by hand 100 of such XML files: I find this rather
> confusing and error prone.
> And I am a programmer...
>
> But I think I lost track of the value and purpose of this editing in the
> first place... can someone refresh me?
>
> Now, if there is a purpose, you raise a good point in this post (and your
> previous post about XML entities escaping).
> Why is this format mixing structure and formatting together, in a
> pseudo-HTML format?
> Is this meant to become the reference text for SPDX licenses?
>
> I am questioning the use of XML in first place, which may be a format that
> is barely OK for saving data files, but is quite terrible for editing IMHO.
>
> At least why not use plain HTML if you need to mix format and structure?
> You could then use some of the decent HTML WYSIWYG editing tools available
> and not have to spend more time on the form than the substance?
> --
> Cordially
> Philippe Ombredanne
> _______________________________________________
> Spdx-legal mailing list
> Spdx-legal@lists.spdx.org
> https://lists.spdx.org/mailman/listinfo/spdx-legal
> _______________________________________________
> Spdx-legal mailing list
> Spdx-legal@lists.spdx.org
> https://lists.spdx.org/mailman/listinfo/spdx-legal
>
_______________________________________________
Spdx-legal mailing list
Spdx-legal@lists.spdx.org
https://lists.spdx.org/mailman/listinfo/spdx-legal

Reply via email to