Thanks David, I really like this visualization and have framed the problem. As 
an aside, we've been using the term "logical model" for the information model 
and "physical model" or "serialization" for how this is represented as "bytes 
that go over the wire".

We have had some requirements that blur the lines between physical and logical 
and I think that those might not be fully captured by what you have here. For 
example, I don't think "context is invisible and irrelevant at the information 
level" because if you deserialized a JSON SPDXv3 document to the information 
model and then serialized the information model to a YAML SPDXv3 document you 
would not expect the context to be lost and it would be if it's not in the 
information model. This is one of the reasons that we added NamespaceMap to the 
information model, because without it the namespace mappings wouldn't 
round-trip.

I agree that we don't need ContextualCollection, it was added to communicate 
that the elements are contextually related in some way (described by the 
context). We could absolutely remove it, have Collection be a concrete class 
(instead of abstract), but we would lose the ability to communicate that 
elements have a contextual relationship that's not described by Relationship 
elements, and maybe we don't need that, I think that's the question on the 
table.


Regards,

William Bartholomew (he/him) - Let's 
chat<https://outlook.office.com/findtime/vote?book=will...@microsoft.com&anonymous&ep=plink>
Principal Security Strategist
Global Cybersecurity Policy - Microsoft

My working day may not be your working day. Please don't feel obliged to reply 
to this e-mail outside of your normal working hours.

From: Spdx-tech@lists.spdx.org <Spdx-tech@lists.spdx.org> On Behalf Of David 
Kemp via lists.spdx.org
Sent: Monday, November 29, 2021 1:24 PM
To: SPDX-list <Spdx-tech@lists.spdx.org>
Subject: [EXTERNAL] Re: [spdx-tech] SPDX v3 Serialization, with examples

Here is a simple example that should clarify what serialization is intended to 
accomplish.  Information modeling is information-centric - it starts with 
"information" (variable values used by applications), and defines how those 
values are serialized into "data" (bytes that go over the wire).  The objective 
is to preserve those information values from end to end, and any data format 
that accomplishes the objective is a candidate solution.

The first slide shows three Element values: two Identities and one File.  Those 
three values start on the left, and they have to arrive unchanged on the right. 
 There is no "right" information model, just correct models that accomplish the 
goal, and incorrect ones that do not.  Three alternative models are shown:
Alternative 1) Just serialize a "set" variable containing those three variables 
unchanged.  This is a "Bundle" type; it is the simplest to understand but 
doesn't support optimization.
Alternative 2) Serialize an array of [Element, Context] where Context is shared 
data that can be factored out, along with the remaining serialized or 
referenced Elements. Only the Element is kept by the receiver, Context is 
discarded.
Alternative 3) Same as 2, except that Context is serialized as a 
pseudo-property of Element, created at serialization and removed at 
deserialization.

To avoid the impression that this serialization looks like SPDX v2, the example 
Elements are not created at the same time as the serialized data. The File 
element is created by "Acme", a software creator; an Identity Element is also 
created by Acme but at an earlier time, and the other Identity ("Steve") has 
nothing at all to do with the File, it's sole purpose is to show that a 
serialized document can contain any arbitrary Elements.  The only requirement 
is that the Steve Element was created before it gets serialized in the same 
document as File.  And it illustrates that "ContextualCollection" is 
meaningless and confusing - none of the three Elements is a "Collection" type, 
and the set of three "unrelated" elements is not a "non-contextual" set because 
it includes Context (when using alternative 2 or 3).  "Context" in the sense of 
"real-life relationships" is invisible and irrelevant at the Information level. 
 It can be represented using the values of Relationship Elements.

The second slide shows the Information Model for Element and Context, and the 
serialized document that can be used to reconstruct Elements a, b, and c.  All 
Elements are independent, and the serialization can reconstruct them no matter 
what values they have.  My previous example showed all Elements created by the 
same creator at the same time, which allows the greatest optimization of the 
serialized data but is not a requirement.  This example shows more independence 
and therefore less optimization.

The complete information model and example data are included in 
https://github.com/davaya/spdxv3-template-tool<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdavaya%2Fspdxv3-template-tool&data=04%7C01%7Cwillbar%40microsoft.com%7C8c2f10f27886453f5bc908d9b37e9990%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637738178608358704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=s37UMkQRsMdnAX6aA8JitUHYY2I7hoGA%2FujfwAuGLvY%3D&reserved=0>,
 folders Schemas and Data3.  make_artifacts checks and translates the 
information model, check_sboms lists the elements contained in each document.

Dave


On Mon, Nov 22, 2021 at 3:26 PM David Kemp via 
lists.spdx.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=04%7C01%7Cwillbar%40microsoft.com%7C8c2f10f27886453f5bc908d9b37e9990%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637738178608358704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gGAIfSgB8KHtmDU0Jp5wVxLIKUnqR5qKOI8Rc1OXiwA%3D&reserved=0>
 <dk190a=gmail....@lists.spdx.org<mailto:gmail....@lists.spdx.org>> wrote:
All,

Several people have asked for concrete examples of SPDX v3 data to accompany 
and inform the logical model discussions.  Attached is a v3 Information Model 
(spdx-v3.jidl, a text file) based on the logical model but modified to reflect 
the difference between logical Elements and a serialized file (unit of 
transfer).

The Serialization paper discusses the difference in detail, but can be 
summarized as:
1) Elements are serialized directly -- the unit of transfer is one or more 
Elements, a special Document type is not used.
2) Context is used in serialization but does not exist in Elements.
3) Collection means composition. Some files (e.g., SBOM 1) serialize a 
Collection element but others (Package2) serialize a Set of Elements that may 
have shared context but are not components of any parent Collection.

My previous example (Elements describing Tolstoy, Puget Sound, and Neal 
DeGrasse Tyson) showed the use case for serializing a Set of completely 
unrelated elements using shared context, and the separate use case for 
serializing a Container element (Dave's Art) that is a composition of the other 
three. The information model was designed specifically to support both Set and 
Collection use cases.

The slide shows two serialized files containing 9 and 11 Elements respectively. 
The files are available from 
https://github.com/davaya/spdxv3-template-tool/tree/main/Data3<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdavaya%2Fspdxv3-template-tool%2Ftree%2Fmain%2FData3&data=04%7C01%7Cwillbar%40microsoft.com%7C8c2f10f27886453f5bc908d9b37e9990%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637738178608358704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=FoWkVgoJEVZBjOOtPqk4mEONNy6XVZqMcwu5zwJYSJg%3D&reserved=0>.
 The individual Elements after deserializing are shown in sbom1.txt.  This 
illustrates that serializing multiple Elements together using shared context 
can be much more efficient than serializing each Element individually.

TODO: use the same information model to serialize other data formats, allowing 
lossless conversion among them and format-agnostic Element verification.  All 
formats, after deserializing, must yield the same Element application values.

Dave



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4272): https://lists.spdx.org/g/Spdx-tech/message/4272
Mute This Topic: https://lists.spdx.org/mt/87406810/21656
Group Owner: spdx-tech+ow...@lists.spdx.org
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to