Thanks David, I really like this visualization and have framed the problem. As an aside, we've been using the term "logical model" for the information model and "physical model" or "serialization" for how this is represented as "bytes that go over the wire".
We have had some requirements that blur the lines between physical and logical and I think that those might not be fully captured by what you have here. For example, I don't think "context is invisible and irrelevant at the information level" because if you deserialized a JSON SPDXv3 document to the information model and then serialized the information model to a YAML SPDXv3 document you would not expect the context to be lost and it would be if it's not in the information model. This is one of the reasons that we added NamespaceMap to the information model, because without it the namespace mappings wouldn't round-trip. I agree that we don't need ContextualCollection, it was added to communicate that the elements are contextually related in some way (described by the context). We could absolutely remove it, have Collection be a concrete class (instead of abstract), but we would lose the ability to communicate that elements have a contextual relationship that's not described by Relationship elements, and maybe we don't need that, I think that's the question on the table. Regards, William Bartholomew (he/him) - Let's chat<https://outlook.office.com/findtime/vote?book=will...@microsoft.com&anonymous&ep=plink> Principal Security Strategist Global Cybersecurity Policy - Microsoft My working day may not be your working day. Please don't feel obliged to reply to this e-mail outside of your normal working hours. From: Spdx-tech@lists.spdx.org <Spdx-tech@lists.spdx.org> On Behalf Of David Kemp via lists.spdx.org Sent: Monday, November 29, 2021 1:24 PM To: SPDX-list <Spdx-tech@lists.spdx.org> Subject: [EXTERNAL] Re: [spdx-tech] SPDX v3 Serialization, with examples Here is a simple example that should clarify what serialization is intended to accomplish. Information modeling is information-centric - it starts with "information" (variable values used by applications), and defines how those values are serialized into "data" (bytes that go over the wire). The objective is to preserve those information values from end to end, and any data format that accomplishes the objective is a candidate solution. The first slide shows three Element values: two Identities and one File. Those three values start on the left, and they have to arrive unchanged on the right. There is no "right" information model, just correct models that accomplish the goal, and incorrect ones that do not. Three alternative models are shown: Alternative 1) Just serialize a "set" variable containing those three variables unchanged. This is a "Bundle" type; it is the simplest to understand but doesn't support optimization. Alternative 2) Serialize an array of [Element, Context] where Context is shared data that can be factored out, along with the remaining serialized or referenced Elements. Only the Element is kept by the receiver, Context is discarded. Alternative 3) Same as 2, except that Context is serialized as a pseudo-property of Element, created at serialization and removed at deserialization. To avoid the impression that this serialization looks like SPDX v2, the example Elements are not created at the same time as the serialized data. The File element is created by "Acme", a software creator; an Identity Element is also created by Acme but at an earlier time, and the other Identity ("Steve") has nothing at all to do with the File, it's sole purpose is to show that a serialized document can contain any arbitrary Elements. The only requirement is that the Steve Element was created before it gets serialized in the same document as File. And it illustrates that "ContextualCollection" is meaningless and confusing - none of the three Elements is a "Collection" type, and the set of three "unrelated" elements is not a "non-contextual" set because it includes Context (when using alternative 2 or 3). "Context" in the sense of "real-life relationships" is invisible and irrelevant at the Information level. It can be represented using the values of Relationship Elements. The second slide shows the Information Model for Element and Context, and the serialized document that can be used to reconstruct Elements a, b, and c. All Elements are independent, and the serialization can reconstruct them no matter what values they have. My previous example showed all Elements created by the same creator at the same time, which allows the greatest optimization of the serialized data but is not a requirement. This example shows more independence and therefore less optimization. The complete information model and example data are included in https://github.com/davaya/spdxv3-template-tool<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdavaya%2Fspdxv3-template-tool&data=04%7C01%7Cwillbar%40microsoft.com%7C8c2f10f27886453f5bc908d9b37e9990%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637738178608358704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=s37UMkQRsMdnAX6aA8JitUHYY2I7hoGA%2FujfwAuGLvY%3D&reserved=0>, folders Schemas and Data3. make_artifacts checks and translates the information model, check_sboms lists the elements contained in each document. Dave On Mon, Nov 22, 2021 at 3:26 PM David Kemp via lists.spdx.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=04%7C01%7Cwillbar%40microsoft.com%7C8c2f10f27886453f5bc908d9b37e9990%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637738178608358704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gGAIfSgB8KHtmDU0Jp5wVxLIKUnqR5qKOI8Rc1OXiwA%3D&reserved=0> <dk190a=gmail....@lists.spdx.org<mailto:gmail....@lists.spdx.org>> wrote: All, Several people have asked for concrete examples of SPDX v3 data to accompany and inform the logical model discussions. Attached is a v3 Information Model (spdx-v3.jidl, a text file) based on the logical model but modified to reflect the difference between logical Elements and a serialized file (unit of transfer). The Serialization paper discusses the difference in detail, but can be summarized as: 1) Elements are serialized directly -- the unit of transfer is one or more Elements, a special Document type is not used. 2) Context is used in serialization but does not exist in Elements. 3) Collection means composition. Some files (e.g., SBOM 1) serialize a Collection element but others (Package2) serialize a Set of Elements that may have shared context but are not components of any parent Collection. My previous example (Elements describing Tolstoy, Puget Sound, and Neal DeGrasse Tyson) showed the use case for serializing a Set of completely unrelated elements using shared context, and the separate use case for serializing a Container element (Dave's Art) that is a composition of the other three. The information model was designed specifically to support both Set and Collection use cases. The slide shows two serialized files containing 9 and 11 Elements respectively. The files are available from https://github.com/davaya/spdxv3-template-tool/tree/main/Data3<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdavaya%2Fspdxv3-template-tool%2Ftree%2Fmain%2FData3&data=04%7C01%7Cwillbar%40microsoft.com%7C8c2f10f27886453f5bc908d9b37e9990%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637738178608358704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=FoWkVgoJEVZBjOOtPqk4mEONNy6XVZqMcwu5zwJYSJg%3D&reserved=0>. The individual Elements after deserializing are shown in sbom1.txt. This illustrates that serializing multiple Elements together using shared context can be much more efficient than serializing each Element individually. TODO: use the same information model to serialize other data formats, allowing lossless conversion among them and format-agnostic Element verification. All formats, after deserializing, must yield the same Element application values. Dave -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4272): https://lists.spdx.org/g/Spdx-tech/message/4272 Mute This Topic: https://lists.spdx.org/mt/87406810/21656 Group Owner: spdx-tech+ow...@lists.spdx.org Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-