CIL From: Spdx-tech@lists.spdx.org <Spdx-tech@lists.spdx.org> On Behalf Of David Kemp via lists.spdx.org Sent: Tuesday, December 14, 2021 1:51 PM To: SPDX-list <Spdx-tech@lists.spdx.org> Subject: [EXTERNAL] [spdx-tech] Do serialized documents contain the bytes of a Document Element?
William has been gathering punch list questions during the meetings, and we've tried to avoid taking up meeting time on engineering the solutions. This is my perspective on the subject question. * A document is a sequence of bytes. This is normally a file, but as Nisha pointed out it could also be the response to an SQL query, or to any API call. A File Element has an artifactUri, a media type, and a filePurpose - if this is sufficient to describe a byte sequence returned from an API at the artifactUri address, great. If not, the logical model could define a new Artifact sub-type for byte sequences returned from API calls. * An Artifact element (File or a new "ByteSequence" type) describes a sequence of bytes. * A sequence of bytes can be signed or hashed. (The details of what to hash, i.e., how to canonicalize a byte sequence, can be worked out later.) If we have a File Element referring to a JPG file, the bytes of the File Element are not included in the bytes of the JPG file. If we have a File Element referring to an SPDXv2 file, the bytes of the File Element are not included in the SPDXv2 file. So if we have a File Element referring to an SPDXv3 file, the bytes of the FileElement are not required to be included in the SPDXv3 file. The problem with answering the question is: SPDXv2.2 Document contains: * Document Creation Information * Package Information * ... * Annotation Information But the SPDXv3 logical model currently says: SPDXv3 Document contains: * Document Element (NOT Document Creation Information) [William] It’s technically both the description of the document and the document creation information. To enable elements to standalone any element can have creation information, but the document is also an element, so its creation information lives there rather than a standalone class. The document has metadata about the document, from SPDX 2.2: SPDX Version, Data License (now on all elements including Document), SPDX Identifier (on all elements including Document), Document Name (now on all elements including Document – renamed to “name”), Document Namespace (this was removed as a standalone concept, however, the document’s SPDX Identifier does include a namespace portion), Creator (now on all elements including Document), Created (now on all elements including Document), Creator Comment (now on all elements including Document – renamed to “comment”), Document Comment (merged with “Creator Comment”). * SBOM Element(s) * Package Element(s) * ... * Annotation Element(s) In v2 Document means "The SBOM" and document means "the bytes of the serialized SBOM". There is no "SBOM Information" analogous to Package/File/Annotation Information in v2. [William] This isn’t a correct comparison. v2 did not have a concept of SBOM, as Gary mentioned most people had a root Package in v2 documents which you could equate to the SBOM. Document was the Document Creation Information and anything else at the root. If I was sharing a list of Annotations in SPDX v2 that would not be an SBOM. In v3 SBOM means "The SBOM" and document means "the bytes of the serialized SBOM". [William] An SPDX document does not need to carry an SBOM, it could be a list of annotations, or a list of licenses, those are not an SBOM. The Document element represents the metadata about the document (who created it, when it was created, gives it an identity that can be used in relationships to relate the document to other documents), this must live inside the serialized document otherwise you would always need two documents (in fact you may need infinite SPDX documents 😊). The Document Element in v3 is the Artifact referring to "the bytes of the serialized SBOM". [William] You could have an artifact that describes an SPDX document, but that doesn’t negate the need for a document to be self-describing which is what the Document element intends to do. (In v3 any Element can be serialized; I'm referring to implementing the v2 use case of serializing an SBOM in v3.) [William] While any Element can be serialized there’s an open question of whether the serialization requires a Document or not, I can see arguments in both directions. The creation information on the element tells you who created the element, but it doesn’t tell you who created the serialized file, do we need that? In same cases who created the “anthology” by bringing all the elements together into a document is interesting and creating that “anthology” may have legal meaning (I’ll defer to the legal team on that) and capturing who did it, when, and what license they apply to that anthology may be important. The SBOM Element (like all Elements) has its own creation information, so when it is serialized the document creation information can be (but is not constrained to be) that of the SBOM Element. Document creation information is serialized in DocumentContext (along with NamespaceMap), and if the document creator wishes, SBOM creation information can override document creation information. [William] In this model I think “DocumentContext” is really just “Document”. It would be very complex to have overrides from SBOM to Document, a Document could “contain” multiple SBOMs which would be ambiguous and overriding will likely make integrity even more difficult. So requiring a Document Element in addition to an SBOM Element in the serialized bytes is a departure from v2, not consistent with it. The logical model should allow serialization of a single SBOM Element plus context, assorted annotations, relationships, licenses, identities, etc., just as v2 allows. [William] I don’t think this conclusion follows naturally since v2 doesn’t have an SBOM element, what you refer to as “context” at the root of a v2 document is “Document”. In RDF it’s the SpdxDocument element<https://github.com/spdx/spdx-spec/blob/development/v2.2.2/examples/SPDXRdfExample-v2.2.spdx.rdf.xml#L1411>, in XML it’s the root Document element<https://github.com/spdx/spdx-spec/blob/development/v2.2.2/examples/SPDXXMLExample-v2.2.spdx.xml#L2> (including some its child elements such as “creationInfo”), in YAML it’s the set of root properties that aren’t collections<https://github.com/spdx/spdx-spec/blob/development/v2.2.2/examples/SPDXYAMLExample-2.2.spdx.yaml#L2> (such as “SPDXID”, “documentNamespace”, “creationInfo”), in tag/value it’s the set of properties that occur before the first object<https://github.com/spdx/spdx-spec/blob/development/v2.2.2/examples/SPDXTagExample-v2.2.spdx#L3>, in Excel it’s the “Document Info” sheet<https://view.officeapps.live.com/op/view.aspx?src=https%3A%2F%2Fraw.githubusercontent.com%2Fspdx%2Fspdx-spec%2Fdevelopment%2Fv2.2.2%2Fexamples%2FSPDXSpreadsheetExample-v2.2.xlsx&wdOrigin=BROWSELINK>. Dave -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4296): https://lists.spdx.org/g/Spdx-tech/message/4296 Mute This Topic: https://lists.spdx.org/mt/87733987/21656 Group Owner: spdx-tech+ow...@lists.spdx.org Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-