CIL

From: Spdx-tech@lists.spdx.org <Spdx-tech@lists.spdx.org> On Behalf Of David 
Kemp via lists.spdx.org
Sent: Tuesday, December 14, 2021 1:51 PM
To: SPDX-list <Spdx-tech@lists.spdx.org>
Subject: [EXTERNAL] [spdx-tech] Do serialized documents contain the bytes of a 
Document Element?

William has been gathering punch list questions during the meetings, and we've 
tried to avoid taking up meeting time on engineering the solutions.  This is my 
perspective on the subject question.

* A document is a sequence of bytes.  This is normally a file, but as Nisha 
pointed out it could also be the response to an SQL query, or to any API call.  
A File Element has an artifactUri, a media type, and a filePurpose - if this is 
sufficient to describe a byte sequence returned from an API at the artifactUri 
address, great.  If not, the logical model could define a new Artifact sub-type 
for byte sequences returned from API calls.
* An Artifact element (File or a new "ByteSequence" type) describes a sequence 
of bytes.
* A sequence of bytes can be signed or hashed.  (The details of what to hash, 
i.e., how to canonicalize a byte sequence, can be worked out later.)

If we have a File Element referring to a JPG file, the bytes of the File 
Element are not included in the bytes of the JPG file.  If we have a File 
Element referring to an SPDXv2 file, the bytes of the File Element are not 
included in the SPDXv2 file.
So if we have a File Element referring to an SPDXv3 file, the bytes of the 
FileElement are not required to be included in the SPDXv3 file.

The problem with answering the question is:

SPDXv2.2 Document contains:
* Document Creation Information
* Package Information
* ...
* Annotation Information

But the SPDXv3 logical model currently says:

SPDXv3 Document contains:
* Document Element (NOT Document Creation Information)
[William] It’s technically both the description of the document and the 
document creation information. To enable elements to standalone any element can 
have creation information, but the document is also an element, so its creation 
information lives there rather than a standalone class. The document has 
metadata about the document, from SPDX 2.2: SPDX Version, Data License (now on 
all elements including Document), SPDX Identifier (on all elements including 
Document), Document Name (now on all elements including Document – renamed to 
“name”), Document Namespace (this was removed as a standalone concept, however, 
the document’s SPDX Identifier does include a namespace portion), Creator (now 
on all elements including Document), Created (now on all elements including 
Document), Creator Comment (now on all elements including Document – renamed to 
“comment”), Document Comment (merged with “Creator Comment”).

* SBOM Element(s)
* Package Element(s)
* ...
* Annotation Element(s)

In v2 Document means "The SBOM" and document means "the bytes of the serialized 
SBOM".  There is no "SBOM Information" analogous to Package/File/Annotation 
Information in v2.
[William] This isn’t a correct comparison. v2 did not have a concept of SBOM, 
as Gary mentioned most people had a root Package in v2 documents which you 
could equate to the SBOM. Document was the Document Creation Information and 
anything else at the root. If I was sharing a list of Annotations in SPDX v2 
that would not be an SBOM.

In v3 SBOM means "The SBOM" and document means "the bytes of the serialized 
SBOM".
[William] An SPDX document does not need to carry an SBOM, it could be a list 
of annotations, or a list of licenses, those are not an SBOM. The Document 
element represents the metadata about the document (who created it, when it was 
created, gives it an identity that can be used in relationships to relate the 
document to other documents), this must live inside the serialized document 
otherwise you would always need two documents  (in fact you may need infinite 
SPDX documents 😊).

The Document Element in v3 is the Artifact referring to "the bytes of the 
serialized SBOM".
[William] You could have an artifact that describes an SPDX document, but that 
doesn’t negate the need for a document to be self-describing which is what the 
Document element intends to do.

(In v3 any Element can be serialized; I'm referring to implementing the v2 use 
case of serializing an SBOM in v3.)
[William] While any Element can be serialized there’s an open question of 
whether the serialization requires a Document or not, I can see arguments in 
both directions. The creation information on the element tells you who created 
the element, but it doesn’t tell you who created the serialized file, do we 
need that? In same cases who created the “anthology” by bringing all the 
elements together into a document is interesting and creating that “anthology” 
may have legal meaning (I’ll defer to the legal team on that) and capturing who 
did it, when, and what license they apply to that anthology may be important.

The SBOM Element (like all Elements) has its own creation information, so when 
it is serialized the document creation information can be (but is not 
constrained to be) that of the SBOM Element.  Document creation information is 
serialized in DocumentContext (along with NamespaceMap), and if the document 
creator wishes, SBOM creation information can override document creation 
information.
[William] In this model I think “DocumentContext” is really just “Document”. It 
would be very complex to have overrides from SBOM to Document, a Document could 
“contain” multiple SBOMs which would be ambiguous and overriding will likely 
make integrity even more difficult.

So requiring a Document Element in addition to an SBOM Element in the 
serialized bytes is a departure from v2, not consistent with it.  The logical 
model should allow serialization of a single SBOM Element plus context, 
assorted annotations, relationships, licenses, identities, etc., just as v2 
allows.
[William] I don’t think this conclusion follows naturally since v2 doesn’t have 
an SBOM element, what you refer to as “context” at the root of a v2 document is 
“Document”. In RDF it’s the SpdxDocument 
element<https://github.com/spdx/spdx-spec/blob/development/v2.2.2/examples/SPDXRdfExample-v2.2.spdx.rdf.xml#L1411>,
 in XML it’s the root Document 
element<https://github.com/spdx/spdx-spec/blob/development/v2.2.2/examples/SPDXXMLExample-v2.2.spdx.xml#L2>
 (including some its child elements such as “creationInfo”), in  YAML it’s the 
set of root properties that aren’t 
collections<https://github.com/spdx/spdx-spec/blob/development/v2.2.2/examples/SPDXYAMLExample-2.2.spdx.yaml#L2>
 (such as “SPDXID”, “documentNamespace”, “creationInfo”), in tag/value it’s the 
set of properties that occur before the first 
object<https://github.com/spdx/spdx-spec/blob/development/v2.2.2/examples/SPDXTagExample-v2.2.spdx#L3>,
 in Excel it’s the “Document Info” 
sheet<https://view.officeapps.live.com/op/view.aspx?src=https%3A%2F%2Fraw.githubusercontent.com%2Fspdx%2Fspdx-spec%2Fdevelopment%2Fv2.2.2%2Fexamples%2FSPDXSpreadsheetExample-v2.2.xlsx&wdOrigin=BROWSELINK>.


Dave



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4296): https://lists.spdx.org/g/Spdx-tech/message/4296
Mute This Topic: https://lists.spdx.org/mt/87733987/21656
Group Owner: spdx-tech+ow...@lists.spdx.org
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to