Hi David,

One point of clarification is that relationships are 1:many, you can create a 
relationship from 1 package to 100 files using a single relationship element.

Regards,

William Bartholomew (he/him) - Let's 
chat<https://outlook.office.com/findtime/vote?book=will...@microsoft.com&anonymous&ep=plink>
Principal Security Strategist
Global Cybersecurity Policy - Microsoft

My working day may not be your working day. Please don't feel obliged to reply 
to this e-mail outside of your normal working hours.

From: Spdx-tech@lists.spdx.org <Spdx-tech@lists.spdx.org> On Behalf Of David 
Kemp via lists.spdx.org
Sent: Thursday, January 27, 2022 6:17 AM
To: William Bartholomew (CELA) <will...@microsoft.com>
Cc: SPDX-list <Spdx-tech@lists.spdx.org>
Subject: Re: [EXTERNAL] [spdx-tech] Is "contains" special?

The amount of data in the graph is determined by use cases in either 
representation.  A Package with an elements property can list zero, some, or a 
whole bunch of file and package IRIs in that property.  An empty list is 
equivalent to creating zero CONTAINS relationship elements.  In either 
representation the individual File and Package elements have to be created the 
first time they are needed.

If later you look inside the package artifact and find 100 files you then 
create 100 file elements.  Then you either create 100 new relationship elements 
or you create 1 new package element with 100 IRIs in the "elements" property.   
So for pros and cons, both the initial creation and the bulk update use case 
are more efficient using the property.

Now assume the artifact hasn't changed but you discover that you made a mistake 
and it really contains 102 files.  You have to create two new File elements, 
then either two new Relationship elements or one new Package element with 102 
IRIs.  Depending on how broad and deep the artifact is (thousands of files?) 
and how many times you make mistakes and have to create new package elements 
for a single unchanged artifact, the size of the IRI list could be bigger than 
the size of the extra relationship elements.  The two approaches are like 
having two manifests
"Package_version_rev0" and "Package_version_rev1", vs. a single manifest 
"Package_version" plus a separate "patch 1".  The pro's of the property 
approach include having a name ('rev1') and unique IRI for the latest 
description of the package. I think that's a significant advantage.

So if the poll alternatives are:
1) CONTAINS relationship, no elements property
2) elements property, deprecate CONTAINS relationship
3) allow both elements and CONTAINS

I vote for 2.  I strongly object to alternative 1 because the initial creation 
of a large package would require hundreds or thousands of relationship elements 
in addition to the package element.

There are two reasons to vote for 3:
a) there are some use cases where patching a package list with "contains" 
relationships saves bytes in the element graph.  This allows the syntactic 
sugar concept where elements property can be converted to CONTAINS 
relationships and vice versa.
b) there are some use cases where patching a package list accomplishes 
something that cannot be expressed by creating a new revision of the package 
list.  This use case means the syntactic sugar concept is invalid.

My concern with 3 is that I believe there are no use cases where 3b is true 
(can't represent the use case graph using just the elements property).  But it 
is easy to mistakenly create incorrect or inconsistent CONTAINS relationships.  
With option 2, tools could always expand the property into relationships for 
processing or viewing graphs, while not being exposed to mistakes enabled by 
option 3.

Dave


On Tue, Jan 25, 2022 at 10:32 AM William Bartholomew (CELA) 
<will...@microsoft.com<mailto:will...@microsoft.com>> wrote:
The challenge is that SPDX doesn't require you to describe the contents of the 
package unless it's needed for your use cases. I've worked in several scenarios 
where the package-level information is sufficient and calculating, knowing, and 
transporting around package content information would be unnecessary. When you 
have broad and deep dependency trees (*cough* npm *cough*) forcing the package 
contents to be part of the package element pulls in an immense amount of 
information which may be completely unnecessary, the NTIA's minimum SBOM 
elements does not even require file level information, only package level 
information.

Additionally, we need to separate the metadata about the package from the 
package itself in this discussion. Yes, if a package's contents change it is a 
new package, if we learn new metadata about a package's contents does that 
require a new package (not package contents) metadata? I could make arguments 
either way but given the amount of information that we expect will be attached 
to element ids I lean towards them not being versioned if relationship metadata 
(including contains) change. Your comment about dependencies focuses on 
incoming dependencies, outgoing dependencies are very similar to files, they 
are just "delayed" resolution files.

Regards,

William Bartholomew (he/him) - Let's 
chat<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Foutlook.office.com%2Ffindtime%2Fvote%3Fbook%3Dwillbar%40microsoft.com%26anonymous%26ep%3Dplink&data=04%7C01%7Cwillbar%40microsoft.com%7C20b5158bb57648ce317a08d9e19fb9ec%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637788899921538197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=lprIgjH1ftU%2B0Yjmpe%2BZSQ4PRkNqKFVnQonk2Ak9%2Fl0%3D&reserved=0>
Principal Security Strategist
Global Cybersecurity Policy - Microsoft

My working day may not be your working day. Please don't feel obliged to reply 
to this e-mail outside of your normal working hours.

From: Spdx-tech@lists.spdx.org<mailto:Spdx-tech@lists.spdx.org> 
<Spdx-tech@lists.spdx.org<mailto:Spdx-tech@lists.spdx.org>> On Behalf Of David 
Kemp via 
lists.spdx.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=04%7C01%7Cwillbar%40microsoft.com%7C20b5158bb57648ce317a08d9e19fb9ec%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637788899921538197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=YMmKQoKDNQEjc5hfEZgHrYKR%2FRjPZeCGJE7cJjyT0Pg%3D&reserved=0>
Sent: Tuesday, January 25, 2022 3:31 AM
To: SPDX-list <Spdx-tech@lists.spdx.org<mailto:Spdx-tech@lists.spdx.org>>
Subject: [EXTERNAL] [spdx-tech] Is "contains" special?

The difference between "contains" and every other type of relationship is that 
it is the minimum essential requirement for some types to exist.  A package 
cannot be a package without having contents.  It's "packageness" is defined by 
the fact that it has contents.  The same cannot be said for all of the other 
relationship types - a Package and a BOM can exist without patches, variants, 
ancestors, dependencies, examples, etc.

If any of those other relationship types were essential for a Package or BOM to 
exist, then the model would include "dependency_element", "patch_element" 
properties in addition to the contents ("element") property, and the version of 
the Package would change whenever the properties change.  The reason dependency 
is not a property is because a Package and its version don't change every time 
some other Package references / uses / becomes dependent on it.

Contains is special and different from all other relationships because if the 
content of a Package changes, it is a different version of the Package.

Dave



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4346): https://lists.spdx.org/g/Spdx-tech/message/4346
Mute This Topic: https://lists.spdx.org/mt/88673938/21656
Group Owner: spdx-tech+ow...@lists.spdx.org
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to