Hi David, One point of clarification is that relationships are 1:many, you can create a relationship from 1 package to 100 files using a single relationship element.
Regards, William Bartholomew (he/him) - Let's chat<https://outlook.office.com/findtime/vote?book=will...@microsoft.com&anonymous&ep=plink> Principal Security Strategist Global Cybersecurity Policy - Microsoft My working day may not be your working day. Please don't feel obliged to reply to this e-mail outside of your normal working hours. From: Spdx-tech@lists.spdx.org <Spdx-tech@lists.spdx.org> On Behalf Of David Kemp via lists.spdx.org Sent: Thursday, January 27, 2022 6:17 AM To: William Bartholomew (CELA) <will...@microsoft.com> Cc: SPDX-list <Spdx-tech@lists.spdx.org> Subject: Re: [EXTERNAL] [spdx-tech] Is "contains" special? The amount of data in the graph is determined by use cases in either representation. A Package with an elements property can list zero, some, or a whole bunch of file and package IRIs in that property. An empty list is equivalent to creating zero CONTAINS relationship elements. In either representation the individual File and Package elements have to be created the first time they are needed. If later you look inside the package artifact and find 100 files you then create 100 file elements. Then you either create 100 new relationship elements or you create 1 new package element with 100 IRIs in the "elements" property. So for pros and cons, both the initial creation and the bulk update use case are more efficient using the property. Now assume the artifact hasn't changed but you discover that you made a mistake and it really contains 102 files. You have to create two new File elements, then either two new Relationship elements or one new Package element with 102 IRIs. Depending on how broad and deep the artifact is (thousands of files?) and how many times you make mistakes and have to create new package elements for a single unchanged artifact, the size of the IRI list could be bigger than the size of the extra relationship elements. The two approaches are like having two manifests "Package_version_rev0" and "Package_version_rev1", vs. a single manifest "Package_version" plus a separate "patch 1". The pro's of the property approach include having a name ('rev1') and unique IRI for the latest description of the package. I think that's a significant advantage. So if the poll alternatives are: 1) CONTAINS relationship, no elements property 2) elements property, deprecate CONTAINS relationship 3) allow both elements and CONTAINS I vote for 2. I strongly object to alternative 1 because the initial creation of a large package would require hundreds or thousands of relationship elements in addition to the package element. There are two reasons to vote for 3: a) there are some use cases where patching a package list with "contains" relationships saves bytes in the element graph. This allows the syntactic sugar concept where elements property can be converted to CONTAINS relationships and vice versa. b) there are some use cases where patching a package list accomplishes something that cannot be expressed by creating a new revision of the package list. This use case means the syntactic sugar concept is invalid. My concern with 3 is that I believe there are no use cases where 3b is true (can't represent the use case graph using just the elements property). But it is easy to mistakenly create incorrect or inconsistent CONTAINS relationships. With option 2, tools could always expand the property into relationships for processing or viewing graphs, while not being exposed to mistakes enabled by option 3. Dave On Tue, Jan 25, 2022 at 10:32 AM William Bartholomew (CELA) <will...@microsoft.com<mailto:will...@microsoft.com>> wrote: The challenge is that SPDX doesn't require you to describe the contents of the package unless it's needed for your use cases. I've worked in several scenarios where the package-level information is sufficient and calculating, knowing, and transporting around package content information would be unnecessary. When you have broad and deep dependency trees (*cough* npm *cough*) forcing the package contents to be part of the package element pulls in an immense amount of information which may be completely unnecessary, the NTIA's minimum SBOM elements does not even require file level information, only package level information. Additionally, we need to separate the metadata about the package from the package itself in this discussion. Yes, if a package's contents change it is a new package, if we learn new metadata about a package's contents does that require a new package (not package contents) metadata? I could make arguments either way but given the amount of information that we expect will be attached to element ids I lean towards them not being versioned if relationship metadata (including contains) change. Your comment about dependencies focuses on incoming dependencies, outgoing dependencies are very similar to files, they are just "delayed" resolution files. Regards, William Bartholomew (he/him) - Let's chat<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Foutlook.office.com%2Ffindtime%2Fvote%3Fbook%3Dwillbar%40microsoft.com%26anonymous%26ep%3Dplink&data=04%7C01%7Cwillbar%40microsoft.com%7C20b5158bb57648ce317a08d9e19fb9ec%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637788899921538197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=lprIgjH1ftU%2B0Yjmpe%2BZSQ4PRkNqKFVnQonk2Ak9%2Fl0%3D&reserved=0> Principal Security Strategist Global Cybersecurity Policy - Microsoft My working day may not be your working day. Please don't feel obliged to reply to this e-mail outside of your normal working hours. From: Spdx-tech@lists.spdx.org<mailto:Spdx-tech@lists.spdx.org> <Spdx-tech@lists.spdx.org<mailto:Spdx-tech@lists.spdx.org>> On Behalf Of David Kemp via lists.spdx.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=04%7C01%7Cwillbar%40microsoft.com%7C20b5158bb57648ce317a08d9e19fb9ec%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637788899921538197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=YMmKQoKDNQEjc5hfEZgHrYKR%2FRjPZeCGJE7cJjyT0Pg%3D&reserved=0> Sent: Tuesday, January 25, 2022 3:31 AM To: SPDX-list <Spdx-tech@lists.spdx.org<mailto:Spdx-tech@lists.spdx.org>> Subject: [EXTERNAL] [spdx-tech] Is "contains" special? The difference between "contains" and every other type of relationship is that it is the minimum essential requirement for some types to exist. A package cannot be a package without having contents. It's "packageness" is defined by the fact that it has contents. The same cannot be said for all of the other relationship types - a Package and a BOM can exist without patches, variants, ancestors, dependencies, examples, etc. If any of those other relationship types were essential for a Package or BOM to exist, then the model would include "dependency_element", "patch_element" properties in addition to the contents ("element") property, and the version of the Package would change whenever the properties change. The reason dependency is not a property is because a Package and its version don't change every time some other Package references / uses / becomes dependent on it. Contains is special and different from all other relationships because if the content of a Package changes, it is a different version of the Package. Dave -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4346): https://lists.spdx.org/g/Spdx-tech/message/4346 Mute This Topic: https://lists.spdx.org/mt/88673938/21656 Group Owner: spdx-tech+ow...@lists.spdx.org Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-