Thanks Simon for the review and comments.

 

Responses inline below.

 

From: Spdx-tech@lists.spdx.org <Spdx-tech@lists.spdx.org> On Behalf Of Simon 
Avery via lists.spdx.org
Sent: Sunday, December 12, 2021 10:50 PM
To: Spdx-tech@lists.spdx.org
Subject: Re: [spdx-tech] SPDX Java tools update related to hasFiles and 
Contains property

 

Gary,

 

A few comments:

 

I agree with your assertion that a CONTAINS (or a CONTAINED_BY) relationship is 
semantically identical to a RDF/JSON/YAML/XML package that has the hasFile(s) 
property or a tag/value file where the files follow the package. 

 

Option C looks good but I’m confused as to why you are excluding RDF. That’s 
the only format where hasFile is defined in the spec. If you're going to always 
use hasFile for JSON/YAML/XML then why not do the same for RDF? 

[G.O.] There are two reasons I went with this approach in RDF – there was a 
comment in the SPDX 2.2 OWL Ontology for the CONTAINS relationship type 
indicating the relationship is preferred over the hasFile property.  The other 
reason was in the 3.0 model, we seemed to be aligning on using the 
relationships rather than the properties for the model itself.  That being 
said, I struggle with this decision since, as you stated, it is inconsistent 
with the other serialization formats and it is also easier to write SPARQL 
queries for a property rather than relationships.  I’m quite tempted to change 
this to keep the hasFiles and remove any duplicated CONTAINS relationships – 
this is the one decision I’m most interested in getting feedback on.

 

Converting both ways of expressing the package/files relationship to be stored 
in a single consistent manner when serialized and deserialized is definitely A 
Good Thing to prevent duplication.

 

I think that you should have the check for duplication happen for all file 
types, not just Tag/Value.

[G.O.] Agreed – I’ll go back through the implementations and double check that 
I’ve not included any duplication

 

The same issue exists for JSON files that could have both the documentDescribes 
property set and/or a relationship stating which packages are DESCRIBED_BY or 
the document DESCRIBES certain packages.  How are your tools handling that 
situation?

[G.O.] As far as I know, this is already handled for JSON/YAML/XML formats.  I 
went back in the RDF and found a related issue from pre-2.0 RDF versions that 
use a property describesPackage instead of a DESCRIBES relationship.  I doubt 
we would run into this situation very often, but I added an Issue 
<https://github.com/spdx/spdx-java-rdf-store/issues/27>  to track.

 

On a more general note, I’m only handling/translating the “to” relationships 
(CONTAINS, DESCRIBES) and not the “from” relationships (CONTAINS_BY, 
DESCRIBED_BY).  The reason for this approach is the “from” relationships may 
not be contained in the same document and are not used as often in practice.  
It also simplifies the code only having to deal with the one direction.

 

Simon Avery

 

 

On Fri, Dec 10, 2021 at 1:52 PM Steve Winslow <swins...@gmail.com 
<mailto:swins...@gmail.com> > wrote:

Hi Gary, a couple of quick initial reactions / thoughts:

 

For the Golang tools, I believe it currently handles things similarly to the 
way you described the Java tools. The in-memory representation of a Package has 
a  
<https://github.com/spdx/tools-golang/blob/9813e3e9ab9528c405c798c153e2da336b37cec9/spdx/package.go#L251>
 "Files" property, which maps SPDX IDs to File objects. A File has a similar  
<https://github.com/spdx/tools-golang/blob/9813e3e9ab9528c405c798c153e2da336b37cec9/spdx/file.go#L156>
 "Snippets" property. 

 

When parsing a tag-value SPDX 2.2 or 2.1 document, Files and Snippets are added 
into those maps based on positioning in the document, as described in 5.2.3 
<https://spdx.github.io/spdx-spec/composition-of-an-SPDX-document/#523-file-information-section>
  for the tag-value format. There is no assumed equivalence of CONTAINS 
Relationships; none are auto-generated, and such Relationships are only created 
if the document explicitly includes them. I'm not saying this is the "correct" 
approach -- just describing how the Golang tools work today.

 

One other thing I'd highlight is the fact that CONTAINS Relationships can 
reference SPDX elements from separate SPDX documents. So, at least 
syntactically, I think you can have a situation like the following:

*       Document A: 

*       defines a File SPDXRef-FileA1

*       Tag-value Document B:

*       defines a Package SPDXRef-PackageB
*       defines Files -B1 and -B2 immediately afterward (so PackageB implicitly 
CONTAINS them)
*       also states that PackageB CONTAINS FileA1 from Document A (so PackageB 
explicitly CONTAINS them, but from a different document)

I don't know why you _would_ do this, but I think you _could_ and it would be 
syntactically valid.

 

The reason I'm mentioning this is just that, at least for tag-value documents, 
we might think of the "list of Files following a Package" as being equivalent 
to "the Package CONTAINS the Files". But it's possible to contrive an example 
where some Files can be expressed as the latter but not as the former.

 

This might be a rabbit hole that isn't helpful or applicable, so feel free to 
disregard if so  :)

 

Steve

 

On Fri, Dec 10, 2021 at 4:20 PM Gary O'Neall <g...@sourceauditor.com 
<mailto:g...@sourceauditor.com> > wrote:

I would like to get some feedback from the community on some changes I’m making 
to the SPDX Java tools related to the hasFiles property in JSON and the 
CONTAINS relationship.

 

If you’re a user of the SPDX Java tools, please review the following since it 
may introduce an incompatibility with prior versions.

 

If you’re an implementer of tools that read or write SPDX, you may also want to 
review this and let us know if you agree with the approach.

 

If you’re working on the SPDX 3.0 spec, you may find this issue relevant to 
some upcoming topics related to serialization/deserialization.

 

I’d like to get feedback over the next week or two before I update the tools 
with the changes.

 

Problem statement: The SPDX Java tools are currently representing the 
relationships between the Package and the files contained in the Package in two 
possibly inconsistent ways – using a hasFile property and using the CONTAINS 
relationship between the Package and the File.  This could lead to inconsistent 
results depending on how the SPDX file was serialized.

 

Current state of the SPDX Spec:

*       The relationship CONTAINS is documented and can be used to describe a 
package CONTAINing a file in all supported serialization formats
*       Section 5.2.3 
<https://spdx.github.io/spdx-spec/composition-of-an-SPDX-document/#523-file-information-section>
  describes how the position of file and package declarations are used to 
denote which files belong to which package
*       Section 5.2.3 
<https://spdx.github.io/spdx-spec/composition-of-an-SPDX-document/#523-file-information-section>
  states “When implementing file information in RDF, the spdx:hasFile property 
is used to associate the package with the file.”
*       The RDF OWL property hasFile is defined as “Indicates that a particular 
file belongs to a package.”
*       The RDF OWL documentation for the CONTAINS relationship includes the 
comment “A Relationship of relationshipType_contains expresses that an 
SPDXElement contains the relatedSPDXElement. For example, a Package contains a 
File. (relationshipType_contains introduced in SPDX 2.0 deprecates property 
'hasFile' from SPDX 1.2)”

*       Note that comment in parenthesis is inconsistent with the hasFile 
documentation in the OWL document (it is not deprecated) and also inconsistent 
with section 5.2.32

*       The JSON schema defines a hasFiles property in the JSON Schema file 
with the same definition as RDF

 

Current state of the Tools-Java version 1.0.3:

*       The Model object SpdxPackage has a property “files” which is a 
collection based on a hasFile property in the underlying object store.
*       When deserialized, Tag/Value, JSON, YAML, XML, and Spreadsheets, will 
store any files contained by a package as a hasFile property in the underlying 
store and not as a CONTAINS relationship
*       If a package has a stated CONTAINS relationship between a package and a 
file, it will be stored as a relationship (possibly duplicating information in 
hasFile)

 

I would assert that a Package with a File listed in the hasFiles property is 
semantically the same as Package has a CONTAINS relationship with File.  This 
leads to the inconsistency described in the problem statement.

 

There are 3 alternatives I’ve looked at to resolve the inconsistency:

A.      Leave the tools as is and live with the inconsistency.
B.      Translate all CONTAINS relationships to a hasFiles property in the 
model store when deserializing.
C.      Translate all hasFiles properties into CONTAINS relationships when 
deserializing and translating back to the hasFiles property in the 
JSON/YAML/XML formats (not in the Tag/Value or RDF formats)

 

I’ve taken approach C in a large part due to the SPDX 3.0 discussions where we 
plan to allow more compact serializations and convert to Relationships when 
deserializing.  If nothing else, this would be a good experiment to see how 
this approach works in practice.

 

Approach C has the following implications on the Java-Tools:

*       Runtime model:

*       In the runtime model, any addition to the files collection for a 
package will automatically create a CONTAINS relationship between the package 
and the file
*       In the runtime model, and modification to the CONTAINS relationships 
between a package and file will be reflected in the files collection
*       There is no longer any possibility of duplication or inconsistencies 
between the CONTAINS relationship and the files collection for a package.

*       Tag/Value:

*       When deserializing, a CONTAINS relationship between the package and the 
file will be created based on the position of the files and packages per the 
spec

*       A check will be made to make sure we don’t add any duplicate CONTAINS 
relationships

*       Files serialized will aways include the CONTAINS relationships in 
addition to maintaining the proper relative positions of the packages and files

*       Note: I could remove these relationships in the serialization since 
they are redundant with the position, however, I personally think the resultant 
tag/value is clearing having the additional relationships.  Feedback is welcome 
on this point.

*       JSON/XML/YAML:

*       When deserializing, a CONTAINS relationship between the package and the 
file will be created for every element of the hasFiles list.
*       Files serialized will always use the hasFiles property for any CONTAINS 
relationship and not include the CONTAINS relationships.  

*       RDF/XML:

*       When deserializing, a CONTAINS relationship between the package and the 
file will be created for every <Package,hasFile,File> triple
*       When serializing, the CONTAINS relationships will be serialized.

*       Note: I’m quite interested in feedback if this translation to a 
Relationship makes it harder for semantic reasoners or other implementations 
using RDF

 

Thanks for reading through all this!  Let me know any concerns, thoughts, 
questions.

 

Gary

 

-------------------------------------------------

Gary O'Neall

Principal Consultant

Source Auditor Inc.

Mobile: 408.805.0586

Email: g...@sourceauditor.com <mailto:g...@sourceauditor.com> 

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is 
intended only for the person(s) or entity to which it is addressed and may 
contain confidential and/or privileged material. Any review, re-transmission, 
dissemination or other use of, or taking of any action in reliance upon this 
information by persons or entities other than the intended recipient is 
prohibited. If you received this in error, please contact the sender and 
destroy any copies of this information.

 





-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4289): https://lists.spdx.org/g/Spdx-tech/message/4289
Mute This Topic: https://lists.spdx.org/mt/87646486/21656
Group Owner: spdx-tech+ow...@lists.spdx.org
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to