On 06/09/2024 13:32, John Walker wrote:
Hi Andy,

-----Original Message-----
From: Andy Seaborne <[email protected]>
Sent: Friday, 6 September 2024 10:54
To: [email protected]
Subject: Re: rdfdiff shows graphs are unequal, but does not list the differences



On 05/09/2024 19:12, John Walker wrote:
Hi,

I am working on a project where we cleanse/normalize some RDF data, and I
am using the rdfdiff utility to compare the input and output.
The utility tells me the models are unequal, but it does not list any
statements.

Through a process of trial and error, I could isolate a couple of literals that 
are
changed, but it’s unclear why the utility does not detect them.
* "1756"^^xsd:int --> "1756"^^xsd:integer
* "2024-03-13T12:52:06.227Z"^^xsd:dateTime -->
"2024-03-13T12:52:06.227000+00:00"^^xsd:dateTime

See attached minimal examples.

$ rdfdiff original.ttl modified.ttl TTL TTL models are unequal

Does Jena normalize literals when parsing the input files?

Not unless you configure the parser to do that or it goes into TDB.

Are the literal values different, or not?

xsd:dateTime: They are different RDF terms, they represent the same value.

Am I correct to say the variant with "Z" time zone is the canonical lexical 
representation?

Yes.
https://www.w3.org/TR/xmlschema11-2/#f-tzCanFragMap

xsd:int/xsd:integer:
TDB1 blurs the difference, TDB2 retains the datatype.

Reading SPARQL 1.1 recommendation, is it correct to say:

"1756"^^xsd:int = "1756"^^xsd:integer produces a type error

That is not a type error. "=" compares values and they have the same value space (numbers) so they can be be compared.

https://www.w3.org/TR/sparql11-query/#OperatorMapping

sameTerm("1756"^^xsd:int, "1756"^^xsd:integer) = false

Correct.

There can be multiple terms for the same value.

also false --

sameTerm("+1756"^^xsd:integer, "1756"^^xsd:integer)
sameTerm("01756"^^xsd:integer, "1756"^^xsd:integer)


This because RDF 1.1 Concepts and Abstract Syntax literal term equality 
requires the datatype IRIs to compare equal, character by character.

and also RDF is not dependent on XSD datatypes. They are suggested but there is no requirement to handle XSD. There is in SPARQL for a limited set of datatypes and pragmatically, many triple store support a lot more than the minimum.


I'm a bit puzzled by the example for 
"2004-12-31T19:00:00-05:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> and 
xsd:dateTime("2005-01-01T00:00:00Z").
Those are not character by character equal, but the RDFterm-equal returns.

They are not term equals (sameTerm).
They are value equals (the same point on the time line)

RDFTerm-Equal is a fallback. Two terms are value-equal if the terms are the sameTerm regardless of understanding the datatype (datatypes are a function).

Two dateTimes will be dispatched further up the table by:

A = B   xsd:dateTime    xsd:dateTime    op:dateTime-equal(A, B)






Is this a bug?

Which Jena version are you running?

I'm running 4.10.0 locally.


Jena5 changed to "term equality" everywhere for in-memory, with TDB still
storing values.

I'll try with the latest release.

John


      Andy


Regards,

John Walker
Principal Consultant & co-founder

Semaku B.V. | Torenallee 20 (SFJ 3D) | 5617 BC Eindhoven | T +31 6
42590072 | https://semaku.com/
KvK: 58031405 | BTW: NL852842156B01 | IBAN: NL94 INGB 0008 3219 95


Reply via email to