On 06/09/2024 13:32, John Walker wrote:
Hi Andy,
-----Original Message-----
From: Andy Seaborne <[email protected]>
Sent: Friday, 6 September 2024 10:54
To: [email protected]
Subject: Re: rdfdiff shows graphs are unequal, but does not list the differences
On 05/09/2024 19:12, John Walker wrote:
Hi,
I am working on a project where we cleanse/normalize some RDF data, and I
am using the rdfdiff utility to compare the input and output.
The utility tells me the models are unequal, but it does not list any
statements.
Through a process of trial and error, I could isolate a couple of literals that
are
changed, but it’s unclear why the utility does not detect them.
* "1756"^^xsd:int --> "1756"^^xsd:integer
* "2024-03-13T12:52:06.227Z"^^xsd:dateTime -->
"2024-03-13T12:52:06.227000+00:00"^^xsd:dateTime
See attached minimal examples.
$ rdfdiff original.ttl modified.ttl TTL TTL models are unequal
Does Jena normalize literals when parsing the input files?
Not unless you configure the parser to do that or it goes into TDB.
Are the literal values different, or not?
xsd:dateTime: They are different RDF terms, they represent the same value.
Am I correct to say the variant with "Z" time zone is the canonical lexical
representation?
Yes.
https://www.w3.org/TR/xmlschema11-2/#f-tzCanFragMap
xsd:int/xsd:integer:
TDB1 blurs the difference, TDB2 retains the datatype.
Reading SPARQL 1.1 recommendation, is it correct to say:
"1756"^^xsd:int = "1756"^^xsd:integer produces a type error
That is not a type error. "=" compares values and they have the same
value space (numbers) so they can be be compared.
https://www.w3.org/TR/sparql11-query/#OperatorMapping
sameTerm("1756"^^xsd:int, "1756"^^xsd:integer) = false
Correct.
There can be multiple terms for the same value.
also false --
sameTerm("+1756"^^xsd:integer, "1756"^^xsd:integer)
sameTerm("01756"^^xsd:integer, "1756"^^xsd:integer)
This because RDF 1.1 Concepts and Abstract Syntax literal term equality
requires the datatype IRIs to compare equal, character by character.
and also RDF is not dependent on XSD datatypes. They are suggested but
there is no requirement to handle XSD. There is in SPARQL for a limited
set of datatypes and pragmatically, many triple store support a lot more
than the minimum.
I'm a bit puzzled by the example for
"2004-12-31T19:00:00-05:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> and
xsd:dateTime("2005-01-01T00:00:00Z").
Those are not character by character equal, but the RDFterm-equal returns.
They are not term equals (sameTerm).
They are value equals (the same point on the time line)
RDFTerm-Equal is a fallback. Two terms are value-equal if the terms are
the sameTerm regardless of understanding the datatype (datatypes are a
function).
Two dateTimes will be dispatched further up the table by:
A = B xsd:dateTime xsd:dateTime op:dateTime-equal(A, B)
Is this a bug?
Which Jena version are you running?
I'm running 4.10.0 locally.
Jena5 changed to "term equality" everywhere for in-memory, with TDB still
storing values.
I'll try with the latest release.
John
Andy
Regards,
John Walker
Principal Consultant & co-founder
Semaku B.V. | Torenallee 20 (SFJ 3D) | 5617 BC Eindhoven | T +31 6
42590072 | https://semaku.com/
KvK: 58031405 | BTW: NL852842156B01 | IBAN: NL94 INGB 0008 3219 95