Hi Andy
> -----Original Message----- > From: Andy Seaborne <a...@apache.org> > Sent: Monday, 9 September 2024 22:07 > To: users@jena.apache.org > Subject: Re: rdfdiff shows graphs are unequal, but does not list the > differences > > John, > > So they differ only in the object and the terms are "same value" but not "same > term"? Yes, exactly. I made a gist with the examples: https://gist.github.com/jaw111/f4a9fb64904c442e36d64fea327e8a81 So, you would need to diff a_10.ttl with b_10.ttl and a_2.ttl with b_2.ttl to see the different behaviour. John > > Seems to work with the current Jena development codebase, but not in Jena > 5.1.0. > > "Fixed in the next release". > > Andy > > On 09/09/2024 10:40, John Walker wrote: > > Hi Andy, > > > > Thanks for the quick reply! > > > > > >> -----Original Message----- > >> From: Andy Seaborne <a...@apache.org> > >> Sent: Friday, 6 September 2024 16:40 > >> To: users@jena.apache.org > >> Subject: Re: rdfdiff shows graphs are unequal, but does not list the > >> differences > >> > >> > >> > >> On 06/09/2024 13:32, John Walker wrote: > >>> Hi Andy, > >>> > >>>> -----Original Message----- > >>>> From: Andy Seaborne <a...@apache.org> > >>>> Sent: Friday, 6 September 2024 10:54 > >>>> To: users@jena.apache.org > >>>> Subject: Re: rdfdiff shows graphs are unequal, but does not list > >>>> the differences > >>>> > >>>> > >>>> > >>>> On 05/09/2024 19:12, John Walker wrote: > >>>>> Hi, > >>>>> > >>>>> I am working on a project where we cleanse/normalize some RDF > >>>>> data, and I > >>>> am using the rdfdiff utility to compare the input and output. > >>>>> The utility tells me the models are unequal, but it does not list > >>>>> any > >>>> statements. > >>>>> > >>>>> Through a process of trial and error, I could isolate a couple of > >>>>> literals that are > >>>> changed, but it’s unclear why the utility does not detect them. > >>>>> * "1756"^^xsd:int --> "1756"^^xsd:integer > >>>>> * "2024-03-13T12:52:06.227Z"^^xsd:dateTime --> > >>>>> "2024-03-13T12:52:06.227000+00:00"^^xsd:dateTime > >>>>> > >>>>> See attached minimal examples. > >>>>> > >>>>> $ rdfdiff original.ttl modified.ttl TTL TTL models are unequal > >>>>> > >>>>> Does Jena normalize literals when parsing the input files? > >>>> > >>>> Not unless you configure the parser to do that or it goes into TDB. > >>>> > >>>>> Are the literal values different, or not? > >>>> > >>>> xsd:dateTime: They are different RDF terms, they represent the same > value. > >>> > >>> Am I correct to say the variant with "Z" time zone is the canonical > >>> lexical > >> representation? > >> > >> Yes. > >> https://www.w3.org/TR/xmlschema11-2/#f-tzCanFragMap > >> > >>>> xsd:int/xsd:integer: > >>>> TDB1 blurs the difference, TDB2 retains the datatype. > >>> > >>> Reading SPARQL 1.1 recommendation, is it correct to say: > >>> > >>> "1756"^^xsd:int = "1756"^^xsd:integer produces a type error > >> > >> That is not a type error. "=" compares values and they have the same > >> value space (numbers) so they can be be compared. > >> > >> https://www.w3.org/TR/sparql11-query/#OperatorMapping > >> > >>> sameTerm("1756"^^xsd:int, "1756"^^xsd:integer) = false > >> > >> Correct. > >> > >> There can be multiple terms for the same value. > >> > >> also false -- > >> > >> sameTerm("+1756"^^xsd:integer, "1756"^^xsd:integer) > >> sameTerm("01756"^^xsd:integer, "1756"^^xsd:integer) > >> > >> > >>> This because RDF 1.1 Concepts and Abstract Syntax literal term > >>> equality > >> requires the datatype IRIs to compare equal, character by character. > >> > >> and also RDF is not dependent on XSD datatypes. They are suggested > >> but there is no requirement to handle XSD. There is in SPARQL for a > >> limited set of datatypes and pragmatically, many triple store support > >> a lot more than the minimum. > >> > >>> > >>> I'm a bit puzzled by the example for "2004-12-31T19:00:00- > >> 05:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> and > >> xsd:dateTime("2005-01-01T00:00:00Z"). > >>> Those are not character by character equal, but the RDFterm-equal > returns. > >> > >> They are not term equals (sameTerm). > >> They are value equals (the same point on the time line) > >> > >> RDFTerm-Equal is a fallback. Two terms are value-equal if the terms > >> are the sameTerm regardless of understanding the datatype (datatypes > >> are a function). > > > > OK, clear. > > > >> > >> Two dateTimes will be dispatched further up the table by: > >> > >> A = B xsd:dateTime xsd:dateTime op:dateTime-equal(A, B) > >> > >> > >> > >>> > >>>> > >>>> > >>>>> Is this a bug? > >>>> > >>>> Which Jena version are you running? > >>> > >>> I'm running 4.10.0 locally. > >>> > >>>> > >>>> Jena5 changed to "term equality" everywhere for in-memory, with TDB > >>>> still storing values. > >>> > >>> I'll try with the latest release. > > > > Using 5.1.0 the rdfdiff does output the statements with different terms > when I try it with my initial files. > > However, when I try with the smaller examples from my earlier mail, then no > diff is shown. > > > > $ rdfdiff original.ttl modified.ttl TTL TTL models are unequal > > > > It seems strange, but if I add more statements to both files, then it does > output the diff when both graphs contain at least 10 statements: > > > > $ rdfdiff original.ttl modified.ttl TTL TTL models are unequal > > > > < [http://example.com/this, http://purl.org/dc/terms/modified, > > "2024-03-13T12:52:06.227Z"^^xsd:dateTime] > > < [http://example.com/this, http://open-services.net/ns/core#shortId, > > "1756"^^xsd:int] > >> [http://example.com/this, http://purl.org/dc/terms/modified, > >> "2024-03-13T12:52:06.227000+00:00"^^xsd:dateTime] > >> [http://example.com/this, http://open-services.net/ns/core#shortId, > >> "1756"^^xsd:integer] > > > > Seems like odd behaviour. > > > > John > > > >>> > >>> John > >>> > >>>> > >>>> Andy > >>>> > >>>>> > >>>>> Regards, > >>>>> > >>>>> John Walker > >>>>> Principal Consultant & co-founder > >>>>> > >>>>> Semaku B.V. | Torenallee 20 (SFJ 3D) | 5617 BC Eindhoven | T +31 6 > >>>>> 42590072 | https://semaku.com/ > >>>>> KvK: 58031405 | BTW: NL852842156B01 | IBAN: NL94 INGB 0008 > 3219 > >> 95 > >>> > >