On 11/10/2020 03:20, Zalan Kemenczy wrote:
Hi there,

I'm running into some issues with Jena datatypes and canonicalization. The
following doc:

https://jena.apache.org/documentation/tdb/value_canonicalization.html

explains that TDB understands XML derived types, and that types derived
from each other should all match their canonicalized value. I'm not sure
this is what I'm seeing.

TDB1 and TDB2 do different things. TDB2 preserves integer subtypes datatypes whereas TDB1 does not.

I have some data in a TDB2 instance:

<#example> <http://xmlns.com/foaf/0.1/age> "70"^^
http://www.w3.org/2001/XMLSchema#long .

where the following query will match:

(bgp (triple ?e <http://xmlns.com/foaf/0.1/age> "70"^^<
http://www.w3.org/2001/XMLSchema#long>))

but neither of the following do:

(bgp (triple ?e <http://xmlns.com/foaf/0.1/age> "70"^^<
http://www.w3.org/2001/XMLSchema#int>))
(bgp (triple ?e <http://xmlns.com/foaf/0.1/age> "70"^^<
http://www.w3.org/2001/XMLSchema#integer>))
>
I interpret the canonicalization doc to mean these three queries should be
functionally equivalent. Have I misunderstood something?

It's the lexical form that is canonicalized - not the datatype.

The distinction of "070", "70", "+70" etc is lost and become "70".

My other issue is that when I query for longs in the db, certain arithmetic
operators will coerce results to `XSDDatatype/XSDinteger`, even if both
operands are `XSDDatatype/XSDlong`:
>
(extend ((?age-new (+ ?age "1"^^<http://www.w3.org/2001/XMLSchema#long>)))
   (bgp (triple ?e <http://xmlns.com/foaf/0.1/age> ?age)))

Is this expected? I would have thought the Datatype would be preserved if
possible.

Expected - yes. Jena has always done that - it's not related to TDB or canonicalization.

This is about XPath/XQuery Functions and Operators (F&O).

The text is there is a bit murky and it has changed across F&O versions. Strictly, SPARQL is defined for 2.0.

When that spec says "the same type" in the context of the that spec at that point, does it mean same xs:numeric basic types: xs:integer, xs:decimal, xs:float and xs:double, or exact datatype? (The later 3.1 adds "primitive datatype" but that post-dates SPARQL.)

Generally, the interpretation seems to be the former - so long+long is xs:integer+xs:integer (and can't overflow except for the implementation limit on xs:integer) which is what Jena does.

I can't find any examples on the web that show long + long -> long.
All examples are "integer + integer" suggesting arguments are raised to one of the four xs:numeric types.


The safest approach if you want xs:long is to cast:

xsd:long(?age + "1"^^xsd:long)

The text in the latest spec 3.1 is:
https://www.w3.org/TR/xpath-functions/#op.numeric

I'd be interested in knowing what other systems do here.

    Andy


Thanks in advance!

Zalan

Reply via email to