Question about basic vs extended language ranges

Håvard Ottestad Sat, 12 Sep 2020 03:56:10 -0700

Hi,

I’ve been trying to get basic language ranges working for the SHACL engine in 
RDF4J and I’ve stumbled upon some differences between how RDF4J and Jena 
implement basic language ranges.


The SPARQL spec points to: https://www.ietf.org/rfc/rfc4647.txt 
<https://www.ietf.org/rfc/rfc4647.txt>
Specifically sections
 -  2.1.  Basic Language Range
 - 3.3.1.  Basic Filtering

Looking at the ABNF in 2.1.

   language-range   = (1*8ALPHA *("-" 1*8alphanum)) / "*"
   alphanum         = ALPHA / DIGIT

It looks like “*” is legal, “en” is legal and “en-gb” is legal (and even 
“a-ab-abc-12345678-a”). But “*-gb” is not legal and neither is “en-*”.

It seems like the range “en” would match a tag “en-gb” and a tag “en”.

I had a deep dive into the langMatch code in Jena and it seems to support “*” 
at any position in the range. 

Is Jena supporting part of the extended range specification, or am I missing 
something? (I have been missing a lot of things lately :P so I wouldn’t be 
surprised).

Cheers,
Håvard



PS: From 2.2.  Extended Language Range

   extended-language-range = (1*8ALPHA / "*”) *("-" (1*8alphanum / "*"))

Question about basic vs extended language ranges

Reply via email to