Well, the RDF specifications fundamentally define RDF literals to be the 
following:

  *   a lexical form, being a Unicode 
[UNICODE<https://www.w3.org/TR/rdf11-concepts/#bib-UNICODE>] string, which 
should be in Normal Form C [NFC<https://www.w3.org/TR/rdf11-concepts/#bib-NFC>],
https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal

So, you are effectively forced to use some sort of string-based encoding of the 
binary data to represent any literal, whether that underlying datatype is truly 
binary data.

Now in principle you could define a custom implementation of the LiteralLabel 
interface that stores the value as true binary, i.e. byte[], and only 
materialises it into a string encoding when absolutely necessary.  This could 
then be used to create instances via NodeFactory.create(LiteralLabel).

However, data into and out of the system is generally going to be via a RDF 
serialisation, which again will require string encoding or decoding as 
appropriate.  And the parsers don’t really care about datatypes so your custom 
implementation wouldn’t get used.  Thus, whether a custom LiteralLabel would 
actually gain you anything would depend on how the data is coming into the 
system and how you consume it.  If the data is coming in via some programmatic 
means that isn’t parsing serialised RDF then maybe but I don’t think it would 
gain you much.

For spatial indexing generally the approach of a GeoSPARQL implementation is to 
build the spatial index up-front so you’d only pay the cost of the string to 
binary decoding once when the index was first built from the RDF data.  The 
spatial index is going to convert the incoming geo-data into its own internal 
index structures that will be very efficient to access, at which point whether 
the binary data was originally string encoded is irrelevant.

Regards,

Rob Vesse

From: Nicholas Car <n...@kurrawong.net>
Date: Wednesday, 3 May 2023 at 23:22
To: users@jena.apache.org <users@jena.apache.org>
Subject: Re: Binary literals
I see Base64 is an XSD option too, but I’m most interested in “true” binary, as 
opposed to binary-as-text options, and whether any exist!

Nick

On Thu, May 4, 2023 at 8:13 am, Nicholas Car <[n...@kurrawong.net](mailto:On 
Thu, May 4, 2023 at 8:13 am, Nicholas Car <<a href=)> wrote:

> Dear Jena users,
>
> How can I store binary literals in RDF and in Jena/Fuseki?
>
> There is xsd:hexBinary for arbitrary binary data but is there a better/more 
> efficient/another way to store binary literals in Jena?
>
> The reason I ask is that a future version of GeoSPARQL might want to include 
> WKB - Well-Known Binary - as a geometry format option. We would hope this can 
> be efficiently accessed by a spatial index so we want to know how to handle 
> perhaps a custom data type, perhaps geo:wkbLiteral, and how best to store 
> this in Jena, perhaps not as hex text.
>
> Thanks, Nick

Reply via email to