If it's ^^geo:wkbLiteral, then the lexical form isn't restricted to
xsd:hexBinary or xsd:base64Binary.
It can be anything where the lexical form is Unicode characters (this is
the block on it using binary. The datatype defines how to go from
lexical form to value.
Space efficiency isn't the only factor - whether there are visually
similar characters might be important, or whether a few characters of
checksum (e.g. ISBN check digit) is useful.
Base85 (density) RFC1924, Base58 avoids ambiguous looking characters and
avoids + (HTML form encoding issue).
Or a custom one, if you want some characters to add structure to the value.
base64 is common and quite dense (75%)
Andy
https://en.wikipedia.org/wiki/Binary-to-text_encoding
On 04/05/2023 13:41, Rob @ DNR wrote:
My 2 cents: Base 64 might be preferable to Hex encoding since it is inherently
more compact
Rob
From: Nicholas Car <[email protected]>
Date: Thursday, 4 May 2023 at 10:58
To: [email protected] <[email protected]>
Subject: Re: Binary literals
Hi Rob,
Thanks for this: it is pretty much as I thought!
I think we will be able to cater for WKB then in GeoSPARQL 1.3 with just hex
encoding of the value and ^^geo:wkbLiteral and then, as you say, implementers,
like Jena-geosparql, can just read the hex into their spatial indexes one-time.
I see little value in this other than meeting an allowed data type in the
Simple Features standard, then again, I see little value in KML and other
existing, allowed, formats too!
Cheers, Nick
------- Original Message -------
On Thursday, May 4th, 2023 at 18:30, Rob @ DNR <[email protected]> wrote:
Well, the RDF specifications fundamentally define RDF literals to be the
following:
* a lexical form, being a Unicode
[UNICODEhttps://www.w3.org/TR/rdf11-concepts/#bib-UNICODE] string, which should
be in Normal Form C [NFChttps://www.w3.org/TR/rdf11-concepts/#bib-NFC],
https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal
So, you are effectively forced to use some sort of string-based encoding of the
binary data to represent any literal, whether that underlying datatype is truly
binary data.
Now in principle you could define a custom implementation of the LiteralLabel
interface that stores the value as true binary, i.e. byte[], and only
materialises it into a string encoding when absolutely necessary. This could
then be used to create instances via NodeFactory.create(LiteralLabel).
However, data into and out of the system is generally going to be via a RDF
serialisation, which again will require string encoding or decoding as
appropriate. And the parsers don’t really care about datatypes so your custom
implementation wouldn’t get used. Thus, whether a custom LiteralLabel would
actually gain you anything would depend on how the data is coming into the
system and how you consume it. If the data is coming in via some programmatic
means that isn’t parsing serialised RDF then maybe but I don’t think it would
gain you much.
For spatial indexing generally the approach of a GeoSPARQL implementation is to
build the spatial index up-front so you’d only pay the cost of the string to
binary decoding once when the index was first built from the RDF data. The
spatial index is going to convert the incoming geo-data into its own internal
index structures that will be very efficient to access, at which point whether
the binary data was originally string encoded is irrelevant.
Regards,
Rob Vesse
From: Nicholas Car [email protected]
Date: Wednesday, 3 May 2023 at 23:22
To: [email protected] [email protected]
Subject: Re: Binary literals
I see Base64 is an XSD option too, but I’m most interested in “true” binary, as
opposed to binary-as-text options, and whether any exist!
Nick
On Thu, May 4, 2023 at 8:13 am, Nicholas Car <[[email protected]](mailto:On Thu, May 4,
2023 at 8:13 am, Nicholas Car <<a href=)> wrote:
Dear Jena users,
How can I store binary literals in RDF and in Jena/Fuseki?
There is xsd:hexBinary for arbitrary binary data but is there a better/more
efficient/another way to store binary literals in Jena?
The reason I ask is that a future version of GeoSPARQL might want to include
WKB - Well-Known Binary - as a geometry format option. We would hope this can
be efficiently accessed by a spatial index so we want to know how to handle
perhaps a custom data type, perhaps geo:wkbLiteral, and how best to store this
in Jena, perhaps not as hex text.
Thanks, Nick