My 2 cents: Base 64 might be preferable to Hex encoding since it is inherently more compact
Rob From: Nicholas Car <n...@kurrawong.net> Date: Thursday, 4 May 2023 at 10:58 To: users@jena.apache.org <users@jena.apache.org> Subject: Re: Binary literals Hi Rob, Thanks for this: it is pretty much as I thought! I think we will be able to cater for WKB then in GeoSPARQL 1.3 with just hex encoding of the value and ^^geo:wkbLiteral and then, as you say, implementers, like Jena-geosparql, can just read the hex into their spatial indexes one-time. I see little value in this other than meeting an allowed data type in the Simple Features standard, then again, I see little value in KML and other existing, allowed, formats too! Cheers, Nick ------- Original Message ------- On Thursday, May 4th, 2023 at 18:30, Rob @ DNR <rve...@dotnetrdf.org> wrote: > Well, the RDF specifications fundamentally define RDF literals to be the > following: > > * a lexical form, being a Unicode > [UNICODEhttps://www.w3.org/TR/rdf11-concepts/#bib-UNICODE] string, which > should be in Normal Form C [NFChttps://www.w3.org/TR/rdf11-concepts/#bib-NFC], > > https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal > > So, you are effectively forced to use some sort of string-based encoding of > the binary data to represent any literal, whether that underlying datatype is > truly binary data. > > Now in principle you could define a custom implementation of the LiteralLabel > interface that stores the value as true binary, i.e. byte[], and only > materialises it into a string encoding when absolutely necessary. This could > then be used to create instances via NodeFactory.create(LiteralLabel). > > However, data into and out of the system is generally going to be via a RDF > serialisation, which again will require string encoding or decoding as > appropriate. And the parsers don’t really care about datatypes so your custom > implementation wouldn’t get used. Thus, whether a custom LiteralLabel would > actually gain you anything would depend on how the data is coming into the > system and how you consume it. If the data is coming in via some programmatic > means that isn’t parsing serialised RDF then maybe but I don’t think it would > gain you much. > > For spatial indexing generally the approach of a GeoSPARQL implementation is > to build the spatial index up-front so you’d only pay the cost of the string > to binary decoding once when the index was first built from the RDF data. The > spatial index is going to convert the incoming geo-data into its own internal > index structures that will be very efficient to access, at which point > whether the binary data was originally string encoded is irrelevant. > > Regards, > > Rob Vesse > > From: Nicholas Car n...@kurrawong.net > > Date: Wednesday, 3 May 2023 at 23:22 > To: users@jena.apache.org users@jena.apache.org > > Subject: Re: Binary literals > I see Base64 is an XSD option too, but I’m most interested in “true” binary, > as opposed to binary-as-text options, and whether any exist! > > Nick > > On Thu, May 4, 2023 at 8:13 am, Nicholas Car <[n...@kurrawong.net](mailto:On > Thu, May 4, 2023 at 8:13 am, Nicholas Car <<a href=)> wrote: > > > Dear Jena users, > > > > How can I store binary literals in RDF and in Jena/Fuseki? > > > > There is xsd:hexBinary for arbitrary binary data but is there a better/more > > efficient/another way to store binary literals in Jena? > > > > The reason I ask is that a future version of GeoSPARQL might want to > > include WKB - Well-Known Binary - as a geometry format option. We would > > hope this can be efficiently accessed by a spatial index so we want to know > > how to handle perhaps a custom data type, perhaps geo:wkbLiteral, and how > > best to store this in Jena, perhaps not as hex text. > > > > Thanks, Nick