Re: JSON-LD writer and the Titanium RdfDataset

Thomas Francart Thu, 11 Jul 2024 01:46:00 -0700

Le jeu. 11 juil. 2024 à 10:10, Holger Knublauch <hol...@topquadrant.com> a
écrit :


> Yes, using shapes to describe the closure is one option. I wonder if
> anyone has a similar algorithm that takes a JSON-LD frame and generates
> SPARQL queries for the triples that are visited by the frame, i.e. directly
> working on the frame JSON only?
>

We have something that generates SPARQL queries not based on the frame, but
based on the shapes [1]
We also thought about deriving the frame + the JSON schema from the shapes.
The additional necessary information for that in the shapes are : 1/ what
are the root NodeShapse 2/ what is the hierarchical embedding information
between shapes.

Thomas

[1] : https://shacl-play.sparna.fr/play/sparql#documentation-1


>
> Holger
>
>
> > On 11 Jul 2024, at 7:18 AM, Nicholas Car <n...@kurrawong.net> wrote:
> >
> > Hi Holger and all,
> >
> > We do something similar to what I think you want done here with RDFrame,
> a prototype tool we use so our APIs can extract triples from a store
> according to a (SHACL or CQL) frame:
> >
> > https://rdframe.dev.kurrawong.ai/
> >
> > I suspect what we are doing is early days/simple stuff compared to what
> you need but the principle of frame -> SPARQL seems relevant.
> >
> > Cheers, Nick
> >
> >
> >
> >
> > On Thursday, 11 July 2024 at 15:03, Holger Knublauch <
> hol...@topquadrant.com> wrote:
> >
> >> Hi Andy,
> >>
> >> thanks for your response. To clarify, it would be a scenario such as a
> TDB with 1 million triples and the request is to produce a JSON-LD document
> from the "closure" around a given resource (in TopBraid's Source Code panel
> when the user navigates to a resource or through API calls). In other
> words: input is a Jena Graph, a start node and a JSON-LD frame document,
> and the output should be a JSON-LD describing the node and all reachable
> triples described by the frame.
> >>
> >> So it sounds like Titanium cannot really be used for this as its
> algorithms can only operate on their own in-memory copy of a graph, and we
> cannot copy all 1 million triples into memory each time.
> >>
> >> Holger
> >>
> >>> On 10 Jul 2024, at 5:53 PM, Andy Seaborne a...@apache.org wrote:
> >>>
> >>> Hi Holger,
> >>>
> >>> How big is the database?
> >>> What sort of framing are you aiming to do?
> >>> Using framing to select some from a large database doesn't feel like
> the way to extract triples as you've discovered. Framing can touch anywhere
> in the JSON document.
> >>>
> >>> This recent thread is relevant --
> >>> https://lists.apache.org/thread/3mrcyf1ccry78rkxxb6vqsm4okfffzfl
> >>>
> >>> That JSON-LD file is 280 million triples.
> >>>
> >>> It's structure is
> >>>
> >>> [{"@context": <url> , ... }
> >>> ,{"@context": <url> , ... }
> >>> ,{"@context": <url> , ... }
> >>> ...
> >>> ,{"@context": <url> , ... }
> >>> ]
> >>>
> >>> 9 million array entries.
> >>>
> >>> It looks to me like it has been produced by text manipulation, taking
> each entity, writing a separate, self-contained JSON-LD object then, by
> text, making a big array. That, or a tool that is designed specially to
> write large JSON-LD. e.g. the outer array.
> >>>
> >>> That's the same context URL and would be a denial of service attack
> except Titanium reads the whole file as JSON and runs out of space.
> >>>
> >>> The JSON-LD algorithms do assume the whole document is available.
> Titanium is a faithful implementation of the spec.
> >>>
> >>> It is hard to work with.
> >>>
> >>> In JSON the whole object needs to be seen - repeated member names (and
> facto - last duplicate wins) and "@context" being at the end are possible.
> Cases that don't occur in XML. Streaming JSON or JSON-LD is going to have
> to relax the strictness somehow.
> >>>
> >>> JSON-LD is designed around the assumption of small/medium sized data.
> >>>
> >>> And this affects writing. That large file looks like it was specially
> written or at least with a tool that is designed specially to write large
> JSON-LD. e.g. the outer array.
> >>>
> >>> Jena could do with some RDFFormats + writers for JSONLD at scale. Oen
> obvious one is the one extending WriterStreamRDFBatched where a batch is
> the subject and its immediate triples, then write similar to the case above
> except in a way that is one context then the array is with "@graph".
> >>>
> >>>
> https://www.w3.org/TR/json-ld11/#example-163-same-description-in-json-ld-context-shared-among-node-objects
> >>>
> >>> That doesn't solve the reading side - a companion reader would be
> needed that stream-reads JSON.
> >>>
> >>> Contributions welcome!
> >>>
> >>> Andy
> >>>
> >>> On 10/07/2024 12:36, Holger Knublauch wrote:
> >>>
> >>>> I am working on serializing partial RDF graphs to JSON-LD using the
> Jena-Titanium bridge.
> >>>> Problem: For Titanium to "see" the triples it needs to have a
> complete copy. See JenaTitanion.convert which copies all Jena triples into
> a corresponding RdfDatset. This cannot scale if the graph is backed by a
> database, and we only want to export certain triples (esp for Framing).
> Titanium's RdfGraph does not provide an incremental function similar to
> Graph.find() but only returns a complete Java List of all triples.
> >>>> Has anyone here run into the same problem and what would be a
> solution?
> >>>> I guess one solution would be an incremental algorithm that "walks" a
> @context and JSON-LD frame document to collect all required Jena triples,
> producing a sub-graph that can then be sent to Titanium. But the complexity
> of such an algorithm is similar to having to implement my own JSON-LD
> engine, which feels like an overkill.
> >>>> Holger
>
>

-- 

*Thomas Francart* -* SPARN**A*
linked *data* | domain *ontologies* | *knowlegde* graphs
blog : blog.sparna.fr, site : sparna.fr, linkedin :
fr.linkedin.com/in/thomasfrancart
tel :  +33 (0)6.71.11.25.97

Re: JSON-LD writer and the Titanium RdfDataset

Reply via email to