Le jeu. 11 juil. 2024 à 10:10, Holger Knublauch <hol...@topquadrant.com> a écrit :
> Yes, using shapes to describe the closure is one option. I wonder if > anyone has a similar algorithm that takes a JSON-LD frame and generates > SPARQL queries for the triples that are visited by the frame, i.e. directly > working on the frame JSON only? > We have something that generates SPARQL queries not based on the frame, but based on the shapes [1] We also thought about deriving the frame + the JSON schema from the shapes. The additional necessary information for that in the shapes are : 1/ what are the root NodeShapse 2/ what is the hierarchical embedding information between shapes. Thomas [1] : https://shacl-play.sparna.fr/play/sparql#documentation-1 > > Holger > > > > On 11 Jul 2024, at 7:18 AM, Nicholas Car <n...@kurrawong.net> wrote: > > > > Hi Holger and all, > > > > We do something similar to what I think you want done here with RDFrame, > a prototype tool we use so our APIs can extract triples from a store > according to a (SHACL or CQL) frame: > > > > https://rdframe.dev.kurrawong.ai/ > > > > I suspect what we are doing is early days/simple stuff compared to what > you need but the principle of frame -> SPARQL seems relevant. > > > > Cheers, Nick > > > > > > > > > > On Thursday, 11 July 2024 at 15:03, Holger Knublauch < > hol...@topquadrant.com> wrote: > > > >> Hi Andy, > >> > >> thanks for your response. To clarify, it would be a scenario such as a > TDB with 1 million triples and the request is to produce a JSON-LD document > from the "closure" around a given resource (in TopBraid's Source Code panel > when the user navigates to a resource or through API calls). In other > words: input is a Jena Graph, a start node and a JSON-LD frame document, > and the output should be a JSON-LD describing the node and all reachable > triples described by the frame. > >> > >> So it sounds like Titanium cannot really be used for this as its > algorithms can only operate on their own in-memory copy of a graph, and we > cannot copy all 1 million triples into memory each time. > >> > >> Holger > >> > >>> On 10 Jul 2024, at 5:53 PM, Andy Seaborne a...@apache.org wrote: > >>> > >>> Hi Holger, > >>> > >>> How big is the database? > >>> What sort of framing are you aiming to do? > >>> Using framing to select some from a large database doesn't feel like > the way to extract triples as you've discovered. Framing can touch anywhere > in the JSON document. > >>> > >>> This recent thread is relevant -- > >>> https://lists.apache.org/thread/3mrcyf1ccry78rkxxb6vqsm4okfffzfl > >>> > >>> That JSON-LD file is 280 million triples. > >>> > >>> It's structure is > >>> > >>> [{"@context": <url> , ... } > >>> ,{"@context": <url> , ... } > >>> ,{"@context": <url> , ... } > >>> ... > >>> ,{"@context": <url> , ... } > >>> ] > >>> > >>> 9 million array entries. > >>> > >>> It looks to me like it has been produced by text manipulation, taking > each entity, writing a separate, self-contained JSON-LD object then, by > text, making a big array. That, or a tool that is designed specially to > write large JSON-LD. e.g. the outer array. > >>> > >>> That's the same context URL and would be a denial of service attack > except Titanium reads the whole file as JSON and runs out of space. > >>> > >>> The JSON-LD algorithms do assume the whole document is available. > Titanium is a faithful implementation of the spec. > >>> > >>> It is hard to work with. > >>> > >>> In JSON the whole object needs to be seen - repeated member names (and > facto - last duplicate wins) and "@context" being at the end are possible. > Cases that don't occur in XML. Streaming JSON or JSON-LD is going to have > to relax the strictness somehow. > >>> > >>> JSON-LD is designed around the assumption of small/medium sized data. > >>> > >>> And this affects writing. That large file looks like it was specially > written or at least with a tool that is designed specially to write large > JSON-LD. e.g. the outer array. > >>> > >>> Jena could do with some RDFFormats + writers for JSONLD at scale. Oen > obvious one is the one extending WriterStreamRDFBatched where a batch is > the subject and its immediate triples, then write similar to the case above > except in a way that is one context then the array is with "@graph". > >>> > >>> > https://www.w3.org/TR/json-ld11/#example-163-same-description-in-json-ld-context-shared-among-node-objects > >>> > >>> That doesn't solve the reading side - a companion reader would be > needed that stream-reads JSON. > >>> > >>> Contributions welcome! > >>> > >>> Andy > >>> > >>> On 10/07/2024 12:36, Holger Knublauch wrote: > >>> > >>>> I am working on serializing partial RDF graphs to JSON-LD using the > Jena-Titanium bridge. > >>>> Problem: For Titanium to "see" the triples it needs to have a > complete copy. See JenaTitanion.convert which copies all Jena triples into > a corresponding RdfDatset. This cannot scale if the graph is backed by a > database, and we only want to export certain triples (esp for Framing). > Titanium's RdfGraph does not provide an incremental function similar to > Graph.find() but only returns a complete Java List of all triples. > >>>> Has anyone here run into the same problem and what would be a > solution? > >>>> I guess one solution would be an incremental algorithm that "walks" a > @context and JSON-LD frame document to collect all required Jena triples, > producing a sub-graph that can then be sent to Titanium. But the complexity > of such an algorithm is similar to having to implement my own JSON-LD > engine, which feels like an overkill. > >>>> Holger > > -- *Thomas Francart* -* SPARN**A* linked *data* | domain *ontologies* | *knowlegde* graphs blog : blog.sparna.fr, site : sparna.fr, linkedin : fr.linkedin.com/in/thomasfrancart tel : +33 (0)6.71.11.25.97