Yes, using shapes to describe the closure is one option. I wonder if anyone has a similar algorithm that takes a JSON-LD frame and generates SPARQL queries for the triples that are visited by the frame, i.e. directly working on the frame JSON only?
Holger > On 11 Jul 2024, at 7:18 AM, Nicholas Car <n...@kurrawong.net> wrote: > > Hi Holger and all, > > We do something similar to what I think you want done here with RDFrame, a > prototype tool we use so our APIs can extract triples from a store according > to a (SHACL or CQL) frame: > > https://rdframe.dev.kurrawong.ai/ > > I suspect what we are doing is early days/simple stuff compared to what you > need but the principle of frame -> SPARQL seems relevant. > > Cheers, Nick > > > > > On Thursday, 11 July 2024 at 15:03, Holger Knublauch <hol...@topquadrant.com> > wrote: > >> Hi Andy, >> >> thanks for your response. To clarify, it would be a scenario such as a TDB >> with 1 million triples and the request is to produce a JSON-LD document from >> the "closure" around a given resource (in TopBraid's Source Code panel when >> the user navigates to a resource or through API calls). In other words: >> input is a Jena Graph, a start node and a JSON-LD frame document, and the >> output should be a JSON-LD describing the node and all reachable triples >> described by the frame. >> >> So it sounds like Titanium cannot really be used for this as its algorithms >> can only operate on their own in-memory copy of a graph, and we cannot copy >> all 1 million triples into memory each time. >> >> Holger >> >>> On 10 Jul 2024, at 5:53 PM, Andy Seaborne a...@apache.org wrote: >>> >>> Hi Holger, >>> >>> How big is the database? >>> What sort of framing are you aiming to do? >>> Using framing to select some from a large database doesn't feel like the >>> way to extract triples as you've discovered. Framing can touch anywhere in >>> the JSON document. >>> >>> This recent thread is relevant -- >>> https://lists.apache.org/thread/3mrcyf1ccry78rkxxb6vqsm4okfffzfl >>> >>> That JSON-LD file is 280 million triples. >>> >>> It's structure is >>> >>> [{"@context": <url> , ... } >>> ,{"@context": <url> , ... } >>> ,{"@context": <url> , ... } >>> ... >>> ,{"@context": <url> , ... } >>> ] >>> >>> 9 million array entries. >>> >>> It looks to me like it has been produced by text manipulation, taking each >>> entity, writing a separate, self-contained JSON-LD object then, by text, >>> making a big array. That, or a tool that is designed specially to write >>> large JSON-LD. e.g. the outer array. >>> >>> That's the same context URL and would be a denial of service attack except >>> Titanium reads the whole file as JSON and runs out of space. >>> >>> The JSON-LD algorithms do assume the whole document is available. Titanium >>> is a faithful implementation of the spec. >>> >>> It is hard to work with. >>> >>> In JSON the whole object needs to be seen - repeated member names (and >>> facto - last duplicate wins) and "@context" being at the end are possible. >>> Cases that don't occur in XML. Streaming JSON or JSON-LD is going to have >>> to relax the strictness somehow. >>> >>> JSON-LD is designed around the assumption of small/medium sized data. >>> >>> And this affects writing. That large file looks like it was specially >>> written or at least with a tool that is designed specially to write large >>> JSON-LD. e.g. the outer array. >>> >>> Jena could do with some RDFFormats + writers for JSONLD at scale. Oen >>> obvious one is the one extending WriterStreamRDFBatched where a batch is >>> the subject and its immediate triples, then write similar to the case above >>> except in a way that is one context then the array is with "@graph". >>> >>> https://www.w3.org/TR/json-ld11/#example-163-same-description-in-json-ld-context-shared-among-node-objects >>> >>> That doesn't solve the reading side - a companion reader would be needed >>> that stream-reads JSON. >>> >>> Contributions welcome! >>> >>> Andy >>> >>> On 10/07/2024 12:36, Holger Knublauch wrote: >>> >>>> I am working on serializing partial RDF graphs to JSON-LD using the >>>> Jena-Titanium bridge. >>>> Problem: For Titanium to "see" the triples it needs to have a complete >>>> copy. See JenaTitanion.convert which copies all Jena triples into a >>>> corresponding RdfDatset. This cannot scale if the graph is backed by a >>>> database, and we only want to export certain triples (esp for Framing). >>>> Titanium's RdfGraph does not provide an incremental function similar to >>>> Graph.find() but only returns a complete Java List of all triples. >>>> Has anyone here run into the same problem and what would be a solution? >>>> I guess one solution would be an incremental algorithm that "walks" a >>>> @context and JSON-LD frame document to collect all required Jena triples, >>>> producing a sub-graph that can then be sent to Titanium. But the >>>> complexity of such an algorithm is similar to having to implement my own >>>> JSON-LD engine, which feels like an overkill. >>>> Holger