Hi Andy, thanks for your response. To clarify, it would be a scenario such as a TDB with 1 million triples and the request is to produce a JSON-LD document from the "closure" around a given resource (in TopBraid's Source Code panel when the user navigates to a resource or through API calls). In other words: input is a Jena Graph, a start node and a JSON-LD frame document, and the output should be a JSON-LD describing the node and all reachable triples described by the frame.
So it sounds like Titanium cannot really be used for this as its algorithms can only operate on their own in-memory copy of a graph, and we cannot copy all 1 million triples into memory each time. Holger > On 10 Jul 2024, at 5:53 PM, Andy Seaborne <a...@apache.org> wrote: > > Hi Holger, > > How big is the database? > What sort of framing are you aiming to do? > Using framing to select some from a large database doesn't feel like the way > to extract triples as you've discovered. Framing can touch anywhere in the > JSON document. > > This recent thread is relevant -- > https://lists.apache.org/thread/3mrcyf1ccry78rkxxb6vqsm4okfffzfl > > That JSON-LD file is 280 million triples. > > It's structure is > > [{"@context": <url> , ... } > ,{"@context": <url> , ... } > ,{"@context": <url> , ... } > ... > ,{"@context": <url> , ... } > ] > > 9 million array entries. > > It looks to me like it has been produced by text manipulation, taking each > entity, writing a separate, self-contained JSON-LD object then, by text, > making a big array. That, or a tool that is designed specially to write large > JSON-LD. e.g. the outer array. > > That's the same context URL and would be a denial of service attack except > Titanium reads the whole file as JSON and runs out of space. > > The JSON-LD algorithms do assume the whole document is available. Titanium is > a faithful implementation of the spec. > > It is hard to work with. > > In JSON the whole object needs to be seen - repeated member names (and facto > - last duplicate wins) and "@context" being at the end are possible. Cases > that don't occur in XML. Streaming JSON or JSON-LD is going to have to relax > the strictness somehow. > > JSON-LD is designed around the assumption of small/medium sized data. > > And this affects writing. That large file looks like it was specially written > or at least with a tool that is designed specially to write large JSON-LD. > e.g. the outer array. > > > Jena could do with some RDFFormats + writers for JSONLD at scale. Oen obvious > one is the one extending WriterStreamRDFBatched where a batch is the subject > and its immediate triples, then write similar to the case above except in a > way that is one context then the array is with "@graph". > > https://www.w3.org/TR/json-ld11/#example-163-same-description-in-json-ld-context-shared-among-node-objects > > That doesn't solve the reading side - a companion reader would be needed that > stream-reads JSON. > > Contributions welcome! > > Andy > > On 10/07/2024 12:36, Holger Knublauch wrote: >> I am working on serializing partial RDF graphs to JSON-LD using the >> Jena-Titanium bridge. >> Problem: For Titanium to "see" the triples it needs to have a complete copy. >> See JenaTitanion.convert which copies all Jena triples into a corresponding >> RdfDatset. This cannot scale if the graph is backed by a database, and we >> only want to export certain triples (esp for Framing). Titanium's RdfGraph >> does not provide an incremental function similar to Graph.find() but only >> returns a complete Java List of all triples. >> Has anyone here run into the same problem and what would be a solution? >> I guess one solution would be an incremental algorithm that "walks" a >> @context and JSON-LD frame document to collect all required Jena triples, >> producing a sub-graph that can then be sent to Titanium. But the complexity >> of such an algorithm is similar to having to implement my own JSON-LD >> engine, which feels like an overkill. >> Holger