Hi,

As a community resource, I have collected weekly Wikidata truthy dumps for nearly a year, sorted them and computed diffs between those sorted dumps.

The diffs are bipz2-compressed text rdfpatch files sorted by their payload.

This way, one can just be merge sort them using `sort -m -s -k2`.

Explanation: -m=merge, -s=stable (order of patch application matters), -k2=2nd field - which is the quad payload; first field is the A/D flag for added/deleted.


The simple formula is: Sorted dump + sorted patch → a sorted dump that can be patched again.

A truthy dump is 40GB - half a year of patches is less than 4GB/10% of that.


Datasets: https://huggingface.co/datasets/Aklakan/wikidata-sorted-nquads-and-diffs/tree/main/truthy-BETA/2026/diffs

Posix-based patch tool: https://github.com/Scaseco/nqpatch-posix


I also just saw there is https://github.com/apache/jena/issues/3835 which seems to indicate there is effort into a similar direction.

I will upload sha1 checksum metadata for these files in the coming days, that should make it even more useful because it

lets you build a dependency graph of the dumps and diffs in hash space.


Cheers,

Claus



On 1/15/26 01:56, [email protected] wrote:
We use it with the rocksdb datastore to keep two Fuseki instances in  sync for 
High availability.

We have it setup so that RDF updates flow into rdf-delta via a message broker 
(azure service bus).

rdf-delta acts as the source of truth, synchronizing updates to both fusekis.


regards,



Lawson Lewis
------------

Infrastructure and Development

KurrawongAI

[email protected]

https://kurrawong.ai


On Tuesday, January 13th, 2026 at 23:37, Andy Seaborne <[email protected]> wrote:

On 11/01/2026 23:25, [email protected] wrote:

Hi Jena community,

With RDF Delta https://github.com/afs/rdf-delta soon to be archived, my team 
and I are trying to understand what the future holds for the RDF patch format 
and the functionality it enables.

Lawson - what is the Kurrawong use case?

Andy

i.e.

- are lots of people using RDF patch logs,
- is no one using it,
- will RDF patch functionality be mainlined into Jena,
- is there some equivalant format/feature coming in Jena v6 that performs the 
same role,
- is there just no interest, and the functionality is not being developed?

Thanks in advance to anyone who can help to answer.

regards,

Lawson Lewis

Infrastructure and Development

KurrawongAI

[emailAddress] [email protected]

[website] https://kurrawong.ai

Reply via email to