Hi,
As a community resource, I have collected weekly Wikidata truthy dumps
for nearly a year, sorted them and computed diffs between those sorted
dumps.
The diffs are bipz2-compressed text rdfpatch files sorted by their payload.
This way, one can just be merge sort them using `sort -m -s -k2`.
Explanation: -m=merge, -s=stable (order of patch application matters),
-k2=2nd field - which is the quad payload; first field is the A/D flag
for added/deleted.
The simple formula is: Sorted dump + sorted patch → a sorted dump that
can be patched again.
A truthy dump is 40GB - half a year of patches is less than 4GB/10% of that.
Datasets:
https://huggingface.co/datasets/Aklakan/wikidata-sorted-nquads-and-diffs/tree/main/truthy-BETA/2026/diffs
Posix-based patch tool: https://github.com/Scaseco/nqpatch-posix
I also just saw there is https://github.com/apache/jena/issues/3835
which seems to indicate there is effort into a similar direction.
I will upload sha1 checksum metadata for these files in the coming days,
that should make it even more useful because it
lets you build a dependency graph of the dumps and diffs in hash space.
Cheers,
Claus
On 1/15/26 01:56, [email protected] wrote:
We use it with the rocksdb datastore to keep two Fuseki instances in sync for
High availability.
We have it setup so that RDF updates flow into rdf-delta via a message broker
(azure service bus).
rdf-delta acts as the source of truth, synchronizing updates to both fusekis.
regards,
Lawson Lewis
------------
Infrastructure and Development
KurrawongAI
[email protected]
https://kurrawong.ai
On Tuesday, January 13th, 2026 at 23:37, Andy Seaborne <[email protected]> wrote:
On 11/01/2026 23:25, [email protected] wrote:
Hi Jena community,
With RDF Delta https://github.com/afs/rdf-delta soon to be archived, my team
and I are trying to understand what the future holds for the RDF patch format
and the functionality it enables.
Lawson - what is the Kurrawong use case?
Andy
i.e.
- are lots of people using RDF patch logs,
- is no one using it,
- will RDF patch functionality be mainlined into Jena,
- is there some equivalant format/feature coming in Jena v6 that performs the
same role,
- is there just no interest, and the functionality is not being developed?
Thanks in advance to anyone who can help to answer.
regards,
Lawson Lewis
Infrastructure and Development
KurrawongAI
[emailAddress] [email protected]
[website] https://kurrawong.ai