On 13/01/2022 06:33, Aayush Yadav wrote:
Hi,
I just had a query regarding how JENA handles deleted nodes in TDB2? I read
in the documentation and even saw while implementing, that the nodes that
are deleted are not removed from storage until compact is run. So how are
these nodes handled exactly? By handled I mean, how does compact know which
nodes to delete from storage, and how does running select all skip these
triples?
Any insights on this or if anyone could point to which file in the Jena
code too look into, might help.
Thanks,
Aayush.
Triples/quads become unreachable from the current roots of the indexes.
The only nodes to keep are these accessible from triples that are
reachable from the current roots.
Compact performed by copying the current view of the database and
(optionally) throwing away the old one.
The copy is of the current state of the database. If a node isn't
reached when copying, it isn't in the new node table. (Same for
triples/quads.)
There is no reference counting of nodes - too expensive and not simple
because of transactions having different views of the database and may
abort, not commit.
In TDB2, there are subdirectories "Data-0001" etc The highest number
Data* sub-directory is the active one. The rest are no longer used - you
can zip+moved them elsewhere as a record of the database at a point in
time or just delete them.
Andy