Hi Tim,
Some context for our readers: gTDB and xTDB are different ways of using
TDB. Last I heard, it was TDB1, not that makes very much difference here.
gTDB - one graph stored in the default graph of a TDB database.
Many graphs, many databases.
xTDB - single, shared TDB database with graphs stored as named graphs.
On 08/06/2020 14:33, Tim Flicker wrote:
Hi Jena Community,
I'm working on an auto data backup and restore feature for our platform
which uses Jena for data access (gTDB and xTDB). The requirement is to
have the application up during the backup operation although it can be
taken down for restore. I've been looking into the tdbbackup and
tdbloader scripts that come packaged with Jena.
Only one process can access a TDB database at a time so when the server
is running, only it can use the database.
Live backup of databases has to be done by the server - that's what is
done by the backup servlet [1].
The backup could be written to local disk or delivered over HTTP.
curl -v -XPOST 'http://localhost:8080/tbl/backup?storage=xdb' --output
test.trig
writes the entire database as a single TriG file.
This backup is a single transactional snapshot of the database so the
data is consistent even if changes are also being made.
gTDB is harder because the graphs are in many databases. There isn't an
easy way to backup all the graph at once without additional code to take
a lock or transaction on each database - that's something outside of TDB.
Restore is either built separately, stop the server and install or write
to the database from inside a running server.
That is for the TDB databases - in your situation, there are also the
configuration in disk files (connector files) that go with the graphs -
they aren't in TDB so these backup procedures aren't going to be enough
on their own. It is a problem if graphs have been added or deleted from
the system between backup and restore.
Is it safe to run the tdbbackup script while graphs are being access
either read or write?
No.
In fact, it should refuse to do it.
An alternative approach would be to
programmatically place lock files which seems like it would put the
system in "read only" mode.
The TDB lock files control exclusive access, not a read/write mode.
Once the lock files are in place, I could do
a backup at the file system level then remove the locks once complete.
Parallel operation with the server is possible with transactions,
including having writers while a TDB backup is taken - the backup will
not see the changes, only the data in the database as it was when teh
backup started.
Any advice on how to proceed with this operation safely is greatly
appreciated.
Regards,
Tim
Hope that helps,
Andy
[1]
https://doc.topquadrant.com/6.3/backup-and-restore/#Live_Data_Backup_of_a_Shared_Graph_TDB