Ok, well compacting by using dump is not a problem.

Another question related to backup: when backing up disk image and all jena data files as is, possible data corruption is backed up too without warning. But if exporting data with Fuseki's built-in backup and saving that, does Fuseki give error when database is corrupted? So in that case the previous backed up data dump could be restored. I guess an error message from trying to dump corrupted database is the requirement for making it more useful than standard image backup.



On 14.6.2018 19:48, Andy Seaborne wrote:
Inside a TDB2 directory, you'll see "Data-0001". That's the first database.  TDB2 has a "compact" operation which would create "Data-0002" etc and after that Data-0001 is not used and never touched. Delete or archive as you choose.

It's simple at the moment - a copy of the database so much like backup-restore except it can happen to a running database (writers are locked out, readers can continue until the switchover point). Plenty of scope to make more efficient.

Compaction is not available from Fuseki yet.

    Andy

On 14/06/18 15:11, ajs6f wrote:
Yes, there is some truth in that.

TDB1 uses a dictionary that maps node IDs to node labels (so that, e.g. a literal that is used as an object doesn't need to be in-line recorded in the indexes, which could quickly bloat the indexes). That dictionary isn't "garbage collected", so part of what you are seeing may be the absence of mappings that aren't in use. Andy can say more about what might be happening with the indexes themselves or how this does or doesn't apply to TDB2.

ajs6f

On Jun 14, 2018, at 10:00 AM, Mikael Pesonen <mikael.peso...@lingsoft.fi> wrote:


Just managed to load using tdbloader2, it even reads the gz file directly. Noticed that new database size on disk is quite a bit smaller:

payload size: 2.8Gt
old size on disk: 21Gt
new size on disk: 3Gt

So it seems that its good to do cleanup of the db every now and then using the backup?



On 14.6.2018 16:55, ajs6f wrote:
That dataset is just an NQuads file. You can stick it into Fuseki as you would do with any other NQuads file. You can certainly use tdbloader2, or you can script individual graph loads using GSP. tdbloader2 will produce an optimal set of indexes.

ajs6f

On Jun 14, 2018, at 7:55 AM, Mikael Pesonen <mikael.peso...@lingsoft.fi> wrote:

Hi,

made backup using Fuseki HTTP Administration Protocol: ds_2018-06-14_14-43-32.nq.gz

How do I restore it in Linux? Empty existing data and use tdbloader2? How exactly?

Thank you

--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND



--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Reply via email to