Re: Is there any way to keep same size between real data and TDB

Andy Seaborne Mon, 10 Dec 2018 02:26:12 -0800


On 10/12/2018 04:24, Lee, Seokju | Daniel | TPDD wrote:

Hi Andy,

Thanks for the reply.

The in-memory dataset described above is fully transactional

Interesting that I didn't know that it is different from using TDB even I used 
to use only for test purpose because I thought it is the same as TDB.

TDB in -memory if TDB with a location of "--mem--" (assembler) orTDBFactory.createDataset().


The in-memory dataset is DatasetFactory.createTxnMem().


I have another question that is how you can keep this persistent because 
in-memory means if application is crash with whatever the reason is, it would 
be losed.
Am I right?


Yes.

Personally, I reload the data for each test or test suite or run Fuseki(in-process or separately).

(Just for your understanding, to use ramdisk is just for dev and stg 
environment for functional test not production. For prodecution we will use SSD)


For staging? I'd use the same setup as prod.

    Andy


Thanks
Daniel




-----Original Message-----
From: Andy Seaborne <a...@apache.org>
Sent: Friday, December 7, 2018 8:09 PM
To: users@jena.apache.org
Subject: Re: Is there any way to keep same size between real data and TDB



On 07/12/2018 01:03, Lee, Seokju | Daniel | TPDD wrote:

Greetings,

I am using Apache Jena 3.7.0 now and encounter the following issue so I would 
like to know how to solve it.

Background:

    *   We created our own sparql endpoint for using apache jena.
    *   Sometimes we need to clear data store and restore from new ttl file.
    *   For performance, we are using ram disk for TDB instead of SSD with our 
own reason.
    *   We thought that we have big enough memory for TDB

Issue

    *   Our application was just down because of full of ram disk
    *   Back then, we repeated to restore from new ttl files
    *   The number of TTL is 1.5 millions and file size is around 250MB
    *   Ramdisk size is 4 GB (first time when we restore it, ram disk used 
under 1 GB)


Have you considered using an in-memory Jena graph?

<#dataset> rdf:type ja:MemoryDataset;
     ja:data "data.trig";
.


Investigating

    *   I think nodes.dat is real data and looks like SPO.dat, POS.dat, OSP.dat 
looked didn't remove old data that I removed in my application.

Question

    *   Is there any way to keep same size between real data and TDB?


ja:MemoryDataset

    *   We are using for removal with "Model.removeAll()" and "TDB.sync()"


TDB.sync() is not necessary when using transactions.  And not using 
transactions is not a good idea for a SPARQL endpoint. TDB.sync is legacy and 
for older single-threaded applications.

The in-memory dataset described above is fully transactional (Serialization 
level isolation), uses the heap for storage so only uses what is needed and 
deleted data gets garbage collected.

(TDB2 has a compaction operation but does mean there are times where there are 
two copies of the database.)

      Andy


Thanks
Daniel

Re: Is there any way to keep same size between real data and TDB

Reply via email to