Re: Storing a lot of strings in TDB store

Andy Seaborne Sat, 16 Feb 2019 15:19:24 -0800



On 15/02/2019 13:56, Ekaterina Danilova wrote:

I have a dataset describing IT infrastructure. It consists of many
lightweight named graphs (about 15 statements each) describing different
components.
I understand that there is little sense in using RDF if store is used
simply as key-value database, but I have 2 reasons for RDF :
1) It is nice and easy to visualize and see the connections (for this part
URI approach is definitely correct)
2) I am interested in inference and use GenericRuleReasoner with rules to
make different conclusions based on data
So, 2 mostly used parts are Graph Store protocol to access the named graphs
and reasoner to reason over data. The graphs are not supposed to be read
very often, so some loss of performance is acceptable.
I hope this added some clarity.
Right now I am actually using the URI approach but I wanted to find out if
it is the right way. Looks like it is.

The only thing I have to add is that adding triples one by one, or a fewat a time, over HTTP to Fuseki and TDB2 is going to incur a lot ofoverheads. Doing all the additions in one single operation is going tobe significantly faster per triple.


    Andy



On Fri, 15 Feb 2019 at 15:29, ajs6f <aj...@apache.org> wrote:

You are conflating several things here. Jean-Marc is quite right to advise
you to use identifiers and not labels for the entities in your data, up to
some limit that will depend on your resourcing and purposes. If you don't
do that, there is no purpose to using Jena (or RDF at all), because in that
case, you are using it as a kind of very low-performance key-value store.
On the other hand, if you have specific questions about performance, it
would be wise to tell us a great deal more about what you are doing and how
you are doing it.

What is your data like? What pieces of Jena are you using and how? What
queries are you running and how? There are lots of opportunities for
optimization when using a complex framework like Jena.

ajs6f

On Feb 15, 2019, at 8:06 AM, Ekaterina Danilova <

katja.danilov...@gmail.com> wrote:


No , both better in performance, and in the spirit of Sem Web


Hm, the performance when using value as string or URI to resource was

quite

same. On 10 000 examples it was 4.46ms vs 4.44ms. I didn't notice any
difference even when I tested string of 1000 characters length.

But I understood your idea, my issue with performance is just caused by
retrieving more named graphs than one and reasoning over it in order to

get

the actual string value.
So, in the end it is following the Semantic web logic but the extra

actions

cost almost double drop in speed unless I come up with some better idea

of

organizing the dataset.

So, to make it clear - the preferred way is replacing the repeatable

value

with resource URIs and avoiding the strings?

Thanks for the advice

On Fri, 15 Feb 2019 at 14:52, Jean-Marc Vanel <jeanmarc.va...@gmail.com>
wrote:

Le ven. 15 févr. 2019 à 13:45, Ekaterina Danilova <
katja.danilov...@gmail.com> a écrit :

....

I understand that you have a database of Vcard stuff, but one must keep

in

mind that Semantic Web is all about creating links, filling strings is
secondary.

So, does it mean that creating resource is the better attitude in the

sense

of Semantic web but worse in the sense of performance?

No , both better in performance, and in the spirit of Sem Web .

Re: Storing a lot of strings in TDB store

Reply via email to