Re: Storing a lot of strings in TDB store

Ekaterina Danilova Fri, 15 Feb 2019 04:46:53 -0800

Thanks for pointing out the issue with New York. However, this is just the
test data which I made for an example, Vcard was just easy choice. My
actual database is not about Vcard and consists of self-made properties
created with smth like this:
public PropertyImpl( String uri )


The idea of my application is storing data and using the reasoning features
over it. It will have a lot of smaller graphs which might have quite a lot
of repeating String data so I am really interested what might be good for
the performance.

I understand that you have a database of Vcard stuff, but one must keep in
> mind that Semantic Web is all about creating links, filling strings is
> secondary.
>
So, does it mean that creating resource is the better attitude in the sense
of Semantic web but worse in the sense of performance?

And is there any information on how TDB2 actually keeps such string data?
Might it be that it actually saves it only once?


On Fri, 15 Feb 2019 at 14:24, Jean-Marc Vanel <jeanmarc.va...@gmail.com>
wrote:

> First this a bad practice:
>
> http://people/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#Region "New
> York" .
>
> You should do
> http://people/JohnSmith, http://www.w3.org/2001/vcard-rdf/3.0#Region
> dbpedia:NewYork .
>
> that is ,
> http://dbpedia.org/resource/New_York
>
> possibly with another object property like
> http://xmlns.com/foaf/0.1/based_near
>
> I understand that you have a database of Vcard stuff, but one must keep in
> mind that Semantic Web is all about creating links, filling strings is
> secondary.
>
>
>
> And then there is no trouble with string at all :) .
>
> Jean-Marc Vanel
> <
> http://163.172.179.125:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
> >
> +33 (0)6 89 16 29 52
> Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
>  Chroniques jardin
> <
> http://semantic-forms.cc:1952/backlinks?q=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle
> >
>
>
> Le ven. 15 févr. 2019 à 13:02, Ekaterina Danilova <
> katja.danilov...@gmail.com> a écrit :
>
> > Hello
> > i would like to ask how TDB2 and Fuseki manages big amounts of string
> data
> > (especially repeating data) and what it the best practices. Does it
> > optimize it somehow? Or is it on us to do some improvements.
> >
> > For example, we have a TDB2 storage which we access via Fuseki and
> example
> > named graph like this:
> > [http://people/JohnSmith, http://www.w3.org/2001/vcard-rdf/3.0#Region,
> > "New
> > York"]
> > [http://people/JohnSmith, http://www.w3.org/2001/vcard-rdf/3.0#Other,
> > "long
> > long string"]
> > [http://people/JohnSmith, http://www.w3.org/2001/vcard-rdf/3.0#NAME,
> "JOHN
> > SMITH"]
> >
> > So, we have JohnSmith person with 2 properties - "Region" and "Other".
> One
> > of them is short string of New York, other is long string.
> > Assume we have 100 000 more people and many of them have same "Region"
> and
> > "other" properties. So, what would be the best approach to storing such
> > data?
> >
> > I created 10 000 more named graphs of people with different names but
> same
> > other properties and tested the performance.
> > First I checked 10 000 cases of reading the graphs like this and the
> > average time was around 4.4 ms (no matter how long are the strings).
> >
> > Other option I considered is making "New York" a resource and storing it
> in
> > "cities" named graph and doing the same thing with "long long string".
> So,
> > the idea is to store the actual string only once.I tested reading the
> > graphs again on 10 000 cases and didn't notice any change in performance.
> > The average load time was still 4.4 ms when instead of "New York" and
> "Long
> > long String" we had resources URIs.
> > However, to get the full data, we need to add the actual resources to our
> > original JohnSmith graph, which adds overhead since we have to get 2 more
> > named graphs. So, it causes quite expectable drop of performance.
> >
> > So, according to my tests the first case (the one described in the graph
> > example) performed the best, but it feels like we are storing too much
> > extra information. So, I still wanted to ask on your opinions to such
> > approach and learn if TDB store makes some inner optimization to the
> data.
> >
>

Re: Storing a lot of strings in TDB store

Reply via email to