Hi Andy, thanks for your answers. So would it be feasible to add/delete
triples in an existing database?

Thanks,

Alexandra

On Tue, Mar 29, 2016 at 9:58 AM, Andy Seaborne <a...@apache.org> wrote:

> On 21/03/16 13:35, Alexandra Kokkinaki wrote:
>
>> Hi Andy, thanks for your answers.
>>
>>
>> On Fri, Mar 18, 2016 at 11:43 AM, Andy Seaborne <a...@apache.org> wrote:
>>
>> Hi,
>>>
>>> it will depend on usage patterns. 2* 500 million isn't unreasonable but
>>> validating with your expected usage is essential.
>>> The critical factors are the usage patterns and the hardware available.
>>> Number of queries, query complexity, number of updates, all matter. RAM
>>> is
>>> good (which is true for any database) as are SSDs if you do lots of
>>> update
>>> or need fast startup from cold.
>>>
>>> What kind of usage patterns are considered not valid for big triple
>> stores.
>> We are planning to use our fuseki server to allow, machine to machine
>> communication and also allow independent users to  express mostly spatial
>> queries We plan to do indexing and have a query time out too. Is that
>> enough to address performance issues?
>>
>
> They are a good idea.  It will protect the server.
>
> It is possible to write SPARQL queries which are fundamentally expensive.
>
> The TDB will need to get updated daily, using jena API, since I suppose
>> deleting and inserting everything back would take a long time. I read in (
>>
>> https://lists.w3.org/Archives/Public/public-sparql-dev/2008JulSep/0029.html
>> ) that it takes 5370secs for 100M triples  to be loaded in TDB, which is
>> good.
>> But here <https://www.w3.org/wiki/LargeTripleStores> it is said that it
>> took 36 hours to load 1.7B triples in TDB
>>
>
> ... in 2008 ... with a spinning disk.
>
> 12k triples/s would be a bit slow nowadays.
>
> At large scale tdbloader2 can be faster that tdbloader. You have to try
> with your data on your hardware - it isn't a simple yes/no question
> unfortunately.
>
> tdbloader2 only loads from empty.
>
> tdbloader does not do anything special when loading a partial database.
>
> , which drives me towards the
>> daily updates rather than daily delete and insert.
>> How long would a 500 triple DB take to be loaded in an empty database?
>>
>
> 500M?
>
> Just run
>
> tdbloader --loc DB <data> and see what rate you get - I'd be interested in
> seeing the log.  Every data set, every hardware set can be different.
> That's why it is hard to make any accurate predications - just try it.
>
> tdbloader --loc=DB <the_data>
>
> The pattern of the data makes a difference - LUBM loads very fast as it
> has a high triples to nodes ratio so less bytes are being loaded.  All
> triple stores report better figures on that data - a factor of x2 faster is
> common - but it's not typical data.
>
>         Andy
>
>
> Multiple requests, whether same service or different service, are
>>> competing for the same machine resources.  Fuseki runs requests
>>> independently and in parallel.  There are per-database transactions
>>> supporting multiple, truly parallel readers.
>>>
>>>
>>      Andy
>>>
>>
>>
>> Many thanks,
>>
>> Alexandra
>>
>>
>>>
>>> On 18/03/16 09:35, Alexandra Kokkinaki wrote:
>>>
>>> Hi,
>>>>
>>>> after researching on TDB performance with Big Data, I would still like
>>>> to
>>>> know:
>>>> We have one fuseki server exposing 2 sparql endpoints (2million triples
>>>> each) as data services. We are planning to add one more, but with Big
>>>> data
>>>>
>>>> 500Million triples
>>>>>
>>>>>
>>>>      - For big data is it better to use many installations of fuseki
>>>> server
>>>>      or
>>>>      - many data services under the same Fuseki server?
>>>>
>>>>
>>>> Could fuseki cope with two or more services with more than  500 Million
>>>> triples each?
>>>>
>>>>
>>>
>>>
>>> How does Fuseki cope when it has to serve concurrent queries to the
>>>> different data services?
>>>>
>>>>
>>>
>>>
>>> Many thanks,
>>>>
>>>> Alexandra
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to