Hi Andy, thanks for your answers. So would it be feasible to add/delete triples in an existing database?
Thanks, Alexandra On Tue, Mar 29, 2016 at 9:58 AM, Andy Seaborne <a...@apache.org> wrote: > On 21/03/16 13:35, Alexandra Kokkinaki wrote: > >> Hi Andy, thanks for your answers. >> >> >> On Fri, Mar 18, 2016 at 11:43 AM, Andy Seaborne <a...@apache.org> wrote: >> >> Hi, >>> >>> it will depend on usage patterns. 2* 500 million isn't unreasonable but >>> validating with your expected usage is essential. >>> The critical factors are the usage patterns and the hardware available. >>> Number of queries, query complexity, number of updates, all matter. RAM >>> is >>> good (which is true for any database) as are SSDs if you do lots of >>> update >>> or need fast startup from cold. >>> >>> What kind of usage patterns are considered not valid for big triple >> stores. >> We are planning to use our fuseki server to allow, machine to machine >> communication and also allow independent users to express mostly spatial >> queries We plan to do indexing and have a query time out too. Is that >> enough to address performance issues? >> > > They are a good idea. It will protect the server. > > It is possible to write SPARQL queries which are fundamentally expensive. > > The TDB will need to get updated daily, using jena API, since I suppose >> deleting and inserting everything back would take a long time. I read in ( >> >> https://lists.w3.org/Archives/Public/public-sparql-dev/2008JulSep/0029.html >> ) that it takes 5370secs for 100M triples to be loaded in TDB, which is >> good. >> But here <https://www.w3.org/wiki/LargeTripleStores> it is said that it >> took 36 hours to load 1.7B triples in TDB >> > > ... in 2008 ... with a spinning disk. > > 12k triples/s would be a bit slow nowadays. > > At large scale tdbloader2 can be faster that tdbloader. You have to try > with your data on your hardware - it isn't a simple yes/no question > unfortunately. > > tdbloader2 only loads from empty. > > tdbloader does not do anything special when loading a partial database. > > , which drives me towards the >> daily updates rather than daily delete and insert. >> How long would a 500 triple DB take to be loaded in an empty database? >> > > 500M? > > Just run > > tdbloader --loc DB <data> and see what rate you get - I'd be interested in > seeing the log. Every data set, every hardware set can be different. > That's why it is hard to make any accurate predications - just try it. > > tdbloader --loc=DB <the_data> > > The pattern of the data makes a difference - LUBM loads very fast as it > has a high triples to nodes ratio so less bytes are being loaded. All > triple stores report better figures on that data - a factor of x2 faster is > common - but it's not typical data. > > Andy > > > Multiple requests, whether same service or different service, are >>> competing for the same machine resources. Fuseki runs requests >>> independently and in parallel. There are per-database transactions >>> supporting multiple, truly parallel readers. >>> >>> >> Andy >>> >> >> >> Many thanks, >> >> Alexandra >> >> >>> >>> On 18/03/16 09:35, Alexandra Kokkinaki wrote: >>> >>> Hi, >>>> >>>> after researching on TDB performance with Big Data, I would still like >>>> to >>>> know: >>>> We have one fuseki server exposing 2 sparql endpoints (2million triples >>>> each) as data services. We are planning to add one more, but with Big >>>> data >>>> >>>> 500Million triples >>>>> >>>>> >>>> - For big data is it better to use many installations of fuseki >>>> server >>>> or >>>> - many data services under the same Fuseki server? >>>> >>>> >>>> Could fuseki cope with two or more services with more than 500 Million >>>> triples each? >>>> >>>> >>> >>> >>> How does Fuseki cope when it has to serve concurrent queries to the >>>> different data services? >>>> >>>> >>> >>> >>> Many thanks, >>>> >>>> Alexandra >>>> >>>> >>>> >>> >> >