Re: How to increase performance

George News Tue, 24 Oct 2017 08:24:50 -0700

On 2017-10-24 16:56, Lorenz Buehmann wrote:
> Hi,
> 
> minor comments inline:
> 
> 
> On 23.10.2017 22:44, George News wrote:
>> Hi Rob,
>>
>> Thanks for your really helpful comments.
>>
>> As you mention the stack trace is not the one of the query. I actually
>> don't have the query that originated that stack trace as this was in
>> production and I was not logging every query.
>>
>> My idea with the previous email was to see if we can understand what
>> causes the memory issue by analysing the stack trace and, then with the
>> examples examples trying to understand how I can speed up the system.
>>
>> 1) Stack trace
>> I almost reached the same conclusion as you. In this sense I have tried
>> to implement HTTP streaming on my server and directly write to the
>> resultset to an outputstream. Although in time the performance is quite
>> similar to generate a full string with the whole serialize resultset, I
>> guess in terms of memory consumption it should be better (I don't know
>> how to measure the memory being used in realtime).
> There are a lots of Java profilers that could be used.
>>
>> 2) Inference
>> The SPARQL example and the dataset were attached to try to understand if
>> inference could be the problem from such a big delay in getting the
>> responses. If you consider the format of data and query is normal, then
>> we are working well.
>> We have made some test with and without inference and as expected the
>> difference in performance is notice. However the results are not the
>> expected, as the rdf:type subclass is not consider and some of the
>> resources are not properly identified.
>>
>> In this sense, for instance considering a resource data like:
>>
>> ...
>> {
>>     "@id" :
>> "http://virtual-testbed.org/1/R3081091219.temperature-ambient-sensor-0.quantity";,
>>     "@type" : "http://purl.org/iot/vocab/m3-lite#AirTemperature";
>>   },
>> ...
>>
>> We have modified it to :
>> ...
>> {
>>     "@id" :
>> "http://virtual-testbed.org/1/R3081091219.temperature-ambient-sensor-0.quantity";,
>>     "@type" : [ "http://purl.org/iot/vocab/m3-lite#AirTemperature";,
>>                 "http://purl.oclc.org/NET/ssnx/ssn#SensingDevice";
>>               ]
>>   },
>> ...
>>
>> This way we can ask for SensingDevice and for AirTemperature. But
>> considering the ontology AirTemperature is a subclass SensingDevice, and
>> taking into account that the ontology model is not used in the query
>> model, how can I infer the subClassOf? Do I have to manually include
>> "AirTemperature rdf:subClassOf SensingDevice" in my resource
>> description? Isn't that same as including the ontology model merged with
>> the data model (for instance by using a union) when launching the select
>> query?
> Given that your model doesn't change during time, you could materialize
> some/all inferences and write them back to TDB. Depending on which kind
> of inferences you really need, this could even be done by some SPARQL
> Update queries.
> Indeed, this would increase the number of triples in your dataset, thus,
> it would consume more disk space. On the other hand, inference has not
> to be done during query time, i.e. querying should be faster.
>


I'm now having a deeper look at [1] in order to understand how to
extract inference triples. This way I will include them in the models I
stored in the TDB and later on instruct the users of the system in order
to properly query.

Hope that works ;)
Jorge

[1]: https://jena.apache.org/documentation/inference/





> 
> Cheers,
> 
> Lorenz
>>
>> Again I have to really thanks you guys. So great this list and the
>> help you provide. Hope sometime I can pay it back.
>>
>> Regards,
>> Jorge
>>
>>
>>
>> On 2017-10-23 17:33, Rob Vesse wrote:
>>> Note that attachments are generally stripped from Apache mailing lists, it 
>>> is usually better to just cut and paste inline
>>>
>>> Since you CC’d me directly I did get the attachments, the most interesting 
>>> of which is the stack trace including here for the benefit of the rest of 
>>> the list with some analysis interspersed
>>>
>>> default task-38
>>>   at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
>>>   at 
>>> org.apache.jena.ext.com.google.common.cache.LocalCache$Strength$1.referenceValue(Lorg/apache/jena/ext/com/google/common/cache/LocalCache$Segment;Lorg/apache/jena/ext/com/google/common/cache/LocalCache$ReferenceEntry;Ljava/lang/Object;I)Lorg/apache/jena/ext/com/google/common/cache/LocalCache$ValueReference;
>>>  (LocalCache.java:382)
>>>   at 
>>>
>>> As I expected it does appear to be in the cache where the stack trace 
>>> originates, however since this is the standard Google Guava cache that is 
>>> happily used in many production installations across many different 
>>> companies across many different projects far beyond Jena I would suspect 
>>> that is actually something else that is the root cause.
>>>
>>> org.apache.jena.ext.com.google.common.cache.LocalCache$Segment.setValue(Lorg/apache/jena/ext/com/google/common/cache/LocalCache$ReferenceEntry;Ljava/lang/Object;Ljava/lang/Object;J)V
>>>  (LocalCache.java:2165)
>>>   at 
>>> org.apache.jena.ext.com.google.common.cache.LocalCache$Segment.put(Ljava/lang/Object;ILjava/lang/Object;Z)Ljava/lang/Object;
>>>  (LocalCache.java:2883)
>>>   at 
>>> org.apache.jena.ext.com.google.common.cache.LocalCache.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
>>>  (LocalCache.java:4149)
>>>   at 
>>> org.apache.jena.ext.com.google.common.cache.LocalCache$LocalManualCache.put(Ljava/lang/Object;Ljava/lang/Object;)V
>>>  (LocalCache.java:4754)
>>>   at 
>>> org.apache.jena.atlas.lib.cache.CacheGuava.put(Ljava/lang/Object;Ljava/lang/Object;)V
>>>  (CacheGuava.java:76)
>>>   at 
>>> org.apache.jena.tdb.store.nodetable.NodeTableCache.cacheUpdate(Lorg/apache/jena/graph/Node;Lorg/apache/jena/tdb/store/NodeId;)V
>>>  (NodeTableCache.java:207)
>>>   at 
>>> org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>>  (NodeTableCache.java:129)
>>>   at 
>>> org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>>  (NodeTableCache.java:82)
>>>   at 
>>> org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>>  (NodeTableWrapper.java:50)
>>>   at 
>>> org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>>  (NodeTableInline.java:67)
>>>   at 
>>> org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>>  (NodeTableWrapper.java:50)
>>>   at 
>>> org.apache.jena.tdb.solver.BindingTDB.get1(Lorg/apache/jena/sparql/core/Var;)Lorg/apache/jena/graph/Node;
>>>  (BindingTDB.java:122)
>>>   at 
>>> org.apache.jena.sparql.engine.binding.BindingBase.get(Lorg/apache/jena/sparql/core/Var;)Lorg/apache/jena/graph/Node;
>>>  (BindingBase.java:121)
>>>   at 
>>> org.apache.jena.sparql.expr.ExprLib.evalOrElse(Lorg/apache/jena/sparql/expr/Expr;Lorg/apache/jena/sparql/engine/binding/Binding;Lorg/apache/jena/sparql/function/FunctionEnv;Lorg/apache/jena/sparql/expr/NodeValue;)Lorg/apache/jena/sparql/expr/NodeValue;
>>>  (ExprLib.java:70)
>>>   at 
>>> org.apache.jena.sparql.expr.ExprLib.evalOrNull(Lorg/apache/jena/sparql/expr/Expr;Lorg/apache/jena/sparql/engine/binding/Binding;Lorg/apache/jena/sparql/function/FunctionEnv;)Lorg/apache/jena/sparql/expr/NodeValue;
>>>  (ExprLib.java:38)
>>>   at 
>>> org.apache.jena.sparql.expr.aggregate.AccumulatorExpr.accumulate(Lorg/apache/jena/sparql/engine/binding/Binding;Lorg/apache/jena/sparql/function/FunctionEnv;)V
>>>  (AccumulatorExpr.java:50)
>>>   at 
>>> org.apache.jena.sparql.engine.iterator.QueryIterGroup$1.initializeIterator()Ljava/util/Iterator;
>>>  (QueryIterGroup.java:111)
>>>
>>> This shows that you are using grouping, this wasn’t in the example query 
>>> you sent so this is not a stack trace associated with that specific query.
>>>
>>>   at org.apache.jena.atlas.iterator.IteratorDelayedInitialization.init()V 
>>> (IteratorDelayedInitialization.java:40)
>>>   at 
>>> org.apache.jena.atlas.iterator.IteratorDelayedInitialization.hasNext()Z 
>>> (IteratorDelayedInitialization.java:50)
>>>   at 
>>> org.apache.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding()Z
>>>  (QueryIterPlainWrapper.java:53)
>>>   at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z 
>>> (QueryIteratorBase.java:114)
>>>   at 
>>> org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding()Z
>>>  (QueryIterProcessBinding.java:66)
>>>   at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z 
>>> (QueryIteratorBase.java:114)
>>>   at 
>>> org.apache.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding()Z 
>>> (QueryIterConvert.java:58)
>>>   at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z 
>>> (QueryIteratorBase.java:114)
>>>   at 
>>> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding()Z
>>>  (QueryIteratorWrapper.java:39)
>>>   at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z 
>>> (QueryIteratorBase.java:114)
>>>   at 
>>> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding()Z
>>>  (QueryIteratorWrapper.java:39)
>>>   at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z 
>>> (QueryIteratorBase.java:114)
>>>   at org.apache.jena.sparql.engine.ResultSetStream.hasNext()Z 
>>> (ResultSetStream.java:74)
>>>   at org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext()Z 
>>> (ResultSetCheckCondition.java:55)
>>>   at 
>>> es.semantic.project.serialize.ResultSetSerializer.asJSON()Ljava/lang/String;
>>>  (ResultSetSerializer.java:161)
>>>
>>> This looks suspect to me, this is a method from your own code that produces 
>>> a JSON string that encodes a result set. Depending on the results set the 
>>> string could be very large and occupy a large portion of memory. Or equally 
>>> if you are handling many requests in parallel storing many otherwise 
>>> reasonably sized strings could exhaust memory.
>>>
>>> The general practice is to stream results directly back to the client, the 
>>> details of how you do that will be specific to the framework you are using. 
>>> This avoids buffering the entire results in memory which will also explain 
>>> some of your perceived performance issues because your users are always 
>>> forced to wait for the complete results set to be calculated before getting 
>>> any response
>>>
>>>   at 
>>> es.semantic.project.serialize.Serializer.writeAs(Ljava/lang/String;)Ljava/lang/String;
>>>  (Serializer.java:98)
>>>   at 
>>> es.semantic.project.serialize.Serializer.writeAs(Ljavax/ws/rs/core/MediaType;)Ljava/lang/String;
>>>  (Serializer.java:69)
>>>   at 
>>> es.semantic.project.rest.QueryRestService.executeQuery(Les/semantic/project/storage/sql/SparqlQuery$Scope;Ljava/lang/String;Ljavax/ws/rs/core/Request;)Ljavax/ws/rs/core/Response;
>>>  (QueryRestService.java:340)
>>>   at 
>>> es.semantic.project.rest.QueryRestService$Proxy$_$$_WeldClientProxy.executeQuery(Les/semantic/project/storage/sql/SparqlQuery$Scope;Ljava/lang/String;Ljavax/ws/rs/core/Request;)Ljavax/ws/rs/core/Response;
>>>  (Unknown Source)
>>>   at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>>>  (Native Method)
>>>   at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>>>  (NativeMethodAccessorImpl.java:62)
>>>   at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>>>  (DelegatingMethodAccessorImpl.java:43)
>>>   at 
>>> java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>>>  (Method.java:498)
>>>   at 
>>> org.jboss.resteasy.core.MethodInjectorImpl.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Ljava/lang/Object;)Ljava/lang/Object;
>>>  (MethodInjectorImpl.java:139)
>>>   at 
>>> org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Ljava/lang/Object;)Lorg/jboss/resteasy/specimpl/BuiltResponse;
>>>  (ResourceMethodInvoker.java:295)
>>>   at 
>>> org.jboss.resteasy.core.ResourceMethodInvoker.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Ljava/lang/Object;)Lorg/jboss/resteasy/specimpl/BuiltResponse;
>>>  (ResourceMethodInvoker.java:249)
>>>   at 
>>> org.jboss.resteasy.core.ResourceMethodInvoker.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;)Lorg/jboss/resteasy/specimpl/BuiltResponse;
>>>  (ResourceMethodInvoker.java:236)
>>>   at 
>>> org.jboss.resteasy.core.SynchronousDispatcher.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Lorg/jboss/resteasy/core/ResourceInvoker;)V
>>>  (SynchronousDispatcher.java:395)
>>>   at 
>>> org.jboss.resteasy.core.SynchronousDispatcher.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;)V
>>>  (SynchronousDispatcher.java:202)
>>>   at 
>>> org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(Ljava/lang/String;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;Z)V
>>>  (ServletContainerDispatcher.java:221)
>>>   at 
>>> org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(Ljava/lang/String;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
>>>  (HttpServletDispatcher.java:56)
>>>   at 
>>> org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
>>>  (HttpServletDispatcher.java:51)
>>>   at 
>>> javax.servlet.http.HttpServlet.service(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
>>>  (HttpServlet.java:790)
>>>   at 
>>> io.undertow.servlet.handlers.ServletHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (ServletHandler.java:85)
>>>   at 
>>> io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
>>>  (FilterHandler.java:129)
>>>   at 
>>> org.apache.logging.log4j.web.Log4jServletFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V
>>>  (Log4jServletFilter.java:71)
>>>   at 
>>> io.undertow.servlet.core.ManagedFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V
>>>  (ManagedFilter.java:60)
>>>   at 
>>> io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
>>>  (FilterHandler.java:131)
>>>   at 
>>> io.undertow.servlet.handlers.FilterHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (FilterHandler.java:84)
>>>   at 
>>> io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (ServletSecurityRoleHandler.java:62)
>>>   at 
>>> io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (ServletDispatchingHandler.java:36)
>>>   at 
>>> org.wildfly.extension.undertow.security.SecurityContextAssociationHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (SecurityContextAssociationHandler.java:78)
>>>   at 
>>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (PredicateHandler.java:43)
>>>   at 
>>> io.undertow.servlet.handlers.security.SSLInformationAssociationHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (SSLInformationAssociationHandler.java:131)
>>>   at 
>>> io.undertow.servlet.handlers.security.ServletAuthenticationCallHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (ServletAuthenticationCallHandler.java:57)
>>>   at 
>>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (PredicateHandler.java:43)
>>>   at 
>>> io.undertow.security.handlers.AbstractConfidentialityHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (AbstractConfidentialityHandler.java:46)
>>>   at 
>>> io.undertow.servlet.handlers.security.ServletConfidentialityConstraintHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (ServletConfidentialityConstraintHandler.java:64)
>>>   at 
>>> io.undertow.security.handlers.AuthenticationMechanismsHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (AuthenticationMechanismsHandler.java:60)
>>>   at 
>>> io.undertow.servlet.handlers.security.CachedAuthenticatedSessionHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (CachedAuthenticatedSessionHandler.java:77)
>>>   at 
>>> io.undertow.security.handlers.NotificationReceiverHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (NotificationReceiverHandler.java:50)
>>>   at 
>>> io.undertow.security.handlers.AbstractSecurityContextAssociationHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (AbstractSecurityContextAssociationHandler.java:43)
>>>   at 
>>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (PredicateHandler.java:43)
>>>   at 
>>> org.wildfly.extension.undertow.security.jacc.JACCContextIdHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (JACCContextIdHandler.java:61)
>>>   at 
>>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (PredicateHandler.java:43)
>>>   at 
>>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (PredicateHandler.java:43)
>>>   at 
>>> io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(Lio/undertow/server/HttpServerExchange;Lio/undertow/servlet/handlers/ServletChain;Lio/undertow/servlet/handlers/ServletRequestContext;Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
>>>  (ServletInitialHandler.java:284)
>>>   at 
>>> io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(Lio/undertow/server/HttpServerExchange;Lio/undertow/servlet/handlers/ServletRequestContext;Lio/undertow/servlet/handlers/ServletChain;Ljavax/servlet/DispatcherType;)V
>>>  (ServletInitialHandler.java:263)
>>>   at 
>>> io.undertow.servlet.handlers.ServletInitialHandler.access$000(Lio/undertow/servlet/handlers/ServletInitialHandler;Lio/undertow/server/HttpServerExchange;Lio/undertow/servlet/handlers/ServletRequestContext;Lio/undertow/servlet/handlers/ServletChain;Ljavax/servlet/DispatcherType;)V
>>>  (ServletInitialHandler.java:81)
>>>   at 
>>> io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>>  (ServletInitialHandler.java:174)
>>>   at 
>>> io.undertow.server.Connectors.executeRootHandler(Lio/undertow/server/HttpHandler;Lio/undertow/server/HttpServerExchange;)V
>>>  (Connectors.java:202)
>>>   at io.undertow.server.HttpServerExchange$1.run()V 
>>> (HttpServerExchange.java:793)
>>>   at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>>>  (ThreadPoolExecutor.java:1142)
>>>   at java.util.concurrent.ThreadPoolExecutor$Worker.run()V 
>>> (ThreadPoolExecutor.java:617)
>>>   at java.lang.Thread.run()V (Thread.java:745)
>>>
>>> Nothing looks obviously wrong with either your query or your sample data 
>>> though as I noted you didn’t provide the exact query that triggered this 
>>> particular stack trace
>>>
>>> Rob
>>>
>>> On 23/10/2017 15:59, "George News" <george.n...@gmx.net> wrote:
>>>
>>>     On 2017-10-11 15:47, Rob Vesse wrote:
>>>     > Comments inline:
>>>     > 
>>>     > On 11/10/2017 11:57, "George News" <george.n...@gmx.net> wrote:
>>>     > 
>>>     >     Hi all,
>>>     >     
>>>     >     The project I'm working in currently has a TDB with approximately 
>>> 100M
>>>     >     triplets and the size is increasing quite quickly. When I make a 
>>> typical
>>>     >     SPARQL query for getting data from the system, it takes ages, 
>>> sometimes
>>>     >     more than 10-20 minutes. I think performance wise this is not 
>>> really
>>>     >     user friendly. Therefore I need to know how I can increase the 
>>> speed, etc.
>>>     
>>>     We have made a tdbdump of the current TDB and the size for the figures
>>>     we pointed out are about 70GB in RDF/xml format.
>>>     
>>>     >     I'm running the whole system on a machine with Intel Xeon E312xx 
>>> with
>>>     >     32Gb RAM and many times I'm getting OutofMemory Exceptions and the
>>>     >     google.cache that Jena handles is the one that seems to be 
>>> causing the
>>>     >     problem.
>>>     > 
>>>     >  Specifics stack traces would be useful to understand where the cache 
>>> is being exploded. Certain kinds of query may use the cache more heavily 
>>> than others so some elaboration on the general construction of queries 
>>> would be interesting.
>>>     
>>>     Find ExceptionStackTrace.txt file as an example. Most of the times the
>>>     error is quite similar.
>>>     
>>>     >     
>>>     >     Are the figures I'm pointing normal (machine specs, response time,
>>>     >     etc.)? Is it too big/too small?
>>>     > 
>>>     >  The size of the data seems small relative to the size of the 
>>> machine. You don’t specify whether you change the JVM heap size, most 
>>> memory usage in TDB is off-heap via memory mapped files so setting too 
>>> large a heap can negatively impact performance.
>>>     > 
>>>     >  The response times seems very poor but that may be the nature of 
>>> your queries and data structure, however since you are unable to show those 
>>> we can only provide generalisations
>>>     >
>>>     >     For the moment, we have decided to split the graph in pieces, 
>>> that is,
>>>     >     generating a new named graph every now and then so the amount of
>>>     >     information stored in a "current" graph is smaller. Then 
>>> restricting the
>>>     >     query to a set of graphs things work better.
>>>     >     
>>>     >     Although this solution works, when we merge the graphs for 
>>> historical
>>>     >     queries, we are facing the same problem as before. Then, how can 
>>> we
>>>     >     increased the speed?
>>>     >     
>>>     >     I cannot disclosed the dataset or part of it, but I will try to 
>>> somehow
>>>     >     explain it.
>>>     >     
>>>     >     - Ids for entities are approximately 255 random ASCII characters. 
>>> Does
>>>     >     the size of the ids affect the speed of the SPARQL queries? If 
>>> yes, can
>>>     >     I apply a Lucene index to the IDs in order to reduce the query 
>>> time?
>>>     > 
>>>     >  It depends on the nature of the query. All terms are mapped into 
>>> 64-bit internal identifiers, these are only mapped back to the original 
>>> terms as and when that query engine and/or results serialisation requires 
>>> it.  A cache is used to speed up the mapping in both directions so 
>>> depending on the nature of the queries and your system loads you may be 
>>> thrashing this cache.
>>>     >     
>>>     >     - The depth level of the graph or the information relationship is 
>>> around
>>>     >     7-8 level at most, but most of the times it is required to link 
>>> 3-4 levels.
>>>     > 
>>>     >   Difficult to say how this impacts performance because it really 
>>> depends on how you are querying that structure
>>>     >     
>>>     >     - Most of the queries include several:
>>>     >     ?x myont:hasattribute ?b.
>>>     >     ?a rdf:type ?b.
>>>     >     
>>>     >     Therefore checking the class and subclasses of entities. Is there 
>>> anyway
>>>     >     to speed up the inference as if I'm asking for the parent class I 
>>> will
>>>     >     get also the children ones defined in my ontology.
>>>     > 
>>>     > So are you actively using inference? If you are then that will 
>>> significantly degrade performance because the inference closure is done 
>>> entirely in memory i.e. not in TDB if inference is turned on and you will 
>>> get minimal performance benefit from using TDB.
>>>     > 
>>>     >  If you only need simple inference like class and property hierarchy 
>>> you may be better served by asserting those statically using SPARQL updates 
>>> and not using dynamic inference
>>>     
>>>     sorry for the delay on providing examples of data and SPARQL queries we
>>>     usually make.
>>>     
>>>     The data we are using is following the ontology that is publicly
>>>     available at [1].
>>>     
>>>     Using this ontology, a sample of a semantic document in JSON-LD format
>>>     can be found in the files attached (Observation.jsonld and
>>>     Resource.jsonld). These individuals are stored in a TDB using JENA, and
>>>     are stored in different graphs that can be merged within the code using
>>>     MultiUnion in order to make queries. Then we are requesting for data by
>>>     using SPARQL Select queries with quite a lot of inference required
>>>     (SPARQL.txt).
>>>     
>>>     As you suggested, we have made some tries including more properties
>>>     (mainly rdf:type) for the individual descriptions in order to disable
>>>     inference from the requests. For instance, whenever I register a
>>>     m3-lite#AirThermometer I'm always including it is also a
>>>     ssn#SensingDevice. This way this device can be easily discovered by its
>>>     more descriptive name and/or by its generic one.
>>>     
>>>     However, the results are not the expected using the same SPARQL
>>>     sentences and I have to create specific SPARQL queries to properly
>>>     discover the data. Is this the way you suggested to work? Should then I
>>>     inform the users of our system about the way we are registering data?
>>>     
>>>     [1]: http://ontology.fiesta-iot.eu/ontologyDocs/fiesta-iot/doc
>>>     
>>>     
>>>     >     - I know the "." in a query acts as more or less like an AND 
>>> logical
>>>     >     operation. Does the order of sentences have implications in the
>>>     >     performance? Should I start with the most restrictive ones? 
>>> Should I
>>>     >     start with the simplest ones, i.e. checking number values, etc.?
>>>     > 
>>>     >  yes and no.  TDB Will attempt to do the necessary scans in an 
>>> optimal order based on its knowledge of the statistics of the data. However 
>>> this only applies within a single query pattern i.e. { } so depending on 
>>> the structure of your query you may need to do some manual reordering. Also 
>>> if inference is involved then that may interact.
>>>     >     
>>>     >     - Some of the queries uses spatial and time filtering? Is is worth
>>>     >     implementing the support for spatial searches with SPARQL? Is 
>>> there any
>>>     >     kind of index for time searches?
>>>     > 
>>>     >  There is a geospatial indexing extension but there is no temporal 
>>> indexing provided by Jena.
>>>     
>>>     As you can see from the Resource.jsonld, we are using location. Do you
>>>     think indexing will help on locating the individuals?
>>>     >     
>>>     >     Any help is more than welcome.
>>>     > 
>>>     >  Without more detail it is difficult to provide more detailed help.
>>>     > 
>>>     > Rob
>>>     >     
>>>     >     Regards,
>>>     >     Jorge
>>>     >     
>>>     > 
>>>     > 
>>>     > 
>>>     > 
>>>     > 
>>>     
>>>
>>>
>>>
>>>
>>>
> 
>

Re: How to increase performance

Reply via email to