On 2017-10-24 16:56, Lorenz Buehmann wrote: > Hi, > > minor comments inline: > > > On 23.10.2017 22:44, George News wrote: >> Hi Rob, >> >> Thanks for your really helpful comments. >> >> As you mention the stack trace is not the one of the query. I actually >> don't have the query that originated that stack trace as this was in >> production and I was not logging every query. >> >> My idea with the previous email was to see if we can understand what >> causes the memory issue by analysing the stack trace and, then with the >> examples examples trying to understand how I can speed up the system. >> >> 1) Stack trace >> I almost reached the same conclusion as you. In this sense I have tried >> to implement HTTP streaming on my server and directly write to the >> resultset to an outputstream. Although in time the performance is quite >> similar to generate a full string with the whole serialize resultset, I >> guess in terms of memory consumption it should be better (I don't know >> how to measure the memory being used in realtime). > There are a lots of Java profilers that could be used. >> >> 2) Inference >> The SPARQL example and the dataset were attached to try to understand if >> inference could be the problem from such a big delay in getting the >> responses. If you consider the format of data and query is normal, then >> we are working well. >> We have made some test with and without inference and as expected the >> difference in performance is notice. However the results are not the >> expected, as the rdf:type subclass is not consider and some of the >> resources are not properly identified. >> >> In this sense, for instance considering a resource data like: >> >> ... >> { >> "@id" : >> "http://virtual-testbed.org/1/R3081091219.temperature-ambient-sensor-0.quantity", >> "@type" : "http://purl.org/iot/vocab/m3-lite#AirTemperature" >> }, >> ... >> >> We have modified it to : >> ... >> { >> "@id" : >> "http://virtual-testbed.org/1/R3081091219.temperature-ambient-sensor-0.quantity", >> "@type" : [ "http://purl.org/iot/vocab/m3-lite#AirTemperature", >> "http://purl.oclc.org/NET/ssnx/ssn#SensingDevice" >> ] >> }, >> ... >> >> This way we can ask for SensingDevice and for AirTemperature. But >> considering the ontology AirTemperature is a subclass SensingDevice, and >> taking into account that the ontology model is not used in the query >> model, how can I infer the subClassOf? Do I have to manually include >> "AirTemperature rdf:subClassOf SensingDevice" in my resource >> description? Isn't that same as including the ontology model merged with >> the data model (for instance by using a union) when launching the select >> query? > Given that your model doesn't change during time, you could materialize > some/all inferences and write them back to TDB. Depending on which kind > of inferences you really need, this could even be done by some SPARQL > Update queries. > Indeed, this would increase the number of triples in your dataset, thus, > it would consume more disk space. On the other hand, inference has not > to be done during query time, i.e. querying should be faster. >
I'm now having a deeper look at [1] in order to understand how to extract inference triples. This way I will include them in the models I stored in the TDB and later on instruct the users of the system in order to properly query. Hope that works ;) Jorge [1]: https://jena.apache.org/documentation/inference/ > > Cheers, > > Lorenz >> >> Again I have to really thanks you guys. So great this list and the >> help you provide. Hope sometime I can pay it back. >> >> Regards, >> Jorge >> >> >> >> On 2017-10-23 17:33, Rob Vesse wrote: >>> Note that attachments are generally stripped from Apache mailing lists, it >>> is usually better to just cut and paste inline >>> >>> Since you CC’d me directly I did get the attachments, the most interesting >>> of which is the stack trace including here for the benefit of the rest of >>> the list with some analysis interspersed >>> >>> default task-38 >>> at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48) >>> at >>> org.apache.jena.ext.com.google.common.cache.LocalCache$Strength$1.referenceValue(Lorg/apache/jena/ext/com/google/common/cache/LocalCache$Segment;Lorg/apache/jena/ext/com/google/common/cache/LocalCache$ReferenceEntry;Ljava/lang/Object;I)Lorg/apache/jena/ext/com/google/common/cache/LocalCache$ValueReference; >>> (LocalCache.java:382) >>> at >>> >>> As I expected it does appear to be in the cache where the stack trace >>> originates, however since this is the standard Google Guava cache that is >>> happily used in many production installations across many different >>> companies across many different projects far beyond Jena I would suspect >>> that is actually something else that is the root cause. >>> >>> org.apache.jena.ext.com.google.common.cache.LocalCache$Segment.setValue(Lorg/apache/jena/ext/com/google/common/cache/LocalCache$ReferenceEntry;Ljava/lang/Object;Ljava/lang/Object;J)V >>> (LocalCache.java:2165) >>> at >>> org.apache.jena.ext.com.google.common.cache.LocalCache$Segment.put(Ljava/lang/Object;ILjava/lang/Object;Z)Ljava/lang/Object; >>> (LocalCache.java:2883) >>> at >>> org.apache.jena.ext.com.google.common.cache.LocalCache.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; >>> (LocalCache.java:4149) >>> at >>> org.apache.jena.ext.com.google.common.cache.LocalCache$LocalManualCache.put(Ljava/lang/Object;Ljava/lang/Object;)V >>> (LocalCache.java:4754) >>> at >>> org.apache.jena.atlas.lib.cache.CacheGuava.put(Ljava/lang/Object;Ljava/lang/Object;)V >>> (CacheGuava.java:76) >>> at >>> org.apache.jena.tdb.store.nodetable.NodeTableCache.cacheUpdate(Lorg/apache/jena/graph/Node;Lorg/apache/jena/tdb/store/NodeId;)V >>> (NodeTableCache.java:207) >>> at >>> org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node; >>> (NodeTableCache.java:129) >>> at >>> org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node; >>> (NodeTableCache.java:82) >>> at >>> org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node; >>> (NodeTableWrapper.java:50) >>> at >>> org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node; >>> (NodeTableInline.java:67) >>> at >>> org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node; >>> (NodeTableWrapper.java:50) >>> at >>> org.apache.jena.tdb.solver.BindingTDB.get1(Lorg/apache/jena/sparql/core/Var;)Lorg/apache/jena/graph/Node; >>> (BindingTDB.java:122) >>> at >>> org.apache.jena.sparql.engine.binding.BindingBase.get(Lorg/apache/jena/sparql/core/Var;)Lorg/apache/jena/graph/Node; >>> (BindingBase.java:121) >>> at >>> org.apache.jena.sparql.expr.ExprLib.evalOrElse(Lorg/apache/jena/sparql/expr/Expr;Lorg/apache/jena/sparql/engine/binding/Binding;Lorg/apache/jena/sparql/function/FunctionEnv;Lorg/apache/jena/sparql/expr/NodeValue;)Lorg/apache/jena/sparql/expr/NodeValue; >>> (ExprLib.java:70) >>> at >>> org.apache.jena.sparql.expr.ExprLib.evalOrNull(Lorg/apache/jena/sparql/expr/Expr;Lorg/apache/jena/sparql/engine/binding/Binding;Lorg/apache/jena/sparql/function/FunctionEnv;)Lorg/apache/jena/sparql/expr/NodeValue; >>> (ExprLib.java:38) >>> at >>> org.apache.jena.sparql.expr.aggregate.AccumulatorExpr.accumulate(Lorg/apache/jena/sparql/engine/binding/Binding;Lorg/apache/jena/sparql/function/FunctionEnv;)V >>> (AccumulatorExpr.java:50) >>> at >>> org.apache.jena.sparql.engine.iterator.QueryIterGroup$1.initializeIterator()Ljava/util/Iterator; >>> (QueryIterGroup.java:111) >>> >>> This shows that you are using grouping, this wasn’t in the example query >>> you sent so this is not a stack trace associated with that specific query. >>> >>> at org.apache.jena.atlas.iterator.IteratorDelayedInitialization.init()V >>> (IteratorDelayedInitialization.java:40) >>> at >>> org.apache.jena.atlas.iterator.IteratorDelayedInitialization.hasNext()Z >>> (IteratorDelayedInitialization.java:50) >>> at >>> org.apache.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding()Z >>> (QueryIterPlainWrapper.java:53) >>> at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z >>> (QueryIteratorBase.java:114) >>> at >>> org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding()Z >>> (QueryIterProcessBinding.java:66) >>> at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z >>> (QueryIteratorBase.java:114) >>> at >>> org.apache.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding()Z >>> (QueryIterConvert.java:58) >>> at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z >>> (QueryIteratorBase.java:114) >>> at >>> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding()Z >>> (QueryIteratorWrapper.java:39) >>> at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z >>> (QueryIteratorBase.java:114) >>> at >>> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding()Z >>> (QueryIteratorWrapper.java:39) >>> at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z >>> (QueryIteratorBase.java:114) >>> at org.apache.jena.sparql.engine.ResultSetStream.hasNext()Z >>> (ResultSetStream.java:74) >>> at org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext()Z >>> (ResultSetCheckCondition.java:55) >>> at >>> es.semantic.project.serialize.ResultSetSerializer.asJSON()Ljava/lang/String; >>> (ResultSetSerializer.java:161) >>> >>> This looks suspect to me, this is a method from your own code that produces >>> a JSON string that encodes a result set. Depending on the results set the >>> string could be very large and occupy a large portion of memory. Or equally >>> if you are handling many requests in parallel storing many otherwise >>> reasonably sized strings could exhaust memory. >>> >>> The general practice is to stream results directly back to the client, the >>> details of how you do that will be specific to the framework you are using. >>> This avoids buffering the entire results in memory which will also explain >>> some of your perceived performance issues because your users are always >>> forced to wait for the complete results set to be calculated before getting >>> any response >>> >>> at >>> es.semantic.project.serialize.Serializer.writeAs(Ljava/lang/String;)Ljava/lang/String; >>> (Serializer.java:98) >>> at >>> es.semantic.project.serialize.Serializer.writeAs(Ljavax/ws/rs/core/MediaType;)Ljava/lang/String; >>> (Serializer.java:69) >>> at >>> es.semantic.project.rest.QueryRestService.executeQuery(Les/semantic/project/storage/sql/SparqlQuery$Scope;Ljava/lang/String;Ljavax/ws/rs/core/Request;)Ljavax/ws/rs/core/Response; >>> (QueryRestService.java:340) >>> at >>> es.semantic.project.rest.QueryRestService$Proxy$_$$_WeldClientProxy.executeQuery(Les/semantic/project/storage/sql/SparqlQuery$Scope;Ljava/lang/String;Ljavax/ws/rs/core/Request;)Ljavax/ws/rs/core/Response; >>> (Unknown Source) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; >>> (Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; >>> (NativeMethodAccessorImpl.java:62) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; >>> (DelegatingMethodAccessorImpl.java:43) >>> at >>> java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; >>> (Method.java:498) >>> at >>> org.jboss.resteasy.core.MethodInjectorImpl.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Ljava/lang/Object;)Ljava/lang/Object; >>> (MethodInjectorImpl.java:139) >>> at >>> org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Ljava/lang/Object;)Lorg/jboss/resteasy/specimpl/BuiltResponse; >>> (ResourceMethodInvoker.java:295) >>> at >>> org.jboss.resteasy.core.ResourceMethodInvoker.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Ljava/lang/Object;)Lorg/jboss/resteasy/specimpl/BuiltResponse; >>> (ResourceMethodInvoker.java:249) >>> at >>> org.jboss.resteasy.core.ResourceMethodInvoker.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;)Lorg/jboss/resteasy/specimpl/BuiltResponse; >>> (ResourceMethodInvoker.java:236) >>> at >>> org.jboss.resteasy.core.SynchronousDispatcher.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Lorg/jboss/resteasy/core/ResourceInvoker;)V >>> (SynchronousDispatcher.java:395) >>> at >>> org.jboss.resteasy.core.SynchronousDispatcher.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;)V >>> (SynchronousDispatcher.java:202) >>> at >>> org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(Ljava/lang/String;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;Z)V >>> (ServletContainerDispatcher.java:221) >>> at >>> org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(Ljava/lang/String;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V >>> (HttpServletDispatcher.java:56) >>> at >>> org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V >>> (HttpServletDispatcher.java:51) >>> at >>> javax.servlet.http.HttpServlet.service(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V >>> (HttpServlet.java:790) >>> at >>> io.undertow.servlet.handlers.ServletHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (ServletHandler.java:85) >>> at >>> io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V >>> (FilterHandler.java:129) >>> at >>> org.apache.logging.log4j.web.Log4jServletFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V >>> (Log4jServletFilter.java:71) >>> at >>> io.undertow.servlet.core.ManagedFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V >>> (ManagedFilter.java:60) >>> at >>> io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V >>> (FilterHandler.java:131) >>> at >>> io.undertow.servlet.handlers.FilterHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (FilterHandler.java:84) >>> at >>> io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (ServletSecurityRoleHandler.java:62) >>> at >>> io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (ServletDispatchingHandler.java:36) >>> at >>> org.wildfly.extension.undertow.security.SecurityContextAssociationHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (SecurityContextAssociationHandler.java:78) >>> at >>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (PredicateHandler.java:43) >>> at >>> io.undertow.servlet.handlers.security.SSLInformationAssociationHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (SSLInformationAssociationHandler.java:131) >>> at >>> io.undertow.servlet.handlers.security.ServletAuthenticationCallHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (ServletAuthenticationCallHandler.java:57) >>> at >>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (PredicateHandler.java:43) >>> at >>> io.undertow.security.handlers.AbstractConfidentialityHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (AbstractConfidentialityHandler.java:46) >>> at >>> io.undertow.servlet.handlers.security.ServletConfidentialityConstraintHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (ServletConfidentialityConstraintHandler.java:64) >>> at >>> io.undertow.security.handlers.AuthenticationMechanismsHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (AuthenticationMechanismsHandler.java:60) >>> at >>> io.undertow.servlet.handlers.security.CachedAuthenticatedSessionHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (CachedAuthenticatedSessionHandler.java:77) >>> at >>> io.undertow.security.handlers.NotificationReceiverHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (NotificationReceiverHandler.java:50) >>> at >>> io.undertow.security.handlers.AbstractSecurityContextAssociationHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (AbstractSecurityContextAssociationHandler.java:43) >>> at >>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (PredicateHandler.java:43) >>> at >>> org.wildfly.extension.undertow.security.jacc.JACCContextIdHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (JACCContextIdHandler.java:61) >>> at >>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (PredicateHandler.java:43) >>> at >>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (PredicateHandler.java:43) >>> at >>> io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(Lio/undertow/server/HttpServerExchange;Lio/undertow/servlet/handlers/ServletChain;Lio/undertow/servlet/handlers/ServletRequestContext;Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V >>> (ServletInitialHandler.java:284) >>> at >>> io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(Lio/undertow/server/HttpServerExchange;Lio/undertow/servlet/handlers/ServletRequestContext;Lio/undertow/servlet/handlers/ServletChain;Ljavax/servlet/DispatcherType;)V >>> (ServletInitialHandler.java:263) >>> at >>> io.undertow.servlet.handlers.ServletInitialHandler.access$000(Lio/undertow/servlet/handlers/ServletInitialHandler;Lio/undertow/server/HttpServerExchange;Lio/undertow/servlet/handlers/ServletRequestContext;Lio/undertow/servlet/handlers/ServletChain;Ljavax/servlet/DispatcherType;)V >>> (ServletInitialHandler.java:81) >>> at >>> io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(Lio/undertow/server/HttpServerExchange;)V >>> (ServletInitialHandler.java:174) >>> at >>> io.undertow.server.Connectors.executeRootHandler(Lio/undertow/server/HttpHandler;Lio/undertow/server/HttpServerExchange;)V >>> (Connectors.java:202) >>> at io.undertow.server.HttpServerExchange$1.run()V >>> (HttpServerExchange.java:793) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V >>> (ThreadPoolExecutor.java:1142) >>> at java.util.concurrent.ThreadPoolExecutor$Worker.run()V >>> (ThreadPoolExecutor.java:617) >>> at java.lang.Thread.run()V (Thread.java:745) >>> >>> Nothing looks obviously wrong with either your query or your sample data >>> though as I noted you didn’t provide the exact query that triggered this >>> particular stack trace >>> >>> Rob >>> >>> On 23/10/2017 15:59, "George News" <george.n...@gmx.net> wrote: >>> >>> On 2017-10-11 15:47, Rob Vesse wrote: >>> > Comments inline: >>> > >>> > On 11/10/2017 11:57, "George News" <george.n...@gmx.net> wrote: >>> > >>> > Hi all, >>> > >>> > The project I'm working in currently has a TDB with approximately >>> 100M >>> > triplets and the size is increasing quite quickly. When I make a >>> typical >>> > SPARQL query for getting data from the system, it takes ages, >>> sometimes >>> > more than 10-20 minutes. I think performance wise this is not >>> really >>> > user friendly. Therefore I need to know how I can increase the >>> speed, etc. >>> >>> We have made a tdbdump of the current TDB and the size for the figures >>> we pointed out are about 70GB in RDF/xml format. >>> >>> > I'm running the whole system on a machine with Intel Xeon E312xx >>> with >>> > 32Gb RAM and many times I'm getting OutofMemory Exceptions and the >>> > google.cache that Jena handles is the one that seems to be >>> causing the >>> > problem. >>> > >>> > Specifics stack traces would be useful to understand where the cache >>> is being exploded. Certain kinds of query may use the cache more heavily >>> than others so some elaboration on the general construction of queries >>> would be interesting. >>> >>> Find ExceptionStackTrace.txt file as an example. Most of the times the >>> error is quite similar. >>> >>> > >>> > Are the figures I'm pointing normal (machine specs, response time, >>> > etc.)? Is it too big/too small? >>> > >>> > The size of the data seems small relative to the size of the >>> machine. You don’t specify whether you change the JVM heap size, most >>> memory usage in TDB is off-heap via memory mapped files so setting too >>> large a heap can negatively impact performance. >>> > >>> > The response times seems very poor but that may be the nature of >>> your queries and data structure, however since you are unable to show those >>> we can only provide generalisations >>> > >>> > For the moment, we have decided to split the graph in pieces, >>> that is, >>> > generating a new named graph every now and then so the amount of >>> > information stored in a "current" graph is smaller. Then >>> restricting the >>> > query to a set of graphs things work better. >>> > >>> > Although this solution works, when we merge the graphs for >>> historical >>> > queries, we are facing the same problem as before. Then, how can >>> we >>> > increased the speed? >>> > >>> > I cannot disclosed the dataset or part of it, but I will try to >>> somehow >>> > explain it. >>> > >>> > - Ids for entities are approximately 255 random ASCII characters. >>> Does >>> > the size of the ids affect the speed of the SPARQL queries? If >>> yes, can >>> > I apply a Lucene index to the IDs in order to reduce the query >>> time? >>> > >>> > It depends on the nature of the query. All terms are mapped into >>> 64-bit internal identifiers, these are only mapped back to the original >>> terms as and when that query engine and/or results serialisation requires >>> it. A cache is used to speed up the mapping in both directions so >>> depending on the nature of the queries and your system loads you may be >>> thrashing this cache. >>> > >>> > - The depth level of the graph or the information relationship is >>> around >>> > 7-8 level at most, but most of the times it is required to link >>> 3-4 levels. >>> > >>> > Difficult to say how this impacts performance because it really >>> depends on how you are querying that structure >>> > >>> > - Most of the queries include several: >>> > ?x myont:hasattribute ?b. >>> > ?a rdf:type ?b. >>> > >>> > Therefore checking the class and subclasses of entities. Is there >>> anyway >>> > to speed up the inference as if I'm asking for the parent class I >>> will >>> > get also the children ones defined in my ontology. >>> > >>> > So are you actively using inference? If you are then that will >>> significantly degrade performance because the inference closure is done >>> entirely in memory i.e. not in TDB if inference is turned on and you will >>> get minimal performance benefit from using TDB. >>> > >>> > If you only need simple inference like class and property hierarchy >>> you may be better served by asserting those statically using SPARQL updates >>> and not using dynamic inference >>> >>> sorry for the delay on providing examples of data and SPARQL queries we >>> usually make. >>> >>> The data we are using is following the ontology that is publicly >>> available at [1]. >>> >>> Using this ontology, a sample of a semantic document in JSON-LD format >>> can be found in the files attached (Observation.jsonld and >>> Resource.jsonld). These individuals are stored in a TDB using JENA, and >>> are stored in different graphs that can be merged within the code using >>> MultiUnion in order to make queries. Then we are requesting for data by >>> using SPARQL Select queries with quite a lot of inference required >>> (SPARQL.txt). >>> >>> As you suggested, we have made some tries including more properties >>> (mainly rdf:type) for the individual descriptions in order to disable >>> inference from the requests. For instance, whenever I register a >>> m3-lite#AirThermometer I'm always including it is also a >>> ssn#SensingDevice. This way this device can be easily discovered by its >>> more descriptive name and/or by its generic one. >>> >>> However, the results are not the expected using the same SPARQL >>> sentences and I have to create specific SPARQL queries to properly >>> discover the data. Is this the way you suggested to work? Should then I >>> inform the users of our system about the way we are registering data? >>> >>> [1]: http://ontology.fiesta-iot.eu/ontologyDocs/fiesta-iot/doc >>> >>> >>> > - I know the "." in a query acts as more or less like an AND >>> logical >>> > operation. Does the order of sentences have implications in the >>> > performance? Should I start with the most restrictive ones? >>> Should I >>> > start with the simplest ones, i.e. checking number values, etc.? >>> > >>> > yes and no. TDB Will attempt to do the necessary scans in an >>> optimal order based on its knowledge of the statistics of the data. However >>> this only applies within a single query pattern i.e. { } so depending on >>> the structure of your query you may need to do some manual reordering. Also >>> if inference is involved then that may interact. >>> > >>> > - Some of the queries uses spatial and time filtering? Is is worth >>> > implementing the support for spatial searches with SPARQL? Is >>> there any >>> > kind of index for time searches? >>> > >>> > There is a geospatial indexing extension but there is no temporal >>> indexing provided by Jena. >>> >>> As you can see from the Resource.jsonld, we are using location. Do you >>> think indexing will help on locating the individuals? >>> > >>> > Any help is more than welcome. >>> > >>> > Without more detail it is difficult to provide more detailed help. >>> > >>> > Rob >>> > >>> > Regards, >>> > Jorge >>> > >>> > >>> > >>> > >>> > >>> > >>> >>> >>> >>> >>> >>> > >