Re: [Virtuoso-users] Out of curiosity, some questions about OpenSource Virtuoso settings
The trouble with classic http is that pretty frequently you wind up making a separate TCP connection for each API call. In theory there was pipelining in http/1.1 but it did not really work and was rarely implemented in "web service" scenarios. http/2 has pipelining that works, reduces the # of round trips to do encryption, compresses headers, and has all sorts of goodness. Another issue is that http-based protocols have historically been text-based protocols and if there was useful thing I learned in grad school it was that you can do an awful lot of FLOPS in the time that it takes to parse a string like "7.55523135E12". Prior to 2000, it was common to see a wide range of text-based formats for structured data such as HL7, UN/EDIFACT, the communication packets used in amateur radio, etc. In the 2000-2010 period, XML ate most of that. In 2010-present, JSON has replaced XML in a lot of places. As people catch on to what JSON-LD really means (you can paint on the XML semantics that are missing in JSON with a minimum of bother), JSON will grow. Performance-wise you can get much better results with binary formats and people today are even realizing (slower than with JSON-LD) you can implement binary formats in LD in a secure way in C, unlike text formats. The interesting thing though is that "general" binary formats are nowhere near the consolidation we've seen with XML and JSON. Binary XML formats have been tried and not caught on outside of niches. A small community speaks Binary FIX. There is protocol buffers, Captain Proto, Thrift, Avro, MessagePack, and who knows what else. Binary formats might never get their XML moment because once you start caring about performance you care about performance for *your specific case* and thus it is impossible to pick one protocol that will make everybody equally happy (or miserable) Caching is another http family problem. Once you start caching you run into (a) the risk that cached data will be invalid, and (b) there is no guarantee that perceived performance will be better when you use a cache. It's is very hard for Homo Corporatus to understand that users (ex. customers, employees) experience latency, not throughput. (It's anathema to corporate ideology, for instance, that 9 women can't make a baby in 1 month -- I knew Sun was on the skids when they started talking about "Throughput Computing") A 5400 rpm hard drive typical in a laptop can spin around about twice in the time it takes my (awful) DSL connection to round trip to the Azure data center in Chicago, and any kind of I/O storm means it can take much longer than two spins to discover that something is not in the cache. It's certainly possible that http caching can improve perceived performance, but you can't take it for granted. On Mon, Sep 12, 2016 at 10:49 AM, Davis, Daniel (NIH/NLM) [C] < daniel.da...@nih.gov> wrote: > Tangentially on JDBC/ODBC vs HTTP.HTTP is slow because of the > transport, but JDBC/ODBC has more mature models of paging over results, > pre-fetching, etc. I feel that RDF is mature as a Data format, but as a > protocol it is not that mature. > > -Original Message- > From: Hugh Williams [mailto:hwilli...@openlinksw.com] > Sent: Monday, September 12, 2016 5:30 AM > To: Lorenz Buehmann > Cc: virtuoso-users@lists.sourceforge.net > Subject: Re: [Virtuoso-users] Out of curiosity, some questions about > OpenSource Virtuoso settings > > Hi Lorenz, > > In Virtuoso 7+ Vectored query execution [1] enables single, typically > complex analytical types queries, to be broken down and executed on > multiple threads. The INI file param that controls this is > "ThreadsPerQuery" [2] , which controls the maximum number of threads that > can be claimed from the thread pool by a single query, and there are other > associated params details at [2] . > > [1] http://docs.openlinksw.com/virtuoso/vexqrparl.html > [2] http://docs.openlinksw.com/virtuoso/vexqrparlconfp/ > > Best Regards > Hugh Williams > Professional Services > OpenLink Software, Inc. // http://www.openlinksw.com/ > Weblog -- http://www.openlinksw.com/blogs/ > LinkedIn -- http://www.linkedin.com/company/openlink-software/ > Twitter -- http://twitter.com/OpenLink > Google+ -- http://plus.google.com/100570109519069333827/ > Facebook -- http://www.facebook.com/OpenLinkSoftware > Universal Data Access, Integration, and Management Technology Providers > > > On 12 Sep 2016, at 07:13, Lorenz Buehmann leipzig.de> wrote: > > > > Hi, > > > > just as a follow-up question: > > > > I know it supports inter-query parallelization, but does it also support > intra-query parallelization, i.e. using multiple threads to compute the > result of a single query? If yes, which parameter is used to configure this? > > > > > > Cheers, > > > > Lorenz > > > > On 12.09.2016 05:57, Kingsley Idehen wrote: > >> On 9/10/16 10:41 PM, giacom...@libero.it > >> wrote: > >> > >>> So, I have a bunch of question
Re: [Virtuoso-users] Out of curiosity, some questions about OpenSource Virtuoso settings
Tangentially on JDBC/ODBC vs HTTP.HTTP is slow because of the transport, but JDBC/ODBC has more mature models of paging over results, pre-fetching, etc. I feel that RDF is mature as a Data format, but as a protocol it is not that mature. -Original Message- From: Hugh Williams [mailto:hwilli...@openlinksw.com] Sent: Monday, September 12, 2016 5:30 AM To: Lorenz Buehmann Cc: virtuoso-users@lists.sourceforge.net Subject: Re: [Virtuoso-users] Out of curiosity, some questions about OpenSource Virtuoso settings Hi Lorenz, In Virtuoso 7+ Vectored query execution [1] enables single, typically complex analytical types queries, to be broken down and executed on multiple threads. The INI file param that controls this is "ThreadsPerQuery" [2] , which controls the maximum number of threads that can be claimed from the thread pool by a single query, and there are other associated params details at [2] . [1] http://docs.openlinksw.com/virtuoso/vexqrparl.html [2] http://docs.openlinksw.com/virtuoso/vexqrparlconfp/ Best Regards Hugh Williams Professional Services OpenLink Software, Inc. // http://www.openlinksw.com/ Weblog -- http://www.openlinksw.com/blogs/ LinkedIn -- http://www.linkedin.com/company/openlink-software/ Twitter -- http://twitter.com/OpenLink Google+ -- http://plus.google.com/100570109519069333827/ Facebook -- http://www.facebook.com/OpenLinkSoftware Universal Data Access, Integration, and Management Technology Providers > On 12 Sep 2016, at 07:13, Lorenz Buehmann > wrote: > > Hi, > > just as a follow-up question: > > I know it supports inter-query parallelization, but does it also support > intra-query parallelization, i.e. using multiple threads to compute the > result of a single query? If yes, which parameter is used to configure this? > > > Cheers, > > Lorenz > > On 12.09.2016 05:57, Kingsley Idehen wrote: >> On 9/10/16 10:41 PM, giacom...@libero.it >> wrote: >> >>> So, I have a bunch of questions about Virtuoso. >>> >>> * Are SPARQL queries performed concurrently (using the standard >>> virtuoso.ini configuration)? >>> >> Yes, Virtuoso is highly multi-threaded. In addition, it has >> vectorized query execution i.e., many query batches per thread, handled >> concurrently . >> >>> * Are Virtuoso transactions autocommittable, or there aren't any >>> kind of transactions when connecting to the ODBC driver? >>> >> Depends, by default you have Read Committed Isolation level. >> >>> * Does Virtuoso automatically performs triple indexing before query data? >>> >> No, it has indexes in place. >> >> >>> In >>> some systems I have to configure it manually. >>> * Does the usage of ODBC degradates the benchmarking of Virtuoso's >>> SPARQL query or the time required to store the triples within a given named >>> graph? >>> >> No, if anything that's faster than HTTP. >> >> Links: >> >> [1] >> http://docs.openlinksw.com/virtuoso/isolation/ >> >> [2] >> >> http://wikis.openlinksw.com/VirtuosoWikiWeb/ChangeVirtuosoSDefaultTra >> nsactionIsolationLevel >> >> [3] >> http://docs.openlinksw.com/virtuoso/fn_log_enable.html >> >> [4] >> >> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPer >> formanceTuning >> >> >> [5] >> >> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtBulkRD >> FLoader#Running >> >> multiple Loaders >> >> >> Kingsley >> >> >>> Thanks in advance for any support, >>> JackB >>> >>> -- ___ >>> Virtuoso-users mailing list >>> >>> Virtuoso-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users >>> >>> >>> >> >> >> >> - >> - >> >> >> >> ___ >> Virtuoso-users mailing list >> >> Virtuoso-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/virtuoso-users > > -- > ___ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users -- ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
Re: [Virtuoso-users] Timeout when listing graphs
On 9/12/16 3:50 AM, Pantelis Natsiavas wrote: > SPARQL SELECT distinct ?graph WHERE { GRAPH ?graph { ?s ?p ?o } }; Please try: SPARQL SELECT distinct ?graph WHERE { GRAPH ?graph { ?s a ?o } }; -- Regards, Kingsley Idehen Founder & CEO OpenLink Software (Home Page: http://www.openlinksw.com) Medium Blog: https://medium.com/@kidehen Blogspot Blog: http://kidehen.blogspot.com Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature -- ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
Re: [Virtuoso-users] Out of curiosity, some questions about OpenSource Virtuoso settings
Hi Lorenz, In Virtuoso 7+ Vectored query execution [1] enables single, typically complex analytical types queries, to be broken down and executed on multiple threads. The INI file param that controls this is “ThreadsPerQuery” [2] , which controls the maximum number of threads that can be claimed from the thread pool by a single query, and there are other associated params details at [2] . [1] http://docs.openlinksw.com/virtuoso/vexqrparl.html [2] http://docs.openlinksw.com/virtuoso/vexqrparlconfp/ Best Regards Hugh Williams Professional Services OpenLink Software, Inc. // http://www.openlinksw.com/ Weblog -- http://www.openlinksw.com/blogs/ LinkedIn -- http://www.linkedin.com/company/openlink-software/ Twitter -- http://twitter.com/OpenLink Google+ -- http://plus.google.com/100570109519069333827/ Facebook -- http://www.facebook.com/OpenLinkSoftware Universal Data Access, Integration, and Management Technology Providers > On 12 Sep 2016, at 07:13, Lorenz Buehmann > wrote: > > Hi, > > just as a follow-up question: > > I know it supports inter-query parallelization, but does it also support > intra-query parallelization, i.e. using multiple threads to compute the > result of a single query? If yes, which parameter is used to configure this? > > > Cheers, > > Lorenz > > On 12.09.2016 05:57, Kingsley Idehen wrote: >> On 9/10/16 10:41 PM, giacom...@libero.it >> wrote: >> >>> So, I have a bunch of questions about Virtuoso. >>> >>> * Are SPARQL queries performed concurrently (using the standard >>> virtuoso.ini >>> configuration)? >>> >> Yes, Virtuoso is highly multi-threaded. In addition, it has vectorized >> query execution i.e., many query batches per thread, handled concurrently . >> >>> * Are Virtuoso transactions autocommittable, or there aren't any kind of >>> transactions when connecting to the ODBC driver? >>> >> Depends, by default you have Read Committed Isolation level. >> >>> * Does Virtuoso automatically performs triple indexing before query data? >>> >> No, it has indexes in place. >> >> >>> In >>> some systems I have to configure it manually. >>> * Does the usage of ODBC degradates the benchmarking of Virtuoso's SPARQL >>> query or the time required to store the triples within a given named graph? >>> >> No, if anything that's faster than HTTP. >> >> Links: >> >> [1] >> http://docs.openlinksw.com/virtuoso/isolation/ >> >> [2] >> >> http://wikis.openlinksw.com/VirtuosoWikiWeb/ChangeVirtuosoSDefaultTransactionIsolationLevel >> >> [3] >> http://docs.openlinksw.com/virtuoso/fn_log_enable.html >> >> [4] >> >> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning >> >> >> [5] >> >> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtBulkRDFLoader#Running >> >> multiple Loaders >> >> >> Kingsley >> >> >>> Thanks in advance for any support, >>> JackB >>> -- >>> ___ >>> Virtuoso-users mailing list >>> >>> Virtuoso-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users >>> >>> >>> >> >> >> >> -- >> >> >> >> ___ >> Virtuoso-users mailing list >> >> Virtuoso-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/virtuoso-users > > -- > ___ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users smime.p7s Description: S/MIME cryptographic signature -- ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
Re: [Virtuoso-users] Timeout when listing graphs
Hi Hugh. I had changed the virtuoso.ini vector parameters in the past due to working with large datasets. One of these changes could have caused the problem. However, I can not confidently say which. I have changed the parameters you suggested, however, still I cannot get an answer to the graphs query in a reasonable time. I am using Virtuoso Version: 07.20.3214 Build: Oct 14 2015, on Ubuntu 14.04 and the number of the graphs is not large (maybe 15 or 20). Kind regards, Pantelis Natsiavas 2016-09-08 4:54 GMT+03:00 Hugh Williams : > Hi Pantelis, > > What is the version of the binary being run ie 07.20.3217 or other build > id and how many graphs are in the database ? > > I have seen such issues with older Virtuoso binaries, which has been > resolved by adding the following to the “[Parameters]” of the INI file: > > VectorSize = 1000 > AdjustVectorSize = 0 > > Best Regards > Hugh Williams > Professional Services > OpenLink Software, Inc. // http://www.openlinksw.com/ > Weblog -- http://www.openlinksw.com/blogs/ > LinkedIn -- http://www.linkedin.com/company/openlink-software/ > Twitter -- http://twitter.com/OpenLink > Google+ -- http://plus.google.com/100570109519069333827/ > Facebook -- http://www.facebook.com/OpenLinkSoftware > Universal Data Access, Integration, and Management Technology Providers > > > On 7 Sep 2016, at 10:25, Pantelis Natsiavas wrote: > > > > Hi everybody. > > > > When I try to list the graphs in my virtuoso instance, I get a huge > delay, and finally a timeout. > > > > This happens for the last two weeks or so, without any obvious reason. > The delay occurs when I try it through the conductor web UI (Tab "Linked > Data" -> Tab "Graphs" -> Tab "Graphs"). I have tried retrieving the graphs > through the isql-v > > > > SPARQL SELECT distinct ?graph WHERE { GRAPH ?graph { ?s ?p ?o } }; > > > > and it is also very slow. > > > > Please note that SPARQL queries are answered normally, without any > obvious delays. > > > > Do you have any kind of suggestions? Should I fear having my triple > store in some kind of corrupted state? > > > > Best regards, > > Pantelis Natsiavas > > > -- > > ___ > > Virtuoso-users mailing list > > Virtuoso-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/virtuoso-users > > -- ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
Re: [Virtuoso-users] Ubuntu upgrading
Thank you Hugh. I have removed the old virtuoso.trx file and everything seems in place. Kind regards, Pantelis Natsiavas 2016-09-08 4:47 GMT+03:00 Hugh Williams : > Hi Pantelis, > > The "It is impossible to have a database file > /media/VirtuosoDBDrive/virtuoso.db > with a length not multiple of 2MB” error on startup indicates and attempt > to grow the database, which is done in 2MB segments, failed during the > process probably due to running out of disk space or the server shutdown > unexpectedly. > > This is not related to the trx file which would only be an issue if > upgrading to a new engine build id ie 3217 , 3216 etc. and if you have ran > the +checkpoint-only option then you can remove the trx file if it is not > zero bytes as it would just contain the database signature info. > > Best Regards > Hugh Williams > Professional Services > OpenLink Software, Inc. // http://www.openlinksw.com/ > Weblog -- http://www.openlinksw.com/blogs/ > LinkedIn -- http://www.linkedin.com/company/openlink-software/ > Twitter -- http://twitter.com/OpenLink > Google+ -- http://plus.google.com/100570109519069333827/ > Facebook -- http://www.facebook.com/OpenLinkSoftware > Universal Data Access, Integration, and Management Technology Providers > > > On 7 Sep 2016, at 09:13, Pantelis Natsiavas wrote: > > > > Hi everybody. > > > > As I am trying to work on big datasets, I thought that upgrading on the > latest version of virtuoso would be a good idea as I would hopefully get > some performance advantages too. > > > > However, following the procedure described in > http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/ > Main/UpgradingToVOS610#Upgrading from Release 7.x to a newer Release 7.x > I found out that my trx file has 185 bytes length and it would not go to 0 > even after the restart with the "+checkpoint-only" argument. My log > contains the following snippet after startup > > > > 10:57:16 It is impossible to have a database file > > /media/VirtuosoDBDrive/virtuoso.db > with a length not multiple of 2MB. > > 10:57:16 The process must have last terminated while growing the file. > > 10:57:16 Please contact OpenLink Customer Support > > 10:57:16 Database version 3126 > > > > 10:58:00 Roll forward started > > 10:58:00 3 transactions, 185 bytes replayed (100 %) > > 10:58:00 Roll forward complete > > > > > > My questions are: > > 1. If I get it right, the virtuoso.trx size implies that there are > transactions left uncompleted in a "dirty" state. However, this should not > be the case since as the log shows normal replay of the transactions. Could > I just delete the virtuoso.trx? Is there something else I could do to > recover? > > 2. What would be the easiest way to upgrade the virtuoso server? I am > not comfortable with the instruction "install the newer v7.x binary > components, either atop or after removing the older v7.x binary > components.". Is there a more specific guideline? > > > > Please note that I am running Virtuoso Version: 07.20.3214 Build: Oct 14 > 2015, on Ubuntu 14.04. > > > > Kind regards, > > Pantelis Natsiavas > > > -- > > ___ > > Virtuoso-users mailing list > > Virtuoso-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/virtuoso-users > > -- ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users