On 2/9/11 8:29 AM, Abhi wrote:
Thanks for that.
Say I have to load a triple file for a single instance, then I use the
ld_dir function to inform virtuoso of the file and then use
rdf_loader_run and load it into virtuoso(This is present in the
dbpedia example page).
Now for the cluster, say I have 4 triple files belonging to 4
different graphs. Do you mean, load each instance with one of the the
triple files?
Yes, so they load in parallel.
Also, say I have a huge file of 10 Gb, then can I split the file into
2.5 Gbs of well formed triples and then load the split files into the
4 instances?
Yes, again so they load in parallel.
This is how we load the entire DBpedia dataset in 15 mins on the LOD
cloud cache instance of Virtuoso. We split the load across 8 instances
in the 8-node cluster :-)
When you say "You talk to Virtuoso as you would the single server
edition from any port. " do you mean to say, I can talk to any cluster
instance and I will have access to the data in all the cluster instances?
Yep! That's the essence of the matter re. our "shared nothing" cluster.
Kingsley
On Wed, Feb 9, 2011 at 5:55 PM, Kingsley Idehen
<kide...@openlinksw.com <mailto:kide...@openlinksw.com>> wrote:
On 2/9/11 3:46 AM, Abhi wrote:
Can a virtuoso cluster be treated as a virtual single instance?
To expand:
Say I have a cluster of 4 virtuoso instances with one of them
configured as a master. Now, I have to load the cluster with say
3 billion triples belonging to say 5 different graphs.
1. I just load the data into the master server and virtuoso
clustering takes care of spreading the data into the different
servers as it sees fit? Also, is the data partitioned into the
different servers or is it just replicated?
You can load across all 4 instances in parallel. That's the very
essence of the horizontal partitioning that underlies our cluster
engine. It's one virtual database in a sense where access to any
node delivers access to the entire parallelized cluster.
2. When I have to query this data say from
3 interconnected graphs, then I just run the query against the
master cluster and virtuoso cluster will take care of fetching
the partitioned data(assuming it is partitioned) from the
different instances?
Are my assumptions correct?
You talk to Virtuoso as you would the single server edition from
any port. The cluster engine deals with the rest of the work :-)
Kingsley
--
Cheers,
Abhi
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
<mailto:Virtuoso-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/virtuoso-users
--
Regards,
Kingsley Idehen
President& CEO
OpenLink Software
Web:http://www.openlinksw.com
Weblog:http://www.openlinksw.com/blog/~kidehen
<http://www.openlinksw.com/blog/%7Ekidehen>
Twitter/Identi.ca: kidehen
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel
Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
<mailto:Virtuoso-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/virtuoso-users
--
Cheers,
Abhi
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users
--
Regards,
Kingsley Idehen
President& CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen