On 2/9/11 8:29 AM, Abhi wrote:
Thanks for that.

Say I have to load a triple file for a single instance, then I use the ld_dir function to inform virtuoso of the file and then use rdf_loader_run and load it into virtuoso(This is present in the dbpedia example page).

Now for the cluster, say I have 4 triple files belonging to 4 different graphs. Do you mean, load each instance with one of the the triple files?

Yes, so they load in parallel.

Also, say I have a huge file of 10 Gb, then can I split the file into 2.5 Gbs of well formed triples and then load the split files into the 4 instances?

Yes, again so they load in parallel.

This is how we load the entire DBpedia dataset in 15 mins on the LOD cloud cache instance of Virtuoso. We split the load across 8 instances in the 8-node cluster :-)

When you say "You talk to Virtuoso as you would the single server edition from any port. " do you mean to say, I can talk to any cluster instance and I will have access to the data in all the cluster instances?

Yep! That's the essence of the matter re. our "shared nothing" cluster.

Kingsley

On Wed, Feb 9, 2011 at 5:55 PM, Kingsley Idehen <kide...@openlinksw.com <mailto:kide...@openlinksw.com>> wrote:

    On 2/9/11 3:46 AM, Abhi wrote:
    Can a virtuoso cluster be treated as a virtual single instance?
    To expand:

    Say I have a cluster of 4 virtuoso instances with one of them
    configured as a master. Now, I have to load the cluster with say
    3 billion triples belonging to say 5 different graphs.

    1. I just load the data into the master server and virtuoso
    clustering takes care of spreading the data into the different
    servers as it sees fit? Also, is the data partitioned into the
    different servers or is it just replicated?

    You can load across all 4 instances in parallel. That's the very
    essence of the horizontal partitioning that underlies our cluster
    engine. It's one virtual database in a sense where access to any
    node delivers access to the entire parallelized cluster.


    2. When I have to query this data say from
    3 interconnected graphs, then I just run the query against the
    master cluster and virtuoso cluster will take care of fetching
    the partitioned data(assuming it is partitioned) from the
    different instances?

    Are my assumptions correct?

    You talk to Virtuoso as you would the single server edition from
    any port. The cluster engine deals with the rest of the work :-)


    Kingsley

-- Cheers,
    Abhi


    
------------------------------------------------------------------------------
    The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
    Pinpoint memory and threading errors before they happen.
    Find and fix more than 250 security defects in the development cycle.
    Locate bottlenecks in serial and parallel code that limit performance.
    http://p.sf.net/sfu/intel-dev2devfeb


    _______________________________________________
    Virtuoso-users mailing list
    Virtuoso-users@lists.sourceforge.net  
<mailto:Virtuoso-users@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/virtuoso-users


--
    Regards,

    Kingsley Idehen     
    President&  CEO
    OpenLink Software
    Web:http://www.openlinksw.com
    Weblog:http://www.openlinksw.com/blog/~kidehen  
<http://www.openlinksw.com/blog/%7Ekidehen>
    Twitter/Identi.ca: kidehen






    
------------------------------------------------------------------------------
    The ultimate all-in-one performance toolkit: Intel(R) Parallel
    Studio XE:
    Pinpoint memory and threading errors before they happen.
    Find and fix more than 250 security defects in the development cycle.
    Locate bottlenecks in serial and parallel code that limit performance.
    http://p.sf.net/sfu/intel-dev2devfeb
    _______________________________________________
    Virtuoso-users mailing list
    Virtuoso-users@lists.sourceforge.net
    <mailto:Virtuoso-users@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/virtuoso-users




--
Cheers,
Abhi


------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb


_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen





Reply via email to