Re: [Virtuoso-users] Clustering basic doubt

Kingsley Idehen Wed, 09 Feb 2011 15:01:48 +0000

On 2/9/11 8:29 AM, Abhi wrote:

Thanks for that.
Say I have to load a triple file for a single instance, then I use theld_dir function to inform virtuoso of the file and then userdf_loader_run and load it into virtuoso(This is present in thedbpedia example page).
Now for the cluster, say I have 4 triple files belonging to 4different graphs. Do you mean, load each instance with one of the thetriple files?


Yes, so they load in parallel.

Also, say I have a huge file of 10 Gb, then can I split the file into2.5 Gbs of well formed triples and then load the split files into the4 instances?


Yes, again so they load in parallel.

This is how we load the entire DBpedia dataset in 15 mins on the LODcloud cache instance of Virtuoso. We split the load across 8 instancesin the 8-node cluster :-)

When you say "You talk to Virtuoso as you would the single serveredition from any port. " do you mean to say, I can talk to any clusterinstance and I will have access to the data in all the cluster instances?


Yep! That's the essence of the matter re. our "shared nothing" cluster.

Kingsley

On Wed, Feb 9, 2011 at 5:55 PM, Kingsley Idehen<kide...@openlinksw.com <mailto:kide...@openlinksw.com>> wrote:


    On 2/9/11 3:46 AM, Abhi wrote:

    Can a virtuoso cluster be treated as a virtual single instance?
    To expand:

    Say I have a cluster of 4 virtuoso instances with one of them
    configured as a master. Now, I have to load the cluster with say
    3 billion triples belonging to say 5 different graphs.

    1. I just load the data into the master server and virtuoso
    clustering takes care of spreading the data into the different
    servers as it sees fit? Also, is the data partitioned into the
    different servers or is it just replicated?


    You can load across all 4 instances in parallel. That's the very
    essence of the horizontal partitioning that underlies our cluster
    engine. It's one virtual database in a sense where access to any
    node delivers access to the entire parallelized cluster.


    2. When I have to query this data say from
    3 interconnected graphs, then I just run the query against the
    master cluster and virtuoso cluster will take care of fetching
    the partitioned data(assuming it is partitioned) from the
    different instances?

    Are my assumptions correct?


    You talk to Virtuoso as you would the single server edition from
    any port. The cluster engine deals with the rest of the work :-)


    Kingsley

--Cheers,

    Abhi


    
------------------------------------------------------------------------------
    The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
    Pinpoint memory and threading errors before they happen.
    Find and fix more than 250 security defects in the development cycle.
    Locate bottlenecks in serial and parallel code that limit performance.
    http://p.sf.net/sfu/intel-dev2devfeb


    _______________________________________________
    Virtuoso-users mailing list
    Virtuoso-users@lists.sourceforge.net  
<mailto:Virtuoso-users@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/virtuoso-users

--

    Regards,

    Kingsley Idehen     
    President&  CEO
    OpenLink Software
    Web:http://www.openlinksw.com
    Weblog:http://www.openlinksw.com/blog/~kidehen  
<http://www.openlinksw.com/blog/%7Ekidehen>
    Twitter/Identi.ca: kidehen






    
------------------------------------------------------------------------------
    The ultimate all-in-one performance toolkit: Intel(R) Parallel
    Studio XE:
    Pinpoint memory and threading errors before they happen.
    Find and fix more than 250 security defects in the development cycle.
    Locate bottlenecks in serial and parallel code that limit performance.
    http://p.sf.net/sfu/intel-dev2devfeb
    _______________________________________________
    Virtuoso-users mailing list
    Virtuoso-users@lists.sourceforge.net
    <mailto:Virtuoso-users@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/virtuoso-users




--
Cheers,
Abhi


------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb


_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users



--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: [Virtuoso-users] Clustering basic doubt

Reply via email to