So, this has nothing to do with the large vector size, but just to be sure the 
SPARQL is correct - do you wish to delete the subjects (and all their triples) 
where the subject has the predicate, or just the predicate itself?

As far as avoiding the maximum vector size, I think your best approach is to 
limit the number of matches and repeat the query until there are no results, 
maybe with a count query in-between.   I have had to do similar sorts of 
work-arounds to avoid the maximum # of results and maximum size of string 
issues.   For instance, my first attempts to export large NTriples files after 
processing failed due to these issues.   You may be able to adapt the code 
below, but I think that a repeated deleted query limited to a # of triples will 
be best in your case.

Anyway, the code:

CREATE PROCEDURE meshrdf_export(in graph_uri varchar, in file_name varchar) {
    DECLARE banner any;
    DECLARE env, ses any;
    DECLARE ses_len, max_ses_len any;

    SET isolation = 'uncommitted';

    max_ses_len := 10000000;

    --
    -- Truncate file and write a comment line indicating the graph and datetime 
of export.
    --
    --no_c_escapes-
    banner := sprintf('# <%s> exported at %s\n', graph_uri, datestring(now()));
    string_to_file (file_name, banner, -2);

    env := vector (0, 0, 0);
    ses := string_output ();

    FOR (SELECT * FROM (SPARQL
                        define input:storage ""
                        SELECT ?s ?p ?o WHERE {
                          GRAPH `iri(?:graph_uri)` {
                            ?s ?p ?o
                          }
                        } ORDER BY ?s ?p ?o) AS sub OPTION (loop)) DO {
        http_nt_triple (env, "s", "p", "o", ses);
        ses_len := length (ses);

        IF (ses_len > max_ses_len) {
            string_to_file (file_name, ses, -1);
            ses := string_output ();
        }
    }
    IF (length (ses)) {
        string_to_file (file_name, ses, -1);
    }
}

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH


From: Pantelis Natsiavas [mailto:natsia...@gmail.com]
Sent: Wednesday, August 17, 2016 4:36 AM
To: virtuoso-users <virtuoso-users@lists.sourceforge.net>
Subject: [Virtuoso-users] Deleting large number of triples

Hi everybody.

I am trying to delete a large number of triples of a very big graph. The graph 
contains 217.609.545 triples and I want to delete all the triples having a 
specific predicate (64.884.016 triples).

I am trying to do it through the isql-v command line interface, using the 
command:

SPARQL DEFINE sql:log-enable 3
WITH <graph>
DELETE { ?s <predicate> ?o }
WHERE{ ?s <predicate> ?o }

After some time (I don't know exactly how much) I got the error

*** Error 42000: [Virtuoso Driver][Virtuoso Server]FRVEC: array in for vectored 
over max vector length 2000000 > 1000000
at line 1 of Top-Level:

 I checked the virtuoso.log and I see nothing related to the specific error.

I changed the parameters in virtuoso.ini:
MaxQueryMem  = 8G          ; from 2G
VectorSize = 1000               ; not changed
MaxVectorSize = 2000000  ; from 1000000
AdjustVectorSize = 1           ; from 0

I am not very confident about these changes in virtuoso settings, but checking 
the http://docs.openlinksw.com/virtuoso/dbadm.html these changes seemed the 
right thing to do.

I restarted the VM and retried the whole process. After one hour, the memory 
consumed by Virtuoso got around 100% and got an error:
*** Error 08S01: [Virtuoso Driver]CL065: Lost connection to server

Please note that from previous similar errors, I already have the following 
virtuoso.ini settings:
NumberOfBuffers = 1360000
MaxDirtyBuffers = 1000000
ThreadCleanupInterval    = 1
ResourcesCleanupInterval = 1

My questions:
1. Is there any way to improve my query in order to facilitate its processing? 
It is the first time I am doing a DELETE query and I am not comfortable with it.
2. Is there any way to "split" the query so that it doesn't need to handle all 
these triples at once?
3. Alternatively, is there any configuration change that might improve memory 
handling in order to handle such big queries?

Kind regards,
Pantelis Natsiavas


------------------------------------------------------------------------------
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to