Daniel and Quentin thank you for your suggestions.

I followed the path that Daniel suggested and executed the delete on the
SQL level. Everything worked fine.

Thank you very much for your time. I really appreciate it.

Kind regards,
Pantelis Natsiavas

2016-08-18 4:51 GMT+03:00 Quentin <quentin@guidinghand.solutions>:

> Hi Pantelis,
>
> If you want fine control over processing of results then Daniel's answer
> is the way to go, PLSQL allows you to do basically anything with your
> results.
>
> However it's not necessary in this instance I think.  We can probably do
> what you want with a subquery.  Since you only want to process a select
> number of records (subject to some memory constraint that you'll find out
> by trial-and-error) then we can do this with a subquery in SPARQL.
> Additionally, select statements have (or used to have) a hard limit of
> 10,000 results that might complicate the process.
>
> ------------------------------
>
> SPARQL DEFINE sql:log-enable 3
>
> WITH GRAPH <TargetGraph>
> DELETE {
> ?S ?P ?O .
> } WHERE {
> { SELECT ?S ?P ?O
> { ?S ?P ?O .
> FILTER ( ?P = <TargetPredicate> )
> } LIMIT 10 }
> }
>
> ------------------------------
>
> This will delete 10 matching records at a time from the target graph,
> filtering on the target predicate.
>
> Upscale as required and rerun as many times as required.
>
> If you want to do it more elegantly, put it in a PLSQL function similar to
> Daniel's below and just loop it together with a select query and stop the
> function when the select query finds zero count of matching predicates.
>
> ------------------------------
>
> SELECT PCount FROM (
> SPARQL
> SELECT COUNT(*) as ?PCount FROM <TargetGraph>
> {
> ?S ?P ?O .
> FILTER ( ?P = <TargetPredicate> )
> } ) AS PCount...
>
> ------------------------------
>
> Make that a SELECT INTO and store the result, loop with the delete until
> the result is zero.
>
> If you want to track progress, write the count at each iteration to the
> debug log (http://docs.openlinksw.com/virtuoso/fn_dbg_printf/) and
> restart Virtuoso in debug&foreground mode to display the debug statements.
>
> That will eventually clean up your table.
>
>
>
> Regards,
>
> Quentin.
>
> Guiding Hand Solutions
>
>
>
> On 2016-08-17 22:54, Pantelis Natsiavas wrote:
>
> Thank you for your advice Daniel.
>
> Actually I want to delete only the statements containing the specific
> predicate. I don't want to delete all the triples containing the subject of
> the predicate. As I have already said, I don't feel comfortable with the
> DELETE queries.
> Is my query wrong? Could you suggest the correct query?
>
> Kind regards,
> Pantelis Natsiavas
>
> 2016-08-17 17:19 GMT+03:00 Davis, Daniel (NIH/NLM) [C] <
> daniel.da...@nih.gov>:
>
>> So, this has nothing to do with the large vector size, but just to be
>> sure the SPARQL is correct - do you wish to delete the subjects (and all
>> their triples) where the subject has the predicate, or just the predicate
>> itself?
>>
>>
>>
>> As far as avoiding the maximum vector size, I think your best approach is
>> to limit the number of matches and repeat the query until there are no
>> results, maybe with a count query in-between.   I have had to do similar
>> sorts of work-arounds to avoid the maximum # of results and maximum size of
>> string issues.   For instance, my first attempts to export large NTriples
>> files after processing failed due to these issues.   You may be able to
>> adapt the code below, but I think that a repeated deleted query limited to
>> a # of triples will be best in your case.
>>
>>
>>
>> Anyway, the code:
>>
>>
>>
>> CREATE PROCEDURE meshrdf_export(in graph_uri varchar, in file_name
>> varchar) {
>>
>>     DECLARE banner any;
>>
>>     DECLARE env, ses any;
>>
>>     DECLARE ses_len, max_ses_len any;
>>
>>
>>
>>     SET isolation = 'uncommitted';
>>
>>
>>
>>     max_ses_len := 10000000;
>>
>>
>>
>>     --
>>
>>     -- Truncate file and write a comment line indicating the graph and
>> datetime of export.
>>
>>     --
>>
>>     --no_c_escapes-
>>
>>     banner := sprintf('# <%s> exported at %s\n', graph_uri,
>> datestring(now()));
>>
>>     string_to_file (file_name, banner, -2);
>>
>>
>>
>>     env := vector (0, 0, 0);
>>
>>     ses := string_output ();
>>
>>
>>
>>     FOR (SELECT * FROM (SPARQL
>>
>>                         define input:storage ""
>>
>>                         SELECT ?s ?p ?o WHERE {
>>
>>                           GRAPH `iri(?:graph_uri)` {
>>
>>                             ?s ?p ?o
>>
>>                           }
>>
>>                         } ORDER BY ?s ?p ?o) AS sub OPTION (loop)) DO {
>>
>>         http_nt_triple (env, "s", "p", "o", ses);
>>
>>         ses_len := length (ses);
>>
>>
>>
>>         IF (ses_len > max_ses_len) {
>>
>>             string_to_file (file_name, ses, -1);
>>
>>             ses := string_output ();
>>
>>         }
>>
>>     }
>>
>>     IF (length (ses)) {
>>
>>         string_to_file (file_name, ses, -1);
>>
>>     }
>>
>> }
>>
>>
>>
>> Dan Davis, Systems/Applications Architect (Contractor),
>>
>> Office of Computer and Communications Systems,
>>
>> National Library of Medicine, NIH
>>
>>
>>
>>
>>
>> *From:* Pantelis Natsiavas [mailto:natsia...@gmail.com]
>> *Sent:* Wednesday, August 17, 2016 4:36 AM
>> *To:* virtuoso-users <virtuoso-users@lists.sourceforge.net>
>> *Subject:* [Virtuoso-users] Deleting large number of triples
>>
>>
>>
>> Hi everybody.
>>
>>
>>
>> I am trying to delete a large number of triples of a very big graph. The
>> graph contains *217.609.545* triples and I want to delete all the
>> triples having a specific predicate (*64.884.016* triples).
>>
>>
>>
>> I am trying to do it through the isql-v command line interface, using the
>> command:
>>
>>
>>
>> SPARQL DEFINE sql:log-enable 3
>>
>> WITH <graph>
>>
>> DELETE { ?s <predicate> ?o }
>>
>> WHERE{ ?s <predicate> ?o }
>>
>>
>>
>> After some time (I don't know exactly how much) I got the error
>>
>>
>>
>> *** Error 42000: [Virtuoso Driver][Virtuoso Server]FRVEC: array in for
>> vectored over max vector length 2000000 > 1000000
>> at line 1 of Top-Level:
>>
>>
>>
>>  I checked the virtuoso.log and I see nothing related to the specific
>> error.
>>
>>
>>
>> I changed the parameters in virtuoso.ini:
>>
>> MaxQueryMem  = 8G          ; from 2G
>> VectorSize = 1000               ; not changed
>> MaxVectorSize = 2000000  ; from 1000000
>> AdjustVectorSize = 1           ; from 0
>>
>>
>>
>> I am not very confident about these changes in virtuoso settings, but
>> checking the http://docs.openlinksw.com/virtuoso/dbadm.html these
>> changes seemed the right thing to do.
>>
>>
>>
>> I restarted the VM and retried the whole process. After one hour, the
>> memory consumed by Virtuoso got around 100% and got an error:
>>
>> *** Error 08S01: [Virtuoso Driver]CL065: Lost connection to server
>>
>>
>>
>> Please note that from previous similar errors, I already have the
>> following virtuoso.ini settings:
>>
>> NumberOfBuffers = 1360000
>> MaxDirtyBuffers = 1000000
>> ThreadCleanupInterval    = 1
>> ResourcesCleanupInterval = 1
>>
>>
>>
>> My questions:
>>
>> 1. Is there any way to improve my query in order to facilitate its
>> processing? It is the first time I am doing a DELETE query and I am not
>> comfortable with it.
>>
>> 2. Is there any way to "split" the query so that it doesn't need to
>> handle all these triples at once?
>>
>> 3. Alternatively, is there any configuration change that might improve
>> memory handling in order to handle such big queries?
>>
>>
>>
>> Kind regards,
>>
>> Pantelis Natsiavas
>>
>>
>>
>>
>>
>
> ------------------------------------------------------------------------------
>
>
> _______________________________________________
> Virtuoso-users mailing 
> listVirtuoso-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/virtuoso-users
>
>
>
>
> ------------------------------------------------------------
> ------------------
>
> _______________________________________________
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>
>
------------------------------------------------------------------------------
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to