Daniel and Quentin thank you for your suggestions. I followed the path that Daniel suggested and executed the delete on the SQL level. Everything worked fine.
Thank you very much for your time. I really appreciate it. Kind regards, Pantelis Natsiavas 2016-08-18 4:51 GMT+03:00 Quentin <quentin@guidinghand.solutions>: > Hi Pantelis, > > If you want fine control over processing of results then Daniel's answer > is the way to go, PLSQL allows you to do basically anything with your > results. > > However it's not necessary in this instance I think. We can probably do > what you want with a subquery. Since you only want to process a select > number of records (subject to some memory constraint that you'll find out > by trial-and-error) then we can do this with a subquery in SPARQL. > Additionally, select statements have (or used to have) a hard limit of > 10,000 results that might complicate the process. > > ------------------------------ > > SPARQL DEFINE sql:log-enable 3 > > WITH GRAPH <TargetGraph> > DELETE { > ?S ?P ?O . > } WHERE { > { SELECT ?S ?P ?O > { ?S ?P ?O . > FILTER ( ?P = <TargetPredicate> ) > } LIMIT 10 } > } > > ------------------------------ > > This will delete 10 matching records at a time from the target graph, > filtering on the target predicate. > > Upscale as required and rerun as many times as required. > > If you want to do it more elegantly, put it in a PLSQL function similar to > Daniel's below and just loop it together with a select query and stop the > function when the select query finds zero count of matching predicates. > > ------------------------------ > > SELECT PCount FROM ( > SPARQL > SELECT COUNT(*) as ?PCount FROM <TargetGraph> > { > ?S ?P ?O . > FILTER ( ?P = <TargetPredicate> ) > } ) AS PCount... > > ------------------------------ > > Make that a SELECT INTO and store the result, loop with the delete until > the result is zero. > > If you want to track progress, write the count at each iteration to the > debug log (http://docs.openlinksw.com/virtuoso/fn_dbg_printf/) and > restart Virtuoso in debug&foreground mode to display the debug statements. > > That will eventually clean up your table. > > > > Regards, > > Quentin. > > Guiding Hand Solutions > > > > On 2016-08-17 22:54, Pantelis Natsiavas wrote: > > Thank you for your advice Daniel. > > Actually I want to delete only the statements containing the specific > predicate. I don't want to delete all the triples containing the subject of > the predicate. As I have already said, I don't feel comfortable with the > DELETE queries. > Is my query wrong? Could you suggest the correct query? > > Kind regards, > Pantelis Natsiavas > > 2016-08-17 17:19 GMT+03:00 Davis, Daniel (NIH/NLM) [C] < > daniel.da...@nih.gov>: > >> So, this has nothing to do with the large vector size, but just to be >> sure the SPARQL is correct - do you wish to delete the subjects (and all >> their triples) where the subject has the predicate, or just the predicate >> itself? >> >> >> >> As far as avoiding the maximum vector size, I think your best approach is >> to limit the number of matches and repeat the query until there are no >> results, maybe with a count query in-between. I have had to do similar >> sorts of work-arounds to avoid the maximum # of results and maximum size of >> string issues. For instance, my first attempts to export large NTriples >> files after processing failed due to these issues. You may be able to >> adapt the code below, but I think that a repeated deleted query limited to >> a # of triples will be best in your case. >> >> >> >> Anyway, the code: >> >> >> >> CREATE PROCEDURE meshrdf_export(in graph_uri varchar, in file_name >> varchar) { >> >> DECLARE banner any; >> >> DECLARE env, ses any; >> >> DECLARE ses_len, max_ses_len any; >> >> >> >> SET isolation = 'uncommitted'; >> >> >> >> max_ses_len := 10000000; >> >> >> >> -- >> >> -- Truncate file and write a comment line indicating the graph and >> datetime of export. >> >> -- >> >> --no_c_escapes- >> >> banner := sprintf('# <%s> exported at %s\n', graph_uri, >> datestring(now())); >> >> string_to_file (file_name, banner, -2); >> >> >> >> env := vector (0, 0, 0); >> >> ses := string_output (); >> >> >> >> FOR (SELECT * FROM (SPARQL >> >> define input:storage "" >> >> SELECT ?s ?p ?o WHERE { >> >> GRAPH `iri(?:graph_uri)` { >> >> ?s ?p ?o >> >> } >> >> } ORDER BY ?s ?p ?o) AS sub OPTION (loop)) DO { >> >> http_nt_triple (env, "s", "p", "o", ses); >> >> ses_len := length (ses); >> >> >> >> IF (ses_len > max_ses_len) { >> >> string_to_file (file_name, ses, -1); >> >> ses := string_output (); >> >> } >> >> } >> >> IF (length (ses)) { >> >> string_to_file (file_name, ses, -1); >> >> } >> >> } >> >> >> >> Dan Davis, Systems/Applications Architect (Contractor), >> >> Office of Computer and Communications Systems, >> >> National Library of Medicine, NIH >> >> >> >> >> >> *From:* Pantelis Natsiavas [mailto:natsia...@gmail.com] >> *Sent:* Wednesday, August 17, 2016 4:36 AM >> *To:* virtuoso-users <virtuoso-users@lists.sourceforge.net> >> *Subject:* [Virtuoso-users] Deleting large number of triples >> >> >> >> Hi everybody. >> >> >> >> I am trying to delete a large number of triples of a very big graph. The >> graph contains *217.609.545* triples and I want to delete all the >> triples having a specific predicate (*64.884.016* triples). >> >> >> >> I am trying to do it through the isql-v command line interface, using the >> command: >> >> >> >> SPARQL DEFINE sql:log-enable 3 >> >> WITH <graph> >> >> DELETE { ?s <predicate> ?o } >> >> WHERE{ ?s <predicate> ?o } >> >> >> >> After some time (I don't know exactly how much) I got the error >> >> >> >> *** Error 42000: [Virtuoso Driver][Virtuoso Server]FRVEC: array in for >> vectored over max vector length 2000000 > 1000000 >> at line 1 of Top-Level: >> >> >> >> I checked the virtuoso.log and I see nothing related to the specific >> error. >> >> >> >> I changed the parameters in virtuoso.ini: >> >> MaxQueryMem = 8G ; from 2G >> VectorSize = 1000 ; not changed >> MaxVectorSize = 2000000 ; from 1000000 >> AdjustVectorSize = 1 ; from 0 >> >> >> >> I am not very confident about these changes in virtuoso settings, but >> checking the http://docs.openlinksw.com/virtuoso/dbadm.html these >> changes seemed the right thing to do. >> >> >> >> I restarted the VM and retried the whole process. After one hour, the >> memory consumed by Virtuoso got around 100% and got an error: >> >> *** Error 08S01: [Virtuoso Driver]CL065: Lost connection to server >> >> >> >> Please note that from previous similar errors, I already have the >> following virtuoso.ini settings: >> >> NumberOfBuffers = 1360000 >> MaxDirtyBuffers = 1000000 >> ThreadCleanupInterval = 1 >> ResourcesCleanupInterval = 1 >> >> >> >> My questions: >> >> 1. Is there any way to improve my query in order to facilitate its >> processing? It is the first time I am doing a DELETE query and I am not >> comfortable with it. >> >> 2. Is there any way to "split" the query so that it doesn't need to >> handle all these triples at once? >> >> 3. Alternatively, is there any configuration change that might improve >> memory handling in order to handle such big queries? >> >> >> >> Kind regards, >> >> Pantelis Natsiavas >> >> >> >> >> > > ------------------------------------------------------------------------------ > > > _______________________________________________ > Virtuoso-users mailing > listVirtuoso-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/virtuoso-users > > > > > ------------------------------------------------------------ > ------------------ > > _______________________________________________ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users > >
------------------------------------------------------------------------------
_______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users