Re: [Virtuoso-users] Deleting large number of triples

2016-09-01 Thread Davis, Daniel (NIH/NLM) [C]
The query seems fine.   The reason it doesn’t work is that Virtuoso first loads 
the triples that match into a vector (which has a limited size), and then 
deletes those that match the delete clause.   In this case, because your 
problem seems simple, you can just drop to SQL to avoid that:

LOG_ENABLE(3,0);
DELETE FROM RDF_QUAD
   WHERE g = rdf_make_iid_of_qname(graph_uri)
AND p = rdf_make_iid_of_qname(predicate_uri);

If you had wanted to delete all the triples for subjects that match, then 
SPARQL delete offers some powerful abstractions, but Virtuoso’s vector size 
would still get in your way.   It would be nice if the buffering were 
automatic, but I’m not sure how that would work given the delete’s capability 
of being an entirely different set of triples related to the others – there are 
even BIND and other SPARQL capabilities that OpenLink’s engineers would have to 
deal with.

From: Pantelis Natsiavas [mailto:natsia...@gmail.com]
Sent: Wednesday, August 17, 2016 10:55 AM
To: Davis, Daniel (NIH/NLM) [C] 
Cc: virtuoso-users 
Subject: Re: [Virtuoso-users] Deleting large number of triples

Thank you for your advice Daniel.

Actually I want to delete only the statements containing the specific 
predicate. I don't want to delete all the triples containing the subject of the 
predicate. As I have already said, I don't feel comfortable with the DELETE 
queries.
Is my query wrong? Could you suggest the correct query?

Kind regards,
Pantelis Natsiavas

2016-08-17 17:19 GMT+03:00 Davis, Daniel (NIH/NLM) [C] 
mailto:daniel.da...@nih.gov>>:
So, this has nothing to do with the large vector size, but just to be sure the 
SPARQL is correct - do you wish to delete the subjects (and all their triples) 
where the subject has the predicate, or just the predicate itself?

As far as avoiding the maximum vector size, I think your best approach is to 
limit the number of matches and repeat the query until there are no results, 
maybe with a count query in-between.   I have had to do similar sorts of 
work-arounds to avoid the maximum # of results and maximum size of string 
issues.   For instance, my first attempts to export large NTriples files after 
processing failed due to these issues.   You may be able to adapt the code 
below, but I think that a repeated deleted query limited to a # of triples will 
be best in your case.

Anyway, the code:

CREATE PROCEDURE meshrdf_export(in graph_uri varchar, in file_name varchar) {
DECLARE banner any;
DECLARE env, ses any;
DECLARE ses_len, max_ses_len any;

SET isolation = 'uncommitted';

max_ses_len := 1000;

--
-- Truncate file and write a comment line indicating the graph and datetime 
of export.
--
--no_c_escapes-
banner := sprintf('# <%s> exported at %s\n', graph_uri, datestring(now()));
string_to_file (file_name, banner, -2);

env := vector (0, 0, 0);
ses := string_output ();

FOR (SELECT * FROM (SPARQL
define input:storage ""
SELECT ?s ?p ?o WHERE {
  GRAPH `iri(?:graph_uri)` {
?s ?p ?o
  }
} ORDER BY ?s ?p ?o) AS sub OPTION (loop)) DO {
http_nt_triple (env, "s", "p", "o", ses);
ses_len := length (ses);

IF (ses_len > max_ses_len) {
string_to_file (file_name, ses, -1);
ses := string_output ();
}
}
IF (length (ses)) {
string_to_file (file_name, ses, -1);
}
}

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH


From: Pantelis Natsiavas 
[mailto:natsia...@gmail.com<mailto:natsia...@gmail.com>]
Sent: Wednesday, August 17, 2016 4:36 AM
To: virtuoso-users 
mailto:virtuoso-users@lists.sourceforge.net>>
Subject: [Virtuoso-users] Deleting large number of triples

Hi everybody.

I am trying to delete a large number of triples of a very big graph. The graph 
contains 217.609.545 triples and I want to delete all the triples having a 
specific predicate (64.884.016 triples).

I am trying to do it through the isql-v command line interface, using the 
command:

SPARQL DEFINE sql:log-enable 3
WITH 
DELETE { ?s  ?o }
WHERE{ ?s  ?o }

After some time (I don't know exactly how much) I got the error

*** Error 42000: [Virtuoso Driver][Virtuoso Server]FRVEC: array in for vectored 
over max vector length 200 > 100
at line 1 of Top-Level:

 I checked the virtuoso.log and I see nothing related to the specific error.

I changed the parameters in virtuoso.ini:
MaxQueryMem  = 8G  ; from 2G
VectorSize = 1000   ; not changed
MaxVectorSize = 200  ; from 100
AdjustVectorSize = 1   ; from 0

I am not very confident about these changes in virtuoso settings, but checking 
the htt

Re: [Virtuoso-users] Deleting large number of triples

2016-08-19 Thread Pantelis Natsiavas
Daniel and Quentin thank you for your suggestions.

I followed the path that Daniel suggested and executed the delete on the
SQL level. Everything worked fine.

Thank you very much for your time. I really appreciate it.

Kind regards,
Pantelis Natsiavas

2016-08-18 4:51 GMT+03:00 Quentin :

> Hi Pantelis,
>
> If you want fine control over processing of results then Daniel's answer
> is the way to go, PLSQL allows you to do basically anything with your
> results.
>
> However it's not necessary in this instance I think.  We can probably do
> what you want with a subquery.  Since you only want to process a select
> number of records (subject to some memory constraint that you'll find out
> by trial-and-error) then we can do this with a subquery in SPARQL.
> Additionally, select statements have (or used to have) a hard limit of
> 10,000 results that might complicate the process.
>
> --
>
> SPARQL DEFINE sql:log-enable 3
>
> WITH GRAPH 
> DELETE {
> ?S ?P ?O .
> } WHERE {
> { SELECT ?S ?P ?O
> { ?S ?P ?O .
> FILTER ( ?P =  )
> } LIMIT 10 }
> }
>
> --
>
> This will delete 10 matching records at a time from the target graph,
> filtering on the target predicate.
>
> Upscale as required and rerun as many times as required.
>
> If you want to do it more elegantly, put it in a PLSQL function similar to
> Daniel's below and just loop it together with a select query and stop the
> function when the select query finds zero count of matching predicates.
>
> --
>
> SELECT PCount FROM (
> SPARQL
> SELECT COUNT(*) as ?PCount FROM 
> {
> ?S ?P ?O .
> FILTER ( ?P =  )
> } ) AS PCount...
>
> --
>
> Make that a SELECT INTO and store the result, loop with the delete until
> the result is zero.
>
> If you want to track progress, write the count at each iteration to the
> debug log (http://docs.openlinksw.com/virtuoso/fn_dbg_printf/) and
> restart Virtuoso in debug&foreground mode to display the debug statements.
>
> That will eventually clean up your table.
>
>
>
> Regards,
>
> Quentin.
>
> Guiding Hand Solutions
>
>
>
> On 2016-08-17 22:54, Pantelis Natsiavas wrote:
>
> Thank you for your advice Daniel.
>
> Actually I want to delete only the statements containing the specific
> predicate. I don't want to delete all the triples containing the subject of
> the predicate. As I have already said, I don't feel comfortable with the
> DELETE queries.
> Is my query wrong? Could you suggest the correct query?
>
> Kind regards,
> Pantelis Natsiavas
>
> 2016-08-17 17:19 GMT+03:00 Davis, Daniel (NIH/NLM) [C] <
> daniel.da...@nih.gov>:
>
>> So, this has nothing to do with the large vector size, but just to be
>> sure the SPARQL is correct - do you wish to delete the subjects (and all
>> their triples) where the subject has the predicate, or just the predicate
>> itself?
>>
>>
>>
>> As far as avoiding the maximum vector size, I think your best approach is
>> to limit the number of matches and repeat the query until there are no
>> results, maybe with a count query in-between.   I have had to do similar
>> sorts of work-arounds to avoid the maximum # of results and maximum size of
>> string issues.   For instance, my first attempts to export large NTriples
>> files after processing failed due to these issues.   You may be able to
>> adapt the code below, but I think that a repeated deleted query limited to
>> a # of triples will be best in your case.
>>
>>
>>
>> Anyway, the code:
>>
>>
>>
>> CREATE PROCEDURE meshrdf_export(in graph_uri varchar, in file_name
>> varchar) {
>>
>> DECLARE banner any;
>>
>> DECLARE env, ses any;
>>
>> DECLARE ses_len, max_ses_len any;
>>
>>
>>
>> SET isolation = 'uncommitted';
>>
>>
>>
>> max_ses_len := 1000;
>>
>>
>>
>> --
>>
>> -- Truncate file and write a comment line indicating the graph and
>> datetime of export.
>>
>> --
>>
>> --no_c_escapes-
>>
>> banner := sprintf('# <%s> exported at %s\n', graph_uri,
>> datestring(now()));
>>
>> string_to_file (file_name, banner, -2);
>>
>>
>>
>> env := vector (0, 0, 0);
>>
>> ses := string_output ();
>>
>>
>>
>> FOR (SELECT * FROM (SPARQL
>>
>> define input:storage ""
>>
>> SELECT ?s ?p ?o WHERE {
>>
>>   GRAPH `iri(?:graph_uri)` {
>>
>> ?s ?p ?o
>>
>>   }
>>
>> } ORDER BY ?s ?p ?o) AS sub OPTION (loop)) DO {
>>
>> http_nt_triple (env, "s", "p", "o", ses);
>>
>> ses_len := length (ses);
>>
>>
>>
>> IF (ses_len > max_ses_len) {
>>
>> string_to_file (file_name, ses, -1);
>>
>> ses := string_output ();
>>
>> }
>>
>> }
>>
>> IF (length (ses)) {
>>
>> string_to_file (file_name, ses, -1);
>>
>> }
>>
>> }
>>
>>
>>
>> Dan Davis, Systems/Applications Architect (Contractor),
>>
>> Office of Co

Re: [Virtuoso-users] Deleting large number of triples

2016-08-17 Thread Quentin
 

Hi Pantelis, 

If you want fine control over processing of results then Daniel's answer
is the way to go, PLSQL allows you to do basically anything with your
results. 

However it's not necessary in this instance I think. We can probably do
what you want with a subquery. Since you only want to process a select
number of records (subject to some memory constraint that you'll find
out by trial-and-error) then we can do this with a subquery in SPARQL.
Additionally, select statements have (or used to have) a hard limit of
10,000 results that might complicate the process. 

-- 

SPARQL DEFINE sql:log-enable 3 

WITH GRAPH 
DELETE {
?S ?P ?O .
} WHERE {
{ SELECT ?S ?P ?O 
{ ?S ?P ?O .
FILTER ( ?P =  )
} LIMIT 10 }
} 

-- 

This will delete 10 matching records at a time from the target graph,
filtering on the target predicate. 

Upscale as required and rerun as many times as required. 

If you want to do it more elegantly, put it in a PLSQL function similar
to Daniel's below and just loop it together with a select query and stop
the function when the select query finds zero count of matching
predicates. 

-- 

SELECT PCount FROM (
SPARQL
SELECT COUNT(*) as ?PCount FROM 
{
?S ?P ?O .
FILTER ( ?P =  )
} ) AS PCount... 

-- 

Make that a SELECT INTO and store the result, loop with the delete until
the result is zero. 

If you want to track progress, write the count at each iteration to the
debug log (http://docs.openlinksw.com/virtuoso/fn_dbg_printf/) and
restart Virtuoso in debug&foreground mode to display the debug
statements. 

That will eventually clean up your table. 

Regards, 

Quentin. 

Guiding Hand Solutions 

On 2016-08-17 22:54, Pantelis Natsiavas wrote: 

> Thank you for your advice Daniel. 
> 
> Actually I want to delete only the statements containing the specific 
> predicate. I don't want to delete all the triples containing the subject of 
> the predicate. As I have already said, I don't feel comfortable with the 
> DELETE queries. 
> Is my query wrong? Could you suggest the correct query? 
> 
> Kind regards, 
> Pantelis Natsiavas 
> 
> 2016-08-17 17:19 GMT+03:00 Davis, Daniel (NIH/NLM) [C] :
> 
> So, this has nothing to do with the large vector size, but just to be sure 
> the SPARQL is correct - do you wish to delete the subjects (and all their 
> triples) where the subject has the predicate, or just the predicate itself? 
> 
> As far as avoiding the maximum vector size, I think your best approach is to 
> limit the number of matches and repeat the query until there are no results, 
> maybe with a count query in-between. I have had to do similar sorts of 
> work-arounds to avoid the maximum # of results and maximum size of string 
> issues. For instance, my first attempts to export large NTriples files after 
> processing failed due to these issues. You may be able to adapt the code 
> below, but I think that a repeated deleted query limited to a # of triples 
> will be best in your case. 
> 
> Anyway, the code: 
> 
> CREATE PROCEDURE meshrdf_export(in graph_uri varchar, in file_name varchar) { 
> 
> DECLARE banner any; 
> 
> DECLARE env, ses any; 
> 
> DECLARE ses_len, max_ses_len any; 
> 
> SET isolation = 'uncommitted'; 
> 
> max_ses_len := 1000; 
> 
> -- 
> 
> -- Truncate file and write a comment line indicating the graph and datetime 
> of export. 
> 
> -- 
> 
> --no_c_escapes- 
> 
> banner := sprintf('# <%s> exported at %sn', graph_uri, datestring(now())); 
> 
> string_to_file (file_name, banner, -2); 
> 
> env := vector (0, 0, 0); 
> 
> ses := string_output (); 
> 
> FOR (SELECT * FROM (SPARQL 
> 
> define input:storage "" 
> 
> SELECT ?s ?p ?o WHERE { 
> 
> GRAPH `iri(?:graph_uri)` { 
> 
> ?s ?p ?o 
> 
> } 
> 
> } ORDER BY ?s ?p ?o) AS sub OPTION (loop)) DO { 
> 
> http_nt_triple (env, "s", "p", "o", ses); 
> 
> ses_len := length (ses); 
> 
> IF (ses_len > max_ses_len) { 
> 
> string_to_file (file_name, ses, -1); 
> 
> ses := string_output (); 
> 
> } 
> 
> } 
> 
> IF (length (ses)) { 
> 
> string_to_file (file_name, ses, -1); 
> 
> } 
> 
> } 
> 
> Dan Davis, Systems/Applications Architect (Contractor), 
> 
> Office of Computer and Communications Systems, 
> 
> National Library of Medicine, NIH 
> 
> FROM: Pantelis Natsiavas [mailto:natsia...@gmail.com] 
> SENT: Wednesday, August 17, 2016 4:36 AM
> TO: virtuoso-users 
> SUBJECT: [Virtuoso-users] Deleting large number of triples 
> 
> Hi everybody. 
> 
> I am trying to delete a large number of triples of a very big graph. The 
> graph contains 217.609.545 triples and I want to delete all the triples 
> having a specific predicate (64.884.016 triples). 
> 
> I am trying to do it through the isql-v command line interface, using the 
> command: 
> 
> SPARQL DEFINE sql:log-enable 3 
> 
> WITH  
> 
> DELETE { ?s  ?o } 
> 
> WHERE{ ?s  ?o } 
> 
> After some time (I don't know exactly how much) I got the error 
> 
> *** Error 4200

Re: [Virtuoso-users] Deleting large number of triples

2016-08-17 Thread Pantelis Natsiavas
Thank you for your advice Daniel.

Actually I want to delete only the statements containing the specific
predicate. I don't want to delete all the triples containing the subject of
the predicate. As I have already said, I don't feel comfortable with the
DELETE queries.
Is my query wrong? Could you suggest the correct query?

Kind regards,
Pantelis Natsiavas

2016-08-17 17:19 GMT+03:00 Davis, Daniel (NIH/NLM) [C] :

> So, this has nothing to do with the large vector size, but just to be sure
> the SPARQL is correct - do you wish to delete the subjects (and all their
> triples) where the subject has the predicate, or just the predicate itself?
>
>
>
> As far as avoiding the maximum vector size, I think your best approach is
> to limit the number of matches and repeat the query until there are no
> results, maybe with a count query in-between.   I have had to do similar
> sorts of work-arounds to avoid the maximum # of results and maximum size of
> string issues.   For instance, my first attempts to export large NTriples
> files after processing failed due to these issues.   You may be able to
> adapt the code below, but I think that a repeated deleted query limited to
> a # of triples will be best in your case.
>
>
>
> Anyway, the code:
>
>
>
> CREATE PROCEDURE meshrdf_export(in graph_uri varchar, in file_name
> varchar) {
>
> DECLARE banner any;
>
> DECLARE env, ses any;
>
> DECLARE ses_len, max_ses_len any;
>
>
>
> SET isolation = 'uncommitted';
>
>
>
> max_ses_len := 1000;
>
>
>
> --
>
> -- Truncate file and write a comment line indicating the graph and
> datetime of export.
>
> --
>
> --no_c_escapes-
>
> banner := sprintf('# <%s> exported at %s\n', graph_uri,
> datestring(now()));
>
> string_to_file (file_name, banner, -2);
>
>
>
> env := vector (0, 0, 0);
>
> ses := string_output ();
>
>
>
> FOR (SELECT * FROM (SPARQL
>
> define input:storage ""
>
> SELECT ?s ?p ?o WHERE {
>
>   GRAPH `iri(?:graph_uri)` {
>
> ?s ?p ?o
>
>   }
>
> } ORDER BY ?s ?p ?o) AS sub OPTION (loop)) DO {
>
> http_nt_triple (env, "s", "p", "o", ses);
>
> ses_len := length (ses);
>
>
>
> IF (ses_len > max_ses_len) {
>
> string_to_file (file_name, ses, -1);
>
> ses := string_output ();
>
> }
>
> }
>
> IF (length (ses)) {
>
> string_to_file (file_name, ses, -1);
>
> }
>
> }
>
>
>
> Dan Davis, Systems/Applications Architect (Contractor),
>
> Office of Computer and Communications Systems,
>
> National Library of Medicine, NIH
>
>
>
>
>
> *From:* Pantelis Natsiavas [mailto:natsia...@gmail.com]
> *Sent:* Wednesday, August 17, 2016 4:36 AM
> *To:* virtuoso-users 
> *Subject:* [Virtuoso-users] Deleting large number of triples
>
>
>
> Hi everybody.
>
>
>
> I am trying to delete a large number of triples of a very big graph. The
> graph contains *217.609.545* triples and I want to delete all the triples
> having a specific predicate (*64.884.016* triples).
>
>
>
> I am trying to do it through the isql-v command line interface, using the
> command:
>
>
>
> SPARQL DEFINE sql:log-enable 3
>
> WITH 
>
> DELETE { ?s  ?o }
>
> WHERE{ ?s  ?o }
>
>
>
> After some time (I don't know exactly how much) I got the error
>
>
>
> *** Error 42000: [Virtuoso Driver][Virtuoso Server]FRVEC: array in for
> vectored over max vector length 200 > 100
> at line 1 of Top-Level:
>
>
>
>  I checked the virtuoso.log and I see nothing related to the specific
> error.
>
>
>
> I changed the parameters in virtuoso.ini:
>
> MaxQueryMem  = 8G  ; from 2G
> VectorSize = 1000   ; not changed
> MaxVectorSize = 200  ; from 100
> AdjustVectorSize = 1   ; from 0
>
>
>
> I am not very confident about these changes in virtuoso settings, but
> checking the http://docs.openlinksw.com/virtuoso/dbadm.html these changes
> seemed the right thing to do.
>
>
>
> I restarted the VM and retried the whole process. After one hour, the
> memory consumed by Virtuoso got around 100% and got an error:
>
> *** Error 08S01: [Virtuoso Driver]CL065: Lost connection to server
>
>
>
> Please note that from previous similar errors, I already have the
> following virtuoso.ini settings:
>
> NumberOfBuffers = 136
> MaxDirtyBuffers = 100
> ThreadCleanupInterval= 1
> ResourcesCleanupInterval = 1
>
>
>
> My questions:
>
> 1. Is there any way to improve my query in order to facilitate its
> processing? It is the first time I am doing a DELETE query and I am not
> comfortable with it.
>
> 2. Is there any way to "split" the query so that it doesn't need to handle
> all these triples at once?
>
> 3. Alternatively, is there any configuration change that might improve
> memory handling in order to handle such big queries?
>
>
>
> Kind regards,
>
> Pantelis Natsiavas
>
>
>
>
>
-

Re: [Virtuoso-users] Deleting large number of triples

2016-08-17 Thread Davis, Daniel (NIH/NLM) [C]
So, this has nothing to do with the large vector size, but just to be sure the 
SPARQL is correct - do you wish to delete the subjects (and all their triples) 
where the subject has the predicate, or just the predicate itself?

As far as avoiding the maximum vector size, I think your best approach is to 
limit the number of matches and repeat the query until there are no results, 
maybe with a count query in-between.   I have had to do similar sorts of 
work-arounds to avoid the maximum # of results and maximum size of string 
issues.   For instance, my first attempts to export large NTriples files after 
processing failed due to these issues.   You may be able to adapt the code 
below, but I think that a repeated deleted query limited to a # of triples will 
be best in your case.

Anyway, the code:

CREATE PROCEDURE meshrdf_export(in graph_uri varchar, in file_name varchar) {
DECLARE banner any;
DECLARE env, ses any;
DECLARE ses_len, max_ses_len any;

SET isolation = 'uncommitted';

max_ses_len := 1000;

--
-- Truncate file and write a comment line indicating the graph and datetime 
of export.
--
--no_c_escapes-
banner := sprintf('# <%s> exported at %s\n', graph_uri, datestring(now()));
string_to_file (file_name, banner, -2);

env := vector (0, 0, 0);
ses := string_output ();

FOR (SELECT * FROM (SPARQL
define input:storage ""
SELECT ?s ?p ?o WHERE {
  GRAPH `iri(?:graph_uri)` {
?s ?p ?o
  }
} ORDER BY ?s ?p ?o) AS sub OPTION (loop)) DO {
http_nt_triple (env, "s", "p", "o", ses);
ses_len := length (ses);

IF (ses_len > max_ses_len) {
string_to_file (file_name, ses, -1);
ses := string_output ();
}
}
IF (length (ses)) {
string_to_file (file_name, ses, -1);
}
}

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH


From: Pantelis Natsiavas [mailto:natsia...@gmail.com]
Sent: Wednesday, August 17, 2016 4:36 AM
To: virtuoso-users 
Subject: [Virtuoso-users] Deleting large number of triples

Hi everybody.

I am trying to delete a large number of triples of a very big graph. The graph 
contains 217.609.545 triples and I want to delete all the triples having a 
specific predicate (64.884.016 triples).

I am trying to do it through the isql-v command line interface, using the 
command:

SPARQL DEFINE sql:log-enable 3
WITH 
DELETE { ?s  ?o }
WHERE{ ?s  ?o }

After some time (I don't know exactly how much) I got the error

*** Error 42000: [Virtuoso Driver][Virtuoso Server]FRVEC: array in for vectored 
over max vector length 200 > 100
at line 1 of Top-Level:

 I checked the virtuoso.log and I see nothing related to the specific error.

I changed the parameters in virtuoso.ini:
MaxQueryMem  = 8G  ; from 2G
VectorSize = 1000   ; not changed
MaxVectorSize = 200  ; from 100
AdjustVectorSize = 1   ; from 0

I am not very confident about these changes in virtuoso settings, but checking 
the http://docs.openlinksw.com/virtuoso/dbadm.html these changes seemed the 
right thing to do.

I restarted the VM and retried the whole process. After one hour, the memory 
consumed by Virtuoso got around 100% and got an error:
*** Error 08S01: [Virtuoso Driver]CL065: Lost connection to server

Please note that from previous similar errors, I already have the following 
virtuoso.ini settings:
NumberOfBuffers = 136
MaxDirtyBuffers = 100
ThreadCleanupInterval= 1
ResourcesCleanupInterval = 1

My questions:
1. Is there any way to improve my query in order to facilitate its processing? 
It is the first time I am doing a DELETE query and I am not comfortable with it.
2. Is there any way to "split" the query so that it doesn't need to handle all 
these triples at once?
3. Alternatively, is there any configuration change that might improve memory 
handling in order to handle such big queries?

Kind regards,
Pantelis Natsiavas


--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users