[Virtuoso-users] read only db?

2017-09-20 Thread Jörn Hees
Hey,

is there a way to run virtuoso in read-only mode, i.e., so the virtuoso.db file 
doesn't get modified but i can SPARQL against it?

I'm asking cause this would incredibly ease scientific use cases with docker 
etc. and allow us to package whole ready-to-run containers e.g. for various 
DBpedia releases.

Cheers,
Jörn


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] obtaining a copy/dump of an online SPARQL endpoint in a nice manner

2016-04-06 Thread Jörn Hees
On 31 Mar 2016, at 22:10, Kingsley Idehen  wrote:
> 
> How are you arriving at data devoid or metadata about its origins?

Link rot... Endpoint is still working, dumps aren't...


> You would be better served, ultimately, instantiating a dedicated
> Virtuoso instance in the cloud for your specific needs. This instance
> could load datasets from wherever, using some of the existing endpoints
> (DBpedia and others) as a mechanism for exposing provenance data etc..

How would i do that? I'm not sure i fully understand... We have a Virtuoso 
instance in our local network for our working group already.
I usually manually load it with the dumps of datasets i want to run my 
algorithms against so i don't impede the public endpoints operations / get 
blocked for doing too many requests.


> There is no nice way of trying to dump all the data from an existing
> SPARQL endpoint.

If there is no nice way, is there any way at all?

I mean I can't be the first one who is interested in all triples a graph on a 
remote endpoint contains (even if that graph is big).


Best,
Jörn


--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)

2015-11-30 Thread Jörn Hees
Hi,

any ideas on this?

> On 23 Nov 2015, at 04:01, Jörn Hees <j_h...@cs.uni-kl.de> wrote:
> 
>>> 
>>> Reading the doc page, i have two remaining questions:
>>> 
>>> After a normal `rdf_loader_run()`, would a 
>>> `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete 
>>> full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in 
>>> those cases and will otherwise never arrive at a complete free-text index 
>>> (not even after the background tasks finished?)?
>> 
>> [Hugh] The scheduler will run `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` so you 
>> can wait for it to run or run it manually itself.
> 
> 
> Yes, that's what i understood, but in the 
> http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext
>  there's this remark:
> 
>> One problem related to free-text indexing of DB.DBA.RDF_QUAD is that some 
>> applications (e.g. those that import billions of triples) may set off 
>> triggers. This will make free-text index data incomplete. Calling procedure 
>> DB.DBA.RDF_OBJ_FT_RECOVER () will insert all missing free-text index items 
>> by dropping and re-inserting every existing free-text index rule.
> 
> So i'm asking: is `rdf_loader_run();` one of those "applications" which 
> deactivate some triggers leading to an incomplete full-text index and need 
> `DB.DBA.RDF_OBJ_FT_RECOVER();` to be called?

Or is it enough to wait or call `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();`?


Cheers,

Jörn


--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] traps: virtuoso-t[54165] general protection ip:931aa7 sp:7f9b34d2eda0 error:0 in virtuoso-t[400000+c00000]

2015-11-30 Thread Jörn Hees
Hi,

i'm running a machine learning algorithm against a local Virtuoso 7.2.1 SPARQL 
endpoint containing ~ 6 G triples...

After several weeks of using it (including reboots etc.) tonight it crashed 
after > 6h of heavy load with no further information in any of the logs than 
this in dmesg:

> traps: virtuoso-t[54165] general protection ip:931aa7 sp:7f9b34d2eda0 error:0 
> in virtuoso-t[40+c0]


It seems like it tried to access some memory it wasn't meant to access and the 
kernel simply killed it for that:
https://en.wikipedia.org/wiki/General_protection_fault

Did anyone ever experienced something similar?
Any reasonable way i could maybe get some more meaningful log if that happens 
again?

Best,
Jörn


--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)

2015-11-22 Thread Jörn Hees
Hi Hugh,

thanks again for your replies...

> On 22 Nov 2015, at 01:46, Hugh Williams  wrote:
> 
>> What puzzles me is that after import and several checkpoints and restarts, 
>> just leaving the DB idle without any queries (see below) it seems to become 
>> busy.
>> I guess it does some kind of "re-organization" and i'd mostly like to find 
>> out how i can tell it "do it now, take all resources you want, don't care if 
>> anyone is waiting, admin override, full speed ;)".
>> That would allow me to then have that static state of the DB which i can 
>> back-up and replay if things go wrong or someone wants an old version, 
>> leaving us with "ready to use" backups, and not such that first start some 
>> lengthy "re-organization after mass import".
>> 
>> The mentioned "re-organization state" now seems to be over after leaving the 
>> DB switched on and idle for the last couple of days.
> 
> [Hugh] Does your database have Full Text indexing enabled which would is  a 
> scheduled background task that would take time to complete on a newly loaded 
> large database like yours, see:
> 
>   
> http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext

I really think that this could be it, as by default there seems to be an "all" 
index.

Reading the doc page, i have two remaining questions:

After a normal `rdf_loader_run()`, would a 
`DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete 
full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in those 
cases and will otherwise never arrive at a complete free-text index (not even 
after the background tasks finished?)?
If i have to, a mention of this around 
http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloadinglod 
would be nice.

I ran `DB.DBA.RDF_OBJ_FT_RECOVER();` on a small instance with just the DBpedia 
core (~ 430 M triples) and it seems to only use 2 - 3 CPUs with very little IO. 
The whole importing of that dataset only took 1:30 hours, but the full-text 
indexing is still running after 3 hours now... Is there any way to go full 
speed at the cost of locking the whole DB or something?

Cheers,
Jörn




--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551=/4140
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)

2015-11-22 Thread Jörn Hees
Hi,

> On 23 Nov 2015, at 03:36, Hugh Williams  wrote:
> 
>>> 
>>> http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext
>> 
>> I really think that this could be it, as by default there seems to be an 
>> "all" index.
> 
> [Hugh] If you installed the Virtuoso Faceted Browser then the FT index would 
> be enabled and run as a scheduled job.

It's a "vanilla" virtuoso-server as installed from the .deb packages of the 
7.2.1 sources, the only VAD package installed is the conductor.


>> 
>> Reading the doc page, i have two remaining questions:
>> 
>> After a normal `rdf_loader_run()`, would a 
>> `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete 
>> full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in those 
>> cases and will otherwise never arrive at a complete free-text index (not 
>> even after the background tasks finished?)?
> 
> [Hugh] The scheduler will run `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` so you 
> can wait for it to run or run it manually itself.


Yes, that's what i understood, but in the 
http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext 
there's this remark:

> One problem related to free-text indexing of DB.DBA.RDF_QUAD is that some 
> applications (e.g. those that import billions of triples) may set off 
> triggers. This will make free-text index data incomplete. Calling procedure 
> DB.DBA.RDF_OBJ_FT_RECOVER () will insert all missing free-text index items by 
> dropping and re-inserting every existing free-text index rule.

So i'm asking: is `rdf_loader_run();` one of those "applications" which 
deactivate some triggers leading to an incomplete full-text index and need 
`DB.DBA.RDF_OBJ_FT_RECOVER();` to be called?

Or are the only two interesting alternatives for me waiting or calling 
`DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();`?


>> If i have to, a mention of this around 
>> http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloadinglod
>>  would be nice.
>> 
>> I ran `DB.DBA.RDF_OBJ_FT_RECOVER();` on a small instance with just the 
>> DBpedia core (~ 430 M triples) and it seems to only use 2 - 3 CPUs with very 
>> little IO. The whole importing of that dataset only took 1:30 hours, but the 
>> full-text indexing is still running after 3 hours now... Is there any way to 
>> go full speed at the cost of locking the whole DB or something?
> 
> [Hugh] Will have to check with development as I am not aware of a param to 
> control CPU usage, it should run with full platform utilisation I would have 
> thought 

Thanks,

Jörn


--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551=/4140
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] rdf_loader_run logs "missed delete of name id cache" messages

2015-11-14 Thread Jörn Hees
Hi Hugh,


> If all datasets are indicate to have loaded with ll_state = 2 and not errors 
> reported in ll_error then the datasets have been loaded successfully.

ok, cool :)


> Did you run the checkpoint command when the bulk load completed as that 
> should always be done ? Also after restart is the high CPU consumption still 
> occurring ?

yes, i ran "checkpoint; commit work; checkpoint;" a couple of times already, 
but after a short while the DB is busy again... the same after a full restart 
of the machine.
I attached the log since restart yesterday at the end of this mail (including 
some comments)... at start time the instance contained the DBpedia EN + DE + 
pagelinks dumps.
I then started to load the Freebase, LinkedGeoData and Wikidata dumps with a 
single loader.
Afterwards the endpoint held ~ 5.96 G triples.


> What  does the “status();” command run from isql as that will show the status 
> of the machine and what might be consuming resources.

Actually funny, cause when i ran that the CPU load instantly seems to have 
dropped to 0 and no IO activity anymore, but the command never returned... i'm 
waiting for > 10 minutes now:
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
SQL> status();
Connected to OpenLink Virtuoso
Driver: 07.20.3214 OpenLink Virtuoso ODBC Driver

I opened some other isql-vt sessions but since then the DB doesn't respond at 
all.
It also doesn't seem to respond to "/etc/init.d/virtuoso-opensource-7 stop" or 
"sudo kill ..." within 5 Minutes...
After that i tried several other kill signals like SIGHUP, SIGINT and i think 
SIGUSR1 finally did the trick.
Starting took awefully long and seems to have replayed a lot of transactions 
which i don't fully understand as there were several checkpoints since the 
import... 
Should i rather use the backup from before?


After restart I ran status(); twice, once immediately after replaying the 
transactions when the DB was idle and a second time after i waited for ~5 
minutes and the DB became active again:

> SQL> status();
> REPORT
> VARCHAR
> ___
> 
> OpenLink Virtuoso  Server
> Version 07.20.3214-pthreads for Linux as of Nov 11 2015
> Started on: 2015-11-14 18:21 GMT+1
> 
> Database Status:
>   File size 0, 42128384 pages, 15652474 free.
>   1090 buffers, 3644262 used, 0 dirty 0 wired down, repl age 0 0 w. io 0 
> w/crsr.
>   Disk Usage: 3613440 reads avg 0 msec, 0% r 0% w last  171 s, 1113064 writes 
> flush  44.42 MB,
> 23803 read ahead, batch = 150.  Autocompact 60532 in 50797 out, 16% saved.
> Gate:  1112 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
> Log = /var/lib/virtuoso-opensource-7/db/virtuoso.trx, 185 bytes
> 25512792 pages have been changed since last backup (in checkpoint state)
> Current backup timestamp: 0x-0x00-0x00
> Last backup date: unknown
> Clients: 1 connects, max 1 concurrent
> RPC: 6 calls, 1 pending, 1 max until now, 0 queued, 0 burst reads (0%), 0 
> second 0M large, 10M max
> Checkpoint Remap 961058 pages, 0 mapped back. 17 s atomic time.
> DB master 42128384 total 15652474 free 961058 remap 0 mapped back
>temp  256 total 251 free
> 
> Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
>Currently 1 threads running 0 threads waiting 0 threads in vdb.
> Pending:
> 
> Client :1:  Account: dba, 316 bytes in, 2514 bytes out, 1 stmts.
> PID: 14099, OS: unix, Application: unknown, IP#: 127.0.0.1
> Transaction status: PENDING, 1 threads.
> Locks:
> 
> 
> Running Statements:
>  Time (msec) Text
>  965 status()
> 
> 
> Hash indexes
> 
> 
> 37 Rows. -- 966 msec.
> SQL> status();
> REPORT
> VARCHAR
> ___
> 
> OpenLink Virtuoso  Server
> Version 07.20.3214-pthreads for Linux as of Nov 11 2015
> Started on: 2015-11-14 18:21 GMT+1
> 
> Database Status:
>   File size 0, 42128384 pages, 15632400 free.
>   1090 buffers, 3662255 used, 98085 dirty 0 wired down, repl age 0 0 w. 
> io 1 w/crsr.
>   Disk Usage: 3630269 reads avg 0 msec, 0% r 0% w last  395 s, 1113064 writes 
> flush  44.42 MB,
> 23871 read ahead, batch = 150.  Autocompact 60532 in 50797 out, 16% saved.
> Gate:  1117 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
> Log = /var/lib/virtuoso-opensource-7/db/virtuoso.trx, 10804859 bytes
> 25512792 pages have been changed since last backup (in checkpoint state)
> Current backup timestamp: 0x-0x00-0x00
> Last backup date: unknown
> Clients: 1 connects, max 1 concurrent
> RPC: 8 calls, 1 pending, 1 max until now, 0 queued, 0 burst reads (0%), 0 
> second 2M large, 10M max
> Checkpoint Remap 961058 pages, 0 mapped back. 17 s atomic time.
> DB master 42128384 total 15632156 free 961058 remap 78945 mapped back
>temp  256 total 250 free
> 
> Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
>Currently 6 threads running 0 

Re: [Virtuoso-users] rdf_loader_run logs "missed delete of name id cache" messages

2015-11-13 Thread Jörn Hees
Hi Hugh,

> On 13 Nov 2015, at 04:11, Hugh Williams  wrote:
> 
> If you check the load_list table does it log in the ll_error column which 
> dataset is causing the error and you can check the file to see it is valid ? 

thanks for the answer... the following returns 0 rows
SELECT * FROM DB.DBA.LOAD_LIST WHERE ll_error IS NOT NULL ;

Does that mean that all files were loaded successfully despite the log 
messages? (all states are 2)



Also when importing additional datasets i got a message like this:
18:43:19 deleted rl kept around for page apply

Any clue what that means?



>> I also noticed that after all loaders finish, virtuoso continues to have 
>> very high CPU loads (2 cores maxed out on average) with very little IO 
>> activity.
>> Does anyone know what is happening in that time? Is it save to shut down the 
>> DB?


And last but not least, any idea why the machine keeps busy for hours after the 
loader is done?
How can i find out what it's doing?



Cheers,
Jörn


--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] rdf_loader_run logs "missed delete of name id cache" messages

2015-11-12 Thread Jörn Hees
Hi,

i'm trying to import the latest DBpedia dumps with 3 rdf_loader_runs on 
Virtuoso-Opensource 7.2.1.

In the virtuoso.log file i find messages like this:

15:47:39 Server online at  (pid 60790)
15:48:30 PL LOG: Loader started
15:48:57 PL LOG: Loader started
15:49:21 PL LOG: Loader started
15:52:34 missed delete of name id cache resource/III_Marine_Expeditionary_Force 
0 (0x5d6ae6 )

I retried importing from scratch a couple of times, also with a single 
rdf_loader_run only, but it seems "missed delete of name id cache" always occur.
I couldn't find anything online about how bad this is.
Does anyone know?


I also noticed that after all loaders finish, virtuoso continues to have very 
high CPU loads (2 cores maxed out on average) with very little IO activity.
Does anyone know what is happening in that time? Is it save to shut down the DB?

Cheers,
Jörn


--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] how to build additional DAV files (e.g. dbpedia)?

2015-11-10 Thread Jörn Hees
Hi,

what's the proper way to build and install the dbpedia VAD files?

Is it this?

> ./configure --with-layout=debian --enable-dbpedia-vad
> cd binsrc
> make

That will generate dbpedia_dav.vad and dbpedia_filesystem.vad...
am i right to assume that one is for installing via DAV upload and the other if 
i have local server file system access and can be copied like this?

> sudo cp dbpedia/dbpedia*.vad /usr/share/virtuoso-opensource-7/vad/


Cheers,
Jörn


--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] quick exact label search through all languages

2015-10-08 Thread Jörn Hees
Hi,

i'm trying to do simple entity lookup by label (exact match), but i don't know 
the language of the label.

I know i can do something like the following, but it's terribly slow (timeout, 
no result):

select * where {
  ?s rdfs:label ?l .
  FILTER(str(?l) = "贝拉克·奥巴马").
}

e.g.
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org=prefix+dbpedia%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2F%3E%0D%0Aselect+*+where+{%0D%0A++%3Fs+rdfs%3Alabel+%3Fl+.%0D%0A++FILTER%28str%28%3Fl%29+%3D+%22%E8%B4%9D%E6%8B%89%E5%85%8B%C2%B7%E5%A5%A5%E5%B7%B4%E9%A9%AC%22%29.%0D%0A}=text%2Fhtml_redir_for_subjs=121_redir_for_hrefs==3=on



I already tried bif:contains like this:

select * where {
  ?s rdfs:label ?l .
  FILTER(bif:contains(?l, "贝拉克·奥巴马")).
}

e.g.
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org=prefix+dbpedia%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2F%3E%0D%0Aselect+*+where+{%0D%0A++%3Fs+rdfs%3Alabel+%3Fl+.%0D%0A++FILTER%28bif%3Acontains%28%3Fl%2C+%22%E8%B4%9D%E6%8B%89%E5%85%8B%C2%B7%E5%A5%A5%E5%B7%B4%E9%A9%AC%22%29%29.%0D%0A}=text%2Fhtml_redir_for_subjs=121_redir_for_hrefs==3=on

but this returns the following error (and it's actually not exactly what i 
want):

> Virtuoso 37000 Error XM029: Free-text expression, line 0: Invalid character 
> in free-text search expression, it may not appear outside quoted string at è
> 
> 
> SPARQL query:
> define sql:big-data-const 0 
> #output-format:text/html
> define sql:signal-void-variables 1 define input:default-graph-uri 
>  prefix dbpedia: 
> select * where {
>   ?s rdfs:label ?l .
>   FILTER(bif:contains(?l, "贝拉克·奥巴马")).
> }



So, is there a way for a performant exact literal search ignoring the language?

Or could you maybe just make the standard `str(?l) = "something"` way quick if 
`?l` is a literal?


Best,
Jörn


--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] quick exact label search through all languages

2015-10-08 Thread Jörn Hees
Hi Patrick,

> On 8 Oct 2015, at 13:33, Patrick van Kleef  wrote:
> 
>> I know i can do something like the following, but it's terribly slow 
>> (timeout, no result):
>> 
>> select * where {
>> ?s rdfs:label ?l .
>> FILTER(str(?l) = "贝拉克·奥巴马").
>> }
>> 
>> [...]
> 
> This of course would result in a partial table scan as the str function would 
> need to be evaluated for each record.

yes and no... i mean the current implementation seems to, but it could 
certainly be optimised, as this sadly is the default pattern to search across 
languages :(


>> [...]
> 
> The problem is that when you use the contain function with unicode characters 
> it has difficulties find word separation, so you need to quote the word 
> yourself:
> 
> Try like this:
> 
> prefix dbpedia: 
> select * where {
>  ?s rdfs:label ?l .
>  FILTER(bif:contains(?l, "'贝拉克·奥巴马'")).
> }
> 
> which will give you super fast results:
> [...]

Cool, thanks, with this i can at least make it work somehow. Sadly it's still 
not exactly what i need...
for example if i search for an exact label "Barack Obama" this method will give 
me way too many results:

select * where {
  ?s rdfs:label ?l .
  FILTER(bif:contains(?l, "'Barack Obama'")).
}

http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org=prefix+dbpedia%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2F%3E%0D%0Aselect+*+where+{%0D%0A++%3Fs+rdfs%3Alabel+%3Fl+.%0D%0A++FILTER%28bif%3Acontains%28%3Fl%2C+%22%27Barack+Obama%27%22%29%29.%0D%0A}=text%2Fhtml_redir_for_subjs=121_redir_for_hrefs==3=on

I'd actually only be interested in exact matches, so the equivalent of `?s 
rdfs:label ?l . FILTER(str(?l) = "Barack Obama")`, just performant ;)

The workaround i found for now is this:

select * where {
  ?s rdfs:label ?l .
  FILTER(bif:contains(?l, "'Barack Obama'") && str(?l) = "Barack Obama").
}

http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org=prefix+dbpedia%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2F%3E%0D%0Aselect+*+where+{%0D%0A++%3Fs+rdfs%3Alabel+%3Fl+.%0D%0A++FILTER%28bif%3Acontains%28%3Fl%2C+%22%27Barack+Obama%27%22%29+%26%26+str%28%3Fl%29+%3D+%22Barack+Obama%22%29.%0D%0A}=text%2Fhtml_redir_for_subjs=121_redir_for_hrefs==3=on

but is that the "best" way?

Best,
Jörn


--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] SPARQL query size limits (GET, POST, POST DIRECT)

2015-04-07 Thread Jörn Hees
Hi,

On 7 Apr 2015, at 01:48, Hugh Williams hwilli...@openlinksw.com wrote:

 10K is the maximum limit, see the following post:
 
   http://boards.openlinksw.com/phpBB3/viewtopic.php?f=12t=1377#p4422

in that thread you mention a 10 K limit (of bytes i assume) for GET.

What are the limits for POST and POST DIRECT?


Cheers,
Jörn



 On 4 Apr 2015, at 18:45, Jörn Hees j_h...@cs.uni-kl.de wrote:
 
 Hi,
 
 for my PhD (learning graph patterns for human associations) i'm currently 
 experimenting with some sub-graphs that i need to match via SPARQL against a 
 Virtuoso endpoint. The sub-graph sizes range from 30 to 400K triples and are 
 automatically transformed into SPARQL queries (basically via N3).
 
 I'm interested in the practical limits for queries via:
 - GET
 - POST
 - POST DIRECT
 in terms of bytes, chars and triples.
 
 I noticed that there seems to be a 1 line limit for the SQL query that 
 is the outcome of the SPARQL query. I'm not sure how SPARQL queries are 
 translated into SQL lines though... is this triple wise (so max 1 
 triples) or term wise (so max 1/3 triples) or something else?
 
 Experiments tell me that the GET method seems to have a ~1 byte limit 
 (after urlencoding).
 
 Sadly, I couldn't really test POST DIRECT, as on our local 7.1.0 endpoint 
 even queries with ~150 triples (~18000 chars) seem to crash the endpoint :(
 If i remove long literals (which also happen to contain a lot of unicode 
 chars), the query doesn't return within 2 hours.
 If i switch to POST query, i only get a Virtuoso 42000 Error SR483: Stack 
 Overflow error, but the server keeps running at least...
 If for POST queries i keep the urlencoded query size  1 bytes, it seems 
 to work.
 
 Running the POST DIRECT query against http://dbpedia.org/sparql results in 
 an HTTP Error 409: Invalid path.
 POST gives me Virtuoso 37000 Error SP031: SPARQL: Internal error: The 
 length of generated SQL text has exceeded 1 lines of code
 If I run the  1 byte POST it seems to work again...
 
 Can anyone confirm my findings / give me a recommendation for how big a 
 sane query can get?
 
 Cheers,
 Jörn

--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] SPARQL query size limits (GET, POST, POST DIRECT)

2015-04-04 Thread Jörn Hees
Hi,

for my PhD (learning graph patterns for human associations) i'm currently 
experimenting with some sub-graphs that i need to match via SPARQL against a 
Virtuoso endpoint. The sub-graph sizes range from 30 to 400K triples and are 
automatically transformed into SPARQL queries (basically via N3).

I'm interested in the practical limits for queries via:
- GET
- POST
- POST DIRECT
in terms of bytes, chars and triples.

I noticed that there seems to be a 1 line limit for the SQL query that is 
the outcome of the SPARQL query. I'm not sure how SPARQL queries are translated 
into SQL lines though... is this triple wise (so max 1 triples) or term 
wise (so max 1/3 triples) or something else?

Experiments tell me that the GET method seems to have a ~1 byte limit 
(after urlencoding).

Sadly, I couldn't really test POST DIRECT, as on our local 7.1.0 endpoint even 
queries with ~150 triples (~18000 chars) seem to crash the endpoint :(
If i remove long literals (which also happen to contain a lot of unicode 
chars), the query doesn't return within 2 hours.
If i switch to POST query, i only get a Virtuoso 42000 Error SR483: Stack 
Overflow error, but the server keeps running at least...
If for POST queries i keep the urlencoded query size  1 bytes, it seems to 
work.

Running the POST DIRECT query against http://dbpedia.org/sparql results in an 
HTTP Error 409: Invalid path.
POST gives me Virtuoso 37000 Error SP031: SPARQL: Internal error: The length 
of generated SQL text has exceeded 1 lines of code
If I run the  1 byte POST it seems to work again...

Can anyone confirm my findings / give me a recommendation for how big a sane 
query can get?

Cheers,
Jörn


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] http://linkeddata.uriburner.com/sparql broken?

2015-04-03 Thread Jörn Hees
Hi,

if i run the default query on http://linkeddata.uriburner.com/sparql it tells 
me:

 Virtuoso 37000 Error SQ074: Line 4: at 'in' before '('
 
 SPARQL query:
 define sql:big-data-const 0 
 #output-format:text/html
 select distinct * where {[] a ?EntityType} limit 50

Maybe someone could have a look...

Cheers,
Jörn


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] public sparql endpoint with dbpedia pagelinks?

2015-03-13 Thread Jörn Hees
Hi,

does anyone know if there is a public endpoint that also has the dbepdia 
pagelinks (http://dbpedia.org/ontology/wikiPageWikiLink) e.g. from 
http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/en/page_links_en.nt.bz2 
loaded?

I think http://lod.openlinksw.com/sparql once had them, but maybe i'm wrong 
there...

Cheers,
Jörn


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] Virtuoso query length - http 400 exception

2015-01-09 Thread Jörn Hees
Hi,

On 9 Jan 2015, at 14:10, Boris Villazon-Terrazas bvilla...@isoco.com wrote:

 I'm performing a query against Virtuoso 07.10.3208 , via Jena-ARQ 2.11.2
 The query is a bit long, it has 16kb  and I'm getting an http 400 
 exception
 
 is there a way to deal with big SPARQL queries in Virtuoso (in this 
 context, i.e., via Jena)?


are you sending them as HTTP GET or POST?


Jörn


--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] SPARQL specific BNode knowing its nodeID?

2015-01-05 Thread Jörn Hees
Hi,

is there a way to get persistent BNode IDs in SPARQL result sets across 
multiple SPARQL queries in Virtuoso?
Or put another way:
Can virtuoso transform all BNodes into persistent generated URIs for me?

It seems as if Virtuoso already does something like that... for example my 
result set contains a triple like (and the nodeIDs seem fairly stable):
http://purl.org/ontology/bibo/Journal   
http://www.w3.org/2000/01/rdf-schema#subClassOf nodeID://b39128

Ignoring the fact that i should never assume that these IDs are stable, can i 
somehow query for the expansion of that one BNode, maybe similar to this with 
some magic:
select * where {
  nodeID://b39128 ?p ?o
}


Why?
I want to apply a graph algorithm to RDF which isn't made for RDF.
Each step might expand individual nodes based on some criteria.
As the algorithm isn't made for RDF, BNodes cause a problem.
To re-identify them i'd need to expand the CBD of nodes and do some kind of 
subgraph matching later on.
This would be a lot of additional effort for what i want to try...
It would be much easier (for me) if there was some way to that simply allowed 
me to treat any BNodes like a URI.


Cheers,
Jörn


--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] DBpedia other vad files

2014-11-10 Thread Jörn Hees

On 10 Nov 2014, at 09:58, Dimitris Kontokostas jimk...@gmail.com wrote:

 It is an option in the configure script
 
 Running ./configure --help you can see what extra vad's you can enable
  --enable-dbpedia-vad  --enable-rdfmappers-vad

thanks, missed that :(

Cheers,
Jörn


--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] DBpedia other vad files

2014-11-09 Thread Jörn Hees
Hi,

i'm currently trying to update my Setting up a local DBpedia 3.9 endpoint with 
Virtuoso 7 guide [1].
It seems all works quite well and similar with DBpedia 2014 but for whatever 
reason i'm missing the DBpedia VAD package in the final step.

If i check the /usr/share/virtuoso/vad/ i can only find the following 2 VAD 
packages (which i also see in the conductor packages list):
conductor_dav.vad  isparql_dav.vad

Is there some compile option i'm missing or how can i build the dbpedia and 
rdf_mappers VAD which are in the binsrc dir?

Cheers,
Jörn

[1]: 
https://joernhees.de/blog/2014/04/23/setting-up-a-local-dbpedia-3-9-mirror-with-virtuoso-7/


--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] get all URIs in a owl:sameAs equivalence class

2014-09-11 Thread Jörn Hees
Hi,

i have a large list of URIs. For each i would like to get all URIs being in one 
owl:sameAs equivalence class with it.
I want to restrict this to the data which is already on the endpoint, so no 
sponging / federated queries.

What i tried:

Try:
```
select distinct * where {
  ?i (owl:sameAs|^owl:sameAs)* ?j .
  VALUES (?i) {(dbpedia:Germany)}
}
```
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.orgqtxt=select+distinct+*+where+{%0D%0A++%3Fi+%28owl%3AsameAs|^owl%3AsameAs%29*+%3Fj+.%0D%0A++VALUES+%28%3Fi%29+{%28dbpedia%3AGermany%29}%0D%0A}format=text%2Fhtmltimeout=3debug=on

Result:
```
Virtuoso 37000 Error TR...: transitive start not given

SPARQL query:
define sql:big-data-const 0 
#output-format:text/html
define sql:signal-void-variables 1 define input:default-graph-uri 
http://dbpedia.org select distinct * where {
  ?i (owl:sameAs|^owl:sameAs)* ?j .
  VALUES (?i) {(dbpedia:Germany)}
}
```



Try:
```
select distinct * where {
  ?i (owl:sameAs|^owl:sameAs)* ?j .
  FILTER( ?i=dbpedia:Germany )
}
```
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.orgqtxt=select+distinct+*+where+{%0D%0A++%3Fi+%28owl%3AsameAs|^owl%3AsameAs%29*+%3Fj+.%0D%0A++FILTER%28+%3Fi%3Ddbpedia%3AGermany+%29%0D%0A}format=text%2Fhtmltimeout=3debug=on

Result:
```
Virtuoso 42000 Error TN...: Exceeded 10 bytes in transitive temp 
memory.  use t_distinct, t_max or more T_MAX_memory options to limit the search 
or increase the pool

SPARQL query:
define sql:big-data-const 0 
#output-format:text/html
define sql:signal-void-variables 1 define input:default-graph-uri 
http://dbpedia.org select distinct * where {
  ?i (owl:sameAs|^owl:sameAs)* ?j .
  FILTER( ?i=dbpedia:Germany )
}
```



Try:
```
DEFINE input:same-as yes
select distinct * where {
  ?i owl:sameAs ?j .
  VALUES (?i) {(dbpedia:Germany)}
}
```
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.orgqtxt=DEFINE+input%3Asame-as+%22yes%22%0D%0Aselect+distinct+*+where+{%0D%0A++%3Fi+owl%3AsameAs+%3Fj+.%0D%0A++VALUES+%28%3Fi%29+{%28dbpedia%3AGermany%29}%0D%0A}format=text%2Fhtmltimeout=3debug=on

Result:
```
i   j
http://dbpedia.org/resource/Germany 
http://www4.wiwiss.fu-berlin.de/eurostat/resource/countries/Deutschland
http://dbpedia.org/resource/Germany 
http://www4.wiwiss.fu-berlin.de/eurostat/resource/regions/Deutschland
http://dbpedia.org/resource/Germany 
http://www4.wiwiss.fu-berlin.de/factbook/resource/Germany
http://dbpedia.org/resource/Germany http://rdf.freebase.com/ns/m.0345h
...

http://dbpedia.org/resource/Germany http://dbpedia.org/resource/Germany
http://dbpedia.org/resource/Germany http://dbpedia.org/resource/Montenegro 
// already reported on DBpedia mailing list
...
```

Problem: if i run this on our local endpoint (Virtuoso version 07.10.3207) i 
only get (even though if i manually query for `dbpedia:Germany owl:sameAs ?j` i 
get loads...):
```
i   j
http://dbpedia.org/resource/Germany http://rdf.freebase.com/ns/m.0345h
```

Was this maybe updated recently?



Any other ideas or something obvious i'm missing?

Cheers,
Jörn


--
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191iu=/4140/ostg.clktrk
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] importing Freebase RDF dump: slows down, memleak?

2014-09-08 Thread Jörn Hees

On 2 Sep 2014, at 00:01, Hugh Williams hwilli...@openlinksw.com wrote:

 Development indicate your suggestion is not without merit but 
 implementation is not as simple as it may seems as the indexes are not all 
 sequential, but something like that could possibly be implemented. It is 
 suggested you could try dropping the indexes on RDF_QUAD table,  load the 
 Freebase datasets and then recreate indexes after loading,  which would 
 require a smaller working set that would better fix into the 32GB RAM 
 available. The command for dropping the necessary indexes are:
 
 drop index rdf_quad_pogs;
 drop index rdf_quad_sp;
 drop index rdf_quad_op;
 drop index rdf_quad_gs;
 
 and the respective indexes can then be recreated as detailed at:
 
 
 http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning?#RDF%20Index%20Scheme
 
 Note you need to recreate the column-wise indexes being v7. Let us know how 
 this works for you.
 
 Cool, will try.
 
 [Hugh] OK, let us know the outcome ...

After 5 days:

[Sun Sep  7 23:43:31 2014] virtuoso-t[11495]: segfault at  ip 
008c3a8e sp 7f4ec2f79d20 error 7 in virtuoso-t[40+b47000]

yay...

Performance also breaks down at some point with dropped indexes and 2 
run_rdf_loaders as suggested.
I'll append the output of `select * from DB.DBA.LOAD_LIST;` but for now i give 
up...

Cheers,
Jörn


ll_file 
  ll_graph  
ll_statell_started   ll_done  ll_host 
ll_work_time  ll_error
VARCHAR NOT NULL
  VARCHAR   
INTEGER TIMESTAMPTIMESTAMPINTEGER INTEGER   
  VARCHAR
___

/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.aa.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 17:20.13 927132000  2014.9.2 17:34.28 11121000  0  
 NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ab.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 17:20.59 818941000  2014.9.2 17:35.42 994563000  0 
  NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ac.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 17:34.28 46244000  2014.9.2 17:52.1 4205  0
   NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ad.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 17:35.43 10491000  2014.9.2 17:53.58 266217000  0  
 NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ae.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 17:52.1 4551  2014.9.2 18:11.45 21522000  0
   NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.af.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 17:53.58 27075  2014.9.2 18:14.13 26648  0 
  NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ag.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 18:11.45 25765000  2014.9.2 18:29.32 312824000  0  
 NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ah.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 18:14.13 27152  2014.9.2 18:34.51 216078000  0 
  NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ai.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 18:29.32 321036000  2014.9.2 18:54.37 54526000  0  
 NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.aj.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 18:34.51 220487000  2014.9.2 18:54.44 130952000  0 
  NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ak.nt.gz
  http://rdf.freebase.com   

Re: [Virtuoso-users] importing Freebase RDF dump: slows down, memleak?

2014-09-01 Thread Jörn Hees

On 1 Sep 2014, at 17:15, Hugh Williams hwilli...@openlinksw.com wrote:

 [Hugh] Did you let the load continue or was it stopped ?

yupp, i let it continue but it was killed for out of memory 2.5 days after my 
last mail... :-/


 Development indicate your suggestion is not without merit but implementation 
 is not as simple as it may seems as the indexes are not all sequential, but 
 something like that could possibly be implemented. It is suggested you could 
 try dropping the indexes on RDF_QUAD table,  load the Freebase datasets and 
 then recreate indexes after loading,  which would require a smaller working 
 set that would better fix into the 32GB RAM available. The command for 
 dropping the necessary indexes are:
 
   drop index rdf_quad_pogs;
   drop index rdf_quad_sp;
   drop index rdf_quad_op;
   drop index rdf_quad_gs;
 
 and the respective indexes can then be recreated as detailed at:
 
   
 http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning?#RDF%20Index%20Scheme
 
 Note you need to recreate the column-wise indexes being v7. Let us know how 
 this works for you.

Cool, will try.

 Note you can also use the ld_meter scripts we provided for monitoring the 
 Virtuoso Bulk loader activity as detailed at:
 
   
 http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtTipsAndTricksGuideLDMeterUtility
 
 Also, how many rdf_loader_run() processes do you have  running when 
 performing the load, as for v7 we recommend running  Number of Core * 0.4 for 
 best performance typically ?

Thanks, didn't know these. I'll probably not run multiple rdf_loaders at the 
same time as deactivating the indices, etc. (i assume it's meant for cases 
where you have enough RAM and aren't invalidating even more of the cache 
hierarchy by several processes concurring?)

Cheers,
Jörn


--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] importing Freebase RDF dump: slows down, memleak?

2014-08-25 Thread Jörn Hees
Hi again,

On 22 Aug 2014, at 17:44, Jörn Hees j_h...@cs.uni-kl.de wrote:

 On 22 Aug 2014, at 17:51, Hugh Williams hwilli...@openlinksw.com wrote:
 
 What I would not expect though is for the memory consumption to continue to 
 increase until the server is killed due to oom error which would imply a 
 possible memory leak, which is why I recommend building with the develop/7  
 build where there have been improvement in memory management.
 
 I currently just used the stable 7.1.0 release. I'll try with the dev build 
 again and report back...

So i'm running the import on a fresh dev build since my last email and i'm now 
at a total memory consumption of 31218/32177 MB (buffers: 15 MB, cache: 
remaining ~700 MB).

The Virtuoso process has allocated 31.5 GB (VIRT), 30.1 GB (RES) and 3.812 MB 
(SHR) Memory.

I'm not sure if i really have to run the importer till it's killed for out of 
memory (as i said it becomes pretty slow after a while and is currently only 
seeking around with 200 KB/s) or if this is enough already. As NumberOfBuffers 
is set to 272 as recommended i guess that anything above 21 GB is 
suspicious... we're at  31 GB now.


I've also split up the input file into 100M line chunks so that i can track the 
progress a bit better...
14 of these are completely loaded now, so 1.4 G triples, the 15th is currently 
running.
These are the start times as reported in DB.DBA.LOAD_LIST. I added a column for 
loaded triples (not necessarily unique):
2014.8.22 19:59 0
2014.8.22 20:09 100M
2014.8.22 20:22 200M
2014.8.22 20:39 300M
2014.8.22 20:53 400M
2014.8.22 21:11 500M
2014.8.22 21:31 600M
2014.8.22 22:03 700M
2014.8.22 22:39 800M
2014.8.22 23:32 900M
2014.8.23 00:17 1G
2014.8.23 02:47 1.1G
2014.8.23 08:51 1.2G
2014.8.23 18:02 1.3G
2014.8.24 16:16 1.4G

The import times for 100M triples seem to be roughly about:
- 10 minutes initially
- 30 minutes after 600M loaded triples
- 45 minutes after 900M triples
- 2h:30 after 1G triples (I'm guessing that this is when the set Memory-Limit 
is hit)
- 6h after 1.1G triples
- 10h after 1.2G triples
- 22h after 1.3G triples
- 22h after 1.4G triples

The last 4 lines sadly don't give me the impression that this scales nearly 
linearly after virtuoso runs out of fast random access memory and has to rely 
on block storage :-/ Is there maybe a setting which allows virtuoso to fall 
back to a merge-sort like approach like creating sorted temp dbs and then 
merging them bottom up? Wouldn't this scale way beyond the available RAM sizes 
and not cause the seekwait pattern i observe?!?


Anything else i can do to help to debug this? Can i stop the import?

Cheers,
Jörn


--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] importing Freebase RDF dump: slows down, memleak?

2014-08-22 Thread Jörn Hees
Hi Hugh,

thanks for the reply.

I know 32 GB is probably not much considering the size of the dumps, but it's 
the size limit of our VMs :(
So i'd be willing to live with a bit slower import and response times if i can 
still leave it on a VM.

On 22 Aug 2014, at 17:51, Hugh Williams hwilli...@openlinksw.com wrote:

 What I would not expect though is for the memory consumption to continue to 
 increase until the server is killed due to oom error which would imply a 
 possible memory leak, which is why I recommend building with the develop/7  
 build where there have been improvement in memory management.

I currently just used the stable 7.1.0 release. I'll try with the dev build 
again and report back...

Cheers,
Jörn


--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] Conductor - RDF - Graphs tab doesn't show all graphs

2010-08-19 Thread Jörn Hees
Hi,

I noticed is that in the virtuoso conductor under RDF/Graphs a few graphs are 
listed, but that this list is different from what I get with this SPARQL-query:

SELECT ?g count(*)
WHERE { GRAPH ?g {?s ?p ?o.} }
GROUP BY ?g
ORDER BY DESC 2

(As a side-note: this SPARQL-query might take a long time.)

What striked me even more is that the graphs mentioned above are different from 
these:

SELECT DISTINCT ?g WHERE { GRAPH ?g {?s ?p ?o.} }

(As a side-note: this SPARQL-query returns in a split second.)

Any idea what's going on or what I'm doing wrong?

I currently think that the list I can see in the conductor is taken from this 
SQL-query:
select * from SPARQL_SELECT_KNOWN_GRAPHS_T;
which actually is the view of the SPARQL_SELECT_KNOWN_GRAPHS procedure.
Nevertheless I couldn't find out why this is out of sync and how to get it 
back to sync.
Another SQL-query:
select distinct id_to_iri(g) from rdf_quad;
(also takes quite a while) shows the list of graph IRIs corresponding to the 
first SPARQL-query.


Another puzzling fact is that the renaming action in the Graphs tab actually 
does not seem to rename, but clone the graph (which can again only be seen by 
the above first SPARQL-query, while the Graphs Tab and the second SPARQL-query 
do not show the old name anymore).


I hit all these problems after I used the Schemas tab to load the old SKOS 
vocabulary: I imported this IRI in the Schemas tab:
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf
(please tell me if this was a wrong idea... I actually don't know why there's 
a distinction of Schemas and Graphs and probably did something wrong here).

Afterwards I wanted to delete the existing http://www.w3.org/2004/02/skos/core 
graph in the Graphs tab and wanted to rename 
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf to 
http://www.w3.org/2004/02/skos/core .
But I can't delete http://www.w3.org/2004/02/skos/core (it's in the list of 
the Graphs tab, but when clicking on delete I get):
 An error has occurred when processing /conductor/sparql_graph.vspx page.
 SQL State 42S12
 SQL Message   
 
 SR201: Primary key not found in delete.

The SPARQL query mentioned above still shows both graph IRIs:
http://www.w3.org/2004/02/skos/core
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf

I then tried to rename the graph http://www.w3.org/2002/07/owl# to 
http://www.w3.org/2002/07/owlHash. The Graphs tab now correctly excluded the 
old owl# and instead shows owlHash, but when SPARQLing as above, the owl# is 
there plus the owlHash.


Cheers,
Jörn



Re: [Virtuoso-users] Conductor - RDF - Graphs tab doesn't show all graphs

2010-08-19 Thread Jörn Hees
On Thursday 19 August 2010, Hugh Williams wrote:
 Testing against my latest Virtuoso version 06.01.3127 build 18th August
 2010 all of the following queries and the graphs list in the Conductor
 give me the same value for number of graphs in my store:
 
 SPARQL SELECT ?g count(*) WHERE { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g ORDER
 BY DESC 2;
 
 SPARQL SELECT DISTINCT ?g WHERE { GRAPH ?g {?s ?p ?o.} };
 
 select * from SPARQL_SELECT_KNOWN_GRAPHS_T;
 
 What is the version number and build date of the Virtuoso instance you are
 using ?

Strange, I'm using
Version: 06.01.3127
Build: Aug 6 2010
as well, so shouldn't be a difference, should it?

I imported the DBpedia dumps into my db and also installed the DBpedia 
package, but apart from that didn't do much.

Jörn



Re: [Virtuoso-users] Conductor - RDF - Graphs tab doesn't show all graphs

2010-08-19 Thread Jörn Hees
OK, I did further investigations:

I had a backup of the virtuoso db at the point just after loading the DBpedia 
dumps (en  de) and installing the DBpedia and rdf_mappers packages.
I replayed this backup and executed all 3 queries:

On Thursday 19 August 2010, Jörn Hees wrote:
 On Thursday 19 August 2010, Hugh Williams wrote:
  SPARQL SELECT ?g count(*) WHERE { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g
  ORDER BY DESC 2;
  
  SPARQL SELECT DISTINCT ?g WHERE { GRAPH ?g {?s ?p ?o.} };
  
  select * from SPARQL_SELECT_KNOWN_GRAPHS_T;

All resulting in the same 26 graphs.

I then went to the conductor / RDF / Schemas tab and imported 
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf .
Then went to the Graphs tab, deleted the http://www.w3.org/2004/02/skos/core 
graph (worked fine), and renamed 
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf to  
http://www.w3.org/2004/02/skos/core .

After this the first of the queries results in 27 graphs, both others in 26.
(In the first one the 
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf still exists.)

I went back to the Schemas tab and deleted the 
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf Schema. Still the 
same behavior, so 27 vs. 26 graphs.

When I now try to delete the http://www.w3.org/2004/02/skos/core from the 
Graphs tab the first query returns 26 graphs, the second and third 25. The 
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf is still there.

I then tried to *manually delete* the persisting graph:
  sparql clear graph
  http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf;
The result is even stranger:
If I do
  sparql select count(*) from
  http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf
  where {?s ?p ?o};
I get 0 rows.
BUT: my first sparql query still tells me that there are 1196 triples in that 
graph!
Also: if I execute this sql query:
  select distinct id_to_iri(g) from rdf_quad;
or this one:
  select distinct id_to_iri(g), count(*) from rdf_quad group by g;
The *graph still exists*.

A control query:
  sparql select count(*) from http://dbpedia.org where {?s ?p ?o} ;
tells me that there are 258867871 rows in my DBpedia dump.


As a side-question: why do all those queries take so long? Isn't there a 
primary key index on the rdf_quad table for g,s,p,o which they should be able 
to use and return with the count in a split second?


If you want I can provide you with a detailed description how I imported the 
DBpedia dumps (en  de) and could even give you the backup (11 GB gzipped).

Jörn



Re: [Virtuoso-users] Problems loading the DBpedia dump (very long URIs)

2010-08-08 Thread Jörn Hees
Hi Patrick,

thanks for your answers, helped me a lot.

On Saturday 07 August 2010, Patrick van Kleef wrote:
  1. Is there a possibility to for example get the line number where
  the error occurred?

 The loader process creates a table called load_list which records
 which datasets where not loaded completely and for what reason.
 
 Please try:
 
   select * from load_list where where ll_error is not null;
 
 This should give you good indication of what is going on, including
 the line number where the failure happened.

I tried this before, but it only gives me the following as error message for 
both files:
23000 SR133: Can not set NULL to not nullable column 'DB.DBA.RDF_QUAD.O'

There are no line numbers :(

  2. Can I somehow tell virtuoso not to quit TTLP on such lines, but
  to either
  ignore or truncate them?
 
 There are some flags to the TTLP code that would probably skip this
 errror, but in certain cases that could insert partial data into the
 database, which is much harder to clean up, so we have not made it
 default.

Actually the loader script calls the TTLP function with 255 as flags, so 
basically every flag, which tolerates more than the standard is activated 
already.

To fix my problem I for now did the following preprocessing of the DBpedia 
3.5.1 dumps, which strips out all lines with URLs longer than 1024 chars:

for i in external_links_en.nt.gz page_links_en.nt.gz ; do
  echo -n cleaning $i...
  zcat $i | grep -v -E '^.+ .+ .{1025,} .$' | gzip  
${i%.nt.gz}_cleaned.nt.gz 
  mv ${i%.nt.gz}_cleaned.nt.gz $i
  echo done.
done

Cheers,
Jörn



[Virtuoso-users] Problems loading the DBpedia dump (very long URIs)

2010-08-07 Thread Jörn Hees
Hi all,

after having a lot trouble with our group internal DBpedia mirror, I decided 
to do a fresh install.

I downloaded all the files (we only need a subset of the dbpedia: {en,de}) and 
followed this guide yesterday:
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtBulkRDFLoaderExampleDbpedia

It uses the 
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtBulkRDFLoaderScript 
to load the dumps (btw: the script could greatly benefit from a few comments).

Now today I found these lines in the logs:


23:33:20 PL LOG: Loader started

Sat Aug 07 2010
00:06:49 PL LOG:  File 
/usr/local/data/dbpedia/3.5.1/en/external_links_en.nt.gz error 23000 SR133: 
Can not set NULL to not nullable column 'DB.DBA.RDF_QUAD.O'
00:32:56 Checkpoint started
00:34:39 Checkpoint finished, log reused
02:05:30 PL LOG:  File /usr/local/data/dbpedia/3.5.1/en/page_links_en.nt.gz 
error 23000 SR133: Can not set NULL to not nullable column 'DB.DBA.RDF_QUAD.O'
02:47:44 PL LOG: No more files to load. Loader has finished,


So I went to investigate where this comes from and it seems that inside the 
ld_file procedure of the VirtBulkRDFLoaderScript the error is caught. It uses 
the TTLP procedure to load the data.

1. Is there a possibility to for example get the line number where the error 
occurred? I did a few checks on the external_links_en.nt.gz file with zcat and 
grep and I think that a very long URL is the problem (see appendix).

2. Can I somehow tell virtuoso not to quit TTLP on such lines, but to either 
ignore or truncate them?

3. How much data was not inserted? The error handler calls a rollback work, 
but it seems that in the rdf_loader_run procedure a commit work is done only 
after every 100 files loaded, which would mean that all which was inserted 
before is lost? At the same time log_enable(2, 1) is set, which means 
autocommit for every row, no log, so why is there a commit at all?

4. How do I continue? Can I simply restart with just these two files after 
fixing?

Cheers,
Jörn




Appendix:
=== did a few checks: ==
SPARQL select count(*) where { ?s http://dbpedia.org/property/reference ?o . 
};
results in:
348955

At the same time the external_links_en.nt.gz file has:
5081932 lines (zcat external_links_en.nt.gz | wc -l)

The corresponding lines in the file look like this
(zcat external_links_en.nt.gz | cat -n | head -n $((348955+1)) | tail -n 2 ):
348955  http://dbpedia.org/resource/Fourteen_Points 
http://dbpedia.org/property/reference 
http://wwi.lib.byu.edu/index.php/President_Wilson's_Fourteen_Points .
348956  http://dbpedia.org/resource/Fourteen_Points 
http://dbpedia.org/property/reference 
http://www.mtholyoke.edu/acad/intrel/doc31.htm .

Notice the 's in line 348955, but actually as we got 348955 triples, the 
problem should've occured in the line after that one, but in line 348956 I see 
nothing wrong.

Also notice that
sparql select * where { http://dbpedia.org/resource/Fourteen_Points 
http://dbpedia.org/property/reference ?o .};  

   
results in:
http://www.loc.gov/exhibits/treasures/trm053.html   


  
http://wwi.lib.byu.edu/index.php/Main_Page  


  
http://wwi.lib.byu.edu/index.php/President_Wilson's_Fourteen_Points
http://www.mtholyoke.edu/acad/intrel/doc31.htm
http://www.ourdocuments.gov/doc.php?flash=truedoc=62
http://web.jjay.cuny.edu/jobrien/reference/ob34.html

So line 348956 is imported.

Using nested intervals i found this:
356036  http://dbpedia.org/resource/Antisocial_personality_disorder 
http://dbpedia.org/property/reference 
http://www.faculty.missouristate.edu/M/MichaelCarlie/what_I_learned_about/GANGS/WHYFORM/pathological.htm
 
.
356037  http://dbpedia.org/resource/Hugo_Simberg 
http://dbpedia.org/property/reference 

Re: [Virtuoso-users] SPARQL Linked Data: howto configure sponger behind IRI deref for variables

2010-08-03 Thread Jörn Hees
On Tuesday 03 August 2010, Nathan wrote:
 Aldo Bucchi wrote:
  Hi Ivan, Jorn,
  
  I've run into similar requirements. A catch-all solution to this could
  be to surface the ability to specify a custom deref procedure
  
  define input:iri_deref DB.DBA.MY_CUSTOM_IRI_DEREF
  
  This should get all the context to decide whether to perform a fetch
  and also have access to the sponger infrastructure.
  
  This is a broad and raw idea, I know. The point is that the current
  visibility into what happens under the covers is too low. Of course,
  in an ideal world, sources should play along with the Linked Data
  rules ( including specifying caching TTL ) but in reality you might
 
 and ETag's!
 
  want to override some sources, implement specific caching strategies,
  etc.
  
  Spongers provide some flexibility in the how do I get stuff from a
  IRI part. But perhaps the controller part could also be extensible.
 
 FWIW, +1 unless you can simply designate sponger to use a proxy, which
 would solve all use cases from what I can tell.

Yupp, I definitely agree. AFAIK there currently is no possibility to make use 
of let's say a local DBpedia mirror in case you're using the IRI-dereferencing 
options:

If I use:
  DEFINE get:proxy localhost:8890
with this query:
  DEFINE input:grab-all yes
  DEFINE input:grab-depth 1
  SELECT DISTINCT ?p ?lp ?o ?lo
  WHERE {
http://dbpedia.org/resource/Barack_Obama ?p ?o.
OPTIONAL { ?p rdfs:label ?lp. }
OPTIONAL { ?o rdfs:label ?lo. }
   }
I'm told that get:proxy only makes sense with a FROM clause. So is there any 
possibility to use a proxy (e.g., the current datastore's URL rewriting) for 
IRI dereferencing, or perhaps even Aldo's suggestion of a special 
DB.DBA.MY_CUSTOM_IRI_DEREF ?

Cheers,
Jörn




[Virtuoso-users] SPARQL Linked Data: howto configure sponger behind IRI deref for variables

2010-07-29 Thread Jörn Hees
Hey all,

I'm currently working on a project where I want to ask a lot cross-border 
SPARQL-Queries like the following one:

SELECT DISTINCT ?p ?lp ?o ?lo
WHERE {
 http://dbpedia.org/resource/Barack_Obama ?p ?o.
 OPTIONAL { ?p rdfs:label ?lp. }
 OPTIONAL { ?o rdfs:label ?lo. }
}

(So I'd prefer a human readable version of potential URIs for ?p and ?o.)

Now virtuoso allows me to do so by using
DEFINE input:grab-all yes
DEFINE input:grab-depth 1

Nevertheless, as all drop-downs and tutorials say, the performance of this is 
very bad (takes around 15 minutes, when I execute the query).

So my question is: is there any way to configure the sponger behind this to
1. cache its lookups, so don't retrieve graphs already present in the local 
quad store (the DEFINE get:soft soft seems to be what I want, but that only 
works for graphs mentioned in a FROM or FROM NAMED clause),
2. do n-fetches in parallel and/or,
3. tell me if some object turned out not to be a linked data resource, but 
just an URL (e.g., in the triple: (dbpedia:Barack_Obama dbpprop:reference 
http://www.chicagolife.net/content/politics/Barack_Obama) )

Perhaps someone could point me into the right direction / has some helpful 
thoughts?

Cheers,
Jörn