Re: [Virtuoso-users] obtaining a copy/dump of an online SPARQL endpoint in a nice manner

2016-04-06 Thread Jörn Hees
On 31 Mar 2016, at 22:10, Kingsley Idehen  wrote:
> 
> How are you arriving at data devoid or metadata about its origins?

Link rot... Endpoint is still working, dumps aren't...


> You would be better served, ultimately, instantiating a dedicated
> Virtuoso instance in the cloud for your specific needs. This instance
> could load datasets from wherever, using some of the existing endpoints
> (DBpedia and others) as a mechanism for exposing provenance data etc..

How would i do that? I'm not sure i fully understand... We have a Virtuoso 
instance in our local network for our working group already.
I usually manually load it with the dumps of datasets i want to run my 
algorithms against so i don't impede the public endpoints operations / get 
blocked for doing too many requests.


> There is no nice way of trying to dump all the data from an existing
> SPARQL endpoint.

If there is no nice way, is there any way at all?

I mean I can't be the first one who is interested in all triples a graph on a 
remote endpoint contains (even if that graph is big).


Best,
Jörn


--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] obtaining a copy/dump of an online SPARQL endpoint in a nice manner

2016-04-01 Thread Kingsley Idehen
On 4/1/16 9:02 AM, Jörn Hees wrote:
> On 31 Mar 2016, at 22:10, Kingsley Idehen  wrote:
>> How are you arriving at data devoid or metadata about its origins?
> Link rot... Endpoint is still working, dumps aren't...

Example please.

>
>
>> You would be better served, ultimately, instantiating a dedicated
>> Virtuoso instance in the cloud for your specific needs. This instance
>> could load datasets from wherever, using some of the existing endpoints
>> (DBpedia and others) as a mechanism for exposing provenance data etc..
> How would i do that? I'm not sure i fully understand... We have a Virtuoso 
> instance in our local network for our working group already.

You could make a pre-configured DBpedia instance, or LOD the same
datasets we have in our LOD Cloud cache etc..
> I usually manually load it with the dumps of datasets i want to run my 
> algorithms against so i don't impede the public endpoints operations / get 
> blocked for doing too many requests.
>
>
>> There is no nice way of trying to dump all the data from an existing
>> SPARQL endpoint.
> If there is no nice way, is there any way at all?

There is a slow way using OFFSET and LIMIT with smaller sizes that what
you had in the initial post. 
>
> I mean I can't be the first one who is interested in all triples a graph on a 
> remote endpoint contains (even if that graph is big).

You need to be more specific about the data you seek and from what
endpoint & repository combo it's currently visible etc..



>
>
> Best,
> Jörn
>
>


-- 
Regards,

Kingsley Idehen   
Founder & CEO 
OpenLink Software 
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] obtaining a copy/dump of an online SPARQL endpoint in a nice manner

2016-03-31 Thread Kingsley Idehen
On 3/31/16 1:30 PM, Jörn Hees wrote:
> Hi,
>
> i developed some machine learning algorithms that i'd like to run against 
> various datasets.
> Many of them provide Virtuoso powered SPARQL endpoints online, but running my 
> algorithms against them would for sure not be considered "fair use".
>
> Some datasets provide dumps, so i'm able to play nice, load the dumps on a 
> local Virtuoso instance and torture that local instance with my algorithms.
>
> How can i do something similar in case there is no dump available for 
> download, but only a SPARQL endpoint?
>
> I was thinking about issuing a `construct where { ?s ?p ?o } limit X offset 
> Y` and stepping through the endpoint like that once, but the bigger the 
> offset, the slower the response time:
>
> http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&qtxt=select+*+where+{%3Fs+%3Fp+%3Fo.}+limit+1+offset+40002&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=3&debug=on
>
> Any suggestions how to improve this and do this in a "nice" way?
> Also maybe without the danger of skipping a lot of data by different orders?
>
> Best,
> Jörn

How are you arriving at data devoid or metadata about its origins?

You would be better served, ultimately, instantiating a dedicated
Virtuoso instance in the cloud for your specific needs. This instance
could load datasets from wherever, using some of the existing endpoints
(DBpedia and others) as a mechanism for exposing provenance data etc..

There is no nice way of trying to dump all the data from an existing
SPARQL endpoint.

Regards,

Kingsley Idehen   
Founder & CEO 
OpenLink Software 
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users