On 11/12/15 5:16 PM, Haag, Jason wrote:
> Hi All,
>
> I have been trying to understand how virtuoso's crawler content import
> and sponging features work. I'm currently evaluating virtuoso using
> 07.20.3214 VOS. 
>
> I set up three crawl jobs for three different HTML/RDFa files and
> received no errors. 
>
> When I attempt to use the sparql interface to query the data it
> doesn't show up:
>
> For example, http://w3id.org/xapi/adb/verbs/ is the target URL of a
> crawl job I set up in conductor under content imports. I am using the
> xhtml/HTM5 variants cartridge with the following options:
>
> fallback-mode=no
> rdfa=yes
> reify_html5md=0
> reify_rdfa=1
> reify_jsonld=0
> reify_all_grddl=0
> reify_html=0
> passthrough_mode=yes
> loose=yes
> reify_html_misc=no
> reify_turtle=no 
>
> If I go to http://54.152.125.100:8890/sparql and use the following
> sparql query it returns no results:
>
> #Query all Verb IRIs
> PREFIX xapi: <https://w3id.org/xapi/ontology#>
>
> SELECT DISTINCT ?Verb 
>
> WHERE {
>    ?Verb a xapi:Verb .
>
> } 
>
>
> However, the data does start to show up in this query if I
> subsequently add http://w3id.org/xapi/adb/verbs/ as the default data
> set name / graph IRI in the sparql interface and also select the
> sponging option to download all RDF resources. 
>
> Is this sponging option from the sparql interface actually
> adding/download the triples?

Yes, "sponging" is our colloquialism for "importing data" from some URL .

> Wouldn't this allow anyone to add triples that has access to the
> sparql interface?

Yes, if you don't apply access controls to your SPARQL endpoint [1]

> The faceted search interface seems to indicate so as I did this with
> the following graph IRI, http://adlnet.gov/expapi/verbs
>
> http://54.152.125.100:8890/describe/?url=http%3A%2F%2Fadlnet.gov%2Fexpapi%2Fverbs&sid=4
>
> I tried to set up this IRI as a crawl job and it never populated
> virtuoso's data store.

Did you enable the "store metadata option" and then select relevant
cartridge?

> But as soon as I add it as a graph IRI using the sparql interface and
> sponging it shows up. Is this the expected behavior / by design for
> this sparql sponging option?

Yes, you can also import data via SPARQL. You can even automatically
convert CSV data to 5-Star Linked Data via SPARQL integration with the
Sponger [2][3]

> I thought graphs and triples could only be added with special SPARQL
> permissions and using INSERT. 
>
> I still don't think the crawler feature is working for HTML/RDFa. It
> appears to be processing and storing the HTML file in the
> repository/locally in virtuoso, but it doesn't seem to actually add
> the graph or triples to the database. 
>
> Thanks in advance for your patience and help!
>
> J Haag

Links:

[1]
http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtSPARQLEndpointProtection
-- Protecting Your SPARQL Endpoint

[2]
http://virtuoso.openlinksw.com/tutorials/sparql/SPARQL_Tutorials_Part_7/SPARQL_Tutorials_Part_7.html#(1)
-- Sponger Pragmas and Web Crawling via SPARQL

[3]
http://kidehen.blogspot.com/2015/11/generating-linked-data-from-open-data.html
-- Generating 5-Star Linked Data from Open Data .

Kingsley
>
> -------------------------------------------------------
>
>
>
> On Wed, Oct 28, 2015 at 5:17 AM, Tim Haynes <thay...@openlinksw.com
> <mailto:thay...@openlinksw.com>> wrote:
>
>
>     On 27 October 2015 at 20:49, Haag, Jason <jhaa...@gmail.com
>     <mailto:jhaa...@gmail.com>> wrote:
>
>         I think I know the answer to my last two questions. I had
>         additional html files below the /verbs/ directory. I believe
>         that is where the duplicates came from. I'm guessing sponger
>         also looks for any html files at the specified path, not just
>         the "index.html" file that was specified as a target URL. Can
>         anyone verify this? 
>
>
>     Hi,
>
>     It's unlikely - I don't know of anything in the Sponger that
>     implements directory browsing, but it may well be following e.g.
>     <link rel="alternate" href="...." /> to RSS/Atom feeds, etc.
>
>     As Kingsley says, Faceted Browser will show you what graphs the
>     triples appear in.
>
>     When a page is sponged, its URL becomes 1:1 the graph IRI in which
>     data from/about/in that resource is stored. Multiple graphs
>     implies multiple sponging events.
>
>     HTH,
>
>     ~Tim
>     -- 
>     Tim Haynes
>     Product Development Consultant
>     OpenLink Software
>     <http://www.openlinksw.com/>
>     <http://twitter.com/openlink>
>
>
>
>
> ------------------------------------------------------------------------------
>
>
> _______________________________________________
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users


-- 
Regards,

Kingsley Idehen       
Founder & CEO 
OpenLink Software     
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

------------------------------------------------------------------------------
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to