> Hi All,
> I have been trying to understand how virtuoso's crawler content import
> and sponging features work. I'm currently evaluating virtuoso using
> 07.20.3214 VOS. 
> I set up three crawl jobs for three different HTML/RDFa files and
> received no errors. 
> When I attempt to use the sparql interface to query the data it
> doesn't show up:
> For example, http://w3id.org/xapi/adb/verbs/ is the target URL of a
> crawl job I set up in conductor under content imports. I am using the
> xhtml/HTM5 variants cartridge with the following options:
> fallback-mode=no
> rdfa=yes
> reify_html5md=0
> reify_rdfa=1
> reify_jsonld=0
> reify_all_grddl=0
> reify_html=0
> passthrough_mode=yes
> loose=yes
> reify_html_misc=no
> reify_turtle=no 
> If I go to and use the following
> sparql query it returns no results:
> #Query all Verb IRIs
> PREFIX xapi: <https://w3id.org/xapi/ontology#>
>    ?Verb a xapi:Verb .
> } 
> However, the data does start to show up in this query if I
> subsequently add http://w3id.org/xapi/adb/verbs/ as the default data
> set name / graph IRI in the sparql interface and also select the
> sponging option to download all RDF resources. 
> Is this sponging option from the sparql interface actually
> adding/download the triples?

Yes, "sponging" is our colloquialism for "importing data" from some URL .

> Wouldn't this allow anyone to add triples that has access to the
> sparql interface?

Yes, if you don't apply access controls to your SPARQL endpoint [1]

> The faceted search interface seems to indicate so as I did this with
> the following graph IRI, http://adlnet.gov/expapi/verbs
> I tried to set up this IRI as a crawl job and it never populated
> virtuoso's data store.

Did you enable the "store metadata option" and then select relevant

> But as soon as I add it as a graph IRI using the sparql interface and
> sponging it shows up. Is this the expected behavior / by design for
> this sparql sponging option?

Yes, you can also import data via SPARQL. You can even automatically
convert CSV data to 5-Star Linked Data via SPARQL integration with the
Sponger [2][3]

> I thought graphs and triples could only be added with special SPARQL
> permissions and using INSERT. 
> I still don't think the crawler feature is working for HTML/RDFa. It
> appears to be processing and storing the HTML file in the
> repository/locally in virtuoso, but it doesn't seem to actually add
> the graph or triples to the database. 
> Thanks in advance for your patience and help!
> J Haag


-- Protecting Your SPARQL Endpoint

-- Sponger Pragmas and Web Crawling via SPARQL

-- Generating 5-Star Linked Data from Open Data .

>         I think I know the answer to my last two questions. I had
>         additional html files below the /verbs/ directory. I believe
>         that is where the duplicates came from. I'm guessing sponger
>         also looks for any html files at the specified path, not just
>         the "index.html" file that was specified as a target URL. Can
>         anyone verify this? 
>     Hi,
>     It's unlikely - I don't know of anything in the Sponger that
>     implements directory browsing, but it may well be following e.g.
>     <link rel="alternate" href="...." /> to RSS/Atom feeds, etc.
>     As Kingsley says, Faceted Browser will show you what graphs the
>     triples appear in.
>     When a page is sponged, its URL becomes 1:1 the graph IRI in which
>     data from/about/in that resource is stored. Multiple graphs
>     implies multiple sponging events.
>     HTH,
>     ~Tim
