On 11/12/15 5:16 PM, Haag, Jason wrote: > Hi All, > > I have been trying to understand how virtuoso's crawler content import > and sponging features work. I'm currently evaluating virtuoso using > 07.20.3214 VOS. > > I set up three crawl jobs for three different HTML/RDFa files and > received no errors. > > When I attempt to use the sparql interface to query the data it > doesn't show up: > > For example, http://w3id.org/xapi/adb/verbs/ is the target URL of a > crawl job I set up in conductor under content imports. I am using the > xhtml/HTM5 variants cartridge with the following options: > > fallback-mode=no > rdfa=yes > reify_html5md=0 > reify_rdfa=1 > reify_jsonld=0 > reify_all_grddl=0 > reify_html=0 > passthrough_mode=yes > loose=yes > reify_html_misc=no > reify_turtle=no > > If I go to http://54.152.125.100:8890/sparql and use the following > sparql query it returns no results: > > #Query all Verb IRIs > PREFIX xapi: <https://w3id.org/xapi/ontology#> > > SELECT DISTINCT ?Verb > > WHERE { > ?Verb a xapi:Verb . > > } > > > However, the data does start to show up in this query if I > subsequently add http://w3id.org/xapi/adb/verbs/ as the default data > set name / graph IRI in the sparql interface and also select the > sponging option to download all RDF resources. > > Is this sponging option from the sparql interface actually > adding/download the triples?
Yes, "sponging" is our colloquialism for "importing data" from some URL . > Wouldn't this allow anyone to add triples that has access to the > sparql interface? Yes, if you don't apply access controls to your SPARQL endpoint [1] > The faceted search interface seems to indicate so as I did this with > the following graph IRI, http://adlnet.gov/expapi/verbs > > http://54.152.125.100:8890/describe/?url=http%3A%2F%2Fadlnet.gov%2Fexpapi%2Fverbs&sid=4 > > I tried to set up this IRI as a crawl job and it never populated > virtuoso's data store. Did you enable the "store metadata option" and then select relevant cartridge? > But as soon as I add it as a graph IRI using the sparql interface and > sponging it shows up. Is this the expected behavior / by design for > this sparql sponging option? Yes, you can also import data via SPARQL. You can even automatically convert CSV data to 5-Star Linked Data via SPARQL integration with the Sponger [2][3] > I thought graphs and triples could only be added with special SPARQL > permissions and using INSERT. > > I still don't think the crawler feature is working for HTML/RDFa. It > appears to be processing and storing the HTML file in the > repository/locally in virtuoso, but it doesn't seem to actually add > the graph or triples to the database. > > Thanks in advance for your patience and help! > > J Haag Links: [1] http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtSPARQLEndpointProtection -- Protecting Your SPARQL Endpoint [2] http://virtuoso.openlinksw.com/tutorials/sparql/SPARQL_Tutorials_Part_7/SPARQL_Tutorials_Part_7.html#(1) -- Sponger Pragmas and Web Crawling via SPARQL [3] http://kidehen.blogspot.com/2015/11/generating-linked-data-from-open-data.html -- Generating 5-Star Linked Data from Open Data . Kingsley > > ------------------------------------------------------- > > > > On Wed, Oct 28, 2015 at 5:17 AM, Tim Haynes <thay...@openlinksw.com > <mailto:thay...@openlinksw.com>> wrote: > > > On 27 October 2015 at 20:49, Haag, Jason <jhaa...@gmail.com > <mailto:jhaa...@gmail.com>> wrote: > > I think I know the answer to my last two questions. I had > additional html files below the /verbs/ directory. I believe > that is where the duplicates came from. I'm guessing sponger > also looks for any html files at the specified path, not just > the "index.html" file that was specified as a target URL. Can > anyone verify this? > > > Hi, > > It's unlikely - I don't know of anything in the Sponger that > implements directory browsing, but it may well be following e.g. > <link rel="alternate" href="...." /> to RSS/Atom feeds, etc. > > As Kingsley says, Faceted Browser will show you what graphs the > triples appear in. > > When a page is sponged, its URL becomes 1:1 the graph IRI in which > data from/about/in that resource is stored. Multiple graphs > implies multiple sponging events. > > HTH, > > ~Tim > -- > Tim Haynes > Product Development Consultant > OpenLink Software > <http://www.openlinksw.com/> > <http://twitter.com/openlink> > > > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
smime.p7s
Description: S/MIME Cryptographic Signature
------------------------------------------------------------------------------
_______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users