I like the cloud solution of creating a new virtuoso system, doing the load, having plenty of time to test it, then replacing the production instance with the new instance and retiring the production instance.
The main advantage here is that there is no way a screw-up in the load procedure can trash the production system -- even if Virtuoso was entirely reliable, as the data sources grow the rate of exceptional events (say you fill the disk) goes up. The temporary server approach eliminates a lot of headaches and it is good cloud economics. (if you run a server at AMZN for 1 hour a day to update, the cost of your system only goes up by %4). I was having good luck with this approach until Virtuoso 7.2.0 came along and since then I've had problems similar in severity to what the N.I.H. was reporting, it really looked like massive corruption of the data structures, 7.2.1 did not help. I don't know if these issues are fixed in the current TRUNK but if they are it would be nice to get an official release. On Fri, Sep 25, 2015 at 1:31 PM, Haag, Jason <jhaa...@gmail.com> wrote: > > Hi Users, > > I'm trying to determine the best option for my situation for importing RDF > data into Virtuoso. Here's my situation: > > I currently have several RDF datasets available on my server. Each data > set has an RDF dump available as RDF/XML, JSON-LD, and Turtle. These dumps > are generated automatically without virtuoso from an HTML page marked up > using RDFa. > > What is the best option for automating the import of this data on a > regular basis into the virtuoso DB? The datasets may grow so it should not > just import the data once, but import on a regular basis, perhaps daily or > weekly. > > Based on what I've read in the documentation, this crawler option seems > like the most appropriate option for my situation: > http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtSetCrawlerJobsGuideDirectories > > Can anyone verify if this would be the best approach? Does anyone know if > the crawler supports RDFa/HTML or should it point to a specific directory > with only the RDF dump files? > > Thanks in advance! > > J Haag > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users > > -- Paul Houle *Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes* (607) 539 6254 paul.houle on Skype ontolo...@gmail.com :BaseKB -- Query Freebase Data With SPARQL http://basekb.com/gold/ Legal Entity Identifier Lookup https://legalentityidentifier.info/lei/lookup/ <http://legalentityidentifier.info/lei/lookup/> Join our Data Lakes group on LinkedIn https://www.linkedin.com/grp/home?gid=8267275
------------------------------------------------------------------------------
_______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users