I like the cloud solution of creating a new virtuoso system,  doing the
load,  having plenty of time to test it,  then replacing the production
instance with the new instance and retiring the production instance.

The main advantage here is that there is no way a screw-up in the load
procedure can trash the production system --  even if Virtuoso was entirely
reliable,  as the data sources grow the rate of exceptional events (say you
fill the disk) goes up.  The temporary server approach eliminates a lot of
headaches and it is good cloud economics.  (if you run a server at AMZN for
1 hour a day to update,  the cost of your system only goes up by %4).

I was having good luck with this approach until Virtuoso 7.2.0 came along
and since then I've had problems similar in severity to what the N.I.H. was
reporting,  it really looked like massive corruption of the data
structures,  7.2.1 did not help.

I don't know if these issues are fixed in the current TRUNK but if they are
it would be nice to get an official release.

On Fri, Sep 25, 2015 at 1:31 PM, Haag, Jason <jhaa...@gmail.com> wrote:

>
> Hi Users,
>
> I'm trying to determine the best option for my situation for importing RDF
> data into Virtuoso. Here's my situation:
>
> I currently have several RDF datasets available on my server. Each data
> set has an RDF dump available as RDF/XML, JSON-LD, and Turtle. These dumps
> are generated automatically without virtuoso from an HTML page marked up
> using RDFa.
>
> What is the best option for automating the import of this data on a
> regular basis into the virtuoso DB? The datasets may grow so it should not
> just import the data once, but import on a regular basis, perhaps daily or
> weekly.
>
> Based on what I've read in the documentation, this crawler option seems
> like the most appropriate option for my situation:
> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtSetCrawlerJobsGuideDirectories
>
> Can anyone verify if this would be the best approach? Does anyone know if
> the crawler supports RDFa/HTML or should it point to a specific directory
> with only the RDF dump files?
>
> Thanks in advance!
>
> J Haag
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>
>


-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   ontolo...@gmail.com

:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/

Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>

Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275
------------------------------------------------------------------------------
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to