I want to link two Linked Open Data sets (specifically dbpedia and data.deichman.no) that are 800,000 and 200,000 records respectively. The only way I have of accessing these are open SPARQL endpoints. Both run Virtuoso, but I have no JDBC, iSQL or any other form of access except SPARQL.
Trying to do simple paging via order by ?uri limit 1000 offset 4000 and stepping up the offset doesn't work, because I get Virtuoso 22023 Error SR353: Sorted TOP clause specifies more then 10001000 rows to sort. Only 10000 are allowed. Either decrease the offset and/or row count or use a scrollable cursor I've tried simply turning off paging and trying to get the entire data set in one go, but then the result set is chopped off at 10,000 rows. I see that this can be solved in SQL with scrollable cursors, and also via JDBC: http://boards.openlinksw.com/phpBB3/viewtopic.php?f=12&t=1452 but that doesn't help me at all. :-( This is a recurring problem for me, as I'm developing a record linkage tool[1] and use open data sets to try it out. Many of the open data sets are hosted in Virtuoso, and so this keeps hitting me. Is there any general way to solve this or work around it? [1] http://code.google.com/p/duke/ --Lars M. http://www.garshol.priv.no/tmphoto/ http://www.garshol.priv.no/blog/
