I created some synthetic data that tickles the bug reliably on my machine with a standard virtuoso.ini (just adding the directory for the files to the allowed list). I'm attaching the generator program for the files and a loading script.
peter On 12/18/18 9:46 AM, Peter F. Patel-Schneider wrote: > I did a bit of digging and it sure looks as if there is a race condition in > rdf_rl_lang_id in ttlpv.sql. This code appears to check to see if the > language tag is already in DB.DBA.RDF_LANGUAGE and adds it if not. But > another thread could do the same insert between the check and the insert, as > far as I can tell. > > It looks to me as if the right solution is to do a soft insert and a > subsequent query instead of a hard insert. > > However, I don't understand how locking works in SQL so there may be something > that prevents another thread from interfering. > > peter > > > On 12/18/18 8:55 AM, Peter F. Patel-Schneider wrote: >> I'm loading the Turtle Wikidata RDF complete dump, split into pieces and >> loaded with 10 active readers. About half the time the load fails with one >> or more of these errors. The errors are always near the beginning of the >> load---in the first group of 10 files to be loaded and near the beginning of >> the files (generally in the first couple of hundred lines in a file of size >> well over 1 GB). No errors occur for any files beyond the first ten. >> >> I could provide the files, but they total to about 340GB. >> >> It sure looks as if there is some sort of bug when loading RDF >> language-tagged >> strings, where a race condition means that two threads are trying to load the >> same language tag into DB.DBA.RDF_LANGUAGE. This would explain why the >> problem occurs only at the beginning of the load, when the language tags are >> being added to DB.DBA.RDF_LANGUAGE, and not later. It would also explain why >> the errors are different between different runs. (The only other explanation >> would be hardware errors, but this doesn't seem to be viable.) >> >> It seems to me that a quick patch for this problem would be to change the >> insert into a soft insert, but I don't know where to make this change in the >> code. >> >> peter >> >> >> >> >> On 12/11/18 7:11 PM, Hugh Williams wrote: >>> Hi Peter, >>> >>> The triple value do indeed appear to be valid, but the problem could be >>> somewhere else in the dataset file and not necessarily on the reported line >>> or >>> line before it. >>> >>> Is it a public dataset you are loading and if so can you provide a copy for >>> local testing ? >>> >>> Best Regards >>> Hugh Williams >>> Professional Services >>> OpenLink Software >>> Home Page: http://www.openlinksw.com >>> Community Support: https://community.openlinksw.com >>> Weblogs (Blogs): >>> Company Blog: https://medium.com/openlink-software-blog >>> Virtuoso Blog: https://medium.com/virtuoso-blog >>> Data Access Drivers >>> Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers >>> LinkedIn -- http://www.linkedin.com/company/openlink-software/ >>> Twitter -- http://twitter.com/OpenLink >>> Google+ -- http://plus.google.com/100570109519069333827/ >>> Facebook -- http://www.facebook.com/OpenLinkSoftware >>> Universal Data Access, Integration, and Management Technology Providers >>> >>> >>> >>> >>>> On 11 Dec 2018, at 17:45, Peter F. Patel-Schneider <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> I'm loading a bunch of Turtle files and I'm getting the error >>>> >>>> 2300 TURTLE RDF loader, line 1012: SR197: Non unique primary key on >>>> DB.DBA.RDF_LANGUAGE >>>> >>>> The line in question looks fine: >>>> >>>> "Wikimedia template"@ki, >>>> >>>> The line before it may indicate the issue >>>> >>>> "Wikimedia template"@kg, >>>> >>>> Nonetheless this should be valid RDF so there appears to be a bug in >>>> Virtuoso >>>> here. >>>> >>>> Is there any workaround? >>>> >>>> >>>> This is in Virtuoso 07.20.3230. >>>> >>>> peter >>>> >>>> >>>> _______________________________________________ >>>> Virtuoso-users mailing list >>>> [email protected] >>>> <mailto:[email protected]> >>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users >>>
#!/usr/local/bin/python2.7
for x in range (0,20) :
file = open('test{:0>2d}.ttl'.format(x),'w')
for k in range(0,10) :
file.write('<http://www.wikidata.org/entity/Q{:0>2d}{:0>2d}> <http://www.w3.org/2000/01/rdf-schema#label>\n'.format(x,k))
for y in range (ord('a'),ord('z')+1) :
for z in range (ord('a'),ord('z')+1) :
file.write(' "description {:0>2d}{:0>3d}{:0>3d}"@l{:s}{:s},\n'.format(x,y,z,chr(y),chr(z)))
file.write(' "JUNK".\n')
file.close()
test.sh
Description: application/shellscript
_______________________________________________ Virtuoso-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/virtuoso-users
