I did a bit of digging and it sure looks as if there is a race condition in
rdf_rl_lang_id in ttlpv.sql.   This code appears to check to see if the
language tag is already in DB.DBA.RDF_LANGUAGE and adds it if not.  But
another thread could do the same insert between the check and the insert, as
far as I can tell.

It looks to me as if the right solution is to do a soft insert and a
subsequent query instead of a hard insert.

However, I don't understand how locking works in SQL so there may be something
that prevents another thread from interfering.

peter


On 12/18/18 8:55 AM, Peter F. Patel-Schneider wrote:
> I'm loading the Turtle Wikidata RDF complete dump, split into pieces and
> loaded with 10 active readers.   About half the time the load fails with one
> or more of these errors.  The errors are always near the beginning of the
> load---in the first group of 10 files to be loaded and near the beginning of
> the files (generally in the first couple of hundred lines in a file of size
> well over 1 GB).  No errors occur for any files beyond the first ten.
> 
> I could provide the files, but they total to about 340GB.
> 
> It sure looks as if there is some sort of bug when loading RDF language-tagged
> strings, where a race condition means that two threads are trying to load the
> same language tag into DB.DBA.RDF_LANGUAGE.  This would explain why the
> problem occurs only at the beginning of the load, when the language tags are
> being added to DB.DBA.RDF_LANGUAGE, and not later.  It would also explain why
> the errors are different between different runs.  (The only other explanation
> would be hardware errors, but this doesn't seem to be viable.)
> 
> It seems to me that a quick patch for this problem would be to change the
> insert into a soft insert, but I don't know where to make this change in the 
> code.
> 
> peter
> 
> 
> 
> 
> On 12/11/18 7:11 PM, Hugh Williams wrote:
>> Hi Peter,
>>
>> The triple value do indeed appear to be valid, but the problem could be
>> somewhere else in the dataset file and not necessarily on the reported line 
>> or
>> line before it.
>>
>> Is it a public dataset you are loading and if so can you provide a copy for
>> local testing ?
>>
>> Best Regards
>> Hugh Williams
>> Professional Services
>> OpenLink Software
>> Home Page: http://www.openlinksw.com
>> Community Support: https://community.openlinksw.com
>> Weblogs (Blogs):
>> Company Blog: https://medium.com/openlink-software-blog
>> Virtuoso Blog: https://medium.com/virtuoso-blog
>> Data Access Drivers
>> Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>> LinkedIn -- http://www.linkedin.com/company/openlink-software/
>> Twitter  -- http://twitter.com/OpenLink
>> Google+  -- http://plus.google.com/100570109519069333827/
>> Facebook -- http://www.facebook.com/OpenLinkSoftware
>> Universal Data Access, Integration, and Management Technology Providers
>>
>>
>>
>>
>>> On 11 Dec 2018, at 17:45, Peter F. Patel-Schneider <pfpschnei...@gmail.com
>>> <mailto:pfpschnei...@gmail.com>> wrote:
>>>
>>> I'm loading a bunch of Turtle files and I'm getting the error
>>>
>>> 2300 TURTLE RDF loader, line 1012: SR197: Non unique primary key on
>>> DB.DBA.RDF_LANGUAGE
>>>
>>> The line in question looks fine:
>>>
>>>   "Wikimedia template"@ki,
>>>
>>> The line before it may indicate the issue
>>>
>>>    "Wikimedia template"@kg,
>>>
>>> Nonetheless this should be valid RDF so there appears to be a bug in 
>>> Virtuoso
>>> here.
>>>
>>> Is there any workaround?
>>>
>>>
>>> This is in Virtuoso 07.20.3230.
>>>
>>> peter
>>>
>>>
>>> _______________________________________________
>>> Virtuoso-users mailing list
>>> Virtuoso-users@lists.sourceforge.net
>>> <mailto:Virtuoso-users@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>>


_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to