Hi Peter,
I generated the datasets from your python script and loaded them into a local
Virtuoso open source multiple times but did not see any occurrences of the
error:
SQL> select * from load_list;
ll_file
ll_graph
ll_state ll_started ll_done ll_host
ll_work_time ll_error
VARCHAR NOT NULL
VARCHAR
INTEGER TIMESTAMP TIMESTAMP INTEGER INTEGER
VARCHAR
_______________________________________________________________________________
./wikidata/test00.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.54 983316000 0
NULL NULL
./wikidata/test01.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 105660000 0
NULL NULL
./wikidata/test02.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 233562000 0
NULL NULL
./wikidata/test03.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 371457000 0
NULL NULL
./wikidata/test04.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 483846000 0
NULL NULL
./wikidata/test05.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 621974000 0
NULL NULL
./wikidata/test06.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 742255000 0
NULL NULL
./wikidata/test07.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 860062000 0
NULL NULL
./wikidata/test08.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 993561000 0
NULL NULL
./wikidata/test09.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.56 140431000 0
NULL NULL
./wikidata/test10.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.54 985386000 0
NULL NULL
./wikidata/test11.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 109072000 0
NULL NULL
./wikidata/test12.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 230846000 0
NULL NULL
./wikidata/test13.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 375427000 0
NULL NULL
./wikidata/test14.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 486963000 0
NULL NULL
./wikidata/test15.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 624303000 0
NULL NULL
./wikidata/test16.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 745760000 0
NULL NULL
./wikidata/test17.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 862932000 0
NULL NULL
./wikidata/test18.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 995704000 0
NULL NULL
./wikidata/test19.ttl
http://test.nuance.com
2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.56 144745000 0
NULL NULL
20 Rows. -- 3 msec.
SQL> sparql select count(*) from <http://test.nuance.com> where {?s ?p ?o};
callret-0
INTEGER
_______________________________________________________________________________
135402
1 Rows. -- 85 msec.
SQL> status('');
REPORT
VARCHAR
_______________________________________________________________________________
OpenLink Virtuoso Server
Version 07.20.3230-pthreads for Darwin as of Nov 10 2018
Started on: 2018-12-19 00:36 GMT+0
Best Regards
Hugh Williams
Professional Services
OpenLink Software
Home Page: http://www.openlinksw.com <http://www.openlinksw.com/>
Community Support: https://community.openlinksw.com
<https://community.openlinksw.com/>
Weblogs (Blogs):
Company Blog: https://medium.com/openlink-software-blog
<https://medium.com/openlink-software-blog>
Virtuoso Blog: https://medium.com/virtuoso-blog
<https://medium.com/virtuoso-blog>
Data Access Drivers Blog:
https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
<https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers>
LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter -- http://twitter.com/OpenLink
Google+ -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers
> On 18 Dec 2018, at 17:24, Peter F. Patel-Schneider <[email protected]>
> wrote:
>
> I created some synthetic data that tickles the bug reliably on my machine with
> a standard virtuoso.ini (just adding the directory for the files to the
> allowed list). I'm attaching the generator program for the files and a
> loading script.
>
> peter
>
>
> On 12/18/18 9:46 AM, Peter F. Patel-Schneider wrote:
>> I did a bit of digging and it sure looks as if there is a race condition in
>> rdf_rl_lang_id in ttlpv.sql. This code appears to check to see if the
>> language tag is already in DB.DBA.RDF_LANGUAGE and adds it if not. But
>> another thread could do the same insert between the check and the insert, as
>> far as I can tell.
>>
>> It looks to me as if the right solution is to do a soft insert and a
>> subsequent query instead of a hard insert.
>>
>> However, I don't understand how locking works in SQL so there may be
>> something
>> that prevents another thread from interfering.
>>
>> peter
>>
>>
>> On 12/18/18 8:55 AM, Peter F. Patel-Schneider wrote:
>>> I'm loading the Turtle Wikidata RDF complete dump, split into pieces and
>>> loaded with 10 active readers. About half the time the load fails with one
>>> or more of these errors. The errors are always near the beginning of the
>>> load---in the first group of 10 files to be loaded and near the beginning of
>>> the files (generally in the first couple of hundred lines in a file of size
>>> well over 1 GB). No errors occur for any files beyond the first ten.
>>>
>>> I could provide the files, but they total to about 340GB.
>>>
>>> It sure looks as if there is some sort of bug when loading RDF
>>> language-tagged
>>> strings, where a race condition means that two threads are trying to load
>>> the
>>> same language tag into DB.DBA.RDF_LANGUAGE. This would explain why the
>>> problem occurs only at the beginning of the load, when the language tags are
>>> being added to DB.DBA.RDF_LANGUAGE, and not later. It would also explain
>>> why
>>> the errors are different between different runs. (The only other
>>> explanation
>>> would be hardware errors, but this doesn't seem to be viable.)
>>>
>>> It seems to me that a quick patch for this problem would be to change the
>>> insert into a soft insert, but I don't know where to make this change in
>>> the code.
>>>
>>> peter
>>>
>>>
>>>
>>>
>>> On 12/11/18 7:11 PM, Hugh Williams wrote:
>>>> Hi Peter,
>>>>
>>>> The triple value do indeed appear to be valid, but the problem could be
>>>> somewhere else in the dataset file and not necessarily on the reported
>>>> line or
>>>> line before it.
>>>>
>>>> Is it a public dataset you are loading and if so can you provide a copy for
>>>> local testing ?
>>>>
>>>> Best Regards
>>>> Hugh Williams
>>>> Professional Services
>>>> OpenLink Software
>>>> Home Page: http://www.openlinksw.com
>>>> Community Support: https://community.openlinksw.com
>>>> Weblogs (Blogs):
>>>> Company Blog: https://medium.com/openlink-software-blog
>>>> Virtuoso Blog: https://medium.com/virtuoso-blog
>>>> Data Access Drivers
>>>> Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>>> LinkedIn -- http://www.linkedin.com/company/openlink-software/
>>>> Twitter -- http://twitter.com/OpenLink
>>>> Google+ -- http://plus.google.com/100570109519069333827/
>>>> Facebook -- http://www.facebook.com/OpenLinkSoftware
>>>> Universal Data Access, Integration, and Management Technology Providers
>>>>
>>>>
>>>>
>>>>
>>>>> On 11 Dec 2018, at 17:45, Peter F. Patel-Schneider <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>> I'm loading a bunch of Turtle files and I'm getting the error
>>>>>
>>>>> 2300 TURTLE RDF loader, line 1012: SR197: Non unique primary key on
>>>>> DB.DBA.RDF_LANGUAGE
>>>>>
>>>>> The line in question looks fine:
>>>>>
>>>>> "Wikimedia template"@ki,
>>>>>
>>>>> The line before it may indicate the issue
>>>>>
>>>>> "Wikimedia template"@kg,
>>>>>
>>>>> Nonetheless this should be valid RDF so there appears to be a bug in
>>>>> Virtuoso
>>>>> here.
>>>>>
>>>>> Is there any workaround?
>>>>>
>>>>>
>>>>> This is in Virtuoso 07.20.3230.
>>>>>
>>>>> peter
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Virtuoso-users mailing list
>>>>> [email protected]
>>>>> <mailto:[email protected]>
>>>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>>>>
> <generate.py><test.sh>_______________________________________________
> Virtuoso-users mailing list
> [email protected]
> <mailto:[email protected]>
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
> <https://lists.sourceforge.net/lists/listinfo/virtuoso-users>
_______________________________________________
Virtuoso-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/virtuoso-users