On Wed, 22 Feb 2023 at 04:45, Thad Guidry <thadgui...@gmail.com> wrote:

> Hi Guillaume,
>
> Which file system is used with Blazegraph?  Is it NFS or Ext4, etc.?
> Specifically, the file system used where Journal files are written and
> read from? [1]
> Because looking at the code, it seems there could be cases where
> unreported errors can happen around file locking.
>

We are using Ext4. I don't understand enough about the Blazegraph internals
to know if that might be an issue or not. But given your question, I assume
that the locking issues are probably more related to running on NFS.


> [1]
> https://github.com/blazegraph/database/blob/master/bigdata-core/bigdata/src/java/com/bigdata/journal/FileMetadata.java
>
> Thad
> https://www.linkedin.com/in/thadguidry/
> https://calendly.com/thadguidry/
>
>
> On Wed, Feb 22, 2023 at 5:06 AM Guillaume Lederrey <
> gleder...@wikimedia.org> wrote:
>
>> Hello all!
>>
>> TL;DR: We expect to successfully complete the recent data reload on
>> Wikidata Query Service soon, but we've encountered multiple failures
>> related to the size of the graph, and anticipate that this issue may worsen
>> in the future. Although we succeeded this time, we cannot guarantee that
>> future reload attempts will be successful given the current trend of the
>> data reload process. Thank you for your understanding and patience..
>>
>> Longer version:
>>
>> WDQS is updated from a stream of recent changes on Wikidata, with a
>> maximum delay of ~2 minutes. This process was improved as part of the WDQS
>> Streaming Updater project to ensure data coherence[1] . However, the update
>> process is still imperfect and can lead to data inconsistencies in some
>> cases[2][3]. To address this, we reload the data from dumps a few times per
>> year to reinitialize the system from a known good state.
>>
>> The recent reload of data from dumps started in mid-December and was
>> initially met with some issues related to download and instabilities in
>> Blazegraph, the database used by WDQS[4]. Loading the data into Blazegraph
>> takes a couple of weeks due to the size of the graph, and we had multiple
>> attempts where the reload failed after >90% of the data had been loaded.
>> Our understanding of the issue is that a "race condition" in Blazegraph[5],
>> where subtle timing changes lead to corruption of the journal in some rare
>> cases, is to blame.[6]
>>
>> We want to reassure you that the last reload job was successful on one of
>> our servers. The data still needs to be copied over to all of the WDQS
>> servers, which will take a couple of weeks, but should not bring any
>> additional issues. However, reloading the full data from dumps is becoming
>> more complex as the data size grows, and we wanted to let you know why the
>> process took longer than expected. We understand that data inconsistencies
>> can be problematic, and we appreciate your patience and understanding while
>> we work to ensure the quality and consistency of the data on WDQS.
>>
>> Thank you for your continued support and understanding!
>>
>>
>>     Guillaume
>>
>>
>> [1] https://phabricator.wikimedia.org/T244590
>> [2] https://phabricator.wikimedia.org/T323239
>> [3] https://phabricator.wikimedia.org/T322869
>> [4] https://phabricator.wikimedia.org/T323096
>> [5] https://en.wikipedia.org/wiki/Race_condition#In_software
>> [6] https://phabricator.wikimedia.org/T263110
>>
>> --
>> *Guillaume Lederrey* (he/him)
>> Engineering Manager
>> Wikimedia Foundation <https://wikimediafoundation.org/>
>> _______________________________________________
>> Wikidata mailing list -- wikidata@lists.wikimedia.org
>> Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/7QTJBRU2T3J22SNV4TGBRML4QNBGCEOU/
>> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>>
> _______________________________________________
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/U2T6JKVJFJK7HNQCXNPYBFGSHK4AJQTX/
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>


-- 
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
_______________________________________________
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/JYKC4KYWI4BHSDTHQPSQQWJREOCG44LF/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

Reply via email to