Re: [Wikidata] Delta Dumps Production?

2021-02-26 Thread Mike Pham
Hi all,

Thanks for bringing this topic up. While we’re unfortunately not ready at
the moment to expose anything yet, this seems like a good feature to
provide in the future based on the interest.

If you’re able to provide a bit more information about your use cases and
problems/pain points with the current way things work, it’d be helpful for
us in planning a good solution.

Thanks!




—

Mike Pham (he/him)
Sr Product Manager, Search Platform
Wikimedia Foundation 

On 26February, 2021 at 10:37:16, Samuel Klein (meta...@gmail.com) wrote:

+1 to this.  Incremental dumps (even if just weekly) would be extremely
useful.

On Fri, Feb 26, 2021 at 9:58 AM Chris Hokamp  wrote:

> Hi, first time chiming in here, this topic is very relevant for our
> wikidata usecase as well, having daily or weekly diffs would be very useful.
>
> Cheers,
> CH
>
> On Fri, Feb 26, 2021 at 2:47 PM Kingsley Idehen via Wikidata <
> wikidata@lists.wikimedia.org> wrote:
>
>> On 2/26/21 3:46 AM, Guillaume Lederrey wrote:
>>
>> Hello!
>>
>> We are working on a new update process for WDQS, based on a stream of
>> changes [1]. While not exactly the solution you are looking for, this might
>> be a building block for differential dumps. For example by aggregating the
>> stream of changes over a period of time.
>>
>> Note that at this point, the stream of changes that we construct is
>> published to an internal Kafka that isn't exposed to the internet. If there
>> is enough interest, we might be able to expose it in some form.
>>
>> Have fun!
>>
>>Guillaume
>>
>>
>>
>> [1] https://phabricator.wikimedia.org/T244590
>>
>>
>> Hi Guillaume,
>>
>> I am very interested in exposure right now since we are trying to have an
>> up-to-date mirror of Wikidata.
>>
>> We can discuss offline if you like.
>>
>>
>> Kingsley
>>
>>
>>
>> On Fri, Feb 26, 2021 at 8:49 AM Federico Leva (Nemo) 
>> wrote:
>>
>>> Kingsley Idehen via Wikidata, 25/02/21 19:26:
>>> > Is there a mechanism in place for producing and publishing
>>> delta-centric
>>> > dumps for Wikidata?
>>>
>>> There's
>>> https://phabricator.wikimedia.org/T72246
>>>
>>> Magnus Manske used to maintain some biweekly dumps as part of its WDQ
>>> service, IIRC.
>>>
>>> Federico
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> --
>> *Guillaume Lederrey* (he/him)
>> Engineering Manager
>> Wikimedia Foundation 
>>
>> ___
>> Wikidata mailing 
>> listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>> --
>> Regards,
>>
>> Kingsley Idehen  
>> Founder & CEO
>> OpenLink Software
>> Home Page: http://www.openlinksw.com
>> Community Support: https://community.openlinksw.com
>> Weblogs (Blogs):
>> Company Blog: https://medium.com/openlink-software-blog
>> Virtuoso Blog: https://medium.com/virtuoso-blog
>> Data Access Drivers Blog: 
>> https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>
>> Personal Weblogs (Blogs):
>> Medium Blog: https://medium.com/@kidehen
>> Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
>>   http://kidehen.blogspot.com
>>
>> Profile Pages:
>> Pinterest: https://www.pinterest.com/kidehen/
>> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
>> Twitter: https://twitter.com/kidehen
>> Google+: https://plus.google.com/+KingsleyIdehen/about
>> LinkedIn: http://www.linkedin.com/in/kidehen
>>
>> Web Identities (WebID):
>> Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
>> : 
>> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 
Samuel Klein  @metasj   w:user:sj  +1 617 529 4266
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Delta Dumps Production?

2021-02-26 Thread Samuel Klein
+1 to this.  Incremental dumps (even if just weekly) would be extremely
useful.

On Fri, Feb 26, 2021 at 9:58 AM Chris Hokamp  wrote:

> Hi, first time chiming in here, this topic is very relevant for our
> wikidata usecase as well, having daily or weekly diffs would be very useful.
>
> Cheers,
> CH
>
> On Fri, Feb 26, 2021 at 2:47 PM Kingsley Idehen via Wikidata <
> wikidata@lists.wikimedia.org> wrote:
>
>> On 2/26/21 3:46 AM, Guillaume Lederrey wrote:
>>
>> Hello!
>>
>> We are working on a new update process for WDQS, based on a stream of
>> changes [1]. While not exactly the solution you are looking for, this might
>> be a building block for differential dumps. For example by aggregating the
>> stream of changes over a period of time.
>>
>> Note that at this point, the stream of changes that we construct is
>> published to an internal Kafka that isn't exposed to the internet. If there
>> is enough interest, we might be able to expose it in some form.
>>
>> Have fun!
>>
>>Guillaume
>>
>>
>>
>> [1] https://phabricator.wikimedia.org/T244590
>>
>>
>> Hi Guillaume,
>>
>> I am very interested in exposure right now since we are trying to have an
>> up-to-date mirror of Wikidata.
>>
>> We can discuss offline if you like.
>>
>>
>> Kingsley
>>
>>
>>
>> On Fri, Feb 26, 2021 at 8:49 AM Federico Leva (Nemo) 
>> wrote:
>>
>>> Kingsley Idehen via Wikidata, 25/02/21 19:26:
>>> > Is there a mechanism in place for producing and publishing
>>> delta-centric
>>> > dumps for Wikidata?
>>>
>>> There's
>>> https://phabricator.wikimedia.org/T72246
>>>
>>> Magnus Manske used to maintain some biweekly dumps as part of its WDQ
>>> service, IIRC.
>>>
>>> Federico
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> --
>> *Guillaume Lederrey* (he/him)
>> Engineering Manager
>> Wikimedia Foundation 
>>
>> ___
>> Wikidata mailing 
>> listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>> --
>> Regards,
>>
>> Kingsley Idehen  
>> Founder & CEO
>> OpenLink Software
>> Home Page: http://www.openlinksw.com
>> Community Support: https://community.openlinksw.com
>> Weblogs (Blogs):
>> Company Blog: https://medium.com/openlink-software-blog
>> Virtuoso Blog: https://medium.com/virtuoso-blog
>> Data Access Drivers Blog: 
>> https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>
>> Personal Weblogs (Blogs):
>> Medium Blog: https://medium.com/@kidehen
>> Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
>>   http://kidehen.blogspot.com
>>
>> Profile Pages:
>> Pinterest: https://www.pinterest.com/kidehen/
>> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
>> Twitter: https://twitter.com/kidehen
>> Google+: https://plus.google.com/+KingsleyIdehen/about
>> LinkedIn: http://www.linkedin.com/in/kidehen
>>
>> Web Identities (WebID):
>> Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
>> : 
>> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 
Samuel Klein  @metasj   w:user:sj  +1 617 529 4266
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Delta Dumps Production?

2021-02-26 Thread Chris Hokamp
Hi, first time chiming in here, this topic is very relevant for our
wikidata usecase as well, having daily or weekly diffs would be very useful.

Cheers,
CH

On Fri, Feb 26, 2021 at 2:47 PM Kingsley Idehen via Wikidata <
wikidata@lists.wikimedia.org> wrote:

> On 2/26/21 3:46 AM, Guillaume Lederrey wrote:
>
> Hello!
>
> We are working on a new update process for WDQS, based on a stream of
> changes [1]. While not exactly the solution you are looking for, this might
> be a building block for differential dumps. For example by aggregating the
> stream of changes over a period of time.
>
> Note that at this point, the stream of changes that we construct is
> published to an internal Kafka that isn't exposed to the internet. If there
> is enough interest, we might be able to expose it in some form.
>
> Have fun!
>
>Guillaume
>
>
>
> [1] https://phabricator.wikimedia.org/T244590
>
>
> Hi Guillaume,
>
> I am very interested in exposure right now since we are trying to have an
> up-to-date mirror of Wikidata.
>
> We can discuss offline if you like.
>
>
> Kingsley
>
>
>
> On Fri, Feb 26, 2021 at 8:49 AM Federico Leva (Nemo) 
> wrote:
>
>> Kingsley Idehen via Wikidata, 25/02/21 19:26:
>> > Is there a mechanism in place for producing and publishing delta-centric
>> > dumps for Wikidata?
>>
>> There's
>> https://phabricator.wikimedia.org/T72246
>>
>> Magnus Manske used to maintain some biweekly dumps as part of its WDQ
>> service, IIRC.
>>
>> Federico
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> --
> *Guillaume Lederrey* (he/him)
> Engineering Manager
> Wikimedia Foundation 
>
> ___
> Wikidata mailing 
> listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> --
> Regards,
>
> Kingsley Idehen   
> Founder & CEO
> OpenLink Software
> Home Page: http://www.openlinksw.com
> Community Support: https://community.openlinksw.com
> Weblogs (Blogs):
> Company Blog: https://medium.com/openlink-software-blog
> Virtuoso Blog: https://medium.com/virtuoso-blog
> Data Access Drivers Blog: 
> https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>
> Personal Weblogs (Blogs):
> Medium Blog: https://medium.com/@kidehen
> Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
>   http://kidehen.blogspot.com
>
> Profile Pages:
> Pinterest: https://www.pinterest.com/kidehen/
> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
> Twitter: https://twitter.com/kidehen
> Google+: https://plus.google.com/+KingsleyIdehen/about
> LinkedIn: http://www.linkedin.com/in/kidehen
>
> Web Identities (WebID):
> Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
> : 
> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Delta Dumps Production?

2021-02-26 Thread Kingsley Idehen via Wikidata
On 2/26/21 3:46 AM, Guillaume Lederrey wrote:
> Hello!
>
> We are working on a new update process for WDQS, based on a stream of
> changes [1]. While not exactly the solution you are looking for, this
> might be a building block for differential dumps. For example by
> aggregating the stream of changes over a period of time.
>
> Note that at this point, the stream of changes that we construct is
> published to an internal Kafka that isn't exposed to the internet. If
> there is enough interest, we might be able to expose it in some form.
>
> Have fun!
>
>    Guillaume
>
>
>
> [1] https://phabricator.wikimedia.org/T244590
> 


Hi Guillaume,

I am very interested in exposure right now since we are trying to have
an up-to-date mirror of Wikidata.

We can discuss offline if you like.


Kingsley

>
>
> On Fri, Feb 26, 2021 at 8:49 AM Federico Leva (Nemo)
> mailto:nemow...@gmail.com>> wrote:
>
> Kingsley Idehen via Wikidata, 25/02/21 19:26:
> > Is there a mechanism in place for producing and publishing
> delta-centric
> > dumps for Wikidata?
>
> There's
> https://phabricator.wikimedia.org/T72246
> 
>
> Magnus Manske used to maintain some biweekly dumps as part of its WDQ
> service, IIRC.
>
> Federico
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org 
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 
>
>
>
> -- 
>   *Guillaume Lederrey* (he/him)
> Engineering Manager
> Wikimedia Foundation 
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata


-- 
Regards,

Kingsley Idehen   
Founder & CEO 
OpenLink Software   
Home Page: http://www.openlinksw.com
Community Support: https://community.openlinksw.com
Weblogs (Blogs):
Company Blog: https://medium.com/openlink-software-blog
Virtuoso Blog: https://medium.com/virtuoso-blog
Data Access Drivers Blog: 
https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

Personal Weblogs (Blogs):
Medium Blog: https://medium.com/@kidehen
Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
  http://kidehen.blogspot.com

Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
: 
http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this



smime.p7s
Description: S/MIME Cryptographic Signature
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Delta Dumps Production?

2021-02-26 Thread Guillaume Lederrey
Hello!

We are working on a new update process for WDQS, based on a stream of
changes [1]. While not exactly the solution you are looking for, this might
be a building block for differential dumps. For example by aggregating the
stream of changes over a period of time.

Note that at this point, the stream of changes that we construct is
published to an internal Kafka that isn't exposed to the internet. If there
is enough interest, we might be able to expose it in some form.

Have fun!

   Guillaume



[1] https://phabricator.wikimedia.org/T244590


On Fri, Feb 26, 2021 at 8:49 AM Federico Leva (Nemo) 
wrote:

> Kingsley Idehen via Wikidata, 25/02/21 19:26:
> > Is there a mechanism in place for producing and publishing delta-centric
> > dumps for Wikidata?
>
> There's
> https://phabricator.wikimedia.org/T72246
>
> Magnus Manske used to maintain some biweekly dumps as part of its WDQ
> service, IIRC.
>
> Federico
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation 
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Delta Dumps Production?

2021-02-25 Thread Federico Leva (Nemo)

Kingsley Idehen via Wikidata, 25/02/21 19:26:

Is there a mechanism in place for producing and publishing delta-centric
dumps for Wikidata?


There's
https://phabricator.wikimedia.org/T72246

Magnus Manske used to maintain some biweekly dumps as part of its WDQ 
service, IIRC.


Federico

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata