Re: [Wikidata-l] WikiData change propagation for third parties

2013-04-26 Thread Dimitris Kontokostas
Dear Jeremy, all,

In addition to what Sebastian said, in DBpedia Live we use the OAI-PMH
protocol to get update feeds for English, German & Dutch WIkipedia.
This OAI-PMH implementation [1] is very convenient for what we need (and I
guess for most people) because it uses the latest modification date for
update publishing.
So when we ask for updates after time X it returns a list of articles with
modification date after X, no matter how many times they were edited in
between.

This is very easy for you to support (no need for extra hardware, just an
extra table / index) and suited best for most use cases.
What most people need in the end is to know which pages changed since time
X. Fine grained details are for special type of clients.

Best,
Dimitris

[1] http://www.mediawiki.org/wiki/Extension:OAIRepository


On Fri, Apr 26, 2013 at 9:40 AM, Sebastian Hellmann <
hellm...@informatik.uni-leipzig.de> wrote:

> Dear Jeremy,
> please read email from Daniel Kinzler on this list from 26.03.2013 18:26 :
>
>  * A dispatcher needs about 3 seconds to dispatch 1000 changes to a client
>> wiki.
>> * Considering we have ~300 client wikis, this means one dispatcher can
>> handle
>> about 4000 changes per hour.
>> * We currently have two dispatchers running in parallel (on a single box,
>> hume),
>> that makes a capacity of 8000 changes/hour.
>> * We are seeing roughly 17000 changes per hour on wikidata.org - more
>> than twice
>> our dispatch capacity.
>> * I want to try running 6 dispatcher processes; that would give us the
>> capacity
>> to handle 24000 changes per hour (assuming linear scaling).
>>
>
> 1.  Somebody needs to run the Hub and it needs to scale. Looks like the
> protocol was intended to save some traffic, not to dispatch a massive
> amount of messages / per day to a large number of clients. Again, I am not
> familiar, how efficient PubSubHubbub is. What kind of hardware is needed to
> run this, effectively? Do you have experience with this?
>
> 2. Somebody will still need to run and maintain the Hub and feed all
> clients. I was offering to host one of the hubs for DBpedia users, but I am
> not sure, whether we have that capacity.
>
> So we should use IRC RC + http request to the changed page as fallback?
>
> Sebastian
>
> Am 26.04.2013 08:06, schrieb Jeremy Baron:
>
>Hi,
>>
>> On Fri, Apr 26, 2013 at 5:29 AM, Sebastian Hellmann
>> >
>> wrote:
>>
>>> Well, PubSubHubbub is a nice idea. However it clearly depends on two
>>> factors:
>>> 1. whether Wikidata sets up such an infrastructure (I need to check
>>> whether we have capacities, I am not sure atm)
>>>
>> Capacity for what? the infrastructure should be not be a problem.
>> (famous last words, can look more closely tomorrow. but I'm really not
>> worried about it) And you don't need any infrastructure at all for
>> development; just use one of google's public instances.
>>
>>  2. whether performance is good enough to handle high-volume publishers
>>>
>> Again, how do you mean?
>>
>>  Basically, polling to recent changes [1] and then do a http request to
>>> the individual pages should be fine for a start. So I guess this is what we
>>> will implement, if there aren't any better suggestions.
>>> The whole issue is problematic and the DBpedia project would be happy,
>>> if this were discussed and decided right now, so we can plan development.
>>>
>>> What is the best practice to get updates from Wikipedia at the moment?
>>>
>> I believe just about everyone uses the IRC feed from
>> irc.wikimedia.org.
>> https://meta.wikimedia.org/**wiki/IRC/Channels#Raw_feeds
>>
>> I imagine wikidata will or maybe already does propagate changes to a
>> channel on that server but I can imagine IRC would not be a good
>> method for many Instant data repo users. Some will not be able to
>> sustain a single TCP connection for extended periods, some will not be
>> able to use IRC ports at all, and some may go offline periodically.
>> e.g. a server on a laptop. AIUI, PubSubHubbub has none of those
>> problems and is better than the current IRC solution in just about
>> every way.
>>
>> We could potentially even replace the current cross-DB job queue
>> insert crazyness with PubSubHubbub for use on the cluster internally.
>>
>> -Jeremy
>>
>> __**_
>> Wikidata-l mailing list
>> Wikidata-l@lists.wikimedia.org
>> https://lists.wikimedia.org/**mailman/listinfo/wikidata-l
>>
>>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
> http://dbpedia.org/Wiktionary , http://dbpedia.org
> Homepage: 
> http://bis.informatik.uni-**leipzig.de/SebastianHellmann
> Research Group: http://aksw.org
>
> __**_
> Wikidata-l mailing list
> W

Re: [Wikidata-l] [Wikidata-I] [GSoC 2013] Mobilize Wikidata Proposal Draft

2013-04-26 Thread Daniel Kinzler
On 26.04.2013 06:06, Pragun Bhutani wrote:
> Hello,
> 
> I've been having discussions about my GSoC 2013 project with the Wikidata 
> group
> on the IRC(#mediawiki-wikidata) for a few days and have completed the first
> draft of my proposal. I'd really appreciate some feedback on it. I welcome any
> queries that you may have and would love to get tips on how to improve it.
> 
> http://www.mediawiki.org/wiki/User:Pragunbhutani/GSoC_2013_Proposal

Sounds very good to me already!

I'd like to mention though that "make Wikidata usable on mobile devices" and
"make Wikidata editable without JavaScript" are two pretty separate concerns.
While I would love to see the latter happening too, it is not a requirement for
getting the former to work.

So, my advice is: make sure you don't end up trying to do two projects at once.
If there is time for the non-JS editing stuff, great, but if there isn't, a
complete non-JS *view* without full editing capabilities would be sufficient for
supporting the mobile version.

Another note: Daniel Werner and Henning Snater are the people most involved with
designing CSS and JS for the Wikibase UI.

Good luck and have fun,
-- daniel

-- 
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] [Mediawiki-api-announce] BREAKING CHANGE: Wikidata langlinks handling in action=parse, list=allpages, list=langbacklinks, prop=langlinks

2013-04-26 Thread Daniel Kinzler
On 25.04.2013 21:43, Yuri Astrakhan wrote:
> CCing wikidata.
> 
> I don't think this is a good approach. We shouldn't be breaking API just 
> because
> there is a new under-the-hood feature (wikibase). 

This is not a breaking change to the MediaWiki API at all. The hook did not
exist before. Things not using the hook keep working exactly as before.

Only once Wikidata starts using the hook, the behavior of *Wikipedias* API
changes (from including external links to not including them per default).

One could actually see this as fixing a bug: currently, "external" language
links are mis-reported as being "local" language links. This is being fixed.

> From the API client's
> perspective, it should work as before, plus there should be an extra flag
> notifying if the sitelink is stored in wikidata or locally. Sitelinks might be
> the first, but not the last change - e.g. categories, etc.

The "external" links could be included per default by ApiQueryLangLinks; I did
not do this for performance reasons (considering the hook makes paging a lot
more difficult, and may result in a lot more database queries).

Anomie said he'd think about making this less costly.

> As for the implementation, it seems the hook approach might not satisfy all 
> the
> usage scenarios:
> 
> * Given a set of pages (pageset), give all the sitelinks (possibly filtered 
> with
> a set of wanted languages). Rendering page for the UI would use this approach
> with just one page.

You want the hook to work on a more complex structure, changing the link sets
for multiple pages?

Possible, but I don't think it's helpful. For any non-trivial set of pages, we'd
be in danger of running out of memory, and some kind of chunking would be
needed, complicating things even more. Also, implementing a handler for a hook
that handles such a complex structure is quite painful and error prone.
Assembling the result from multiple calls to a simple hook seems to make more
sense to me, which is what I implemented in  Idfcdc53af.

> * langbacklinks - get a list of pages linking to a site.

Yes, that would only consider locally defined links. As I understand, this query
is mainly used to find and fix broken links. So it makes sense to only include
the ones that are actually defined (and fixable) locally.

> * filtering based on having/not having specific langlink for other modules. 
> E.g.
> list all pages that have/don't have a link to a site X.

Same as above.

> * alllanglinks (not yet implemented, but might be to match corresponding
> allcategories, ...) - list all existing langlinks in the site.

Same as above. I believe the sensible semantics it "list all langlinks *defined*
on the site". At least per default.

For alllanglinks, I can imagine how to do this efficiently for the wikibase
case, but not for a generic hook that can manipulate sitelinks.

> We could debate the need of some of these scenarios, but I feel that we
> shouldn't be breaking existing API.

Again: it doesn't. The API reports what is defined locally, and stored in the
API locally, as before. Wikidata starting to use the new hook may break
expectations in the data returned by Wikipedia's API, but that's a separate
issue, I think.

-- daniel

-- 
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] WikiData change propagation for third parties

2013-04-26 Thread Yuri Astrakhan
Recently I spoke with Wikia, and being able to subscribe to the recent
changes feed is a very important feature to them. Apparently polling API's
recent changes creates a much higher stress on the system than subscribing.

Now, we don't need (from the start)  to implement publishing of all the
data - just the fact that certain items have changed, and they can later be
requested by usual means, but it would be good to implement this system for
all of the API, not just wikidata.


On Fri, Apr 26, 2013 at 3:13 AM, Dimitris Kontokostas wrote:

> Dear Jeremy, all,
>
> In addition to what Sebastian said, in DBpedia Live we use the OAI-PMH
> protocol to get update feeds for English, German & Dutch WIkipedia.
> This OAI-PMH implementation [1] is very convenient for what we need (and I
> guess for most people) because it uses the latest modification date for
> update publishing.
> So when we ask for updates after time X it returns a list of articles with
> modification date after X, no matter how many times they were edited in
> between.
>
> This is very easy for you to support (no need for extra hardware, just an
> extra table / index) and suited best for most use cases.
> What most people need in the end is to know which pages changed since time
> X. Fine grained details are for special type of clients.
>
> Best,
> Dimitris
>
> [1] http://www.mediawiki.org/wiki/Extension:OAIRepository
>
>
> On Fri, Apr 26, 2013 at 9:40 AM, Sebastian Hellmann <
> hellm...@informatik.uni-leipzig.de> wrote:
>
>> Dear Jeremy,
>> please read email from Daniel Kinzler on this list from 26.03.2013 18:26
>> :
>>
>>  * A dispatcher needs about 3 seconds to dispatch 1000 changes to a
>>> client wiki.
>>> * Considering we have ~300 client wikis, this means one dispatcher can
>>> handle
>>> about 4000 changes per hour.
>>> * We currently have two dispatchers running in parallel (on a single
>>> box, hume),
>>> that makes a capacity of 8000 changes/hour.
>>> * We are seeing roughly 17000 changes per hour on wikidata.org - more
>>> than twice
>>> our dispatch capacity.
>>> * I want to try running 6 dispatcher processes; that would give us the
>>> capacity
>>> to handle 24000 changes per hour (assuming linear scaling).
>>>
>>
>> 1.  Somebody needs to run the Hub and it needs to scale. Looks like the
>> protocol was intended to save some traffic, not to dispatch a massive
>> amount of messages / per day to a large number of clients. Again, I am not
>> familiar, how efficient PubSubHubbub is. What kind of hardware is needed to
>> run this, effectively? Do you have experience with this?
>>
>> 2. Somebody will still need to run and maintain the Hub and feed all
>> clients. I was offering to host one of the hubs for DBpedia users, but I am
>> not sure, whether we have that capacity.
>>
>> So we should use IRC RC + http request to the changed page as fallback?
>>
>> Sebastian
>>
>> Am 26.04.2013 08:06, schrieb Jeremy Baron:
>>
>>Hi,
>>>
>>> On Fri, Apr 26, 2013 at 5:29 AM, Sebastian Hellmann
>>> >
>>> wrote:
>>>
 Well, PubSubHubbub is a nice idea. However it clearly depends on two
 factors:
 1. whether Wikidata sets up such an infrastructure (I need to check
 whether we have capacities, I am not sure atm)

>>> Capacity for what? the infrastructure should be not be a problem.
>>> (famous last words, can look more closely tomorrow. but I'm really not
>>> worried about it) And you don't need any infrastructure at all for
>>> development; just use one of google's public instances.
>>>
>>>  2. whether performance is good enough to handle high-volume publishers

>>> Again, how do you mean?
>>>
>>>  Basically, polling to recent changes [1] and then do a http request to
 the individual pages should be fine for a start. So I guess this is what we
 will implement, if there aren't any better suggestions.
 The whole issue is problematic and the DBpedia project would be happy,
 if this were discussed and decided right now, so we can plan development.

 What is the best practice to get updates from Wikipedia at the moment?

>>> I believe just about everyone uses the IRC feed from
>>> irc.wikimedia.org.
>>> https://meta.wikimedia.org/**wiki/IRC/Channels#Raw_feeds
>>>
>>> I imagine wikidata will or maybe already does propagate changes to a
>>> channel on that server but I can imagine IRC would not be a good
>>> method for many Instant data repo users. Some will not be able to
>>> sustain a single TCP connection for extended periods, some will not be
>>> able to use IRC ports at all, and some may go offline periodically.
>>> e.g. a server on a laptop. AIUI, PubSubHubbub has none of those
>>> problems and is better than the current IRC solution in just about
>>> every way.
>>>
>>> We could potentially even replace the current cross-DB job queue
>>> insert crazyness with PubSubHubbub for use on the cluster internally.
>>>
>>> -Jeremy
>>>
>>> _

Re: [Wikidata-l] WikiData change propagation for third parties

2013-04-26 Thread Denny Vrandečić
The third party propagation is not very high on our priority list. Not
because it is not important, but because there are things that are even
more important - like getting it to work for Wikipedia :) And this seems to
be stabilizing.

What we have, for now:

* We have the broadcast of all edits through IRC.

* One could poll recent changes, but with 200-450 edits per minute, this
might get problematic.

* We do have the OAIRepository extension installed on Wikidata. Did anyone
try that?

Besides that, we are currently moving our dispatches all to Redis, which
has built-in-support for PubSubHubbub, so we will probably have some
support for that at some point. I cannot make promises with regards to
timeline of that, though. It is still in implementation, and needs to be
fully tested and deployed, and after that it might have some rough edges
still. So, it *could* be there in two to three months, but I cannot promise
that.

The other three options are not sufficient?

Cheers,
Denny




2013/4/26 Yuri Astrakhan 

> Recently I spoke with Wikia, and being able to subscribe to the recent
> changes feed is a very important feature to them. Apparently polling API's
> recent changes creates a much higher stress on the system than subscribing.
>
> Now, we don't need (from the start)  to implement publishing of all the
> data - just the fact that certain items have changed, and they can later be
> requested by usual means, but it would be good to implement this system for
> all of the API, not just wikidata.
>
>
> On Fri, Apr 26, 2013 at 3:13 AM, Dimitris Kontokostas 
> wrote:
>
>> Dear Jeremy, all,
>>
>> In addition to what Sebastian said, in DBpedia Live we use the OAI-PMH
>> protocol to get update feeds for English, German & Dutch WIkipedia.
>> This OAI-PMH implementation [1] is very convenient for what we need (and
>> I guess for most people) because it uses the latest modification date for
>> update publishing.
>> So when we ask for updates after time X it returns a list of articles
>> with modification date after X, no matter how many times they were edited
>> in between.
>>
>> This is very easy for you to support (no need for extra hardware, just an
>> extra table / index) and suited best for most use cases.
>> What most people need in the end is to know which pages changed since
>> time X. Fine grained details are for special type of clients.
>>
>> Best,
>> Dimitris
>>
>> [1] http://www.mediawiki.org/wiki/Extension:OAIRepository
>>
>>
>> On Fri, Apr 26, 2013 at 9:40 AM, Sebastian Hellmann <
>> hellm...@informatik.uni-leipzig.de> wrote:
>>
>>> Dear Jeremy,
>>> please read email from Daniel Kinzler on this list from 26.03.2013 18:26
>>> :
>>>
>>>  * A dispatcher needs about 3 seconds to dispatch 1000 changes to a
 client wiki.
 * Considering we have ~300 client wikis, this means one dispatcher can
 handle
 about 4000 changes per hour.
 * We currently have two dispatchers running in parallel (on a single
 box, hume),
 that makes a capacity of 8000 changes/hour.
 * We are seeing roughly 17000 changes per hour on wikidata.org - more
 than twice
 our dispatch capacity.
 * I want to try running 6 dispatcher processes; that would give us the
 capacity
 to handle 24000 changes per hour (assuming linear scaling).

>>>
>>> 1.  Somebody needs to run the Hub and it needs to scale. Looks like the
>>> protocol was intended to save some traffic, not to dispatch a massive
>>> amount of messages / per day to a large number of clients. Again, I am not
>>> familiar, how efficient PubSubHubbub is. What kind of hardware is needed to
>>> run this, effectively? Do you have experience with this?
>>>
>>> 2. Somebody will still need to run and maintain the Hub and feed all
>>> clients. I was offering to host one of the hubs for DBpedia users, but I am
>>> not sure, whether we have that capacity.
>>>
>>> So we should use IRC RC + http request to the changed page as fallback?
>>>
>>> Sebastian
>>>
>>> Am 26.04.2013 08:06, schrieb Jeremy Baron:
>>>
>>>Hi,

 On Fri, Apr 26, 2013 at 5:29 AM, Sebastian Hellmann
 >
 wrote:

> Well, PubSubHubbub is a nice idea. However it clearly depends on two
> factors:
> 1. whether Wikidata sets up such an infrastructure (I need to check
> whether we have capacities, I am not sure atm)
>
 Capacity for what? the infrastructure should be not be a problem.
 (famous last words, can look more closely tomorrow. but I'm really not
 worried about it) And you don't need any infrastructure at all for
 development; just use one of google's public instances.

  2. whether performance is good enough to handle high-volume publishers
>
 Again, how do you mean?

  Basically, polling to recent changes [1] and then do a http request to
> the individual pages should be fine for a start. So I guess this is what 
> we
> will implement, if there aren't any better suggestions.
>>>

Re: [Wikidata-l] WikiData change propagation for third parties

2013-04-26 Thread Dimitris Kontokostas
Hi Denny

On Fri, Apr 26, 2013 at 5:56 PM, Denny Vrandečić <
denny.vrande...@wikimedia.de> wrote:

> The third party propagation is not very high on our priority list. Not
> because it is not important, but because there are things that are even
> more important - like getting it to work for Wikipedia :) And this seems to
> be stabilizing.
>
> What we have, for now:
>
> * We have the broadcast of all edits through IRC.
>
> * One could poll recent changes, but with 200-450 edits per minute, this
> might get problematic.
>
> * We do have the OAIRepository extension installed on Wikidata. Did anyone
> try that?
>

Great! Didn't know that. I see it installed (
http://www.wikidata.org/wiki/Special:OAIRepository) but it is password
protected, can we (DBpedia) request access?

Cheers,
Dimitris


>
> Besides that, we are currently moving our dispatches all to Redis, which
> has built-in-support for PubSubHubbub, so we will probably have some
> support for that at some point. I cannot make promises with regards to
> timeline of that, though. It is still in implementation, and needs to be
> fully tested and deployed, and after that it might have some rough edges
> still. So, it *could* be there in two to three months, but I cannot promise
> that.
>
> The other three options are not sufficient?
>
> Cheers,
> Denny
>
>
>
>
> 2013/4/26 Yuri Astrakhan 
>
>> Recently I spoke with Wikia, and being able to subscribe to the recent
>> changes feed is a very important feature to them. Apparently polling API's
>> recent changes creates a much higher stress on the system than subscribing.
>>
>> Now, we don't need (from the start)  to implement publishing of all the
>> data - just the fact that certain items have changed, and they can later be
>> requested by usual means, but it would be good to implement this system for
>> all of the API, not just wikidata.
>>
>>
>> On Fri, Apr 26, 2013 at 3:13 AM, Dimitris Kontokostas 
>> wrote:
>>
>>> Dear Jeremy, all,
>>>
>>> In addition to what Sebastian said, in DBpedia Live we use the OAI-PMH
>>> protocol to get update feeds for English, German & Dutch WIkipedia.
>>> This OAI-PMH implementation [1] is very convenient for what we need (and
>>> I guess for most people) because it uses the latest modification date for
>>> update publishing.
>>> So when we ask for updates after time X it returns a list of articles
>>> with modification date after X, no matter how many times they were edited
>>> in between.
>>>
>>> This is very easy for you to support (no need for extra hardware, just
>>> an extra table / index) and suited best for most use cases.
>>> What most people need in the end is to know which pages changed since
>>> time X. Fine grained details are for special type of clients.
>>>
>>> Best,
>>> Dimitris
>>>
>>> [1] http://www.mediawiki.org/wiki/Extension:OAIRepository
>>>
>>>
>>> On Fri, Apr 26, 2013 at 9:40 AM, Sebastian Hellmann <
>>> hellm...@informatik.uni-leipzig.de> wrote:
>>>
 Dear Jeremy,
 please read email from Daniel Kinzler on this list from 26.03.2013 18:26
 :

  * A dispatcher needs about 3 seconds to dispatch 1000 changes to a
> client wiki.
> * Considering we have ~300 client wikis, this means one dispatcher can
> handle
> about 4000 changes per hour.
> * We currently have two dispatchers running in parallel (on a single
> box, hume),
> that makes a capacity of 8000 changes/hour.
> * We are seeing roughly 17000 changes per hour on wikidata.org - more
> than twice
> our dispatch capacity.
> * I want to try running 6 dispatcher processes; that would give us the
> capacity
> to handle 24000 changes per hour (assuming linear scaling).
>

 1.  Somebody needs to run the Hub and it needs to scale. Looks like the
 protocol was intended to save some traffic, not to dispatch a massive
 amount of messages / per day to a large number of clients. Again, I am not
 familiar, how efficient PubSubHubbub is. What kind of hardware is needed to
 run this, effectively? Do you have experience with this?

 2. Somebody will still need to run and maintain the Hub and feed all
 clients. I was offering to host one of the hubs for DBpedia users, but I am
 not sure, whether we have that capacity.

 So we should use IRC RC + http request to the changed page as fallback?

 Sebastian

 Am 26.04.2013 08:06, schrieb Jeremy Baron:

Hi,
>
> On Fri, Apr 26, 2013 at 5:29 AM, Sebastian Hellmann
> >
> wrote:
>
>> Well, PubSubHubbub is a nice idea. However it clearly depends on two
>> factors:
>> 1. whether Wikidata sets up such an infrastructure (I need to check
>> whether we have capacities, I am not sure atm)
>>
> Capacity for what? the infrastructure should be not be a problem.
> (famous last words, can look more closely tomorrow. but I'm really not
> worried about it) And you don't need any infrastructure 

Re: [Wikidata-l] WikiData change propagation for third parties

2013-04-26 Thread Daniel Kinzler
On 26.04.2013 16:56, Denny Vrandečić wrote:
> The third party propagation is not very high on our priority list. Not because
> it is not important, but because there are things that are even more 
> important -
> like getting it to work for Wikipedia :) And this seems to be stabilizing.
> 
> What we have, for now:
> 
> * We have the broadcast of all edits through IRC.

This interface is quite unreliable, the output can't be parsed in an unambiguous
way, and may get truncated. I did implement notifications via XMPP several years
ago, but it never went beyond a proof of concept. Have a look at the XMLRC
extension if you are interested.

> * One could poll recent changes, but with 200-450 edits per minute, this might
> get problematic.

Well, polling isn't really the problem, fetching all the content is. And you'd
need to do that no matter how you get the information of what has changed.

> * We do have the OAIRepository extension installed on Wikidata. Did anyone 
> try that?

In principle that is a decent update interface, but I'd recommend not to use OAI
 before we have implemented feature 47714 ("Support RDF and API serializations
of entity data via OAI-MPH"). Right now, what you'd get from there would be our
*internal* JSON representation, which is different from what the API returns,
and may change at any time without notice.

-- daniel

-- 
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] WikiData change propagation for third parties

2013-04-26 Thread Dimitris Kontokostas
Hi Daniel,

On Fri, Apr 26, 2013 at 6:15 PM, Daniel Kinzler  wrote:

> On 26.04.2013 16:56, Denny Vrandečić wrote:
> > The third party propagation is not very high on our priority list. Not
> because
> > it is not important, but because there are things that are even more
> important -
> > like getting it to work for Wikipedia :) And this seems to be
> stabilizing.
> >
> > What we have, for now:
> >
> > * We have the broadcast of all edits through IRC.
>
> This interface is quite unreliable, the output can't be parsed in an
> unambiguous
> way, and may get truncated. I did implement notifications via XMPP several
> years
> ago, but it never went beyond a proof of concept. Have a look at the XMLRC
> extension if you are interested.
>
> > * One could poll recent changes, but with 200-450 edits per minute, this
> might
> > get problematic.
>
> Well, polling isn't really the problem, fetching all the content is. And
> you'd
> need to do that no matter how you get the information of what has changed.
>
> > * We do have the OAIRepository extension installed on Wikidata. Did
> anyone try that?
>
> In principle that is a decent update interface, but I'd recommend not to
> use OAI
>  before we have implemented feature 47714 ("Support RDF and API
> serializations
> of entity data via OAI-MPH"). Right now, what you'd get from there would
> be our
> *internal* JSON representation, which is different from what the API
> returns,
> and may change at any time without notice.
>

What we do right now in DBpedia Live is that we have a local clone of
Wikipedia that get's in sync using the OAIRepository extension. This is
done to abuse our local copy as we please.

The local copy also publishes updates with OAI-PMH that we use to get the
list of modified page ids. Once we get the page ids, we use the normal
mediawiki api to fetch the actual page content.
So, feature 47714 should not be a problem in our case since we don't need
the data serialized directly from OAI-PMH

Cheers,
Dimitris


>
> -- daniel
>
> --
> Daniel Kinzler, Softwarearchitekt
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
>
> ___
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>



-- 
Kontokostas Dimitris
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] WikiData change propagation for third parties

2013-04-26 Thread Daniel Kinzler
On 26.04.2013 17:31, Dimitris Kontokostas wrote:
> What we do right now in DBpedia Live is that we have a local clone of 
> Wikipedia
> that get's in sync using the OAIRepository extension. This is done to abuse 
> our
> local copy as we please.

It would be owesome if this Just Worked (tm) for Wikidata too, but i highly
doubt it. You can use the OAI interface to get (unstable) data from Wikidata,
but I don't think magic import from OAI will work. Generally, importing Wikidata
entities into another wiki is problematic, because of entity IDs and uniquenes
constraints. If the target wiki is perfectly in sync, it might work...

Are you going to try this? Would be great if you could give us feedback!

-- daniel
-- 
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] WikiData change propagation for third parties

2013-04-26 Thread Daniel Kinzler
On 26.04.2013 17:09, Dimitris Kontokostas wrote:
> * We do have the OAIRepository extension installed on Wikidata. Did anyone
> try that?
> 
> 
> Great! Didn't know that. I see it installed
> (http://www.wikidata.org/wiki/Special:OAIRepository) but it is password
> protected, can we (DBpedia) request access?

Sure, but you already have access, DBpedia live uses it. The password for
Wikidata is the same as for Wikipedia (I don't remember it...).

You guys are the only reason the interface still exists :) DBpedia is the only
(regular) external user (LuceneSearch is the only internal user).

Note that there's nobody really maintaining this interface, so finding an
alternative would be great. Or deciding we (or more precisely, the Wikimedia
FOundation - there's not much the Wikidata team can do there) really want to
support OAI in the future, and then overhaul the implementation.

-- daniel

-- 
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] WikiData change propagation for third parties

2013-04-26 Thread Sebastian Hellmann

Hi Daniel,

Am 26.04.2013 18:01, schrieb Daniel Kinzler:
You guys are the only reason the interface still exists :) DBpedia is 
the only (regular) external user (LuceneSearch is the only internal 
user). Note that there's nobody really maintaining this interface, so 
finding an alternative would be great. Or deciding we (or more 
precisely, the Wikimedia FOundation - there's not much the Wikidata 
team can do there) really want to support OAI in the future, and then 
overhaul the implementation. -- daniel 


Actually, we asked quite often about where to change to and what would 
be the best way for us to create a live mirror. We just never received 
an answer...
We were afraid to pound the Wikipedia API with 150k request per day 
(this is the number of edits on some days), because we were afraid of 
getting IP-blocked. Also there was no official clearance that we may do so.


If you are telling us now that IRC is no good, what other way is there 
to create a live  in sync mirror?

-- Sebastian


--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, 
Deadline: *July 8th*)
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
http://dbpedia.org/Wiktionary , http://dbpedia.org

Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] weekly summary #55

2013-04-26 Thread Lydia Pintscher
Heya folks :)

Lot's of good stuff happened around Wikidata this week. Your summary
is here: http://meta.wikimedia.org/wiki/Wikidata/Status_updates/2013_04_26


Cheers
Lydia

--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Technical Projects

Wikimedia Deutschland e.V.
Obentrautstr. 72
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] [Wikidata-I] [GSoC 2013] Mobilize Wikidata Proposal Draft

2013-04-26 Thread Pragun Bhutani
I agree with you on that. I did make that mistake while writing down my
proposal at first, but Sumanah corrected me early on and advised me to
limit the scope to 6 weeks.

Which is why I've decided to limit my project to "Make Wikidata usable on
mobile devices."
If that's sorted well in time, I'd love to add some basic edit
functionality too but I've kept it in the "if time permits" category.

On Fri, Apr 26, 2013 at 2:44 PM, Daniel Kinzler  wrote:

> On 26.04.2013 06:06, Pragun Bhutani wrote:
> > Hello,
> >
> > I've been having discussions about my GSoC 2013 project with the
> Wikidata group
> > on the IRC(#mediawiki-wikidata) for a few days and have completed the
> first
> > draft of my proposal. I'd really appreciate some feedback on it. I
> welcome any
> > queries that you may have and would love to get tips on how to improve
> it.
> >
> > http://www.mediawiki.org/wiki/User:Pragunbhutani/GSoC_2013_Proposal
>
> Sounds very good to me already!
>
> I'd like to mention though that "make Wikidata usable on mobile devices"
> and
> "make Wikidata editable without JavaScript" are two pretty separate
> concerns.
> While I would love to see the latter happening too, it is not a
> requirement for
> getting the former to work.
>
> So, my advice is: make sure you don't end up trying to do two projects at
> once.
> If there is time for the non-JS editing stuff, great, but if there isn't, a
> complete non-JS *view* without full editing capabilities would be
> sufficient for
> supporting the mobile version.
>
> Another note: Daniel Werner and Henning Snater are the people most
> involved with
> designing CSS and JS for the Wikibase UI.
>
> Good luck and have fun,
> -- daniel
>
> --
> Daniel Kinzler, Softwarearchitekt
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
>
> ___
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>



-- 
Pragun Bhutani
http://pragunbhutani.in
Skype : pragun.bhutani
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l