Re: [Wikidata-tech] Wikidata entity dumps on Labs

2016-05-18 Thread Marius Hoch
As far as I remember the sync only happens once a day. Depending on when 
the dump creation finishes, this means it showing up on labs can be 
severely delayed. If the dumps not being up to date is an issue, I'd 
rather suggest to just do two dumps a week, dump creation is cheap.

Cheers,

Marius


On 18.05.2016 10:56, Lydia Pintscher wrote:
On Wed, May 11, 2016 at 9:17 AM Markus Krötzsch 
> 
wrote:


On 11.05.2016 08:28, Marius Hoch wrote:
> This time it took quite long to produce the dump in the first place
> (until after 8pm UTC for the gzip version, the bzip2 one didn't even
> finish until Tuesday).
>
> I presume that is due to one of the shards picking a slow
database slave
> which significantly slows that shard down. We should get new
database
> slaves soon, thus I presume that this problem is going to
disappear soon.
>
> Cheers,
>
> Marius

That's alright, I was not actually worried about slow dump generation.
What I noticed was that the dumps are available online many hours
before
they appear on Labs. I would like to use the central dump on labs
instead of downloading my own copy each time, but right now this
delays
dump processing further. I was wondering who is providing the central
entity dumps on labs.


Adam, Marius: Do either of you know why there is this delay and if 
there is anything we can do about it?


Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de 

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg 
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das 
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.



___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Wikidata entity dumps on Labs

2016-05-18 Thread Lydia Pintscher
On Wed, May 11, 2016 at 9:17 AM Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 11.05.2016 08:28, Marius Hoch wrote:
> > This time it took quite long to produce the dump in the first place
> > (until after 8pm UTC for the gzip version, the bzip2 one didn't even
> > finish until Tuesday).
> >
> > I presume that is due to one of the shards picking a slow database slave
> > which significantly slows that shard down. We should get new database
> > slaves soon, thus I presume that this problem is going to disappear soon.
> >
> > Cheers,
> >
> > Marius
>
> That's alright, I was not actually worried about slow dump generation.
> What I noticed was that the dumps are available online many hours before
> they appear on Labs. I would like to use the central dump on labs
> instead of downloading my own copy each time, but right now this delays
> dump processing further. I was wondering who is providing the central
> entity dumps on labs.
>

Adam, Marius: Do either of you know why there is this delay and if there is
anything we can do about it?

Cheers
Lydia
-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Wikidata entity dumps on Labs

2016-05-11 Thread Markus Krötzsch

On 11.05.2016 08:28, Marius Hoch wrote:

This time it took quite long to produce the dump in the first place
(until after 8pm UTC for the gzip version, the bzip2 one didn't even
finish until Tuesday).

I presume that is due to one of the shards picking a slow database slave
which significantly slows that shard down. We should get new database
slaves soon, thus I presume that this problem is going to disappear soon.

Cheers,

Marius


That's alright, I was not actually worried about slow dump generation. 
What I noticed was that the dumps are available online many hours before 
they appear on Labs. I would like to use the central dump on labs 
instead of downloading my own copy each time, but right now this delays 
dump processing further. I was wondering who is providing the central 
entity dumps on labs.


Cheers,

Markus



On 10.05.2016 12:05, Markus Krötzsch wrote:

Pushing this up a bit again. The 9 May dump is not available on labs
yet. There is just the empty directory

/public/dumps/public/wikidatawiki/entities/20160509/

I really wonder why it might be taking so long.

Markus


On 02.05.2016 21:36, Markus Kroetzsch wrote:

Hi,

I noticed that there is considerable delay between the weekly Wikidata
JSON dump appearing online and the file appearing on the Labs servers
[1]. For example, the 20160502 dump is online right now, but there is
only an empty directory for this date on Labs.

In retrospect, file modification dates on Labs give the appearance that
the files have been around earlier than they seem to be, but they have
not been available at this time last week either. As it is now, it is
faster to download the dump instead of waiting for the file to show up
in the central location, but it's probably not intended that each tool
gets its own copy. For a weekly dump, half a day of delay is
significant.

Any ideas (including whom to ask)?

Cheers,

Markus

[1] Under /public/dumps/public/wikidatawiki/entities/




___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech



___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech



___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Wikidata entity dumps on Labs

2016-05-11 Thread Marius Hoch
This time it took quite long to produce the dump in the first place 
(until after 8pm UTC for the gzip version, the bzip2 one didn't even 
finish until Tuesday).


I presume that is due to one of the shards picking a slow database slave 
which significantly slows that shard down. We should get new database 
slaves soon, thus I presume that this problem is going to disappear soon.


Cheers,

Marius
On 10.05.2016 12:05, Markus Krötzsch wrote:
Pushing this up a bit again. The 9 May dump is not available on labs 
yet. There is just the empty directory


/public/dumps/public/wikidatawiki/entities/20160509/

I really wonder why it might be taking so long.

Markus


On 02.05.2016 21:36, Markus Kroetzsch wrote:

Hi,

I noticed that there is considerable delay between the weekly Wikidata
JSON dump appearing online and the file appearing on the Labs servers
[1]. For example, the 20160502 dump is online right now, but there is
only an empty directory for this date on Labs.

In retrospect, file modification dates on Labs give the appearance that
the files have been around earlier than they seem to be, but they have
not been available at this time last week either. As it is now, it is
faster to download the dump instead of waiting for the file to show up
in the central location, but it's probably not intended that each tool
gets its own copy. For a weekly dump, half a day of delay is 
significant.


Any ideas (including whom to ask)?

Cheers,

Markus

[1] Under /public/dumps/public/wikidatawiki/entities/




___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech



___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Wikidata entity dumps on Labs

2016-05-10 Thread Markus Krötzsch
Pushing this up a bit again. The 9 May dump is not available on labs 
yet. There is just the empty directory


/public/dumps/public/wikidatawiki/entities/20160509/

I really wonder why it might be taking so long.

Markus


On 02.05.2016 21:36, Markus Kroetzsch wrote:

Hi,

I noticed that there is considerable delay between the weekly Wikidata
JSON dump appearing online and the file appearing on the Labs servers
[1]. For example, the 20160502 dump is online right now, but there is
only an empty directory for this date on Labs.

In retrospect, file modification dates on Labs give the appearance that
the files have been around earlier than they seem to be, but they have
not been available at this time last week either. As it is now, it is
faster to download the dump instead of waiting for the file to show up
in the central location, but it's probably not intended that each tool
gets its own copy. For a weekly dump, half a day of delay is significant.

Any ideas (including whom to ask)?

Cheers,

Markus

[1] Under /public/dumps/public/wikidatawiki/entities/




___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech