[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-04-02 Thread gerritbot
gerritbot added a comment.

Change 201238 merged by ArielGlenn:
Adopt dumpwikidatajson.sh to the new naming pattern

https://gerrit.wikimedia.org/r/201238


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, gerritbot
Cc: Liuxinyu970226, gerritbot, Manybubbles, JanZerebecki, Smalyshev, aude, 
daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, 
Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-04-01 Thread Smalyshev
Smalyshev added a comment.

We could generate multiple dump files from the same database, it doesn't have 
to be from JSON. I'm also not sure why JSON and RDF should always have the same 
snapshot - it's a random point (or, given that dump takes many hours during 
which data changes, random collection of points) in time, no better than any 
other one.


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, Smalyshev
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-04-01 Thread daniel
daniel added a comment.

@Smalyshev: You are right that it doesn't have to be based on JSON, but since 
that is our primary data representation, it seems sensible to use it as a basis.

I agree that it doesn't matter much to have the RDF dumps consistent with the 
JSON dumps. But if we make multiple RDF dumps, it's important that they are 
consistent with each other. The easiest way to achieve this is to base them on 
the same JSON dump.

Whether that should block this task is debatable of course. Perhaps it 
shouldn't. The idea was that putting a timestamp in the directory name only 
makes sense if we have consistent dumps. But we can live with inconsistencies 
for a while - it's not like the regular XML dumps were consistent either.


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, daniel
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-04-01 Thread gerritbot
gerritbot added a comment.

Change 201208 merged by ArielGlenn:
Add new wikidata folders, define dataset folders in puppet

https://gerrit.wikimedia.org/r/201208


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, gerritbot
Cc: gerritbot, Manybubbles, JanZerebecki, Smalyshev, aude, daniel, 
Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, 
jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-04-01 Thread gerritbot
gerritbot added a subscriber: gerritbot.
gerritbot added a comment.

Change 201208 had a related patch set uploaded (by Hoo man):
Add new wikidata folders, define dataset folders in puppet

https://gerrit.wikimedia.org/r/201208


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, gerritbot
Cc: gerritbot, Manybubbles, JanZerebecki, Smalyshev, aude, daniel, 
Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, 
jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-04-01 Thread gerritbot
gerritbot added a comment.

Change 201238 had a related patch set uploaded (by Hoo man):
Adopt dumpwikidatajson.sh to the new naming pattern

https://gerrit.wikimedia.org/r/201238


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, gerritbot
Cc: gerritbot, Manybubbles, JanZerebecki, Smalyshev, aude, daniel, 
Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, 
jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-26 Thread mkroetzsch
mkroetzsch added a comment.

@Smalyshev Yes, this is what I was saying. @hoo was proposing to create a 
special directory for truthy based on offline discussion in the office.


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, mkroetzsch
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.

About 2: We didn't add timestamped subdirectories because they would likely be 
confusing. Dumps of different formats or flavors would not be done on the same 
date. And dump creation usually takes more than a day. So finding the right 
subfolder that has the format and flavor you are looking for seems bad.


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, Lydia_Pintscher
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread mkroetzsch
mkroetzsch added a comment.

@hoo Thanks for the heads up! I do have comments.

(1) I would remove the full and truthy distinction from the path and rather 
make this part of the dump type (for example statements and 
truthy-statements). The reason is that we have many full dumps (terms, 
sitelinks, statements, properties), which can be readily exported in RDF and 
JSON, but we have only one truthy dump and it really is mainly for RDF (at 
least we did not discuss a JSON format for single-triple statements). 
Therefore, it does not seem worth to make a top-level distinction in the 
directory structure for this. For consumers, it is easier if a dump file is 
addressed with four components (projectname, dumptype, date, file format). The 
truthy/full distinction would be another parameter that does not seem to add 
any functionality.

(2) My comment right at the beginning of this bug report was to have 
timestamped subdirectories, just like we have for the main dumps. Maybe you 
have reasons for not having these, but could you explain them here?


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, mkroetzsch
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread JanZerebecki
JanZerebecki added a comment.

All of these dumps will be generated by exporting from the DB. AFAIK currently 
the dumps can contain edits that were made after the dump is started. We should 
at some point change this, but we should not block adding RDF for that. The 
result is that currently each dump format might represent slightly different 
data.


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, JanZerebecki
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread mkroetzsch
mkroetzsch added a comment.

@Lydia_Pintscher I understand this problem, but if you put different dumps for 
different times all in one directory, won't this become quite big over time and 
hard to use? Maybe one should group dumps by how often they are created (and 
have date-directories only below that). For some cases, there does not seem to 
be any problem. For example, creating all RDF dumps from the JSON dump takes 
about 3-6h in total (on labs). So this is easily doable on the same day as the 
JSON dump generation. I am sure that we could also generate alternative JSON 
dumps in comparable time (maybe add an hour to the RDF if you do it in one 
batch). The slow part seems to be the DB export that leads to the first JSON 
dump -- once you have this the other formats should be relatively quick to do.


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, mkroetzsch
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread JanZerebecki
JanZerebecki added a comment.

 Why would one want to do this?


To be able to use the same code as is used for the linked data endpoint of 
Wikibase. Example: 
https://www.wikidata.org/wiki/Special:EntityData/Q42.rdf?flavor=full (this 
format is not final and not yet to be relied on).

 would guarantee consistent state of all files


It would guarantee that all dump files are inconsistent in the same way. It 
would not achieve the consistency of the JSON dump. Not sure if anyone has a 
use for the previous but not the later. Anyway making the JSON dumps consistent 
allows both independent of how the other dumps are generated.


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, JanZerebecki
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread mkroetzsch
mkroetzsch added a comment.

 All of these dumps will be generated by exporting from the DB.


Why would one want to do this? The JSON dump contains all information we need 
for building the other dumps, and it seems that the generation from the JSON 
dump is much faster, avoids any load on the DB, and would guarantee consistent 
state of all files (same revision status). Moreover, we already have code for 
doing it now (which will be updated to agree with any changes in RDF export 
structures we want).


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, mkroetzsch
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread mkroetzsch
mkroetzsch added a comment.

@Smalyshev

Re what does consistent mean: to be based on the same input data. All dumps 
are based on Wikidata content. If they are based on the same content, they are 
consistent, otherwise they are not.

Re discussing RDF dump partitioning in 
https://phabricator.wikimedia.org/T93488: Agreed. We are not discussing which 
RDF dumps to have here, only whether they are likely to be well organised by 
distinguishing full and truthy as a primary categorisation that sits above 
format (RDF vs. JSON and other matters).


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, mkroetzsch
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread Smalyshev
Smalyshev added a comment.

Consistency of dumps in different formats is a questionable thing. What would 
it mean to have JSON and RDF consistent? Of course they'd contain same 
entities, that's a given, and the data would be kind of alike. But even values 
may differ - i.e. RDF has no standard for representing coordinates, so we have 
to choose something. That something will not be the same as JSON. Also, if we 
want to represent dates in standard way - e.g. xsd:dateTime - we'd have to 
modify them, slightly or substantially. Same goes for many other things which 
look slightly different - ranks, units, truthy statements, etc. Ultimately, 
we're basing on the same data set, so excepting bugs we'd have consistency on 
that level, but beyond that I'm not sure what it is.


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, Smalyshev
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread mkroetzsch
mkroetzsch added a comment.

@JanZerebecki:

Re using the same code: That's not essential here. All we want is that the 
dumps are the same. It's also not necessary to develop the code twice, since it 
is already there twice anyway. It's just the question if we want to use a slow 
method that keeps people waiting for the dumps for days (as they already do now 
with many other dumps), or a fast one that you can run anywhere (even without 
DB access; on a laptop if you like). The fact that we must have the code in PHP 
too makes it possible to go back to the slow system if it should ever be 
needed, so there is no lock-in. Dump file generation is also not 
operation-critical for Wikidata (the internal SPARQL query will likely be based 
on a live feed, not on dumps). What's not to like?

Re consistency: I meant that the dumps would contain the same information, not 
that they reflect a consistent state of the site. If it is important for you to 
have a defined state, then the dump-based file generation is also your friend: 
one can do the same with the full history dump, where one could exactly specify 
the revision to dump. Probably still as fast as the DB method, but guaranteed 
to provide a globally consistent snapshot (yes, I know, modulo deletions). Not 
sure if this type of consistency is relevant though. Having a guarantee that 
the dump files in various formats are based on the same data, however, would be 
quite useful (e.g., in SPARQL, where you often mix data from truthy and full 
dumps in one query).

Recall that we are discussing this here since Lydia said that the slowness of 
the DB-based exports would be a reason for why we cannot have an (otherwise 
convenient) date-based directory structure. I agree with Lydia that this would 
be a blocker, but in this case it's really one that we can easily remove. The 
code I am talking about is at https://github.com/Wikidata/Wikidata-Toolkit, 
well tested, extensively documented, and partially WMF-funded. Why not make 
this into a community engagement success story? :-)


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, mkroetzsch
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-25 Thread Smalyshev
Smalyshev added a comment.

I don't think splitting full and truthy would be too useful, as most query 
engines, except for the absolutely most basic ones, will want both anyway. And 
for JSON we don't even have that distinction I think?


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, Smalyshev
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-24 Thread hoo
hoo added a comment.

Ok, we talked about this in the office and came up with the following:

`https://dumps.wikimedia.org/wikidatawiki/entities/` is the (user visible) base 
path (the actual files would be in `/other/…`), //could// also have a fancy 
html overview page with additional explanations. In there we have the 
subdirectories `full` and `truthy` (and possibly more later on). Those contain 
all dumps of those flavors, no matter the format.

In those we have files like 
`(all|items|properties)-20150324(-BETA).(json|ttl|…)`.


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, hoo
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-23 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.

Why should Wikibase be in the name?


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, Lydia_Pintscher
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-03-23 Thread hoo
hoo added a comment.

In https://phabricator.wikimedia.org/T72385#1142484, @Lydia_Pintscher wrote:

 Why should Wikibase be in the name?


Because just having json dumps could mean anything IMO... also I think having 
wikibase in there is more future proof after we hit commons. But that's just my 
opinion and it's not a particularly strong one.


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, hoo
Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, 
Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-02-16 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.

We will publish more dumps than the current json dumps, yes. Daniel wants 
expanded json dumps for example that include full uri for external identifiers 
for example.


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, Lydia_Pintscher
Cc: Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, 
jeremyb-phone, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-02-16 Thread mkroetzsch
mkroetzsch added a comment.

I think json should be in the path somewhere. It does not have to be at the 
top-level, but it would not be good if dump files of one type end up in their 
own directory. The only way for tools to detect and download dumps 
automatically is to look at the HTML directory listings, and this listing 
should not change its appearance (again). Note that different types of dumps 
will be created in different intervals, so a combined directory that contains 
several types of dumps would look quite messy in the end.

We could have wikibase-dumps/wikidatawiki/json if you prefer this over 
something like other/wikibase-json/wikidatawiki. However, the latter seems to 
be more consistent with /other/incr/wikidatawiki. I don't care much about the 
details, but it would be good to have something systematic in the end: either 
other/projectname/dumptype or other/dumptype/projectname seems most logical. 
Also, I think that dumptype could already mention wikibase if desired, so 
that there is no need for an extra directory wikibase-dumps on the path. The 
thing to avoid is to introduce a new directory structure for every new kind of 
dump (and wikibase-dumps smells a lot like this, even if there is a faint 
possibility that there will be more dumps of this kind in the future -- do you 
actually have any plans to move our RDF dumps from 
http://tools.wmflabs.org/wikidata-exports/rdf/ to the dumps site? Could be 
done, but not sure if it is needed.).


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, mkroetzsch
Cc: Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, 
jeremyb-phone, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-02-16 Thread hoo
hoo added a comment.

I'm not to fond of having json in the path, as we'll provide non-JSON 
Wikibase specific dumps at some point (rdf, maybe more) and those should IMO be 
at the same place. If we can't integrate this with the usual dump process now, 
can we have something like /other/wikibase-dumps/wikidatawiki which makes it 
clear, that those are the dumps we'll provide for Wikimedia's Wikibase repo 
installations (that would later on exist for commons as well... and maybe also 
for testwikidata)?

Also we should probably make the old folder a redirect if we decided to change 
this, just fixing all links wont work.


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, hoo
Cc: Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, 
jeremyb-phone, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-02-16 Thread ArielGlenn
ArielGlenn added a comment.

I can certainly make the old dir a symlink.  wikibase-dumps/wikidatawiki  is 
fine too. Markus?


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn
Cc: Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, 
jeremyb-phone, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns

2015-02-16 Thread ArielGlenn
ArielGlenn added a comment.

I'm fine with json/wikidatadumps.  WIkidata folks please sign off or suggest 
something you like better.  This wil entail: fix to the cron job, move of the 
existing dumps, correcting any links that already exist (where are those?)


TASK DETAIL
  https://phabricator.wikimedia.org/T72385

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn
Cc: Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, 
jeremyb-phone, hoo, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs