Re: [Wikidata] Significant change of Wikidata dump size

2019-06-25 Thread Vladimir Ryabtsev
Follow-up: according to my processing script, this dump contains
only 30280591 entries, while the main page is still advertising 57M+ data
items.
Isn't it a bug in the dump process?

Regards,
Vladimir

пн, 24 июн. 2019 г. в 19:37, Vladimir Ryabtsev :

> Hello,
>
> I apologize if I missed something, but why the current JSON dump size is ~25GB
> while a week ago it was ~58GB? (see
> https://dumps.wikimedia.org/wikidatawiki/entities/20190617/)
>
> Regards,
> Vladimir
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Significant change of Wikidata dump size

2019-06-24 Thread Vladimir Ryabtsev
Hello,

I apologize if I missed something, but why the current JSON dump size is ~25GB
while a week ago it was ~58GB? (see
https://dumps.wikimedia.org/wikidatawiki/entities/20190617/)

Regards,
Vladimir
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Language codes for Chinese

2019-06-19 Thread Vladimir Ryabtsev
Thanks everybody, that was very helpful! Still I have some more questions.

Indeed, I need to know all possible languages for
labels/descriptions/aliases that can be returned by API. I need it to group
Chinese languages, probably just separate simplified and traditional
writing.
I found that the web UI supports only these variations of Chinese: zh,
zh-classical, lzh, zh-hans, zh-hant, zh-yue, yue, nan.
It does NOT support zh-cn, zh-hk, zh-mo, zh-sg, zh-tw. At least when I use
Babel extension like {{#babel:en-5|ru-N|zh-cn-0|zh-tw-0}} it does NOT
create lines for these languages. Still, there ARE labels in zh-cn, zh-tw
etc.

1. So the first question is how are labels/aliases for these code are being
populated? Is it through API only?
2. Imre Samu (or anyone who knows it), when you say "zh-yue = yue;
zh-min-nan = nan; zh-classical= lzh", do you mean API automatically treats
them equivalently? Say I wrote a label in yue, will the API return me both
yue and zh-yue when I query it (and reversely editing zh-yue will affect
yue)?
3. What about pure zh? Is it treated as a separate language for
labels/aliases/descriptions or unites al sub-languages?

Vlad

вт, 18 июн. 2019 г. в 23:30, Lucas Werkmeister :

> On 19.06.19 07:17, Federico Leva (Nemo) wrote:
> > Vladimir Ryabtsev, 19/06/19 03:04:
> >> How can I get the COMPLETE list of language codes (desirably with
> >> description) for Chinese that is supported by Wikidata?
> >
> > Languages supported by a MediaWiki instance are expected to be all
> > listed at siprop=languages in the API:
> > <https://www.mediawiki.org/wiki/API:Siteinfo>
> > <
> https://www.wikidata.org/w/api.php?action=query=siteinfo=languages
> >
> >
>
> There’s also a separate API for all the languages Wikibase supports. For
> labels/descriptions/aliases, that’s currently the same list
> (
> https://www.wikidata.org/w/api.php?action=query=wbcontentlanguages=term=2
> ),
> but for monolingual text there are some additional languages
> (
> https://www.wikidata.org/w/api.php?action=query=wbcontentlanguages=monolingualtext=2
> ).
>
> Cheers,
> Lucas
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Language codes for Chinese

2019-06-19 Thread Vladimir Ryabtsev
Hello,

I am looking for the list of supported language codes for variations of
Chinese. So far in API responses I found these:

zh
zh-cn
zh-hans
zh-hant
zh-hk
zh-tw
zh-mo
zh-sg

"Configure" link in "more languages" section leads to this page:
https://www.wikidata.org/wiki/Help:Navigating_Wikidata/User_Options#Babel_extension
Which in turn refers to
https://meta.wikimedia.org/wiki/Table_of_Wikimedia_projects#Projects_per_language_codes
But apparently there is no such values as 'zh-cn', 'zh-nans' etc.

How can I get the COMPLETE list of language codes (desirably with
description) for Chinese that is supported by Wikidata?

Best regards,
Vlad
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Language codes for Chinese

2019-06-18 Thread Vladimir Ryabtsev
Hello,

I am looking for the list of supported language codes for variations of
Chinese. So far in API responses I found these:

zh
zh-cn
zh-hans
zh-hant
zh-hk
zh-tw
zh-mo
zh-sg

"Configure" link in "more languages" section leads to this page:
https://www.wikidata.org/wiki/Help:Navigating_Wikidata/User_Options#Babel_extension
Which in turn refers to
https://meta.wikimedia.org/wiki/Table_of_Wikimedia_projects#Projects_per_language_codes
But apparently there is no such values as 'zh-cn', 'zh-nans' etc.

How can I get the COMPLETE list of language codes (desirably with
description) for Chinese that is supported by Wikidata?

Best regards,
Vlad

P.S. I am re-sending the message as I got a reply from the previous attempt
saying something about moderator approval, but the message have not got any
further since then.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata-tech] Order of claims on entity page

2017-11-30 Thread Vladimir Ryabtsev
Hey Lydia,

We have NLP system that processes news articles, and a part of the system
is named entity linking module which works with both our own entities and
imported from Wikidata. Our internal users work with entities data in our
system for purposes of editing, modifying and merging of entities data. At
some moments entity data is displayed on user's screen and it would be
convenient for him/her to have the same data layout as on Wikidata page for
cross-checking or editing procedures that could not be automated. To display
it like this, we need information about how Wikidata site works to sort and
group properties. This thing would improve productivity of users,
especially when working on large entities with plenty of properties.

Best regards,
Vlad

<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
Без
вирусов. www.avast.ru
<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

2017-11-30 17:36 GMT+03:00 Lydia Pintscher <lydia.pintsc...@wikimedia.de>:

> Hey Vladimir,
>
> On Wed, Nov 29, 2017 at 10:54 PM, Vladimir Ryabtsev
> <greatvo...@gmail.com> wrote:
> > The talk is not about JSON, I never mentioned it. JSON is a serialization
> > format and it does not have to be "nice". The talk about the ability to
> do
> > something.
> >
> > As for use case, I have already described it: representing data in the
> same
> > (familiar to user) layout as on Wikidata page. If you consider this use
> case
> > unimportant or uncommon — well, OK then. At least I tried.
>
> I'd like to understand a bit more about what you are doing/trying to
> do. Would you mind sharing a bit more about what you are working on
> either here or with me off-list? Then I can better tell you if there
> are good ways to get what you want, if we can add some or not.
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
>
> ___
> Wikidata-tech mailing list
> Wikidata-tech@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
>
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Order of claims on entity page

2017-11-29 Thread Vladimir Ryabtsev
The talk is not about JSON, I never mentioned it. JSON is a serialization
format and it does not have to be "nice". The talk about the ability to do
something.

As for use case, I have already described it: representing data in the same
(familiar to user) layout as on Wikidata page. If you consider this use
case unimportant or uncommon — well, OK then. At least I tried.

Best regard,
Vlad

2017-11-29 21:28 GMT+03:00 Thiemo Kreuz :

> > […] it turns out it lacks PLENTY of properties we usually work with.
> From 1400+ properties we normally use, there are only 480 on this page
>
> This is intended. The list was originally created with the most common
> properties, and can and should be expanded any time when the need to do so
> arises. Unlisted properties will be moved to the end, in their original
> order (as stored in the database). If you find specific properties that
> should move up to one of the groups currently specified in
> https://www.wikidata.org/wiki/MediaWiki:Wikibase-SortedProperties , go
> ahead and suggest changes on the talk page.
>
> As for the suggestions in your other mail: Even if I can understand that a
> more "nice" JSON would be – well – more "nice", I don't see what the
> specific benefit of that would be. As long as no specific use case arises I
> don't see a reason to invest resources in changing the current behavior.
>
> Best
> Thiemo
>
> ___
> Wikidata-tech mailing list
> Wikidata-tech@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
>
>
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] Order of claims on entity page

2017-11-29 Thread Vladimir Ryabtsev
As a follow-up, I analyzed the properties mentioned in https://www.wikidata
.org/wiki/MediaWiki:Wikibase-SortedProperties and it turns out it lacks
PLENTY of properties we usually work with. From 1400+ properties we
normally use, there are only 480 on this page :(

Vlad

2017-11-29 16:13 GMT+03:00 Vladimir Ryabtsev <greatvo...@gmail.com>:

> Thiemo, thanks, I understand that I can use the order from that page as
> well as invent my own order. My point is that it would be nice to have a
> way to represent data same way as Wikidata site does. Since I see the
> same claims layout every time I refresh a page, I assume this order is
> fixed and stored somewhere in the system (not on a separate page).
>
> Markus, yes I want to keep properties grouped, values within a group
> ordered and so on and so on... everything like we have on a web page.
> Whoever wants, can sort the data as he wish, be any existing property, but
> the ability to sort "as in the UI" is also desired.
>
> My suggestions:
> • Implement API action (e.g. wbgetproperties) returning all Wikidata
> properties with the current UI order.
> • Add an integer value per claim and per value (and per qualified value?)
> indicating the current order of them in UI. No fights for the order. Who
> wants, will use it or will sort by other features.
>
> Is it the right place to post such suggestions? If not please tell me the
> proper place.
>
> Thanks,
> Vlad
>
>
> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>  Без
> вирусов. www.avast.ru
> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
> <#m_4939092080553920555_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> 2017-11-29 15:35 GMT+03:00 Markus Krötzsch <mar...@semantic-mediawiki.org>
> :
>
>> Dear Vlad,
>>
>> Ordering claims on a page as you suggest would not work well, since
>> several other orders must take precedence over the order you suggest. First
>> of all, statements are grouped by property and you don't want to change
>> this. Hence, you cannot use the order across statements of different
>> properties, since this would force you in some cases to ungroup (which
>> would have other disadvantages).
>>
>> Second, it makes sense to order statements of one property by other
>> aspects, e.g., by time, to make it possible for humans to find something.
>> Hence, again, we are not free to use the order to encode further
>> information.
>>
>> So what remains is to order quantifiers inside statements, but there it
>> is rarely relevant (usually there are only a few qualifiers and all of them
>> can be seen at once, without getting tired).
>>
>> In summary, order does not lend itself as a way to encode much additional
>> information, since there are usability concerns that make you want to
>> change order in different contexts (or maybe for different users), since
>> order cannot be preserved when remixing data, and since it is overall too
>> implicit for people to build up a shared understanding of what it is
>> supposed to mean (you don't want fights about whether some item has to be
>> in fourth or fifth position of some list based on some vague understanding
>> of "quality" or "trustworthiness" -- it would be very hard to find
>> objective arguments for or against a particular order).
>>
>> Cheers,
>>
>> Markus
>>
>>
>> On 29.11.2017 12:45, Владимир Рябцев wrote:
>>
>>> OK Lydia, what is the purpose of giving order of qualifiers then?
>>>
>>> Along with helping to give a user a better representation of data, the
>>> order can be useful in automated processing of properties. To my mind, it
>>> starts with the most important entity data. Moreover, in case of
>>> contradiction, I would assume that first properties are “ranked” higher.
>>> After all we are humans and pay more attention to the top of page. Our mind
>>> may get bit tired by the end of page. In an ideal world you are right that
>>> order does not matter, but in the reality it may help algorithms.
>>>
>>> Vlad
>>>
>>> 29 нояб. 2017 г., в 14:19, Lydia Pintscher <lydia.pintsc...@wikimedia.de>
>>>> написал(а):
>>>>
>>>> On Wed, Nov 29, 2017 at 11:14 AM, Владимир Рябцев <greatvo...@gmail.com>
>>>>> wrote:
>>>>> Thanks for the link with sorted properties. Is this page updated
>>>>> automatically or maintained manually by someone? In latter case this
>>>>> looks
>&g

Re: [Wikidata-tech] Order of claims on entity page

2017-11-29 Thread Vladimir Ryabtsev
Thiemo, thanks, I understand that I can use the order from that page as
well as invent my own order. My point is that it would be nice to have a
way to represent data same way as Wikidata site does. Since I see the same
claims layout every time I refresh a page, I assume this order is fixed and
stored somewhere in the system (not on a separate page).

Markus, yes I want to keep properties grouped, values within a group
ordered and so on and so on... everything like we have on a web page.
Whoever wants, can sort the data as he wish, be any existing property, but
the ability to sort "as in the UI" is also desired.

My suggestions:
• Implement API action (e.g. wbgetproperties) returning all Wikidata
properties with the current UI order.
• Add an integer value per claim and per value (and per qualified value?)
indicating the current order of them in UI. No fights for the order. Who
wants, will use it or will sort by other features.

Is it the right place to post such suggestions? If not please tell me the
proper place.

Thanks,
Vlad


Без
вирусов. www.avast.ru

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

2017-11-29 15:35 GMT+03:00 Markus Krötzsch :

> Dear Vlad,
>
> Ordering claims on a page as you suggest would not work well, since
> several other orders must take precedence over the order you suggest. First
> of all, statements are grouped by property and you don't want to change
> this. Hence, you cannot use the order across statements of different
> properties, since this would force you in some cases to ungroup (which
> would have other disadvantages).
>
> Second, it makes sense to order statements of one property by other
> aspects, e.g., by time, to make it possible for humans to find something.
> Hence, again, we are not free to use the order to encode further
> information.
>
> So what remains is to order quantifiers inside statements, but there it is
> rarely relevant (usually there are only a few qualifiers and all of them
> can be seen at once, without getting tired).
>
> In summary, order does not lend itself as a way to encode much additional
> information, since there are usability concerns that make you want to
> change order in different contexts (or maybe for different users), since
> order cannot be preserved when remixing data, and since it is overall too
> implicit for people to build up a shared understanding of what it is
> supposed to mean (you don't want fights about whether some item has to be
> in fourth or fifth position of some list based on some vague understanding
> of "quality" or "trustworthiness" -- it would be very hard to find
> objective arguments for or against a particular order).
>
> Cheers,
>
> Markus
>
>
> On 29.11.2017 12:45, Владимир Рябцев wrote:
>
>> OK Lydia, what is the purpose of giving order of qualifiers then?
>>
>> Along with helping to give a user a better representation of data, the
>> order can be useful in automated processing of properties. To my mind, it
>> starts with the most important entity data. Moreover, in case of
>> contradiction, I would assume that first properties are “ranked” higher.
>> After all we are humans and pay more attention to the top of page. Our mind
>> may get bit tired by the end of page. In an ideal world you are right that
>> order does not matter, but in the reality it may help algorithms.
>>
>> Vlad
>>
>> 29 нояб. 2017 г., в 14:19, Lydia Pintscher 
>>> написал(а):
>>>
>>> On Wed, Nov 29, 2017 at 11:14 AM, Владимир Рябцев 
 wrote:
 Thanks for the link with sorted properties. Is this page updated
 automatically or maintained manually by someone? In latter case this
 looks
 weird to me, because the order may become not actual at some moment.

>>>
>>> Yes it is maintained by hand by the editors.
>>>
>>> It is curious that when properties are used as qualifiers we have a
 separate
 field specifying the order (called ‘qualifiers-order’). Why not to add
 the
 same at the top-level of entity definition?

>>>
>>> It is just a heading to make the page more manageable - it doesn't
>>> have a meaning beyond that.
>>>
>>>
>>> Cheers
>>> Lydia
>>>
>>> --
>>> Lydia Pintscher - http://about.me/lydia.pintscher
>>> Product Manager for Wikidata
>>>
>>> Wikimedia Deutschland e.V.
>>> Tempelhofer Ufer 23-24
>>> 10963 Berlin
>>> www.wikimedia.de
>>>
>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>>
>>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
>>> Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>>
>>> ___
>>> Wikidata-tech mailing list
>>> 

[Wikidata-tech] Order of claims on entity page

2017-11-28 Thread Vladimir Ryabtsev
I find the order of claims on web site useful, but when requesting entity
data through API (action=wbgetentities) it got lost.
How the claims are ordered on an entity web page and how to restore the
order in API response?

--
Vlad


Без
вирусов. www.avast.ru

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech