Re: [Wikidata] Label gaps on Wikidata

2017-02-27 Thread Daniel Kinzler
Am 27.02.2017 um 18:18 schrieb James Heald:
> From what Daniel is saying, it seems this may not be possible, because the
> template expansion would then depend on the user's preferred language(s), 
> which
> would not be compatible with the template cacheing.
> 
> Is that right?   Or is there a way round this?

We are currently aiming for a compromise: we render the page with the user's
interface language as the target language, and apply fallback accordingly. We do
not take into account secondary user languages, as defined e.g. by the Babel or
Translate extensions.

This means a user with the UI language set to French will see French if
available, but will not see Spanish, even if they somehow declared that they
also speak Spanish.

This way, we split the parser cache once per UI language - a factor of 300, but
not the exponential explosion we would get if we would split on every possible
permutation of languages (does anyone want to compute 300 factorial?).


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-27 Thread James Heald
Something I have been wondering is whether it is possible to get a 
template on eg Commons for a templated WDQS query to take account of the 
user's language (and also, ideally, preferred fall-back languages, as 
perhaps indicated by their {{#babel}} settings).


I had hoped it might be possible to include these preferences as a 
parameter string in the "label service" part of the query text.


From what Daniel is saying, it seems this may not be possible, because 
the template expansion would then depend on the user's preferred 
language(s), which would not be compatible with the template cacheing.


Is that right?   Or is there a way round this?

 -- James.


On 27/02/2017 16:03, Daniel Kinzler wrote:

Am 27.02.2017 um 17:01 schrieb James Hare:

One option is to allow users to define their own ranked preferences for language
beyond just first place. (I personally would enjoy having French as a fallback
to English.)


That would badly fragment the parser cache. I don't think it's viable.


This has the downside of only really working for people with
accounts, which I suspect might be a minority of overall traffic.


Currently, we only support English for anon visiors (yes, this is very sad; the
reason is, again, caching - varnish, this time).




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-27 Thread Thad Guidry
Good fall back languages for English would be any of the Germanic languages
or Romance languages.
As a native American, I also would agree with this article's listing of
languages that are more easily understood by my brain:

1. Afrikaans
2. Danish
3. French
4. Italian
5. Norwegian
etc.

9 easy languages for English Speakers.
https://matadornetwork.com/abroad/9-easy-languages-for-english-speakers-to-learn/

-Thad
+ThadGuidry 
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-27 Thread Daniel Kinzler
Am 27.02.2017 um 17:01 schrieb James Hare:
> One option is to allow users to define their own ranked preferences for 
> language
> beyond just first place. (I personally would enjoy having French as a fallback
> to English.)

That would badly fragment the parser cache. I don't think it's viable.

> This has the downside of only really working for people with
> accounts, which I suspect might be a minority of overall traffic.

Currently, we only support English for anon visiors (yes, this is very sad; the
reason is, again, caching - varnish, this time).

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-27 Thread James Hare
On February 27, 2017 at 7:54:43 AM, Daniel Kinzler (
daniel.kinz...@wikimedia.de) wrote:

The fallback mechanism works OK, but is not great for English speaking
users who
see a lot of items that have no English label. For English, we just don't
know
what to fall back to. Just anything? Or try european languages first? What
should the rule be? If we can decide on a good rule, it should actualyl be
pretty simple to add such fallback for English.




One option is to allow users to define their own ranked preferences for
language beyond just first place. (I personally would enjoy having French
as a fallback to English.) This has the downside of only really working for
people with accounts, which I suspect might be a minority of overall
traffic.


Cheers,
James Hare
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-27 Thread Daniel Kinzler
Am 19.02.2017 um 17:00 schrieb Romaine Wiki:
> Hi all,
> 
> If you look in the recent changes, most items have labels in English and those
> are shown in the recent changes and elsewhere (so we know what the item is 
> about
> without opening first).

Wikidata actually tries to show you the labels in your üpreferred interface
language. And if you user language is not available, it uses a fallback
mechanism to show the next-best language, which may even include automated
transciptions. When all else fails, it will show the English label. If that
doesn't exist, it shows the ID.

> But not all items have labels, and these items without
> English label are often items with only a label in Chinese, Arabic, Cyrillic
> script, Hebrew, etc. This forms a significant gap.

The fallback mechanism works OK, but is not great for English speaking users who
see a lot of items that have no English label. For English, we just don't know
what to fall back to. Just anything? Or try european languages first? What
should the rule be? If we can decide on a good rule, it should actualyl be
pretty simple to add such fallback for English.

> Is there a way to easily make a transcription from one language to another?

We have such rules for some languages/variants, e.g. between the cyrillic and
the roman representations of Kazakh or Uzbek. But translitteration rules can be
complex, and covering every permutation of the 300 languages we support would
mean we'd need about 45000 rule sets...

> Or alternatively if there is a database that has such transcriptions?

Not yet. One of the goals of Wikidata is to be that database.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata - (SPARQL help needed. SERVICE wikibase:label)

2017-02-26 Thread Rick Labs

Thanks Stas & especially Kingsley for the example:

# All subclasses of a class example
# here all subclasses of P279 Organization (Q43229)
SELECT ?item ?label ?itemDescription ?itemAltLabel
WHERE
{
 ?item wdt:P279 wd:Q43229;
   rdfs:label ?label .
 # SERVICE wikibase:label { bd:serviceParam wikibase:language 
"en,de,fr,ja,cn,ru,es,sv,pl,nl,sl,ca,it" }

 FILTER (LANG(?label) = "en")
}
ORDER BY ASC(LCASE(?itemLabel))

When I pull the FILTER line out of above I have almost what I need - 
"the universe" of all sub classes of organization (regardless of 
language).  I want all subclasses in the output, not just those 
available currently with an English label.


In the table output, is it possible to get: a column for language code, 
and get the description to show up  (if available for that row)? That 
would be very helpful prior to my manual operations.


Can I easily export the results table to CSV or Excel?  I can filter and 
sort easily from there provided I have the hooks.


Thanks very much!

Rick

.





On 2/23/2017 1:22 PM, Kingsley Idehen wrote:

On 2/23/17 12:59 PM, Stas Malyshev wrote:

Hi!

On 2/23/17 7:20 AM, Thad Guidry wrote:

In Freebase we had a parameter %lang=all

Does the SPARQL label service have something similar ?

Not as such, but you don't need it if you want all the labels, just do:

?item rdfs:label ?label

and you'd get all labels. No need to invoke service for that, the
service is for when you have specific set of languages you're interested
in.


Yep.

Example at: http://tinyurl.com/h2sbvhd

--
Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software   (Home Page:http://www.openlinksw.com)

Weblogs (Blogs):
Legacy Blog:http://www.openlinksw.com/blog/~kidehen/
Blogspot Blog:http://kidehen.blogspot.com
Medium Blog:https://medium.com/@kidehen

Profile Pages:
Pinterest:https://www.pinterest.com/kidehen/
Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter:https://twitter.com/kidehen
Google+:https://plus.google.com/+KingsleyIdehen/about
LinkedIn:http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal:http://kingsley.idehen.net/dataspace/person/kidehen#this
 
:http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


--
Richard J. Labs, CFA, CPA
CL Capital Management, LLC
Phone: 315-637-0915
E-mail (preferred for efficiency): r...@clbcm.com
Mailing address: 8 Laureldale Dr., Pittsford, NY 14534-3508

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata - (SPARQL help needed. SERVICE wikibase:label)

2017-02-25 Thread Navino Evans
On 24 February 2017 at 22:00, Rick Labs  wrote:

> Nav,
>
> YES!!! that's it! Your SPARQL works perfectly, exactly what I wanted.
>
> Thanks very much. Just had to learn how to get the CVS into Excel as
> UTF-8, not hard. Can finally see what objects people want immediately below
> "Organizations", worldwide. (yes, whats evolved is pretty darn "chaotic")
> Very much appreciated.
>
> Rick


Excellent!! Very happy to help. Best of luck cleaning up the chaos :)


-- 

*nav...@histropedia.com *

@NavinoEvans 

-

   www.histropedia.com

Twitter Facebo
ok
Google +

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata - (SPARQL help needed. SERVICE wikibase:label)

2017-02-24 Thread Rick Labs

Nav,

YES!!! that's it! Your SPARQL works perfectly, exactly what I wanted.

Thanks very much. Just had to learn how to get the CVS into Excel as 
UTF-8, not hard. Can finally see what objects people want immediately 
below "Organizations", worldwide. (yes, whats evolved is pretty darn 
"chaotic")


Very much appreciated.

Rick

On 2/24/2017 7:25 AM, Navino Evans wrote:

Hi Rick,

Is this what you're after? http://tinyurl.com/z7ru9yr
Once you run the query there is a download drop-down menu, just above 
the query results on the right hand side of the screen - it has a 
range of options including CSV.

Hope that helps!
Nav


On 24 February 2017 at 02:25, Rick Labs > wrote:


Thanks Stas & especially Kingsley for the example:

# All subclasses of a class example
# here all subclasses of P279 Organization (Q43229)
SELECT ?item ?label ?itemDescription ?itemAltLabel
WHERE
{
 ?item wdt:P279 wd:Q43229;
   rdfs:label ?label .
 # SERVICE wikibase:label { bd:serviceParam wikibase:language
"en,de,fr,ja,cn,ru,es,sv,pl,nl,sl,ca,it" }
 FILTER (LANG(?label) = "en")
}
ORDER BY ASC(LCASE(?itemLabel))

When I pull the FILTER line out of above I have almost what I need
- "the universe" of all sub classes of organization (regardless of
language).  I want all subclasses in the output, not just those
available currently with an English label.

In the table output, is it possible to get: a column for language
code, and get the description to show up (if available for that
row)? That would be very helpful prior to my manual operations.

Can I easily export the results table to CSV or Excel? I can
filter and sort easily from there provided I have the hooks.

Thanks very much!

Rick

.





On 2/23/2017 1:22 PM, Kingsley Idehen wrote:

On 2/23/17 12:59 PM, Stas Malyshev wrote:

Hi!

On 2/23/17 7:20 AM, Thad Guidry wrote:

In Freebase we had a parameter %lang=all

Does the SPARQL label service have something similar ?

Not as such, but you don't need it if you want all the labels, just do:

?item rdfs:label ?label

and you'd get all labels. No need to invoke service for that, the
service is for when you have specific set of languages you're interested
in.


Yep.

Example at: http://tinyurl.com/h2sbvhd

-- 
Regards,


Kingsley Idehen 
Founder & CEO
OpenLink Software   (Home Page:http://www.openlinksw.com)

Weblogs (Blogs):
Legacy Blog:http://www.openlinksw.com/blog/~kidehen/

Blogspot Blog:http://kidehen.blogspot.com
Medium Blog:https://medium.com/@kidehen

Profile Pages:
Pinterest:https://www.pinterest.com/kidehen/

Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen

Twitter:https://twitter.com/kidehen
Google+:https://plus.google.com/+KingsleyIdehen/about

LinkedIn:http://www.linkedin.com/in/kidehen


Web Identities (WebID):
Personal:http://kingsley.idehen.net/dataspace/person/kidehen#this

 
:http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this



___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata


___ Wikidata mailing
list Wikidata@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wikidata
 


--
/nav...@histropedia.com /
@NavinoEvans 
-
www.histropedia.com 
Twitter  Facebo 
ok 
 Google + 



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata - (SPARQL help needed. SERVICE wikibase:label)

2017-02-24 Thread Navino Evans
Hi Rick,

Is this what you're after? http://tinyurl.com/z7ru9yr

Once you run the query there is a download drop-down menu, just above the
query results on the right hand side of the screen - it has a range of
options including CSV.

Hope that helps!

Nav




On 24 February 2017 at 02:25, Rick Labs  wrote:

> Thanks Stas & especially Kingsley for the example:
>
> # All subclasses of a class example
> # here all subclasses of P279 Organization (Q43229)
> SELECT ?item ?label ?itemDescription ?itemAltLabel
> WHERE
> {
>  ?item wdt:P279 wd:Q43229;
>rdfs:label ?label .
>  # SERVICE wikibase:label { bd:serviceParam wikibase:language
> "en,de,fr,ja,cn,ru,es,sv,pl,nl,sl,ca,it" }
>  FILTER (LANG(?label) = "en")
> }
> ORDER BY ASC(LCASE(?itemLabel))
>
> When I pull the FILTER line out of above I have almost what I need - "the
> universe" of all sub classes of organization (regardless of language).  I
> want all subclasses in the output, not just those available currently with
> an English label.
>
> In the table output, is it possible to get: a column for language code,
> and get the description to show up  (if available for that row)? That would
> be very helpful prior to my manual operations.
>
> Can I easily export the results table to CSV or Excel?  I can filter and
> sort easily from there provided I have the hooks.
>
> Thanks very much!
>
> Rick
>
> .
>
>
>
>
>
> On 2/23/2017 1:22 PM, Kingsley Idehen wrote:
>
> On 2/23/17 12:59 PM, Stas Malyshev wrote:
>
> Hi!
>
> On 2/23/17 7:20 AM, Thad Guidry wrote:
>
> In Freebase we had a parameter %lang=all
>
> Does the SPARQL label service have something similar ?
>
> Not as such, but you don't need it if you want all the labels, just do:
>
> ?item rdfs:label ?label
>
> and you'd get all labels. No need to invoke service for that, the
> service is for when you have specific set of languages you're interested
> in.
>
>
> Yep.
>
> Example at: http://tinyurl.com/h2sbvhd
>
> --
> Regards,
>
> Kingsley Idehen   
> Founder & CEO
> OpenLink Software   (Home Page: http://www.openlinksw.com)
>
> Weblogs (Blogs):
> Legacy Blog: http://www.openlinksw.com/blog/~kidehen/
> Blogspot Blog: http://kidehen.blogspot.com
> Medium Blog: https://medium.com/@kidehen
>
> Profile Pages:
> Pinterest: https://www.pinterest.com/kidehen/
> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
> Twitter: https://twitter.com/kidehen
> Google+: https://plus.google.com/+KingsleyIdehen/about
> LinkedIn: http://www.linkedin.com/in/kidehen
>
> Web Identities (WebID):
> Personal: http://kingsley.idehen.net/dataspace/person/kidehen#this
> : 
> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>
>
>
> ___
> Wikidata mailing 
> listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>


-- 

*nav...@histropedia.com *

@NavinoEvans 

-

   www.histropedia.com

Twitter Facebo
ok
Google +

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata - (SPARQL help needed. SERVICE wikibase:label)

2017-02-23 Thread Rick Labs

Thanks Stas & especially Kingsley for the example:

# All subclasses of a class example
# here all subclasses of P279 Organization (Q43229)
SELECT ?item ?label ?itemDescription ?itemAltLabel
WHERE
{
 ?item wdt:P279 wd:Q43229;
   rdfs:label ?label .
 # SERVICE wikibase:label { bd:serviceParam wikibase:language 
"en,de,fr,ja,cn,ru,es,sv,pl,nl,sl,ca,it" }

 FILTER (LANG(?label) = "en")
}
ORDER BY ASC(LCASE(?itemLabel))

When I pull the FILTER line out of above I have almost what I need - 
"the universe" of all sub classes of organization (regardless of 
language).  I want all subclasses in the output, not just those 
available currently with an English label.


In the table output, is it possible to get: a column for language code, 
and get the description to show up  (if available for that row)? That 
would be very helpful prior to my manual operations.


Can I easily export the results table to CSV or Excel?  I can filter and 
sort easily from there provided I have the hooks.


Thanks very much!

Rick

.





On 2/23/2017 1:22 PM, Kingsley Idehen wrote:

On 2/23/17 12:59 PM, Stas Malyshev wrote:

Hi!

On 2/23/17 7:20 AM, Thad Guidry wrote:

In Freebase we had a parameter %lang=all

Does the SPARQL label service have something similar ?

Not as such, but you don't need it if you want all the labels, just do:

?item rdfs:label ?label

and you'd get all labels. No need to invoke service for that, the
service is for when you have specific set of languages you're interested
in.


Yep.

Example at: http://tinyurl.com/h2sbvhd

--
Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software   (Home Page:http://www.openlinksw.com)

Weblogs (Blogs):
Legacy Blog:http://www.openlinksw.com/blog/~kidehen/
Blogspot Blog:http://kidehen.blogspot.com
Medium Blog:https://medium.com/@kidehen

Profile Pages:
Pinterest:https://www.pinterest.com/kidehen/
Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter:https://twitter.com/kidehen
Google+:https://plus.google.com/+KingsleyIdehen/about
LinkedIn:http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal:http://kingsley.idehen.net/dataspace/person/kidehen#this
 
:http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata - (SPARQL help needed. SERVICE wikibase:label)

2017-02-23 Thread Thad Guidry
Ah!
That wasn't made clear on the wiki. Thanks Stas!

On Thu, Feb 23, 2017, 12:22 PM Kingsley Idehen 
wrote:

> On 2/23/17 12:59 PM, Stas Malyshev wrote:
>
> Hi!
>
> On 2/23/17 7:20 AM, Thad Guidry wrote:
>
> In Freebase we had a parameter %lang=all
>
> Does the SPARQL label service have something similar ?
>
> Not as such, but you don't need it if you want all the labels, just do:
>
> ?item rdfs:label ?label
>
> and you'd get all labels. No need to invoke service for that, the
> service is for when you have specific set of languages you're interested
> in.
>
>
> Yep.
>
> Example at: http://tinyurl.com/h2sbvhd
>
> --
> Regards,
>
> Kingsley Idehen   
> Founder & CEO
> OpenLink Software   (Home Page: http://www.openlinksw.com)
>
> Weblogs (Blogs):
> Legacy Blog: http://www.openlinksw.com/blog/~kidehen/
> Blogspot Blog: http://kidehen.blogspot.com
> Medium Blog: https://medium.com/@kidehen
>
> Profile Pages:
> Pinterest: https://www.pinterest.com/kidehen/
> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
> Twitter: https://twitter.com/kidehen
> Google+: https://plus.google.com/+KingsleyIdehen/about
> LinkedIn: http://www.linkedin.com/in/kidehen
>
> Web Identities (WebID):
> Personal: http://kingsley.idehen.net/dataspace/person/kidehen#this
> : 
> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata - (SPARQL help needed. SERVICE wikibase:label)

2017-02-23 Thread Kingsley Idehen
On 2/23/17 12:59 PM, Stas Malyshev wrote:
> Hi!
>
> On 2/23/17 7:20 AM, Thad Guidry wrote:
>> In Freebase we had a parameter %lang=all
>>
>> Does the SPARQL label service have something similar ?
> Not as such, but you don't need it if you want all the labels, just do:
>
> ?item rdfs:label ?label
>
> and you'd get all labels. No need to invoke service for that, the
> service is for when you have specific set of languages you're interested
> in.

Yep.

Example at: http://tinyurl.com/h2sbvhd

-- 
Regards,

Kingsley Idehen   
Founder & CEO 
OpenLink Software   (Home Page: http://www.openlinksw.com)

Weblogs (Blogs):
Legacy Blog: http://www.openlinksw.com/blog/~kidehen/
Blogspot Blog: http://kidehen.blogspot.com
Medium Blog: https://medium.com/@kidehen

Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal: http://kingsley.idehen.net/dataspace/person/kidehen#this
: 
http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this



smime.p7s
Description: S/MIME Cryptographic Signature
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata - (SPARQL help needed. SERVICE wikibase:label)

2017-02-23 Thread Stas Malyshev
Hi!

On 2/23/17 7:20 AM, Thad Guidry wrote:
> In Freebase we had a parameter %lang=all
> 
> Does the SPARQL label service have something similar ?

Not as such, but you don't need it if you want all the labels, just do:

?item rdfs:label ?label

and you'd get all labels. No need to invoke service for that, the
service is for when you have specific set of languages you're interested
in.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata - (SPARQL help needed. SERVICE wikibase:label)

2017-02-23 Thread Lucas Werkmeister
You can specify multiple languages for the label service:

# All subclasses of a class example
# here all subclasses of P279 Organization (Q43229)
SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel
WHERE
{
  ?item wdt:P279 wd:Q43229.
  SERVICE wikibase:label { bd:serviceParam wikibase:language
"en,de,fr,ja,cn,ru,es,sv,pl,nl,sl,ca,it" }
}
ORDER BY ASC(LCASE(?itemLabel))

Link:
https://query.wikidata.org/#%23%20All%20subclasses%20of%20a%20class%20example%0A%23%20here%20all%20subclasses%20of%20P279%20Organization%20%28Q43229%29%0ASELECT%20%3Fitem%20%3FitemLabel%20%3FitemDescription%20%3FitemAltLabel%0AWHERE%0A%7B%0A%20%3Fitem%20wdt%3AP279%20wd%3AQ43229.%0A%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%2Cde%2Cfr%2Cja%2Ccn%2Cru%2Ces%2Csv%2Cpl%2Cnl%2Csl%2Cca%2Cit%22%20%7D%0A%7D%0AORDER%20BY%20ASC%28LCASE%28%3FitemLabel%29%29

I’ve also changed the query to sort the results case-insensitively.

(Note: the query seems to occasionally take a very long time for me, 180
seconds – I’m not sure if the many label languages cause the slowdown or
if it’s just my internet connection.)

Cheers,
Lucas


On 23.02.2017 02:57, Rick Labs wrote:
>
> I'm running into some major label gaps, as are others.
>
> My area of interest is the Company data project. I'm new to SPARQL and
> here is my working query:
>
> # All subclasses of a class example
> # here all subclasses of P279 Organization (Q43229)
> SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel
> WHERE
> {
> ?item wdt:P279 wd:Q43229 .
> SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
> }
> ORDER BY ASC(?itemLabel)
>
> Background
>
> https://www.wikidata.orB/wiki/Wikidata:WikiProject_Companies seems
> most interested in https://www.wikidata.org/wiki/Q4830453 business
> enterprise. So I write the above SPARQL to see how "business
> enterprise" fits under its immediate parent - P279 Organization. I
> want to learn about all "brother/sister" level objects under
> "Organization."
>
> If you run the above you will see how many "Organization" children
> objects have no English label. This greatly impedes understanding what
> is considered a "business enterprise" and what is not. (Yes - this
> part of the ontology seems to need some serious tuning up too!)  When
> we go to build out a reasonable starter ontology under the "company
> data project" we want the structure sound prior to filling it in with
> a considerable volume of data.
>
> For example, a key goal is the company data needs to "add up" to
> economic data. Any entity that has a proprietor, partners, or any
> payroll counts in economic data. Government offices, schools, non
> profits, etc. all produce goods or services - all contribute to
> economic output (GDP).  So, much of the "company data  project" is
> directly relevant to entities that are more general than just
> "business entities".
>
> Is there a way I can run a SPARQL query that outputs the EN label if
> available (as above), and any other label in any other language
> (including a column for language code) if not? Ideally I'd like to
> have only one additional language reported if EN is not available, and
> I'd like to have it report according to my preference (German if
> available, French if not, then Japanese, Chinese on down the line. It
> would also be beneficial to have a column for the longer description,
> if available.
>
> For my analysis purposes now I'm happy to work with simple language
> translations done by machine. Even if they are slightly off they are
> probably good enough for my purposes of reviewing and trying to
> understand the standing ontology. I don't plan on inserting the
> translations back into WikiData myself, but might try to rally up
> humans with those specific language skills to double check the machine
> translations and once verified, insert the translated labels back into
> WikiData.
>
> I'm not at all familiar with other tools that might be available
> relevant to "the missing label challenge". Right now SPARQL, SERVICE
> wikibase:label, and Google Translate seem like the way to go. But, all
> ideas are most welcome.
>
> Thanks!
>
> Rick
>
>
> On 2/19/2017 11:00 AM, Romaine Wiki wrote:
>> Hi all,
>>
>> If you look in the recent changes, most items have labels in English
>> and those are shown in the recent changes and elsewhere (so we know
>> what the item is about without opening first). But not all items have
>> labels, and these items without English label are often items with
>> only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This
>> forms a significant gap.
>>
>> Is there a way to easily make a transcription from one language to
>> another?
>> Or alternatively if there is a database that has such transcriptions?
>>
>>
>> Also the other way round might be helpful for users of Wikidata that
>> use/read it in Chinese, Arabic, Cyrillic script, Hebrew, etc.
>>
>> Thanks!
>>
>> Romaine
>>
>>
>> 

Re: [Wikidata] Label gaps on Wikidata - (SPARQL help needed. SERVICE wikibase:label)

2017-02-22 Thread Rick Labs

I'm running into some major label gaps, as are others.

My area of interest is the Company data project. I'm new to SPARQL and 
here is my working query:


# All subclasses of a class example
# here all subclasses of P279 Organization (Q43229)
SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel
WHERE
{
?item wdt:P279 wd:Q43229 .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
ORDER BY ASC(?itemLabel)

Background

https://www.wikidata.orB/wiki/Wikidata:WikiProject_Companies seems most 
interested in https://www.wikidata.org/wiki/Q4830453 business 
enterprise. So I write the above SPARQL to see how "business enterprise" 
fits under its immediate parent - P279 Organization. I want to learn 
about all "brother/sister" level objects under "Organization."


If you run the above you will see how many "Organization" children 
objects have no English label. This greatly impedes understanding what 
is considered a "business enterprise" and what is not. (Yes - this part 
of the ontology seems to need some serious tuning up too!)  When we go 
to build out a reasonable starter ontology under the "company data 
project" we want the structure sound prior to filling it in with a 
considerable volume of data.


For example, a key goal is the company data needs to "add up" to 
economic data. Any entity that has a proprietor, partners, or any 
payroll counts in economic data. Government offices, schools, non 
profits, etc. all produce goods or services - all contribute to economic 
output (GDP).  So, much of the "company data  project" is directly 
relevant to entities that are more general than just "business entities".


Is there a way I can run a SPARQL query that outputs the EN label if 
available (as above), and any other label in any other language 
(including a column for language code) if not? Ideally I'd like to have 
only one additional language reported if EN is not available, and I'd 
like to have it report according to my preference (German if available, 
French if not, then Japanese, Chinese on down the line. It would also be 
beneficial to have a column for the longer description, if available.


For my analysis purposes now I'm happy to work with simple language 
translations done by machine. Even if they are slightly off they are 
probably good enough for my purposes of reviewing and trying to 
understand the standing ontology. I don't plan on inserting the 
translations back into WikiData myself, but might try to rally up humans 
with those specific language skills to double check the machine 
translations and once verified, insert the translated labels back into 
WikiData.


I'm not at all familiar with other tools that might be available 
relevant to "the missing label challenge". Right now SPARQL, SERVICE 
wikibase:label, and Google Translate seem like the way to go. But, all 
ideas are most welcome.


Thanks!

Rick


On 2/19/2017 11:00 AM, Romaine Wiki wrote:

Hi all,

If you look in the recent changes, most items have labels in English 
and those are shown in the recent changes and elsewhere (so we know 
what the item is about without opening first). But not all items have 
labels, and these items without English label are often items with 
only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This 
forms a significant gap.


Is there a way to easily make a transcription from one language to 
another?

Or alternatively if there is a database that has such transcriptions?


Also the other way round might be helpful for users of Wikidata that 
use/read it in Chinese, Arabic, Cyrillic script, Hebrew, etc.


Thanks!

Romaine


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-22 Thread Gerard Meijssen
Hoi,
A wonderful read. The fun thing to realise is that this is all about static
data. It shows that given good data a lot can be inferred a lot that is of
value. The one thing to realise is that Wikidata is not static and this has
two components; as more statements are added more items will have the
"minimum" amount of statements to provide an accurate biographic summary
and there is a potential for vandalism. This last part can be remedied to
compare Wikidata with existing Wikipedia articles; it will make it more
obvious to signal vandalism ..

So all in all, it is a wonderful read. What it does not cover is the
potential for using it for "other" languages.. This is where this will make
even more of a difference. All it tales to add valid statements are labels.
For many items it can be said that the effect of one added label may impact
hundreds of thousands of items..
Thanks,
   GerardM

On 22 February 2017 at 11:14, Magnus Manske 
wrote:

> Relevant: https://arxiv.org/pdf/1702.06235.pdf
>
> On Wed, Feb 22, 2017 at 6:57 AM Gerard Meijssen 
> wrote:
>
>> Hoi,
>> You know, typically you are right. In the last few days I added members
>> of the chamber of deputies of Haiti. I used names from the English
>> Wikipedia but I am not sure that the names are correct. In one instance I
>> found that the first name was at the back for others I am not sure that we
>> have it right.
>>
>> The problem with rules are the exceptions and for automated approaches
>> you have to seriously consider these.
>> Thanks,
>>GerardM
>>
>> On 22 February 2017 at 07:46, Konstantinos Stampoulis 
>> wrote:
>>
>> Indeed in many cases a translation is needed, but for some languages and
>> specific types of entities what is needed is just a transcription if not
>> just a copy from the original language. For example names of humans or
>> settlements. I guess for some languages with the same script, one can just
>> copy the label, f.e. for a british person from en to fr.
>>
>>
>>
>> Konstantinos Stampoulis
>> ger...@geraki.gr
>> http://www.geraki.gr
>>
>> 
>> Συνεισφέρετε στην Βικιπαίδεια. https://el.wikipedia.org
>> ---
>> Οι παραπάνω απόψεις είναι προσωπικές και δεν εκφράζουν παρά μόνο εμένα.
>> Το μήνυμα θεωρείται εμπιστευτικό μόνο εάν το έχω ζητήσει ρητά, διαφορετικά
>> μπορείτε να το χρησιμοποιήσετε σε οποιαδήποτε δημόσια συζήτηση. Δεν έχω
>> τίποτε να κρύψω. :-)
>>
>>
>> 2017-02-19 18:16 GMT+02:00 Smolenski Nikola :
>>
>> Citiranje Romaine Wiki :
>> > If you look in the recent changes, most items have labels in English and
>> > those are shown in the recent changes and elsewhere (so we know what the
>> > item is about without opening first). But not all items have labels, and
>> > these items without English label are often items with only a label in
>> > Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant
>> gap.
>> >
>> > Is there a way to easily make a transcription from one language to
>> another?
>> > Or alternatively if there is a database that has such transcriptions?
>>
>> There is in many cases, however there are some problems associated with
>> it. You
>> may not know what is the original language to transcribe from, you might
>> need a
>> translation rather than transcription, if there are multiple labels you
>> have no
>> way to choose between them.
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-22 Thread Magnus Manske
Relevant: https://arxiv.org/pdf/1702.06235.pdf

On Wed, Feb 22, 2017 at 6:57 AM Gerard Meijssen 
wrote:

> Hoi,
> You know, typically you are right. In the last few days I added members of
> the chamber of deputies of Haiti. I used names from the English Wikipedia
> but I am not sure that the names are correct. In one instance I found that
> the first name was at the back for others I am not sure that we have it
> right.
>
> The problem with rules are the exceptions and for automated approaches you
> have to seriously consider these.
> Thanks,
>GerardM
>
> On 22 February 2017 at 07:46, Konstantinos Stampoulis 
> wrote:
>
> Indeed in many cases a translation is needed, but for some languages and
> specific types of entities what is needed is just a transcription if not
> just a copy from the original language. For example names of humans or
> settlements. I guess for some languages with the same script, one can just
> copy the label, f.e. for a british person from en to fr.
>
>
>
> Konstantinos Stampoulis
> ger...@geraki.gr
> http://www.geraki.gr
>
> 
> Συνεισφέρετε στην Βικιπαίδεια. https://el.wikipedia.org
> ---
> Οι παραπάνω απόψεις είναι προσωπικές και δεν εκφράζουν παρά μόνο εμένα. Το
> μήνυμα θεωρείται εμπιστευτικό μόνο εάν το έχω ζητήσει ρητά, διαφορετικά
> μπορείτε να το χρησιμοποιήσετε σε οποιαδήποτε δημόσια συζήτηση. Δεν έχω
> τίποτε να κρύψω. :-)
>
>
> 2017-02-19 18:16 GMT+02:00 Smolenski Nikola :
>
> Citiranje Romaine Wiki :
> > If you look in the recent changes, most items have labels in English and
> > those are shown in the recent changes and elsewhere (so we know what the
> > item is about without opening first). But not all items have labels, and
> > these items without English label are often items with only a label in
> > Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant
> gap.
> >
> > Is there a way to easily make a transcription from one language to
> another?
> > Or alternatively if there is a database that has such transcriptions?
>
> There is in many cases, however there are some problems associated with
> it. You
> may not know what is the original language to transcribe from, you might
> need a
> translation rather than transcription, if there are multiple labels you
> have no
> way to choose between them.
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-21 Thread Gerard Meijssen
Hoi,
You know, typically you are right. In the last few days I added members of
the chamber of deputies of Haiti. I used names from the English Wikipedia
but I am not sure that the names are correct. In one instance I found that
the first name was at the back for others I am not sure that we have it
right.

The problem with rules are the exceptions and for automated approaches you
have to seriously consider these.
Thanks,
   GerardM

On 22 February 2017 at 07:46, Konstantinos Stampoulis 
wrote:

> Indeed in many cases a translation is needed, but for some languages and
> specific types of entities what is needed is just a transcription if not
> just a copy from the original language. For example names of humans or
> settlements. I guess for some languages with the same script, one can just
> copy the label, f.e. for a british person from en to fr.
>
>
>
> Konstantinos Stampoulis
> ger...@geraki.gr
> http://www.geraki.gr
>
> 
> Συνεισφέρετε στην Βικιπαίδεια. https://el.wikipedia.org
> ---
> Οι παραπάνω απόψεις είναι προσωπικές και δεν εκφράζουν παρά μόνο εμένα. Το
> μήνυμα θεωρείται εμπιστευτικό μόνο εάν το έχω ζητήσει ρητά, διαφορετικά
> μπορείτε να το χρησιμοποιήσετε σε οποιαδήποτε δημόσια συζήτηση. Δεν έχω
> τίποτε να κρύψω. :-)
>
>
> 2017-02-19 18:16 GMT+02:00 Smolenski Nikola :
>
>> Citiranje Romaine Wiki :
>> > If you look in the recent changes, most items have labels in English and
>> > those are shown in the recent changes and elsewhere (so we know what the
>> > item is about without opening first). But not all items have labels, and
>> > these items without English label are often items with only a label in
>> > Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant
>> gap.
>> >
>> > Is there a way to easily make a transcription from one language to
>> another?
>> > Or alternatively if there is a database that has such transcriptions?
>>
>> There is in many cases, however there are some problems associated with
>> it. You
>> may not know what is the original language to transcribe from, you might
>> need a
>> translation rather than transcription, if there are multiple labels you
>> have no
>> way to choose between them.
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-21 Thread Konstantinos Stampoulis
Indeed in many cases a translation is needed, but for some languages and
specific types of entities what is needed is just a transcription if not
just a copy from the original language. For example names of humans or
settlements. I guess for some languages with the same script, one can just
copy the label, f.e. for a british person from en to fr.



Konstantinos Stampoulis
ger...@geraki.gr
http://www.geraki.gr


Συνεισφέρετε στην Βικιπαίδεια. https://el.wikipedia.org
---
Οι παραπάνω απόψεις είναι προσωπικές και δεν εκφράζουν παρά μόνο εμένα. Το
μήνυμα θεωρείται εμπιστευτικό μόνο εάν το έχω ζητήσει ρητά, διαφορετικά
μπορείτε να το χρησιμοποιήσετε σε οποιαδήποτε δημόσια συζήτηση. Δεν έχω
τίποτε να κρύψω. :-)


2017-02-19 18:16 GMT+02:00 Smolenski Nikola :

> Citiranje Romaine Wiki :
> > If you look in the recent changes, most items have labels in English and
> > those are shown in the recent changes and elsewhere (so we know what the
> > item is about without opening first). But not all items have labels, and
> > these items without English label are often items with only a label in
> > Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant
> gap.
> >
> > Is there a way to easily make a transcription from one language to
> another?
> > Or alternatively if there is a database that has such transcriptions?
>
> There is in many cases, however there are some problems associated with
> it. You
> may not know what is the original language to transcribe from, you might
> need a
> translation rather than transcription, if there are multiple labels you
> have no
> way to choose between them.
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-21 Thread Gerard Meijssen
Hoi,
Labels are a priority and do need attention but descriptions do not even
though they are a mess. Descriptions are often added by bot and they are
based on an initial set of statements. They are typically not revisited and
as statements are added, it becomes increasingly obvious how ill they
represent the item involved.

It is an old argument but here we go again. The automated descriptions as
developed by Magnus are superior. Like the bot generated descriptions they
are based on statements but they are generated as and when they are needed
and they do allow for other languages. For me the most crucial part is that
when I need disambiguation, I add statements to good effect. Yes, you may
want descriptions in a dump but when an algorithm exists, it is possible to
run it at dump time as well. My point is that technical issues do not trump
usefulness. As it is a lot of time is wasted on something that is obviously
below par, something that does not even work well for English.
Thanks,
  GerardM

On 22 February 2017 at 01:04, Nick Wilson (Quiddity) 
wrote:

> On Mon, Feb 20, 2017 at 9:59 PM, Smolenski Nikola 
> wrote:
> > Citiranje "Nick Wilson (Quiddity)" :
> >> 2) Translation
> >> I also agree that a machine-translation /suggestion/ or /hint/ would be
> a
> >> nice option. The main concern is users who don't understand the
> limitations
> >> of machine-translation and whom must resist the urge to just copy
> >
> > It should be possible, perhaps even preferred, to show translation of
> the most
> > common descriptions, done on translatewiki. Thus all the descriptions
> like
> > "Wikipedia disambiguation page", "Wikimedia category" etc could be
> visible in
> > all languages.
> >
>
> I think this (good) example is for a slightly different feature, which
> means that there are 2 distinct feature-requests:
>
> -
>
> 1) For unique item descriptions (the main focus of this mailing list
> thread), we want to find a way to "suggest" descriptions to editors,
> based on machine-translations of existing descriptions in other
> languages.
>
> 1a) This could be a new task in phabricator? (per discussion in this
> thread)
>
> 1b) (Probably a very-long-term goal?) This could also perhaps be
> https://phabricator.wikimedia.org/T64695 "Draft a computer-assisted
> translation system for Wikidata labels/descriptions"
> which discusses the scaling problems, and suggests that we might
> EVENTUALLY want semi-automated description updates, at least in some
> items, similar to how Reasonator works.
> I suspect it would be best to keep those 2 ideas separate, hence I
> suggest filing a new task for (1a).
>
> --
>
> 2) A way for generic description translations, to be automatically
> added to some items.
>
> 2a) For very common & wikimedia-focused descriptions, this seems to be
> /periodically/ handled by bots.
> E.g. for Disambiguation items, it looks like User:MilanBot currently
> handles this task, for example:
> * https://www.wikidata.org/w/index.php?title=Q260478=history
> * https://www.wikidata.org/wiki/Wikidata:Requests_for_
> permissions/Bot/MilanBot
> E.g. for Category items, it looks like ValterVBot currently handles
> this task, for example:
> * https://www.wikidata.org/w/index.php?title=Q6939670=
> 198113824=197219107
> * https://www.wikidata.org/wiki/Wikidata:Requests_for_
> permissions/Bot/ValterVBot
>
> This task, https://phabricator.wikimedia.org/T139912 seems to track
> the idea of properly automating it all, and it links to an onwiki
> discussion that has many more details. I don't understand the
> technical discussions, or current state of development, enough to even
> attempt to summarize.
>
>
> 2b) For other common descriptions, these translations all seem to be
> manually added?
> E.g. for items with the description "scientific journal article" or
> "scientific article".
> * https://www.wikidata.org/wiki/Q28510879 and
> https://www.wikidata.org/wiki/Q28579322 and
> https://www.wikidata.org/wiki/Q28298612 and I think thousands more?
> However, these are probably not a best practice that we want to
> encourage, per https://www.wikidata.org/wiki/Help:Description and per
> some of the descriptions in other languages being more precise (e.g.
> "vedecký článok (publikovaný 2009-01)" ).
> Therefore, this (2b) cluster probably belongs more with the (1a/1b)
> set of feature-requests, and should not be mass-replicated across
> Wikidata.
>
>
> I hope that's mostly accurate...
> Quiddity
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-21 Thread Nick Wilson (Quiddity)
On Mon, Feb 20, 2017 at 9:59 PM, Smolenski Nikola  wrote:
> Citiranje "Nick Wilson (Quiddity)" :
>> 2) Translation
>> I also agree that a machine-translation /suggestion/ or /hint/ would be a
>> nice option. The main concern is users who don't understand the limitations
>> of machine-translation and whom must resist the urge to just copy
>
> It should be possible, perhaps even preferred, to show translation of the most
> common descriptions, done on translatewiki. Thus all the descriptions like
> "Wikipedia disambiguation page", "Wikimedia category" etc could be visible in
> all languages.
>

I think this (good) example is for a slightly different feature, which
means that there are 2 distinct feature-requests:

-

1) For unique item descriptions (the main focus of this mailing list
thread), we want to find a way to "suggest" descriptions to editors,
based on machine-translations of existing descriptions in other
languages.

1a) This could be a new task in phabricator? (per discussion in this thread)

1b) (Probably a very-long-term goal?) This could also perhaps be
https://phabricator.wikimedia.org/T64695 "Draft a computer-assisted
translation system for Wikidata labels/descriptions"
which discusses the scaling problems, and suggests that we might
EVENTUALLY want semi-automated description updates, at least in some
items, similar to how Reasonator works.
I suspect it would be best to keep those 2 ideas separate, hence I
suggest filing a new task for (1a).

--

2) A way for generic description translations, to be automatically
added to some items.

2a) For very common & wikimedia-focused descriptions, this seems to be
/periodically/ handled by bots.
E.g. for Disambiguation items, it looks like User:MilanBot currently
handles this task, for example:
* https://www.wikidata.org/w/index.php?title=Q260478=history
* https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/MilanBot
E.g. for Category items, it looks like ValterVBot currently handles
this task, for example:
* 
https://www.wikidata.org/w/index.php?title=Q6939670=198113824=197219107
* https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/ValterVBot

This task, https://phabricator.wikimedia.org/T139912 seems to track
the idea of properly automating it all, and it links to an onwiki
discussion that has many more details. I don't understand the
technical discussions, or current state of development, enough to even
attempt to summarize.


2b) For other common descriptions, these translations all seem to be
manually added?
E.g. for items with the description "scientific journal article" or
"scientific article".
* https://www.wikidata.org/wiki/Q28510879 and
https://www.wikidata.org/wiki/Q28579322 and
https://www.wikidata.org/wiki/Q28298612 and I think thousands more?
However, these are probably not a best practice that we want to
encourage, per https://www.wikidata.org/wiki/Help:Description and per
some of the descriptions in other languages being more precise (e.g.
"vedecký článok (publikovaný 2009-01)" ).
Therefore, this (2b) cluster probably belongs more with the (1a/1b)
set of feature-requests, and should not be mass-replicated across
Wikidata.


I hope that's mostly accurate...
Quiddity

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-20 Thread Smolenski Nikola
Citiranje "Nick Wilson (Quiddity)" :
> 2) Translation
> I also agree that a machine-translation /suggestion/ or /hint/ would be a
> nice option. The main concern is users who don't understand the limitations
> of machine-translation and whom must resist the urge to just copy

It should be possible, perhaps even preferred, to show translation of the most
common descriptions, done on translatewiki. Thus all the descriptions like
"Wikipedia disambiguation page", "Wikimedia category" etc could be visible in
all languages.



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-20 Thread Nick Wilson (Quiddity)
1) Gap:
I do agree it would be good to promote these backlogs, as two of the
easiest ones for newcomers to work on. (Although there are guidelines and
best-practices, and any backlog promotion should clearly point to those
documentation pages, so that newcomers can have a ready-reference).

2) Translation
I also agree that a machine-translation /suggestion/ or /hint/ would be a
nice option. The main concern is users who don't understand the limitations
of machine-translation and whom must resist the urge to just copy
(This goes for both language-fluency, but also for technical-vocabulary
fluency, e.g. I could not give a confident description of most chemistry or
physics articles, even with numerous machine-translation-based suggestions
or the article itself!)

I can't see anything specifically about this in Phabricator, so it's
probably worth filing a feature request, unless someone else points out a
task I missed, or raises an overwhelming concern. [Note: a semi-related
task to link in the SeeAlso of the new one: T71345]

3) Tools:
Is it currently possible to get a list of items without a label/description
in language X?
I tried a few weeks ago, and the onwiki Special pages were broken. I filed
https://phabricator.wikimedia.org/T157884 "Nothing loads on
Special:EntitiesWithoutDescription
or Special:EntitiesWithoutLabel results"​ to cover this problem.

Ah, I now see https://tools.wmflabs.org/wikidata-terminator/? which works
for missing descriptions.
However the "with missing labels" set of links seems to be broken for most
languages. Sjoerd filed
https://bitbucket.org/magnusmanske/wikidata-todo/issues/45/terminator-top-1000-linked-items-with
and I've added some example links.

The other set of links that are listed, are all outdated (
https://www.wikidata.org/wiki/Wikidata:WikiProject_Labels_and_descriptions#List_of_items_without_labels_and.2For_descriptions
and below)

I wonder if we should add a link to
https://tools.wmflabs.org/wikidata-game/distributed/#game=23 ("Kaspar's
Persondata game: Descriptions") in that list? AFAIK it only contains
English suggestions though.

Are there any other tools which help with listing or processing these
particular backlogs?


Quiddity
(Volunteer hat. This is just the address I use to subscribe to this list)
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Label gaps on Wikidata

2017-02-19 Thread Smolenski Nikola
Citiranje Romaine Wiki :
> If you look in the recent changes, most items have labels in English and
> those are shown in the recent changes and elsewhere (so we know what the
> item is about without opening first). But not all items have labels, and
> these items without English label are often items with only a label in
> Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant gap.
> 
> Is there a way to easily make a transcription from one language to another?
> Or alternatively if there is a database that has such transcriptions?

There is in many cases, however there are some problems associated with it. You
may not know what is the original language to transcribe from, you might need a
translation rather than transcription, if there are multiple labels you have no
way to choose between them.



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata