[Wikidata-l] Classes and Properties browser update

2014-09-08 Thread Markus Krötzsch

Hi,

I just updated the data for the Wikidata classes and properties browser 
[1] -- was about time -- and added some improvements on the way:



(1) Classes and properties are now always ordered by usage (most used 
first), which was not possible to do before. Examples:


** properties related to humans (or anything else with sex or gender) 
ordered by usage:


http://tools.wmflabs.org/wikidata-exports/miga/#_cat=Properties/Related%20properties=sex%20or%20gender

(for properties, usage includes the use in qualifiers and references)

** Most used months:

http://tools.wmflabs.org/wikidata-exports/miga/#_cat=Classes/All%20superclasses=month

Seems that May is most popular so far. Thanks to whoever added the 
pretty pictures :-) You can replace the word month with other things, 
such as band, building, or mythical character to see what kinds of 
these things we have. In fact, the individual pages for the classes will 
also show the same list at their bottom, but without any pictures.



(2) Classes with the same English label are no longer confused. This 
fixes ambiguities and wrong links for many things.



The data is from 1st September.

Cheers,

Markus

[1] http://tools.wmflabs.org/wikidata-exports/miga/
Reload the page (CTRL+R) to get the new data.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Classes and Properties browser update

2014-09-08 Thread Jeroen De Dauw
Hey,

\o/

Where are the source code and issue tracker for this? Probably good if
those where linked from the tool.

If you load this in Firefox, it spends several seconds loading, after which
one gets the use another browser error. Would be nice if this was shown
before the rest was loaded. Of course it'd be much nicer if the biggest
free browser could also be supported.

 http://tools.wmflabs.org/wikidata-exports/miga/#_item=1204

That first shows population. When then clicking on the link, you see the
data type is quantity, not string.

Cheers

--
Jeroen De Dauw - http://www.bn2vs.com
Software craftsmanship advocate
Evil software architect at Wikimedia Germany
~=[,,_,,]:3
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Classes and Properties browser update

2014-09-08 Thread Markus Krötzsch

On 08.09.2014 14:27, Jeroen De Dauw wrote:

Hey,

\o/

Where are the source code and issue tracker for this? Probably good if
those where linked from the tool.


True, but it's not quite in our master branch yet: the code is part of 
the extended WDTK examples module, see


https://github.com/Wikidata/Wikidata-Toolkit/tree/cleaner-examples/wdtk-examples

This currently depends on the yet-to-be-completed branch of WDTK that 
has the support for the new JSON dumps and format:


https://github.com/Wikidata/Wikidata-Toolkit/pull/91

Right now, this still needs more testing before it can be merged. 
Because of the change in the XML dump format, the master branch of WDTk 
is not currently able to process any of the recent dumps, hence the 
example would not work there.


Anyway, you could use the Wikidata Toolkit issue tracker already.



If you load this in Firefox, it spends several seconds loading, after
which one gets the use another browser error. Would be nice if this
was shown before the rest was loaded. Of course it'd be much nicer if
the biggest free browser could also be supported.


Yes, this is because of the tool we use (Miga), which is not part of our 
code. If you were asking for this above, you could have a look at 
http://migadv.com/.




  http://tools.wmflabs.org/wikidata-exports/miga/#_item=1204

That first shows population. When then clicking on the link, you see
the data type is quantity, not string.


Yes, I think this is a bug in how we use IRIs and labels for datatypes. 
The main id now is the label, but the properties all use the IRI to 
refer to the datatype. Seems that this does not work properly in Miga. I 
will try to make a version with the labels used everywhere.


Markus


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Classes and Properties browser update

2014-09-08 Thread Markus Krötzsch

On 08.09.2014 14:53, Markus Krötzsch wrote:
...




  http://tools.wmflabs.org/wikidata-exports/miga/#_item=1204

That first shows population. When then clicking on the link, you see
the data type is quantity, not string.


Yes, I think this is a bug in how we use IRIs and labels for datatypes.
The main id now is the label, but the properties all use the IRI to
refer to the datatype. Seems that this does not work properly in Miga. I
will try to make a version with the labels used everywhere.


Ok, fixed in the code and on the Web (requires reload). The only bug we 
still have is that the recent Monolingual Text datatype is not known 
yet. It appears as Unknown in properties. Will be fixed when we have 
it in the parser.


Markus

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Classes and Properties browser update

2014-09-08 Thread Benjamin Good
How are related properties calculated?

Is the definition of a Class something that has a subclass relationship?
 Or?

Very cool...

-Ben



On Mon, Sep 8, 2014 at 9:24 AM, Markus Krötzsch 
mar...@semantic-mediawiki.org wrote:

 On 08.09.2014 14:53, Markus Krötzsch wrote:
 ...



   http://tools.wmflabs.org/wikidata-exports/miga/#_item=1204

 That first shows population. When then clicking on the link, you see
 the data type is quantity, not string.


 Yes, I think this is a bug in how we use IRIs and labels for datatypes.
 The main id now is the label, but the properties all use the IRI to
 refer to the datatype. Seems that this does not work properly in Miga. I
 will try to make a version with the labels used everywhere.


 Ok, fixed in the code and on the Web (requires reload). The only bug we
 still have is that the recent Monolingual Text datatype is not known yet.
 It appears as Unknown in properties. Will be fixed when we have it in the
 parser.


 Markus

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Classes and Properties browser update

2014-09-08 Thread Markus Krötzsch

On 08.09.2014 19:02, Benjamin Good wrote:

How are related properties calculated?


Let me start with the second question:



Is the definition of a Class something that has a subclass
relationship?  Or?


Basically yes: a class is something that participates in a subclass of 
relation, or that is used as a value for instance of. Moreover, for 
the display, I filter out all the classes that only occur as a subclass 
in subclass of (no own instances or subclasses) to reduce data size a bit.


I calculate related properties for properties and classes. For classes, 
I look at the items that are instance of the class (direct instances, 
suclass of is ignored). For properties, I look at the items that have a 
statement with this property.


For each of the items I look at, I count how often other properties 
(potentially related properties) occur in their statements. From this 
I can compute which ratio of the items (in a class or with a property) 
have some other property. If this ratio is notably higher than the ratio 
of overall items using the property, then I consider it as related.


The idea is to find properties that are typical: they should be 
notably more likely to occur with an item in this class than they would 
be in general. This also helps to filter out properties that occur 
everywhere (boring ones, like image or freebase identifier): they 
are most frequent but not what you want to know most about when you lok 
at a specific class. I have some custom scoring function in the code 
that I tweaked to adjust this until it seemed right, but there is no 
deeper principle behind this.




Very cool...


Thanks :-)

Markus



-Ben



On Mon, Sep 8, 2014 at 9:24 AM, Markus Krötzsch
mar...@semantic-mediawiki.org mailto:mar...@semantic-mediawiki.org
wrote:

On 08.09.2014 14:53, Markus Krötzsch wrote:
...



  
http://tools.wmflabs.org/__wikidata-exports/miga/#_item=__1204
http://tools.wmflabs.org/wikidata-exports/miga/#_item=1204

That first shows population. When then clicking on the
link, you see
the data type is quantity, not string.


Yes, I think this is a bug in how we use IRIs and labels for
datatypes.
The main id now is the label, but the properties all use the IRI to
refer to the datatype. Seems that this does not work properly in
Miga. I
will try to make a version with the labels used everywhere.


Ok, fixed in the code and on the Web (requires reload). The only bug
we still have is that the recent Monolingual Text datatype is not
known yet. It appears as Unknown in properties. Will be fixed when
we have it in the parser.


Markus

_
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org mailto:Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
https://lists.wikimedia.org/mailman/listinfo/wikidata-l




___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l




___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l