Re: [Wikitech-l] The never-dying topic: category intersection (been there done that)

2008-12-02 Thread Daniel Schwen
> SELECT * FROM ( SELECT page_title,count(cl_to) AS cnt FROM
> page,categorylinks WHERE page_id=cl_from AND cl_to in ( "Frau" ,
> "Geboren_1901" , "Gestorben_1986" ) GROUP BY cl_from ) AS tbl1 WHERE
> tbl1.cnt = 3 ;
Mh, yeah, that is pretty much the same idea that my
http://toolserver.org/~dschwen/intersection/
uses

Except that I'm using several queries into a temporary table instead of 
assembling one query with subqueries (plus it also supports 
link-intersection).

But, then again my too supports deep indexing, which _needs_ multiple queries. 
So I opted for flexibility here.

> Task: On German Wikipedia (yay atomic categories!), find women who
Yeah this is all nice and fine, but we've discussed this issue ad nauseam:

* Atomic categories = _trivial_ intersection
* Non-atomic categories = total bullshit that makes me vomit (sorry guys!)

I find it a little frustrating that this wheel gets reinvented so often. My 
tool was used a couple of times after I posted it, and now as maybe one user 
per day (from a quick glance at the logs). What is going on here? I'm stating 
to think tht nobody actually gives a damn about category intersection, except 
for a couple of vocal people on the mailing list. And out of these only a 
fraction actually _works_ on the problem.

So we have shown multiple times now that cat intersection is technically 
feasible. What we nee now is massive lobbying for atomic categorisation. 
THAT is the hurdle right now IMO. Not some SQL queries.

-- 
[[en:User:Dschwen]]
[[de:Benutzer:Dschwen]]
[[commons:User:Dschwen]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that)

2008-12-02 Thread Aryeh Gregor
On Tue, Dec 2, 2008 at 11:01 AM, Daniel Schwen <[EMAIL PROTECTED]> wrote:
> So we have shown multiple times now that cat intersection is technically
> feasible. What we nee now is massive lobbying for atomic categorisation.
> THAT is the hurdle right now IMO. Not some SQL queries.

I'd say that what we need is someone to add proper support for this to
the core software and get it enabled on Wikimedia sites, actually.  A
toolserver tool is just not the same as having the feature integrated
into the software, in terms of usage levels.  It might be that the
implementations written so far are not efficient enough for enabling
on Wikimedia, but nobody with commit access has even tried.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that)

2008-12-02 Thread Daniel Schwen
> I'd say that what we need is someone to add proper support for this to
> the core software and get it enabled on Wikimedia sites, actually.  A

Then I suggest that Magnus immediately stops working on it, or else his curse 
of never getting anything into the core might strike ;-)
-- 
[[en:User:Dschwen]]
[[de:Benutzer:Dschwen]]
[[commons:User:Dschwen]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that)

2008-12-02 Thread Aryeh Gregor
On Tue, Dec 2, 2008 at 11:20 AM, Daniel Schwen <[EMAIL PROTECTED]> wrote:
> Then I suggest that Magnus immediately stops working on it, or else his curse
> of never getting anything into the core might strike ;-)

Doesn't he have commit access?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that)

2008-12-02 Thread Daniel Schwen
> > curse of never getting anything into the core might strike ;-)
> Doesn't he have commit access?
True, but is gems never seem to get enabled...
-- 
[[en:User:Dschwen]]
[[de:Benutzer:Dschwen]]
[[commons:User:Dschwen]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that)

2008-12-02 Thread Magnus Manske
On Tue, Dec 2, 2008 at 4:33 PM, Aryeh Gregor
<[EMAIL PROTECTED]> wrote:
> On Tue, Dec 2, 2008 at 11:20 AM, Daniel Schwen <[EMAIL PROTECTED]> wrote:
>> Then I suggest that Magnus immediately stops working on it, or else his curse
>> of never getting anything into the core might strike ;-)
>
> Doesn't he have commit access?

Yes, but Brion has revert access :-)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that)

2008-12-02 Thread Nikola Smolenski
On Tuesday 02 December 2008 17:01:30 Daniel Schwen wrote:
> I find it a little frustrating that this wheel gets reinvented so often. My
> tool was used a couple of times after I posted it, and now as maybe one
> user per day (from a quick glance at the logs). What is going on here? I'm
> stating to think tht nobody actually gives a damn about category
> intersection, except for a couple of vocal people on the mailing list. And
> out of these only a fraction actually _works_ on the problem.

Perhaps just category intersection isn't enough. I was thinking about a tool 
that would allow intersection of various article data, including the 
categories.

For example, suppose that I am maintaining [[1991 in art]]. I would want to 
find all articles that link to [[1991]] and are in a subcategory of 
[[:Category:Visual arts]]. And don't even get me started about what could be 
done if template parameters would be recorder somewhere...

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that)

2008-12-02 Thread Lars Aronsson
Daniel Schwen wrote:

> I find it a little frustrating that this wheel gets reinvented 
> so often. My tool was used a couple of times after I posted it, 
> and now as maybe one user per day (from a quick glance at the 

Users of the Swedish Wikipedia are increasingly starting to use 
Duesentrieb's CatScan tool.  It is really useful, but could need 
some further improvement, especially in the handling of large 
categories.

> So we have shown multiple times now that cat intersection is 
> technically feasible. What we nee now is massive lobbying for 
> atomic categorisation. THAT is the hurdle right now IMO. Not 
> some SQL queries.

After a lengthy discussion (over many years) about category:tennis 
players and category:female tennis players in the Swedish 
Wikipedia, I created in late August 2008 the category:men and 
category:women, so that all profession categories could be freed 
from the burden of also documenting the gender.  The Swedish 
Wikipedia still has a category:Danish tennis players (combining 
profession and nationality), just like the English Wikipedia, but 
gender is now documented separately, as in the German Wikipedia.

All three languages have a category:1942 births.  I think no 
language of Wikipedia has a combined category for tennis players 
born in 1942.  So the question of atomic categories is not an 
absolute.  It is more or less implemented everywhere.  For finding 
tennis players born in 1942, even the English Wikipedia needs to 
do cross sectioning of categories.

Radically changing the categorization system is not realistic.  
It was a huge effort already to introduce men/women in the Swedish 
Wikipedia, even though this was just adding categories (not 
removing any), and even though Swedish is not among the largest 10 
Wikipedias. Within 3 months (September-November), some 75,000 
articles were categorized, of which 15,000 women and 60,000 men. 
The ratio 1:4 (1 woman for every 4 men) is far more equal than the 
1:6 ratio of the German Wikipedia.

What I discovered then was that of these 75,000 biographies, only 
60,000 were categorized according to year of birth.  So we now 
have to birth categorize 15,000 articles before we can compile 
reliable statistics on how the gender imbalance shifts over time. 
Early estimates show that there is a 1:10 gender ratio in the 18th 
century and a 1:3 ratio for those born in the 1970s.

So the larger imbalance (1:6) of the German Wikipedia might be 
explained by having a larger amount of 18th century biographies.



-- 
  Lars Aronsson ([EMAIL PROTECTED])
  Aronsson Datateknik - http://aronsson.se

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that)

2008-12-02 Thread Daniel Schwen
> born in 1942.  So the question of atomic categories is not an
> absolute.  It is more or less implemented everywhere.  For finding

I'm going to stop you right here!
One word: 'commons'

'Nuff said.
-- 
[[en:User:Dschwen]]
[[de:Benutzer:Dschwen]]
[[commons:User:Dschwen]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that)

2008-12-03 Thread David Gerard
2008/12/3 Aerik <[EMAIL PROTECTED]>:

> I'm with you - we've shown feasibility in large datasets with a lucene based
> approach, and I think we need to roll it out and test it with real users on
> real data.  We need a new lucene index and a user interface (needs to be
> defined) suitable for average users to find useful.  I'm thinking of a "browse
> related categories" type of function.


Write something the Commons cabal(tm) will love and you'll be most
rewarded with joy and happy users and stuff.


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of two)

2008-12-02 Thread Daniel Schwen
> Perhaps just category intersection isn't enough. I was thinking about a
> tool that would allow intersection of various article data, including the
> categories.
>
> For example, suppose that I am maintaining [[1991 in art]]. I would want to
> find all articles that link to [[1991]] and are in a subcategory of
> [[:Category:Visual arts]]. And don't even get me started about what could
> be done if template parameters would be recorder somewhere...

Is anybody actually reading what I write? Skimming apparently does not cut it!

Go to
http://toolserver.org/~dschwen/intersection/

It does precisely that. Category-intersection plus Link-intersection.
Also see:

http://en.wikipedia.org/wiki/Wikipedia:Link_intersection
http://en.wikipedia.org/wiki/Wikipedia:Category_intersection

-- 
[[en:User:Dschwen]]
[[de:Benutzer:Dschwen]]
[[commons:User:Dschwen]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread Roan Kattouw
We had a pretty lengthy discussion about this before the summer, and the 
consensus seemed to be that a fulltext-based approach looked most 
viable. I actually wrote an extension that does that, and promised to 
release it soon; that was quite a few months ago, and I never got around 
to it. I'll release it properly when I have time, which will hopefully 
be before Christmas :D

The code needs some tweaking and refactoring, though. It's pretty 
tightly integrated with the article text search (both functions in one 
form) and has all kinds of weird features, because the guy who paid me 
to write it wanted them. It also doesn't support three-letter word 
searching (which core does these days, using a prefix hack), which is 
pretty bad since categories with short titles (or stopword titles) won't 
be found either.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread Daniel Schwen
> We had a pretty lengthy discussion about this before the summer, and the
> consensus seemed to be that a fulltext-based approach looked most
> viable.

So how does this take care of deep indexing non-atomic categories? 
=>How will this extension be even remotely useful for let's say commons?

This discussion is far from over. The basic problems are _not_ solved. 

I'm sure this thread will die out soon. 
Half of the participants will again be soothed by the promise of some easy 
solution just barely beyond the horizon, while the half that realizes that 
said solution _cannot possibly work_ without a radical reform of the category 
system will again be too annoyed (I'm getting there already) to continue 
discussing.

Deja vue...
-- 
[[en:User:Dschwen]]
[[de:Benutzer:Dschwen]]
[[commons:User:Dschwen]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread David Gerard
2008/12/3 Daniel Schwen <[EMAIL PROTECTED]>:

> I'm sure this thread will die out soon.
> Half of the participants will again be soothed by the promise of some easy
> solution just barely beyond the horizon, while the half that realizes that
> said solution _cannot possibly work_ without a radical reform of the category
> system will again be too annoyed (I'm getting there already) to continue
> discussing.


If the machinery is in place to replace the present ridiculous
sub-sub-sub-categories with something that *does their job just as
well*, they'll die in quite reasonable order.

If the machinery can't completely replace them without editor pain,
it'll fail. If it can, it won't and Commons will be ENORMOUSLY happy
'cos we can then go wild treating cats like tags!


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread Roan Kattouw
Daniel Schwen schreef:
>> We had a pretty lengthy discussion about this before the summer, and the
>> consensus seemed to be that a fulltext-based approach looked most
>> viable.
>> 
>
> So how does this take care of deep indexing non-atomic categories? 
>   
Err.. what? Please explain what you mean by that.
> =>How will this extension be even remotely useful for let's say commons?
>   
Without addressing Commons in particular, having an efficient way to get 
pages in the intersection of multiple categories would allow wikis to 
delete a category such as [[Category:Deceased Presidents of the United 
States]] and replace it by, say, [[Intersection:Deceased Presidents of 
the United States]], which would list all articles in 
[[Category:Deceased people]] and [[Category:Presidents of the United 
States]]. My extension alone doesn't make that possible, but it makes 
implementing such a feature considerably easier.
> This discussion is far from over. The basic problems are _not_ solved. 
>   
Would you care to elaborate on what those unsolved problems are?
> I'm sure this thread will die out soon. 
> Half of the participants will again be soothed by the promise of some easy 
> solution just barely beyond the horizon, while the half that realizes that 
> said solution _cannot possibly work_ without a radical reform of the category 
> system will again be too annoyed (I'm getting there already) to continue 
> discussing.
It would be nice if you didn't judge people as naive rightaway.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread Aryeh Gregor
On Wed, Dec 3, 2008 at 10:59 AM, Daniel Schwen <[EMAIL PROTECTED]> wrote:
> So how does this take care of deep indexing non-atomic categories?
> =>How will this extension be even remotely useful for let's say commons?

That's a social problem, and so of secondary importance.  Once a
technical mechanism exists for solving the problem given a particular
type of categories, recategorization will happen, sooner or later.  If
you think people will flat-out refuse to move to a new, better system,
I think you're mistaken: look at the completeness of the move from
lists to categories, for instance, when categories were first
introduced.  (Lists are still used, but in most cases only where they
do things that categories currently cannot.)  The same goes for all
the other useful technical innovations that get introduced.  All it
would take is running some bots for a while to switch to the better
system, not a big cost for a large wiki like Commons with plenty of
bot operators.

On a technical level, dealing with non-atomic categories is a much
bigger pain than dealing with atomic ones.  On a social level, on the
other hand, they're equally doable, as dewiki shows.  There will be
transition costs for wikis that have a large body of non-atomic
categories, but those will be one-time only.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread Gregory Maxwell
On Wed, Dec 3, 2008 at 11:05 AM, Roan Kattouw <[EMAIL PROTECTED]> wrote:
> Without addressing Commons in particular, having an efficient way to get
> pages in the intersection of multiple categories would allow wikis to
> delete a category such as [[Category:Deceased Presidents of the United
> States]] and replace it by, say, [[Intersection:Deceased Presidents of
> the United States]], which would list all articles in
> [[Category:Deceased people]] and [[Category:Presidents of the United
> States]]. My extension alone doesn't make that possible, but it makes
> implementing such a feature considerably easier.
[snip]

We've had tools like this on toolserver before, with decent
performance and the ability to be embedded into commons via cross site
JS hacks,  and been told in no uncertain terms that the community
policy is "do not over categorize; things should be placed in the
fewest and most specific categories possible".  On commons there are
quite a few contributors who spend all of their time converting the
set of categories on an image to the one or two most specific
categories.

Please pardon Dschwen's frustration: because it seems like people are
constantly waving their arms and saying that there will be some
wonderful technical solution right around the corner for the problems
created by the current categorization approach (never mind that some
of them, such as the extreme semantic drift, are unsolvable with a
technical solution).

For commons, and a lesser degree other projects, the limiting factor
in the usability of an intersection tool is less the lack of one and
more the insistence of the userbase of using categories in a manner
which is generally incompatible with them.

For the purposes of MediaWiki these factors are not important, I
suppose, but it does explain the sceptical response.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread Daniel Schwen
> the other useful technical innovations that get introduced.  All it
> would take is running some bots for a while to switch to the better
> system, not a big cost for a large wiki like Commons with plenty of
> bot operators.

I'd like for you to be right. But switching from the present category system 
to atomic categories is not as straight forward as having a few bots run over 
all existing cats.

It will require an enormous amount of work. And so far I have not met 
willingness to change anything. Greg has shown a long time ago that fast 
category intersection is doable, but the echo has been pretty much zip, nada.

Just note that simply replacing a category with all of it super categories is 
a dead end. You wouldn't believe the twists and turns in the category tree. 
Amusing example have been posted on this list already.

So, yeah, sorry for my tone. I've pretty much kept my cool for the last N 
incarnations of this debate, but after repeating all the arguments for atomic 
cats and intersections and seeing zero improvement I'm getting a little 
frustrated. Call it "empiric evidence" rather than "assuming people to be 
naive" ;-)

-- 
[[en:User:Dschwen]]
[[de:Benutzer:Dschwen]]
[[commons:User:Dschwen]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread Aryeh Gregor
On Wed, Dec 3, 2008 at 11:43 AM, Daniel Schwen <[EMAIL PROTECTED]> wrote:
> I'd like for you to be right. But switching from the present category system
> to atomic categories is not as straight forward as having a few bots run over
> all existing cats.

Of course, humans would have to manually specify which new categories
each old one corresponds to, but that's a perfectly doable job for a
small group of volunteers working over the course of months.  The bots
would do the much more tedious work of actually replacing them, so
each category could take substantially less than a minute of human
review.  The category intersection feature would then get
incrementally more useful as the work progressed.

> It will require an enormous amount of work. And so far I have not met
> willingness to change anything. Greg has shown a long time ago that fast
> category intersection is doable, but the echo has been pretty much zip, nada.

There's a world of difference between showing that something is
feasible in theory, and making it a core part of the software that's
visible on every category page on every Wikimedia wiki without asking
for community consensus in advance.  As soon as people actually start
using the feature, and they will if there's a box on every category
page, they'll realize that it would be way more useful if they changed
how things are categorized.  As long as category intersections remain
vaporware, there's no incentive to change.  A technical fait accompli
will bring about change.

Even if Commons hypothetically didn't go along with the scheme, it
would be valuable to have it in the software anyway.  Plenty of wikis
could still use it, like dewiki.  We need an interface and we need a
backend and we need someone to hook them together and commit them to
Subversion.  People have spent too much time inventing and reinventing
and re-reinventing new and different but basically interchangeable
backends, and too little time on the other parts of the problem.  If
the feature were committed to the software with a completely brainless
backend unusable on Wikimedia wikis, I predict it would be live on all
sites in less than six months.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread Daniel Schwen
> how things are categorized.  As long as category intersections remain
> vaporware, there's no incentive to change.  A technical fait accompli
> will bring about change.

Uhm, yeah.. except that intersection of atomic categories are not vaporware. 
We had proofs of concept for that and the interest was marginal.

In any case. If someone would really just shoved it into mw core and enabled 
it on all the wmf sites I'd be happy. I concur that it would make the job 
convincing useres of a less retarded categorization scheme a bit easier.

As far as Aeriks soapboxing from a few emails back goes: Let's not kid 
ourselves, tag based categorization is standard on commercial sites such as 
stockphotography libraries. We are not exactly inventing this...

I'll shut up now, and I really hope that this is the last time we're having 
this discussion... (but boy, you will get an earfull if it isn't ;-) )
-- 
[[en:User:Dschwen]]
[[de:Benutzer:Dschwen]]
[[commons:User:Dschwen]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread David Gerard
2008/12/4 Daniel Schwen <[EMAIL PROTECTED]>:

>> how things are categorized.  As long as category intersections remain
>> vaporware, there's no incentive to change.  A technical fait accompli
>> will bring about change.

> Uhm, yeah.. except that intersection of atomic categories are not vaporware.
> We had proofs of concept for that and the interest was marginal.


It's vaporware until it's usable as a tagging system in practice.


> In any case. If someone would really just shoved it into mw core and enabled
> it on all the wmf sites I'd be happy. I concur that it would make the job
> convincing useres of a less retarded categorization scheme a bit easier.
> As far as Aeriks soapboxing from a few emails back goes: Let's not kid
> ourselves, tag based categorization is standard on commercial sites such as
> stockphotography libraries. We are not exactly inventing this...


This being precisely what Commons has been begging for for a while!


> I'll shut up now, and I really hope that this is the last time we're having
> this discussion... (but boy, you will get an earfull if it isn't ;-) )


The last time will be when there's a feature end-users can use without
going off to the toolserver.


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread Gregory Maxwell
On Wed, Dec 3, 2008 at 8:12 PM, David Gerard <[EMAIL PROTECTED]> wrote:
> The last time will be when there's a feature end-users can use without
> going off to the toolserver.

With a JS hack I had my tool integrated to the site. The AJAX calls
went to the toolserver, but as far as the users could see it was
running on the site. No one cared: It didn't produce useful results
because of how categories are used, and when I suggested changing
people just waved their arms at me "just make it walk the tree".

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread David Gerard
2008/12/4 Gregory Maxwell <[EMAIL PROTECTED]>:
> On Wed, Dec 3, 2008 at 8:12 PM, David Gerard <[EMAIL PROTECTED]> wrote:

>> The last time will be when there's a feature end-users can use without
>> going off to the toolserver.

> With a JS hack I had my tool integrated to the site. The AJAX calls
> went to the toolserver, but as far as the users could see it was
> running on the site. No one cared: It didn't produce useful results
> because of how categories are used, and when I suggested changing
> people just waved their arms at me "just make it walk the tree".


Hmm, I musta missed this. I woulda thought the commons-l habitues
would have swooped upon it with great glee.


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread Ilmari Karonen
Gregory Maxwell wrote:
> 
> With a JS hack I had my tool integrated to the site. The AJAX calls
> went to the toolserver, but as far as the users could see it was
> running on the site. No one cared: It didn't produce useful results
> because of how categories are used, and when I suggested changing
> people just waved their arms at me "just make it walk the tree".

That _is_ curious.  When did this happen?  It seems I also blinked and 
missed it.

-- 
Ilmari Karonen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-03 Thread Alex
Gregory Maxwell wrote:
> On Wed, Dec 3, 2008 at 8:12 PM, David Gerard <[EMAIL PROTECTED]> wrote:
>> The last time will be when there's a feature end-users can use without
>> going off to the toolserver.
> 
> With a JS hack I had my tool integrated to the site. The AJAX calls
> went to the toolserver, but as far as the users could see it was
> running on the site. No one cared: It didn't produce useful results
> because of how categories are used, and when I suggested changing
> people just waved their arms at me "just make it walk the tree".
> 

Its sort of a cycle we're stuck in. There's not much interest in
developing a good category intersection tool for core because the
category system on the larger Wikimedia wikis won't really work well
with it. If we develop it there's the risk of the response being the
same as to yours, basically: "Why should we change all the categories?
Just change the tool."

And there's no incentive to change the category system until we actually
have a category intersection tool in core. If people actually do it
there's the risk that an intersection tool is still a long way off and
we're stuck with less-useful categories (though I personally find the
current system, at least on enwiki, to be mostly useless).

-- 
Alex (wikipedia:en:User:Mr.Z-man)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-04 Thread Tim Landscheidt
"Aryeh Gregor" <[EMAIL PROTECTED]> wrote:

>> I'd like for you to be right. But switching from the present category system
>> to atomic categories is not as straight forward as having a few bots run over
>> all existing cats.

> Of course, humans would have to manually specify which new categories
> each old one corresponds to, but that's a perfectly doable job for a
> small group of volunteers working over the course of months.  The bots
> would do the much more tedious work of actually replacing them, so
> each category could take substantially less than a minute of human
> review.  The category intersection feature would then get
> incrementally more useful as the work progressed.
> [...]

Add to that the maintenance costs because you would want to
ensure that if someone who is not aware of the concept of
atomic categories adds a [[Category:Manhattan]] to something
he adds [[Category:New York]], [[Category:East Coast of the
United States]], [[Category:United States]] and the other
gigazillion umbrella categories as well so searches for a
building in a country bordering a water body will still show
results.

Tim

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-04 Thread David Gerard
2008/12/4 Tim Landscheidt <[EMAIL PROTECTED]>:

> Add to that the maintenance costs because you would want to
> ensure that if someone who is not aware of the concept of
> atomic categories adds a [[Category:Manhattan]] to something
> he adds [[Category:New York]], [[Category:East Coast of the
> United States]], [[Category:United States]] and the other
> gigazillion umbrella categories as well so searches for a
> building in a country bordering a water body will still show
> results.


Which is why we have zillions of obsessive nerdy humans writing the
encyclopedia. Tags are fine, there's nothing wrong intrinsically with
hundreds of tags where appropriate and useful. I suppose presentation
in Monobook will be interesting ...


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-04 Thread Aryeh Gregor
On Wed, Dec 3, 2008 at 7:12 PM, Daniel Schwen <[EMAIL PROTECTED]> wrote:
> Uhm, yeah.. except that intersection of atomic categories are not vaporware.
> We had proofs of concept for that and the interest was marginal.

Vaporware with proofs of concept is still vaporware.  The definition
of vaporware is more or less something that doesn't go *beyond* proofs
of concept.  Category intersection has never been added to the
software and there's no timetable for adding it to the software, so
doing any recategorization right *now* to aid category intersection
would be pointless.  JS thingies may have been enabled on some wikis
for some time periods, but that's very different from a feature being
prominently added to *all* wikis.

On Wed, Dec 3, 2008 at 8:16 PM, Gregory Maxwell <[EMAIL PROTECTED]> wrote:
> With a JS hack I had my tool integrated to the site. The AJAX calls
> went to the toolserver, but as far as the users could see it was
> running on the site. No one cared: It didn't produce useful results
> because of how categories are used, and when I suggested changing
> people just waved their arms at me "just make it walk the tree".

What was the interface like (how noticeable/obtrusive), how long was
it up, and why did it get removed?  You're certainly going to need a
critical mass of people who know about it and use it before there will
be any effect.  And enabling it on all wikis at once would likely
help, too: if Germans get used to using it on dewiki and find it
useful, they'll be more likely to push for it to be made useful on
Commons.

On Thu, Dec 4, 2008 at 7:45 AM, Tim Landscheidt <[EMAIL PROTECTED]> wrote:
> Add to that the maintenance costs because you would want to
> ensure that if someone who is not aware of the concept of
> atomic categories adds a [[Category:Manhattan]] to something
> he adds [[Category:New York]], [[Category:East Coast of the
> United States]], [[Category:United States]] and the other
> gigazillion umbrella categories as well so searches for a
> building in a country bordering a water body will still show
> results.

A reasonable point.  In the medium term it could be handled by (you
guessed it) bots.  In the longer term, allowing people to define more
concrete semantic relationships between categories (e.g., "X is
partitioned into X1, X2, ..., Xn") could make this automatic within
the software itself.


In the end, all of these objections are really irrelevant to the
technical issues here.  The fact of the matter is that category
intersection is widely supported in other major software products (in
the form of tag intersection), it's something that a lot of people
want, and so it would be good if it were in the core software.  How
fully various specific communities would want to use it is up to them
-- that some communities might never choose to use a particular
feature doesn't mean that it shouldn't be developed (cf. FlaggedRevs,
etc.).

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-04 Thread David Gerard
2008/12/4 Aryeh Gregor <[EMAIL PROTECTED]>:
> On Wed, Dec 3, 2008 at 8:16 PM, Gregory Maxwell <[EMAIL PROTECTED]> wrote:

>> With a JS hack I had my tool integrated to the site. The AJAX calls
>> went to the toolserver, but as far as the users could see it was
>> running on the site. No one cared: It didn't produce useful results
>> because of how categories are used, and when I suggested changing
>> people just waved their arms at me "just make it walk the tree".

> What was the interface like (how noticeable/obtrusive), how long was
> it up, and why did it get removed?  You're certainly going to need a
> critical mass of people who know about it and use it before there will
> be any effect.


Evidently at least two of us who were drooling for this feature failed
to become aware of it ...


>  And enabling it on all wikis at once would likely
> help, too: if Germans get used to using it on dewiki and find it
> useful, they'll be more likely to push for it to be made useful on
> Commons.



oh. How to hack the Wikimedia social structure.

(mind you, I'll believe it's a conclusive solution when flagged revs hit en:wp.)


> In the end, all of these objections are really irrelevant to the
> technical issues here.  The fact of the matter is that category
> intersection is widely supported in other major software products (in
> the form of tag intersection), it's something that a lot of people
> want, and so it would be good if it were in the core software.  How
> fully various specific communities would want to use it is up to them
> -- that some communities might never choose to use a particular
> feature doesn't mean that it shouldn't be developed (cf. FlaggedRevs,
> etc.).


Indeed.

Greg, can your thingummy please be switched on again and publicised as
such on commons-l, if that's not impossible?


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-06 Thread Platonides
David Gerard wrote:
> 2008/12/4 Tim Landscheidt:
> 
>> Add to that the maintenance costs because you would want to
>> ensure that if someone who is not aware of the concept of
>> atomic categories adds a [[Category:Manhattan]] to something
>> he adds [[Category:New York]], [[Category:East Coast of the
>> United States]], [[Category:United States]] and the other
>> gigazillion umbrella categories as well so searches for a
>> building in a country bordering a water body will still show
>> results.
> 
> 
> Which is why we have zillions of obsessive nerdy humans writing the
> encyclopedia. Tags are fine, there's nothing wrong intrinsically with
> hundreds of tags where appropriate and useful. I suppose presentation
> in Monobook will be interesting ...
> 
> - d.

If we're going to end up with hundreds of categories on each page, why
not make the software automatically add all parent categories?
It would fill the categorylinks table*, but it would as well by manually
adding them.
It would also require forcing the categories to be a graph and maybe
limiting the number of parent categories, as to reduce a bit how
expensive category position changes can be. But, if we leave that to
'manual actions', the same actions would be performed by bots, leading
to the same cost and partially less coherent structure.

*Add a expandedcategorylinks table?
Probably also add a 'don't inherit' flag on categary table which can be
appplied to high level categories such as 'All licenses' or 'Commons root'.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-06 Thread Gregory Maxwell
On Sat, Dec 6, 2008 at 7:25 AM, Platonides <[EMAIL PROTECTED]> wrote:
> If we're going to end up with hundreds of categories on each page, why
> not make the software automatically add all parent categories?
> It would fill the categorylinks table*, but it would as well by manually
> adding them.
> It would also require forcing the categories to be a graph and maybe
> limiting the number of parent categories, as to reduce a bit how
> expensive category position changes can be. But, if we leave that to
> 'manual actions', the same actions would be performed by bots, leading
> to the same cost and partially less coherent structure.
[snip]

Because adding the parents produces non-sense results because
"categorization" is a flawed concept except at the most fuzzy and
course levels: Reality doesn't fit into neat nested boxes (not even
the N-dimensional ones created by multiple parentage).  The two
primary problems are semantic drift (the further away you get from a
relationship the more not-quite-matching error accumulates), and
multiple link types (we use categories to describe different types of
membership, and while within a type the membership relation is
commutative among types it is usually not).  So with parentages you
get chains like [periodic table]->[hydrogen]->[hydrogen
compounds]->[water]->[places with water]->[beaches]->[beaches in
america]->[beaches of lalaville]->[lalavill
beach]->[Image:Ironmeteor_at_lalavill_beach.jpg]

Is an iron meteor a "beach in america" or a "hydrogen compound"? No.

Offering all the parents with an easy checkbox interface that allows
you quickly adopt all that apply would be great, but forcing their
inclusion would produce rubbish.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-08 Thread Ilmari Karonen
Gregory Maxwell wrote:
>
> Because adding the parents produces non-sense results because
> "categorization" is a flawed concept except at the most fuzzy and
> course levels: Reality doesn't fit into neat nested boxes (not even
> the N-dimensional ones created by multiple parentage).  The two
> primary problems are semantic drift (the further away you get from a
> relationship the more not-quite-matching error accumulates), and
> multiple link types (we use categories to describe different types of
> membership, and while within a type the membership relation is
> commutative among types it is usually not).  So with parentages you
> get chains like [periodic table]->[hydrogen]->[hydrogen
> compounds]->[water]->[places with water]->[beaches]->[beaches in
> america]->[beaches of lalaville]->[lalavill
> beach]->[Image:Ironmeteor_at_lalavill_beach.jpg]
> 
> Is an iron meteor a "beach in america" or a "hydrogen compound"? No.

True, but there are _some_ relationships that should always hold.  All 
dogs are animals.  All integers are numbers.  All places in New York are 
in the United States.  Arguably, any page which is in [[Category:Dogs]] 
but not in [[Category:Animals]] is a failure of atomic categorization.

Of course, there are also many relationships that _don't_ hold so 
strictly.  Most dogs are pets, but not all.  Most places in the United 
States are in North America, but not all.  So, yes, some of the 
consistency checking will have to be done at least partly manually.

But really, I wouldn't worry about this too much.  Sure, having a way to 
enforce some category relationships would be useful, as would 
automatically recommending others.  But even if we don't implement it 
immediately in the software, someone will write a bot (or several) to 
help with it.  It won't be perfect, but I wouldn't expect it to be much 
more broken than the current interlanguage link system, which we 
consider useful enough to keep deployed despite its numerous failings.

(While thinking about this, I thought back to an earlier discussion on 
this list (or possibly wikien-l, can't remember now) about the fact that 
there are essentially two types of categories: thematic and taxonomic. 
For the former, the "tag" model of atomic categorization is quite 
natural, but the latter would fit much more naturally into a strictly 
hierarchical model.  It might not be an entirely unreasonable idea to 
formally split the two, perhaps even into separate namespaces, and apply 
different technical approaches to handling them.)

-- 
Ilmari Karonen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] The never-dying topic: category intersection (been there done that .. to the power of three)

2008-12-08 Thread Platonides
Ilmari Karonen wrote:
> (While thinking about this, I thought back to an earlier discussion on 
> this list (or possibly wikien-l, can't remember now) about the fact that 
> there are essentially two types of categories: thematic and taxonomic. 
> For the former, the "tag" model of atomic categorization is quite 
> natural, but the latter would fit much more naturally into a strictly 
> hierarchical model.  It might not be an entirely unreasonable idea to 
> formally split the two, perhaps even into separate namespaces, and apply 
> different technical approaches to handling them.)

Doing that would probably help in advancing with the system.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l