Re: [Wikidata-l] Question about wikipedia categories.

Michael Hale Mon, 06 May 2013 13:06:46 -0700

I added a section for Wikidata query potential. I'd estimate we've imported 
about 1/4 to 1/3 of the data we'd need to start getting comparable results for 
a significant number of categories. I think we should iterate on the query 
prototype considering scenarios where you link to an existing query and want to 
modify it.


> Date: Mon, 6 May 2013 15:21:23 -0400
> From: voldr...@gmail.com
> To: wikidata-l@lists.wikimedia.org
> Subject: Re: [Wikidata-l] Question about wikipedia categories.
> 
> Michael, that's really closely in line with what I was thinking.  Why
> don't you take a crack at improving
> http://meta.wikimedia.org/wiki/Talk:Beyond_categories?
> 
> I am not sure if this is just a crazy pipe dream or not, but I can't
> help but be a little bit excited at the possibility that it might
> actually get done, and I think it would be a huge improvement.
> 
> 
> On Mon, May 6, 2013 at 2:32 PM, Michael Hale <hale.michael...@live.com> wrote:
> > I agree they are extremely useful for many scenarios already. Earlier today
> > I sorted the human proteins category by popularity, and by reading the
> > articles for the most popular ones that I didn't know I felt like I was
> > browsing the table of contents of a live molecular biology book that was
> > more comprehensive than any existing book in print. I do think we are on
> > track for undeniable improvements though. Arnold Schwarzenegger is in about
> > 40 categories right now. His Wikidata item has about 20 statements.
> > Eventually, at least all of the information you can gleam from those
> > categories will be contained in the statements on Wikidata. Then we could
> > update the pages so that the links at the bottom aren't to relevant
> > categories, but are to relevant queries. At first, it would look sort of the
> > same. You can click on the 20th-century American actors category now, and
> > you could click on the 20th-century American actors query in the future. But
> > when you get to the query page you can easily specialize or generalize the
> > query with another click in many more directions than are currently
> > supported in the category system. Right now, I can specialize the pages I
> > see by going to the subcategory for American silent film actors. I can
> > generalize the pages I see by going to a supercategory that drops the
> > American requirement, the actor requirement, or the 20th century
> > requirement. But if your first click away from the article doesn't take you
> > to a category, but instead takes you to a query page you now have many more
> > options. For example, you could delete the 20th-century requirement and add
> > a politician requirement to the actor requirement. Then you are looking at
> > Americans that are actors and politicians, which you can't do in the
> > category system.
> >
> >> From: p...@ontology2.com
> >> To: wikidata-l@lists.wikimedia.org
> >> Date: Mon, 6 May 2013 18:08:04 +0000
> >
> >> Subject: Re: [Wikidata-l] Question about wikipedia categories.
> >>
> >> From my viewpoint, biases are an issue of statistical sampling.
> >>
> >> Wikipedia is an encyclopedia by humans for humans so of course it has a
> >> anthropocentric background, in which the mass of all the concepts swirling
> >> around the Earth like an atmosphere curves the graph, keeping the Sun in
> >> orbit around our world.
> >>
> >> I find Wikipedia categories useful today, warts and all. They've got
> >> two things going for them:
> >>
> >> (1) Class and out-of-class dichotomies are the atom of ontology.
> >> Well-designed categories have an operational definition that allows class
> >> members to be determined with practically perfect precision
> >> (2) They are densely populated.
> >>
> >> Look at the categories on this guy's web page
> >>
> >> http://en.wikipedia.org/wiki/Arnold_Schwarzenegger
> >>
> >> each one of those categories states a useful and correct fact, even if the
> >> organization of those facts is entirely haphazard.
> >>
> >> For instance, it would be better if he was coded as an "American" and an
> >> "Austrian", "Californian", "Los Angelino" and he is also a "Bodybuilder"
> >> and an "Actor" and a zillion other things and then infer that he was a
> >> "American Bodybuilder", "Austrian Actor" and such. But it's not that easy
> >> because he was an "Austrian soldier" but not an "American soldier" and I'd
> >> feel uncomfortable calling him an "Austrian Politician". A lot of nuance
> >> is
> >> encoded in that sticky mess.
> >>
> >> It's very easy to analyze those categories and produce desired concepts
> >> like
> >> "Car" and "Bodybuilder" from junky categories like "Front-wheel drive
> >> vehicle," "General Motors Concept Cars", "Bodybuilder Actor" and "Actor
> >> Bodybuilder", in fact, that's exactly what the semantic web is for.
> >>
> >> There is so much rich and precise information in the categories that you
> >> get
> >> great results despite sampling error caused by low recall in the
> >> categories.
> >>
> >> I'd love to see better structure, but not at the cost of fact density or
> >> precision.
> >>
> >> If we can take advantage of the knowledge in the graph to exert gentle
> >> pressure that improves categorization in Wikipedia that would be great.
> >> It's definitely time for the social industry to move beyond "tags"
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Wikidata-l mailing list
> >> Wikidata-l@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
> >
> > _______________________________________________
> > Wikidata-l mailing list
> > Wikidata-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata-l
> >
> 
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Re: [Wikidata-l] Question about wikipedia categories.

Reply via email to