I added a section for Wikidata query potential. I'd estimate we've imported about 1/4 to 1/3 of the data we'd need to start getting comparable results for a significant number of categories. I think we should iterate on the query prototype considering scenarios where you link to an existing query and want to modify it.
> Date: Mon, 6 May 2013 15:21:23 -0400 > From: voldr...@gmail.com > To: wikidata-l@lists.wikimedia.org > Subject: Re: [Wikidata-l] Question about wikipedia categories. > > Michael, that's really closely in line with what I was thinking. Why > don't you take a crack at improving > http://meta.wikimedia.org/wiki/Talk:Beyond_categories? > > I am not sure if this is just a crazy pipe dream or not, but I can't > help but be a little bit excited at the possibility that it might > actually get done, and I think it would be a huge improvement. > > > On Mon, May 6, 2013 at 2:32 PM, Michael Hale <hale.michael...@live.com> wrote: > > I agree they are extremely useful for many scenarios already. Earlier today > > I sorted the human proteins category by popularity, and by reading the > > articles for the most popular ones that I didn't know I felt like I was > > browsing the table of contents of a live molecular biology book that was > > more comprehensive than any existing book in print. I do think we are on > > track for undeniable improvements though. Arnold Schwarzenegger is in about > > 40 categories right now. His Wikidata item has about 20 statements. > > Eventually, at least all of the information you can gleam from those > > categories will be contained in the statements on Wikidata. Then we could > > update the pages so that the links at the bottom aren't to relevant > > categories, but are to relevant queries. At first, it would look sort of the > > same. You can click on the 20th-century American actors category now, and > > you could click on the 20th-century American actors query in the future. But > > when you get to the query page you can easily specialize or generalize the > > query with another click in many more directions than are currently > > supported in the category system. Right now, I can specialize the pages I > > see by going to the subcategory for American silent film actors. I can > > generalize the pages I see by going to a supercategory that drops the > > American requirement, the actor requirement, or the 20th century > > requirement. But if your first click away from the article doesn't take you > > to a category, but instead takes you to a query page you now have many more > > options. For example, you could delete the 20th-century requirement and add > > a politician requirement to the actor requirement. Then you are looking at > > Americans that are actors and politicians, which you can't do in the > > category system. > > > >> From: p...@ontology2.com > >> To: wikidata-l@lists.wikimedia.org > >> Date: Mon, 6 May 2013 18:08:04 +0000 > > > >> Subject: Re: [Wikidata-l] Question about wikipedia categories. > >> > >> From my viewpoint, biases are an issue of statistical sampling. > >> > >> Wikipedia is an encyclopedia by humans for humans so of course it has a > >> anthropocentric background, in which the mass of all the concepts swirling > >> around the Earth like an atmosphere curves the graph, keeping the Sun in > >> orbit around our world. > >> > >> I find Wikipedia categories useful today, warts and all. They've got > >> two things going for them: > >> > >> (1) Class and out-of-class dichotomies are the atom of ontology. > >> Well-designed categories have an operational definition that allows class > >> members to be determined with practically perfect precision > >> (2) They are densely populated. > >> > >> Look at the categories on this guy's web page > >> > >> http://en.wikipedia.org/wiki/Arnold_Schwarzenegger > >> > >> each one of those categories states a useful and correct fact, even if the > >> organization of those facts is entirely haphazard. > >> > >> For instance, it would be better if he was coded as an "American" and an > >> "Austrian", "Californian", "Los Angelino" and he is also a "Bodybuilder" > >> and an "Actor" and a zillion other things and then infer that he was a > >> "American Bodybuilder", "Austrian Actor" and such. But it's not that easy > >> because he was an "Austrian soldier" but not an "American soldier" and I'd > >> feel uncomfortable calling him an "Austrian Politician". A lot of nuance > >> is > >> encoded in that sticky mess. > >> > >> It's very easy to analyze those categories and produce desired concepts > >> like > >> "Car" and "Bodybuilder" from junky categories like "Front-wheel drive > >> vehicle," "General Motors Concept Cars", "Bodybuilder Actor" and "Actor > >> Bodybuilder", in fact, that's exactly what the semantic web is for. > >> > >> There is so much rich and precise information in the categories that you > >> get > >> great results despite sampling error caused by low recall in the > >> categories. > >> > >> I'd love to see better structure, but not at the cost of fact density or > >> precision. > >> > >> If we can take advantage of the knowledge in the graph to exert gentle > >> pressure that improves categorization in Wikipedia that would be great. > >> It's definitely time for the social industry to move beyond "tags" > >> > >> > >> > >> > >> _______________________________________________ > >> Wikidata-l mailing list > >> Wikidata-l@lists.wikimedia.org > >> https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > > > _______________________________________________ > > Wikidata-l mailing list > > Wikidata-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l