On 10/24/2010 8:42 PM, Aryeh Gregor wrote: > My first thought was to write a GPU program to crack MediaWiki > password hashes as quickly as possible, then use what we've studied in > class about GPU architecture to design a hash function that would be > as slow as possible to crack on a GPU relative to its PHP execution > speed, as Tim suggested a while back. However, maybe there's > something more interesting I could do.
Boring. I want Wikipedia converted into facts in a representation system that supports modal, temporal, and "microtheory" reasoning. You know, in the "real" world, :James_T_Kirk is a :Fictional_Character, but in the Star Trek universe, he's a :Person. Of course, you'd have to pick some chunk out of that big task that's doable. One thing I'd like is something that extracts the "meaning" of hyperlinks. For instance, if we look at http://en.wikipedia.org/wiki/Bruce_Lee We see a link to :Wong_Jack_Man, and in dbpedia right now, this is represented as a unidirectional hyperlink w/o semantics. Now, a smarter system could say :Bruce_Lee :Had_A_Fight_With :Wong_Jack_Man. Although wikipedia is a relatively difficult text to work with with typical BOW and NLP methods, it's got enough semantic structure that hybrid semantic-BOW/NLP methods ought to be able to work miracles. I think that the way hyperlinks are used in text could be used to learn templates for detecting named entity references. I think it also ought to be possible to build linguistic models for classification. For instance, if you're having trouble telling your Jaguars apart, http://en.wikipedia.org/wiki/Jaguar_(disambiguation) <http://en.wikipedia.org/wiki/Jaguar_%28disambiguation%29> and related documents might help you make a filter that can tell the difference between "jaguar the cat" and "jaguar the car". _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l