On Sat, Aug 22, 2009 at 12:24 PM, Brian <brian.min...@colorado.edu> wrote:
> On Sat, Aug 22, 2009 at 12:05 PM, Gwern Branwen <gwe...@gmail.com> wrote: > >> >> I tried this out the other day; it's a very cool idea, but by and >> large, it seems that this hacker doesn't have enough CPU power to >> extract the really good wikilinks, the ones that aren't already linked >> inside the article. (eg. if I try it on [[Encyclopedia of the Brethren >> of Purity]], I have to go all the way down to find a suggestion which >> isn't already linked by the article.) >> >> Perhaps in a decade we'll have enough computing power on the servers >> that this could be a plugin - we'd then have auto-generated See Alsos, >> which would be really cool. >> >> -- >> gwern >> > > A fancy technique called Latent Dirichlet Allocation can be used to find > links that aren't already linked in the document themselves. I did this for > a class project. Here is an expert from the paper which also shows you the > latent connections it found for the Simple article on hippies. > > http://upload.wikimedia.org/wikipedia/meta/2/25/LDA-Wiki-Search.png > > I note that Google has released parallel lda so its not feasible to run it > on all of wikipedia using an ordinary Beowulf cluster. > http://code.google.com/p/plda/ > * now feasible _______________________________________________ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l