I think this is a really cool idea. I don't know of other similar tools but it does sound like something that should be a good fit for elasticsearch.
On Fri, Sep 29, 2017 at 9:34 AM Guilherme Gonçalves < guilherme.p.g...@gmail.com> wrote: > Hi everyone, > > I've been hacking on a new tool and I thought I'd share what (little) I > have so far to get some comments and learn of related approaches from the > community. > > The basic idea would be to have a browser extension that tells the user if > the current page they're viewing looks like a good reference for a > Wikipedia article, for some whitelisted domains like news websites. This > would hopefully prompt casual/opportunistic edits, especially for articles > that may be overlooked normally. > > As a proof of concept for a backend, I built a simple bag-of-words model > of the TextExtracts of enwiki's > Category:All_articles_needing_additional_references. I then set up a tool > [1] to receive HTML input and retrieve the 5 most similar articles to that > input. You can try it out in your browser [2], or on the command line [3]. > The results could definitely be better, but having tried it on a few > different articles over the past few days, I think there's some potential > there. > > I'd be interested in hearing your thoughts on this. Specifically: > > * If such a backend/API were available, would you be interested in using > it for other tools? If so, what functionality would you expect from it? > * I'm thinking of just throwing away the above proof of concept and using > ElasticSearch, though I don't know a lot about it. Is anyone aware of a > similar dataset that already exists there, by any chance? Or any reasons > not to go that way? > * Any other comments on the overall idea or implementation? > > Thanks! > > 1- https://github.com/eggpi/similarity > 2- https://tools.wmflabs.org/similarity/ > 3- Example: curl > https://www.nytimes.com/2017/09/22/opinion/sunday/portugal-drug-decriminalization.html > | curl -X POST http://tools.wmflabs.org/similarity/search --form "text=<-" > -- > Guilherme P. Gonçalves > _______________________________________________ > Cloud mailing list > Cloud@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/cloud >
_______________________________________________ Cloud mailing list Cloud@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/cloud