I think this is a really cool idea. I don't know of other similar tools but
it does sound like something that should be a good fit for elasticsearch.

On Fri, Sep 29, 2017 at 9:34 AM Guilherme Gonçalves <
guilherme.p.g...@gmail.com> wrote:

> Hi everyone,
>
> I've been hacking on a new tool and I thought I'd share what (little) I
> have so far to get some comments and learn of related approaches from the
> community.
>
> The basic idea would be to have a browser extension that tells the user if
> the current page they're viewing looks like a good reference for a
> Wikipedia article, for some whitelisted domains like news websites. This
> would hopefully prompt casual/opportunistic edits, especially for articles
> that may be overlooked normally.
>
> As a proof of concept for a backend, I built a simple bag-of-words model
> of the TextExtracts of enwiki's
> Category:All_articles_needing_additional_references. I then set up a tool
> [1] to receive HTML input and retrieve the 5 most similar articles to that
> input. You can try it out in your browser [2], or on the command line [3].
> The results could definitely be better, but having tried it on a few
> different articles over the past few days, I think there's some potential
> there.
>
> I'd be interested in hearing your thoughts on this. Specifically:
>
> * If such a backend/API were available, would you be interested in using
> it for other tools? If so, what functionality would you expect from it?
> * I'm thinking of just throwing away the above proof of concept and using
> ElasticSearch, though I don't know a lot about it. Is anyone aware of a
> similar dataset that already exists there, by any chance? Or any reasons
> not to go that way?
> * Any other comments on the overall idea or implementation?
>
> Thanks!
>
> 1- https://github.com/eggpi/similarity
> 2- https://tools.wmflabs.org/similarity/
> 3- Example: curl
> https://www.nytimes.com/2017/09/22/opinion/sunday/portugal-drug-decriminalization.html
> | curl -X POST http://tools.wmflabs.org/similarity/search --form "text=<-"
> --
> Guilherme P. Gonçalves
> _______________________________________________
> Cloud mailing list
> Cloud@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/cloud
>
_______________________________________________
Cloud mailing list
Cloud@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to