Re: [Wiki-research-l] Content similarity between two Wikipedia articles

2019-05-07 Thread Kerry Raymond
1:35 AM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Content similarity between two Wikipedia articles Hey Haifeng, On top of all the excellent answers provided, I'd also add that the answer to your question depends on what you want to use the simil

Re: [Wiki-research-l] Content similarity between two Wikipedia articles

2019-05-07 Thread Isaac Johnson
Hey Haifeng, On top of all the excellent answers provided, I'd also add that the answer to your question depends on what you want to use the similarity scores for. For some insight into what it might mean to make choose one approach over another, see this recent publication: https://dl.acm.org/cita

Re: [Wiki-research-l] Content similarity between two Wikipedia articles

2019-05-07 Thread fn
Dear Haifeng, Would you not be able to use ordinary information retrieval techniques such as bag-of-words/phrases and tfidf? Explicit semantic analysis (ESA) uses this approach (though its primary focus is word semantic similarity). There are a few papers for ESA: https://tools.wmflabs.org/

Re: [Wiki-research-l] Content similarity between two Wikipedia articles

2019-05-04 Thread Morten Wang
Hi Haifeng, Yes, you might want to look into some of the work done by Hecht et al. on content similarity between languages, as well as work by Sen et al. on semantic relatedness algorithms (which are implemented in the WikiBrain framework , by the way, see reference below

Re: [Wiki-research-l] Content similarity between two Wikipedia articles

2019-05-04 Thread RhinosF1 Wikipedia
The comparison tool on https://tools.wmflabs.org/copyvios/ can look for repeated phrases. You might be able to tweak that a bit. On Sat, 4 May 2019 at 12:48, Haifeng Zhang wrote: > Dear folks, > > Is there a way to compute content similarity between two Wikipedia > articles? > > For example, I

[Wiki-research-l] Content similarity between two Wikipedia articles

2019-05-04 Thread Haifeng Zhang
Dear folks, Is there a way to compute content similarity between two Wikipedia articles? For example, I can think of representing each article as a vector of likelihoods over possible topics. But, I wonder there are other work people have already explored in the past. Thanks, Haifeng ___