Actually I need a specialized algorithm. I want to use that algorithm to detect duplicate blog posts.
2013/7/23 Tommaso Teofili <tommaso.teof...@gmail.com> > Hi, > > I you may leverage and / or improve MLT component [1]. > > HTH, > Tommaso > > [1] : http://wiki.apache.org/solr/MoreLikeThis > > > 2013/7/23 Furkan KAMACI <furkankam...@gmail.com> > > > Hi; > > > > Sometimes a huge part of a document may exist in another document. As > like > > in student plagiarism or quotation of a blog post at another blog post. > > Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to > > detect it? > > >