Block-quoting and plagiarism are two different questions.
Block-quoting is simple: break the text apart into sentences or even
paragraphs and make them separate documents. Make facets of the
post-analysis text. Now just pull counts of facets and block quotes will
be clear.
Mahout has a
Subject: Re: Document Similarity Algorithm at Solr/Lucene
BTW, How Solr's MoreLikeThis Component works? Which algorithm does it use at
underlying?
2013/7/24 Roman Chyla roman.ch...@gmail.com
This paper contains an excellent algorithm for plagiarism detection,
but beware the published version had
BTW, How Solr's MoreLikeThis Component works? Which algorithm does it use
at underlying?
2013/7/24 Roman Chyla roman.ch...@gmail.com
This paper contains an excellent algorithm for plagiarism detection, but
beware the published version had a mistake in the algorithm - look for
corrections - I
Sent: Tuesday, July 23, 2013 6:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Document Similarity Algorithm at Solr/Lucene
Actually I need a specialized algorithm. I want to use that algorithm to
detect duplicate blog posts.
2013/7/23 Tommaso Teofili tommaso.teof...@gmail.com
Hi,
I
This paper contains an excellent algorithm for plagiarism detection, but
beware the published version had a mistake in the algorithm - look for
corrections - I can't find them now, but I know they have been published
(perhaps by one of the co-authors). You could do it with solr, to create an
index
Hi;
Sometimes a huge part of a document may exist in another document. As like
in student plagiarism or quotation of a blog post at another blog post.
Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to
detect it?
Hi,
I you may leverage and / or improve MLT component [1].
HTH,
Tommaso
[1] : http://wiki.apache.org/solr/MoreLikeThis
2013/7/23 Furkan KAMACI furkankam...@gmail.com
Hi;
Sometimes a huge part of a document may exist in another document. As like
in student plagiarism or quotation of a
Actually I need a specialized algorithm. I want to use that algorithm to
detect duplicate blog posts.
2013/7/23 Tommaso Teofili tommaso.teof...@gmail.com
Hi,
I you may leverage and / or improve MLT component [1].
HTH,
Tommaso
[1] : http://wiki.apache.org/solr/MoreLikeThis
2013/7/23
that the top results will be more relevant.
-- Jack Krupansky
-Original Message-
From: Furkan KAMACI
Sent: Tuesday, July 23, 2013 6:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Document Similarity Algorithm at Solr/Lucene
Actually I need a specialized algorithm. I want to use
On 7/23/2013 3:33 AM, Furkan KAMACI wrote:
Sometimes a huge part of a document may exist in another document. As like
in student plagiarism or quotation of a blog post at another blog post.
Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to
detect it?
Solr is designed
if you need a specialized algorithm for detecting blogposts plagiarism /
quotations (which are different tasks IMHO) I think you have 2 options:
1. implement a dedicated one based on your features / metrics / domain
2. try to fine tune an existing algorithm that is flexible enough
If I were to do
Thanks for your comments.
2013/7/23 Tommaso Teofili tommaso.teof...@gmail.com
if you need a specialized algorithm for detecting blogposts plagiarism /
quotations (which are different tasks IMHO) I think you have 2 options:
1. implement a dedicated one based on your features / metrics / domain
Here is a paper that I found useful:
http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
On Tue, Jul 23, 2013 at 10:42 AM, Furkan KAMACI furkankam...@gmail.com wrote:
Thanks for your comments.
2013/7/23 Tommaso Teofili tommaso.teof...@gmail.com
if you need a specialized
13 matches
Mail list logo