On 10/18/06, Isabel Drost <[EMAIL PROTECTED]> wrote:
Find Me wrote:
> How to eliminate near duplicates from the index? Someone suggested that
I
> could look at the TermVectors and do a comparision to remove the
> duplicates.
As an alternative you could also have a look at the paper "Detecting
P
thanks it works perfectly although I did end up merging the segments rather
than using your MapFileReader.
On 8/15/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
John Casey wrote:
> Hi All, is there any way to extract the outlinks of particular
> webpage/URL?
> I have
Hi All, is there any way to extract the outlinks of particular webpage/URL?
I have had a look the LinkDBReader but this will only give me a listing of
pages that link to the page in question. Any ideas ? I have been having a
look in the segments directory and have been trying to read/parse the fil