> As a user of Lucene I missed some features. Part of the OSS culture is
> for me to tell others about this and maybe to try to find solutions.
> Mark's code seems to be one, so I proposed to consider adding it into
> some spot with better exposure for testing. And I don't seem to be the
> only pe
ichen [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 21, 2003 2:54 PM
To: Lucene Users List
Subject: Re: Similar Document Search
Hi Peter,
I took a look at Mark's thesis and briefly at some of his code. It appears
to me that what he's done with the so-called forward indexing is to (a)
Brian Mila wrote:
amounts). I failed to find a way to get Lucene to give me this
information without hacking this or that. Considering the attention IR
Excuse me if this is off-topic, but isn't hacking the code what open source
software is all about?
Not always, but quite often :-)
I mean
> amounts). I failed to find a way to get Lucene to give me this
> information without hacking this or that. Considering the attention IR
Excuse me if this is off-topic, but isn't hacking the code what open source
software is all about? I mean, its always better to try to do it with
existing meth
Apologies for asking the obvious, but could someone explain why
Documents.Document is a sealed class?
Seems like many of us would love to implement UniqueDocument to support
this oft-requested uniqueness field. Would still have the task of
implementing an IndexWriterEx.AddDocument(UniqueDocument)
hat end?
Regards,
Terry
- Original Message -
From: "Peter Becker" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, August 21, 2003 1:37 AM
Subject: Re: Similar Document Search
Hi all,
it seems there are quite a few people l
s,
Terry
- Original Message -
From: "Peter Becker" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, August 21, 2003 1:37 AM
Subject: Re: Similar Document Search
> Hi all,
>
> it seems there are quite a few people looking
yet, therefore no code at this time...
Best regards,
Gregor
-Original Message-
From: Peter Becker [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 19, 2003 3:06 AM
To: Lucene Users List
Subject: Re: Similar Document Search
Hi Terry,
we have been thinking about the same problem and in the
d
yet, therefore no code at this time...
Best regards,
Gregor
-Original Message-
From: Peter Becker [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 19, 2003 3:06 AM
To: Lucene Users List
Subject: Re: Similar Document Search
Hi Terry,
we have been thinking about the same problem and i
Hi Peter,
I guess you are right.
I've implemented this for a index with ten millions of really small
documents that all are stored in the index. The documents are never more
than a thousand
words so re-indexing is quick enough. However it is probably not
advisable to do
this with bigger documen
Hi Magnus,
thanks for the offer, but unfortunately I can't/don't want to make the
assumption that I can easily access the documents to re-index them. And
I don't think this approach would be feasible unless you can keep the
documents in memory somehow.
Storing the other/non-inverted/normal/wha
Ok, here it is. It's part of a JSP that prints out all keywords in a
document.
/magnus
<%@ page import="org.apache.lucene.index.IndexReader,
org.apache.lucene.document.Document,
com.technohuman.search.language.SwedishAnalyzer,
java.io.StringReader,
hello magnus,
can i ask your sample script?
--buics
Hi Peter
If the original document is available. You could extract keywords from
the document
at query time. That is when someone asks for documents similar to
document a. You
re-analyze document a and in combination with statistics from t
Hi Peter
If the original document is available. You could extract keywords from
the document
at query time. That is when someone asks for documents similar to
document a. You
re-analyze document a and in combination with statistics from the Lucene
index you extract
keywords from document a that
of reality, maybe Doug could comment?)
Regards,
Terry
- Original Message -
From: "Peter Becker" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, August 18, 2003 9:05 PM
Subject: Re: Similar Document Search
> Hi Terry,
>
>
Hi Terry,
we have been thinking about the same problem and in the end we decided
that most likely the only good solution to this is to keep a
non-inverted index, i.e. a map from the documents to the terms. Then you
can query the most terms for the documents and query other documents
matching p
Using the QueryFilter would help with the refining a search based on
hits from a previous search, but it wouldn't help with the "like" part
your asked about.
I'm interested in what you turn up with this though.
Erik
On Monday, August 18, 2003, at 01:11 PM, Terry Steichen wrote:
Is it possib
17 matches
Mail list logo