Miles Barr wrote:
On Mon, 2005-03-14 at 20:48 +0100, Dawid Weiss wrote:
I think what they do at Google is a fancy heuristic -- as David Spencer
mentioned, suburls of a given page, identical snippets, or titles... My
idea was more towards providing a 'realistic overview' of subjects in
pages. So
All,
I got JDBCDirectory from information on the lucene-user's mailing list.
http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1644063
I cannot get basic searches to work. I tried to merge the JDBC
directory with a filesystem index and search the filesystem index.
That produced
Nice write-up in today's Search Day on Lucene in Action!
If you don't get it, you can see it here (currently the top article):
http://searchenginewatch.com/
Chuck
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional comman
Hello,
PhraseQuery will help you do that, as will BooleanQuery (make both
clauses required), or Boolean operators (using + in front of each term
or AND between them) if you are parsing the query string with
QueryParser:
http://www.lucenebook.com/search?query=phrase+query
http://www.lucenebook.
Hi,
We are running a set of small surveys, in an attempt to understand
developers problems when attempting to understand code. Results from
this survey will be used in refining (open-source) tools that we are
building.
If you have looked at the code of any of the (Java) projects below, we
woul
I want to index articles:
My document is:
- Title
- Authors
There are one or more authors, and I index the field with "Appendable
Fields" (page 68, Lucene in action).
Document doc = new Document();
doc.add(Field.Text("Title", title));
doc.add(Field.Text("Author", author1));
doc.add(Field.Text("Au
I've been effectively off-line for a few days, so I'm not sure if
anyone has replied on this thread yet.
Using boosts will definitely use less resources than sorting. If you
do use sorting for dates, be sure you're doing it numerically rather
than lexicographically.
Erik
On Mar 10, 20
Hi Guys,
It is somewhat difficult to suggest something useful without more
details. If you a pretty sure of the quality of the query, then here is my
suggestion:
Index the documents with an extra field called "last_word" that will
contains the last word in the document. So from your exa
Chris Lamprecht wrote:
It's a nice idea, and makes sense. I think that it can be broken if
boosting is used and the search is performed on multiple fileds, especially
unstored ones. In this case the distance between very similar documents
might be increased.
I think that also the duplications sho
Hi,
I am currently evaluating a system that uses Lucene, so please excuse any
lack of understanding.
Could somebody tell me if it is possible to query across separate indexes
with different criteria, but then to join/merge the results. An analogy
is querying two separate tables then joining ba
Miles,
I'm assuming that you want to detect documents that are "almost"
exactly the same (since if they were identical, you could just do a
straight string compare or md5 compare, etc).
If you're storing term vectors in your index, you could compare the
term vectors for the search results, and if
On Mon, 2005-03-14 at 20:48 +0100, Dawid Weiss wrote:
> I think what they do at Google is a fancy heuristic -- as David Spencer
> mentioned, suburls of a given page, identical snippets, or titles... My
> idea was more towards providing a 'realistic overview' of subjects in
> pages. So you could
On Mon, 2005-03-14 at 10:24 -0800, David Spencer wrote:
> Yes, in theory the "similarity" package in the sandbox can help.
> The code generates a query for a source document to find documents that
> are similar to it - the MoreLikeThis class uses the heuristic that 2
> docs are similar if they sh
13 matches
Mail list logo