Hi Lorenzo,
Search in the list's archives -- I posted a glue code that lets Lucene
results be clustered with Carrot2 clusterers (there are a few
implementations there).
http://java2.5341.com/msg/82310.html
The official Web site of the project is at:
http://carrot2.sourceforge.net/
You'll
On 08/06/2005, at 1:33 AM, Paul Elschot wrote:
On Tuesday 07 June 2005 11:42, Matt Quail wrote:
I've been playing around with a custom Query, and I've just realized
that my Scorer is likely to return the same document more then once.
Before I delve a bit further, can anyone tell me if this is
My approach uses the same technique, but I'm using mostly HAG clustering.
I did manage to add clustering support to a lucene based application (a
customized solution), but I'd like to try to create a 'general purpose'
library. I know it ain't easy!
I've found many scaling issues, but I saw that w
Paul Elschot wrote:
For a large number of indexes, it may be necessary to do this over
multiple indexes by first getting the doc numbers for all indexes,
then sorting these per index, then retrieving them
from all indexes, and repeating the whole thing using terms determined
from the retrieved d
I am currently writing sth about text retrieval using EM clustering. The
approach represents documents as high-dimensional vectors, but still it
is not related to Lucene (yet?).
How would you add clustering to Lucene? I think it may be a very
interesting technique to improve search results. If it w
Chris Hostetter wrote:
: was computing the score. This was a big performance gain. About 2x and
: since its the slowest part of our app it was a nice one. :)
:
: We were using a TermQuery though.
I believe that one search on one BooleanQuery containing 20
TermQueries should be faster then 20
Some people just replied, but I forgot the most important thing...
I'm thinking of this project as part of the Google's Summer of Code program,
so I'm looking for other students.
I've sent an email to Erik and he told me that we can propose this as part
of Google's SoC if we find some other peopl
On Tuesday 07 June 2005 20:06, Kevin Burton wrote:
> This is a strange anomaly I wanted to point out:
>
> http://www.flickr.com/photos/burtonator/18030919/
>
> This is a jprofiler screenshot. I can give you a jprofiler "snapshot"
> if you want but it requires the clientside app.
>
> I'm not su
Thank You. I've re-read the FAQ and I think I've got a better understanding
of how I am confused. Presently I am using this arrangement to get my
analyzer:
public static class DefaultAnalyzer extends Analyzer {
public TokenStream tokenStream(String fieldName, Reader reader) {
this depends on the analyzer you are using, use luke and check that numbers are
actually in the index. if not then use an analyzer that does index numbers.
omar
-Original Message-
From: Daniel Naber [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 07, 2005 4:27 PM
To: java-user@lucene.apach
On Tuesday 07 June 2005 22:19, Peter T. Brown wrote:
> I am indexing a Java Long number using a Lucene Keyword field, but no
> matter what I do, I cannot find any documents I know have been indexed
> with this field. My logs show that the number "4" is being indexed as
> "4" but doing any searches
Hello. I am using lucene 1.4.3
I am indexing a Java Long number using a Lucene Keyword field, but no matter
what I do, I cannot find any documents I know have been indexed with this
field. My logs show that the number "4" is being indexed as "4" but doing
any searches in that field for "4" return
Hello,
I have a situation where I need to have multiple applications, potentially
located on different servers, and which have no knowledge of each other,
indexing into and searching from the same Lucene index. I anticipate
problems with locks.
Let's say I have two applications and, at any
Hello,
I have a situation where I need to have multiple applications, potentially
located on different servers, and which have no knowledge of each other,
indexing into and searching from the same Lucene index. I anticipate
problems with locks.
Let's say I have two applications and, at any
This is a strange anomaly I wanted to point out:
http://www.flickr.com/photos/burtonator/18030919/
This is a jprofiler screenshot. I can give you a jprofiler "snapshot"
if you want but it requires the clientside app.
I'm not sure why this should be hot... in a linked list this should be
fas
Hi list,
I've been trying to use lucene to index documents that change
occasionally with fields that change frequently. When I add the
contents of the file they are removed when I try to delete and readd
the document. I and am using something like the following.
public void index(String stuff, Fi
I'm writing this message trying to find some people interested in creating a
'general purpose' lucene search results' clustering extension.
I wrote a simply implementation of clustering, and I would like to
contribute to lucene development by releasing an open source clustering
implementation. I
I am using Lucene in an environment where searches are being carried out
whilst documents are being added and deleted.
Currently I have some index management code which caches the IndexReader
and IndexWriter instances ensuring only one is ever open at a time. When a
document is added then an In
You can try to load the fieldcache:
if you get the StringIndex from the fieldcache, the last element in
the lookup array is the largest value (lexically) in the field.
-John
On 6/7/05, sergiu gordea <[EMAIL PROTECTED]> wrote:
>
>
> Kevin Burton wrote:
>
> > I have an index with a date field.
On Tuesday 07 June 2005 11:42, Matt Quail wrote:
> I've been playing around with a custom Query, and I've just realized
> that my Scorer is likely to return the same document more then once.
> Before I delve a bit further, can anyone tell me if this is this a
> Bad Thing?
Normally, yes. A qu
On Wed, 2005-05-18 at 17:30 +0200, Ivan Frade wrote:
> Hello,
>
> I'm trying to use JDBCDirectory in my project. Now (the project) is
> working fine with FSDirectory, but if i simple replace FSDirectory with
> JDBCDirectory the things don't go well: I can create the index, but when
> try to conne
Wouldn't it defeat the purpose of clustering if you have a single
server to manage a single index? What would happen if this server
failed?
Cheers,
Ben
On 6/8/05, Ben <[EMAIL PROTECTED]> wrote:
> How about using JavaGroups to notify other nodes in the cluster about
> the changes?
>
> Essentially
How about using JavaGroups to notify other nodes in the cluster about
the changes?
Essentially, each node has the same index stored in a different
location. When one node updates/deletes a record, other nodes will get
a notification about the changes and update their index accordingly?
By using th
I realize I've already asked you this question, but do you need 100%
real time, because you could run batch them every 2 minutes, and
concerning Parallel search, unless you really need it, it's overkill in
this case, a communal index will serve you well and will be much easier
to maintain. You
António,
This error is not coming from Lucene, but rather from the ELATED
library (as you can tell from package name). Lucene does not use
Log4j at all. Please address this issue to either the Fedora or
ELATED groups.
Erik
On Jun 6, 2005, at 8:21 PM, [EMAIL PROTECTED] wrote:
Hi!
Hello!
Ehem, I have to apologize. It was my stupidity that caused this problem. I
simply mixed up field names... I did the deletion of items in a superclass,
which of course didn't know about the change in the uri field name. Duh!
Everything works now, just like it should.
Sorry again! Thanks
On Jun 6, 2005, at 7:07 AM, Max Pfingsthorn wrote:
Thanks for all the replies. I do know that the readers should be
reopened, but that is not the problem.
Could you work up a test case that shows this issue? From all I can
see, you're doing the right thing. Something is amiss somewhere th
> When you say your cluster is on a single machine, do you mean that you have
> multiple webservers on the same machine all of which search a single Lucene
> index?
Yes, this is my case.
> Do you use Lucene as your persistent store or do you have a DB back there?
I use Lucene to search for dat
I've been playing around with a custom Query, and I've just realized
that my Scorer is likely to return the same document more then once.
Before I delve a bit further, can anyone tell me if this is this a
Bad Thing?
=Matt
When you say your cluster is on a single machine, do you mean that you
have multiple webservers on the same machine all of which search a
single Lucene index? Because if that's the case, your solution is
simple, as long as you persist to a single DB and then designate one of
your servers (or ev
Hi,
I'm looking for URLDirectory implementation NOT based on RAMDirectory because
the size of my indexes is up to 500Mo.
Thanks.
Jacques LABATTE.
My cluster is on a single machine and I am using FS index.
I have already integrated Lucene into my web application for use in a
non-clustered environment. I don't know what I need to do to make it
work in a clustered environment.
Thanks,
Ben
On 6/7/05, Nader Henein <[EMAIL PROTECTED]> wrote:
>
IMHO, Issues that you need to consider
* Atomicity of updates and deletes if you are using multiple indexes
on multiple machines (the case if your cluster is over a wide network)
* Scheduled indecies to core data comparison and sanitization
(intensive)
This all depends on what th
On Tuesday 07 June 2005 09:22, Paul Elschot wrote:
...
>
> With the indexes on multiple discs, some parallellism can be introduced.
> A thread per disk could be used.
> In case there are multiple requests pending, they can be serialized just
> before the sorting of the terms, and just before the
IMHO, Issues that you need to consider
* Atomicity of updates and deletes if you are using multiple indexes
on multiple machines (the case if your cluster is over a wide network)
* Scheduled indecies to core data comparison and sanitization
(intensive)
This all depends on what th
On Tuesday 07 June 2005 07:17, Kevin Burton wrote:
> Matt Quail wrote:
>
> >> We have a system where I'll be given 10 or 20 unique keys.
> >
> >
> > I assume you mean you have one unique-key field, and you are given
> > 10-20 values to find for this one field?
> >
> >>
> >> Internally I'm creati
Kevin Burton wrote:
I have an index with a date field. I want to quickly find the minimum
and maximum values in the index.
Is there a quick way to do this? I looked at using TermInfos and
finding the first one but how to I find the last?
I also tried the new sort API and the performance
I think that the solution is to sort the results and to get the first
result.
See:
*org.apache.lucene.search.Sort
*
Best,
Sergiu
Kevin Burton wrote:
Andrew Boyd wrote:
How about using range query?
private Term begin, end;
begin = new Term("dateField",
DateTools.dateToString(Date.
Tansley, Robert wrote:
Hi all,
The DSpace (www.dspace.org) currently uses Lucene to index metadata
(Dublin Core standard) and extracted full-text content of documents
stored in it. Now the system is being used globally, it needs to
support multi-language indexing.
I've looked through the mail
On Monday 06 June 2005 22:59, Andy Liu wrote:
> Is there a way to calculate term frequency scores that are relative to
> the number of terms in the field of the document? We want to override
> tf() in this way to curb keyword spamming in web pages. In
> Similarity, only the document's term freque
40 matches
Mail list logo