Re: Lucene search clusters

2005-06-07 Thread Dawid Weiss
Hi Lorenzo, Search in the list's archives -- I posted a glue code that lets Lucene results be clustered with Carrot2 clusterers (there are a few implementations there). http://java2.5341.com/msg/82310.html The official Web site of the project is at: http://carrot2.sourceforge.net/ You'll

Doing a Join across indexes [was Documents returned by Scorer]

2005-06-07 Thread Matt Quail
On 08/06/2005, at 1:33 AM, Paul Elschot wrote: On Tuesday 07 June 2005 11:42, Matt Quail wrote: I've been playing around with a custom Query, and I've just realized that my Scorer is likely to return the same document more then once. Before I delve a bit further, can anyone tell me if this is

Re: Lucene search clusters

2005-06-07 Thread Lorenzo
My approach uses the same technique, but I'm using mostly HAG clustering. I did manage to add clustering support to a lucene based application (a customized solution), but I'd like to try to create a 'general purpose' library. I know it ain't easy! I've found many scaling issues, but I saw that w

Re: Fastest way to fetch N documents with unique keys within large numbers of indexes..

2005-06-07 Thread Kevin Burton
Paul Elschot wrote: For a large number of indexes, it may be necessary to do this over multiple indexes by first getting the doc numbers for all indexes, then sorting these per index, then retrieving them from all indexes, and repeating the whole thing using terms determined from the retrieved d

Re: Lucene search clusters

2005-06-07 Thread Daniel Stephan
I am currently writing sth about text retrieval using EM clustering. The approach represents documents as high-dimensional vectors, but still it is not related to Lucene (yet?). How would you add clustering to Lucene? I think it may be a very interesting technique to improve search results. If it w

Re: Fastest way to fetch N documents with unique keys within large numbers of indexes..

2005-06-07 Thread Kevin Burton
Chris Hostetter wrote: : was computing the score. This was a big performance gain. About 2x and : since its the slowest part of our app it was a nice one. :) : : We were using a TermQuery though. I believe that one search on one BooleanQuery containing 20 TermQueries should be faster then 20

Re: Lucene search clusters

2005-06-07 Thread Lorenzo
Some people just replied, but I forgot the most important thing... I'm thinking of this project as part of the Google's Summer of Code program, so I'm looking for other students. I've sent an email to Erik and he told me that we can propose this as part of Google's SoC if we find some other peopl

Re: use of LinkedList in ConjunctionScorer hurting performance?

2005-06-07 Thread Paul Elschot
On Tuesday 07 June 2005 20:06, Kevin Burton wrote: > This is a strange anomaly I wanted to point out: > > http://www.flickr.com/photos/burtonator/18030919/ > > This is a jprofiler screenshot. I can give you a jprofiler "snapshot" > if you want but it requires the clientside app. > > I'm not su

Re: Cannot search on plain numbers

2005-06-07 Thread Peter T. Brown
Thank You. I've re-read the FAQ and I think I've got a better understanding of how I am confused. Presently I am using this arrangement to get my analyzer: public static class DefaultAnalyzer extends Analyzer { public TokenStream tokenStream(String fieldName, Reader reader) {

RE: Cannot search on plain numbers

2005-06-07 Thread Omar Didi
this depends on the analyzer you are using, use luke and check that numbers are actually in the index. if not then use an analyzer that does index numbers. omar -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 07, 2005 4:27 PM To: java-user@lucene.apach

Re: Cannot search on plain numbers

2005-06-07 Thread Daniel Naber
On Tuesday 07 June 2005 22:19, Peter T. Brown wrote: > I am indexing a Java Long number using a Lucene Keyword field, but no > matter what I do, I cannot find any documents I know have been indexed > with this field. My logs show that the number "4" is being indexed as > "4" but doing any searches

Cannot search on plain numbers

2005-06-07 Thread Peter T. Brown
Hello. I am using lucene 1.4.3 I am indexing a Java Long number using a Lucene Keyword field, but no matter what I do, I cannot find any documents I know have been indexed with this field. My logs show that the number "4" is being indexed as "4" but doing any searches in that field for "4" return

Indexing from multiple applications to a central index.

2005-06-07 Thread Doug Hughes
Hello, I have a situation where I need to have multiple applications, potentially located on different servers, and which have no knowledge of each other, indexing into and searching from the same Lucene index. I anticipate problems with locks. Let's say I have two applications and, at any

Indexing from multiple applications to a central index.

2005-06-07 Thread Doug Hughes
Hello, I have a situation where I need to have multiple applications, potentially located on different servers, and which have no knowledge of each other, indexing into and searching from the same Lucene index. I anticipate problems with locks. Let's say I have two applications and, at any

use of LinkedList in ConjunctionScorer hurting performance?

2005-06-07 Thread Kevin Burton
This is a strange anomaly I wanted to point out: http://www.flickr.com/photos/burtonator/18030919/ This is a jprofiler screenshot. I can give you a jprofiler "snapshot" if you want but it requires the clientside app. I'm not sure why this should be hot... in a linked list this should be fas

Adding document with FileReader and deletions.

2005-06-07 Thread Chris D
Hi list, I've been trying to use lucene to index documents that change occasionally with fields that change frequently. When I add the contents of the file they are removed when I try to delete and readd the document. I and am using something like the following. public void index(String stuff, Fi

Lucene search clusters

2005-06-07 Thread Lorenzo
I'm writing this message trying to find some people interested in creating a 'general purpose' lucene search results' clustering extension. I wrote a simply implementation of clustering, and I would like to contribute to lucene development by releasing an open source clustering implementation. I

Flushing IndexWriters and IndexReaders

2005-06-07 Thread Paul . Illingworth
I am using Lucene in an environment where searches are being carried out whilst documents are being added and deleted. Currently I have some index management code which caches the IndexReader and IndexWriter instances ensuring only one is ever open at a time. When a document is added then an In

Re: Finding minimum and maximum value of a field?

2005-06-07 Thread John Wang
You can try to load the fieldcache: if you get the StringIndex from the fieldcache, the last element in the lookup array is the largest value (lexically) in the field. -John On 6/7/05, sergiu gordea <[EMAIL PROTECTED]> wrote: > > > Kevin Burton wrote: > > > I have an index with a date field.

Re: Documents returned by Scorer

2005-06-07 Thread Paul Elschot
On Tuesday 07 June 2005 11:42, Matt Quail wrote: > I've been playing around with a custom Query, and I've just realized > that my Scorer is likely to return the same document more then once. > Before I delve a bit further, can anyone tell me if this is this a > Bad Thing? Normally, yes. A qu

Re: Using jdbcdirectory

2005-06-07 Thread Anthony Vito
On Wed, 2005-05-18 at 17:30 +0200, Ivan Frade wrote: > Hello, > > I'm trying to use JDBCDirectory in my project. Now (the project) is > working fine with FSDirectory, but if i simple replace FSDirectory with > JDBCDirectory the things don't go well: I can create the index, but when > try to conne

Re: Lucene in clustered environment (Tomcat)

2005-06-07 Thread Ben
Wouldn't it defeat the purpose of clustering if you have a single server to manage a single index? What would happen if this server failed? Cheers, Ben On 6/8/05, Ben <[EMAIL PROTECTED]> wrote: > How about using JavaGroups to notify other nodes in the cluster about > the changes? > > Essentially

Re: Lucene in clustered environment (Tomcat)

2005-06-07 Thread Ben
How about using JavaGroups to notify other nodes in the cluster about the changes? Essentially, each node has the same index stored in a different location. When one node updates/deletes a record, other nodes will get a notification about the changes and update their index accordingly? By using th

Re: Lucene in clustered environment (Tomcat)

2005-06-07 Thread Nader Henein
I realize I've already asked you this question, but do you need 100% real time, because you could run batch them every 2 minutes, and concerning Parallel search, unless you really need it, it's overkill in this case, a communal index will serve you well and will be much easier to maintain. You

Re: log4j:WARN No appenders could be found for logger

2005-06-07 Thread Erik Hatcher
António, This error is not coming from Lucene, but rather from the ELATED library (as you can tell from package name). Lucene does not use Log4j at all. Please address this issue to either the Fedora or ELATED groups. Erik On Jun 6, 2005, at 8:21 PM, [EMAIL PROTECTED] wrote: Hi!

RE: deleting on a keyword field

2005-06-07 Thread Max Pfingsthorn
Hello! Ehem, I have to apologize. It was my stupidity that caused this problem. I simply mixed up field names... I did the deletion of items in a superclass, which of course didn't know about the change in the uri field name. Duh! Everything works now, just like it should. Sorry again! Thanks

Re: deleting on a keyword field

2005-06-07 Thread Erik Hatcher
On Jun 6, 2005, at 7:07 AM, Max Pfingsthorn wrote: Thanks for all the replies. I do know that the readers should be reopened, but that is not the problem. Could you work up a test case that shows this issue? From all I can see, you're doing the right thing. Something is amiss somewhere th

Re: Lucene in clustered environment (Tomcat)

2005-06-07 Thread Ben
> When you say your cluster is on a single machine, do you mean that you have > multiple webservers on the same machine all of which search a single Lucene > index? Yes, this is my case. > Do you use Lucene as your persistent store or do you have a DB back there? I use Lucene to search for dat

Documents returned by Scorer

2005-06-07 Thread Matt Quail
I've been playing around with a custom Query, and I've just realized that my Scorer is likely to return the same document more then once. Before I delve a bit further, can anyone tell me if this is this a Bad Thing? =Matt

Re: Lucene in clustered environment (Tomcat)

2005-06-07 Thread Nader Henein
When you say your cluster is on a single machine, do you mean that you have multiple webservers on the same machine all of which search a single Lucene index? Because if that's the case, your solution is simple, as long as you persist to a single DB and then designate one of your servers (or ev

URLDirectory

2005-06-07 Thread LABATTE Jacques
Hi, I'm looking for URLDirectory implementation NOT based on RAMDirectory because the size of my indexes is up to 500Mo. Thanks. Jacques LABATTE.

Re: Lucene in clustered environment (Tomcat)

2005-06-07 Thread Ben
My cluster is on a single machine and I am using FS index. I have already integrated Lucene into my web application for use in a non-clustered environment. I don't know what I need to do to make it work in a clustered environment. Thanks, Ben On 6/7/05, Nader Henein <[EMAIL PROTECTED]> wrote: >

Re: Lucene in clustered environment (Tomcat)

2005-06-07 Thread Nader Henein
IMHO, Issues that you need to consider * Atomicity of updates and deletes if you are using multiple indexes on multiple machines (the case if your cluster is over a wide network) * Scheduled indecies to core data comparison and sanitization (intensive) This all depends on what th

Re: Fastest way to fetch N documents with unique keys within large numbers of indexes..

2005-06-07 Thread Paul Elschot
On Tuesday 07 June 2005 09:22, Paul Elschot wrote: ... > > With the indexes on multiple discs, some parallellism can be introduced. > A thread per disk could be used. > In case there are multiple requests pending, they can be serialized just > before the sorting of the terms, and just before the

Re: Lucene in clustered environment (Tomcat)

2005-06-07 Thread Nader Henein
IMHO, Issues that you need to consider * Atomicity of updates and deletes if you are using multiple indexes on multiple machines (the case if your cluster is over a wide network) * Scheduled indecies to core data comparison and sanitization (intensive) This all depends on what th

Re: Fastest way to fetch N documents with unique keys within large numbers of indexes..

2005-06-07 Thread Paul Elschot
On Tuesday 07 June 2005 07:17, Kevin Burton wrote: > Matt Quail wrote: > > >> We have a system where I'll be given 10 or 20 unique keys. > > > > > > I assume you mean you have one unique-key field, and you are given > > 10-20 values to find for this one field? > > > >> > >> Internally I'm creati

Re: Finding minimum and maximum value of a field?

2005-06-07 Thread sergiu gordea
Kevin Burton wrote: I have an index with a date field. I want to quickly find the minimum and maximum values in the index. Is there a quick way to do this? I looked at using TermInfos and finding the first one but how to I find the last? I also tried the new sort API and the performance

Re: Finding minimum and maximum value of a field?

2005-06-07 Thread sergiu gordea
I think that the solution is to sort the results and to get the first result. See: *org.apache.lucene.search.Sort * Best, Sergiu Kevin Burton wrote: Andrew Boyd wrote: How about using range query? private Term begin, end; begin = new Term("dateField", DateTools.dateToString(Date.

Re: Indexing multiple languages

2005-06-07 Thread sergiu gordea
Tansley, Robert wrote: Hi all, The DSpace (www.dspace.org) currently uses Lucene to index metadata (Dublin Core standard) and extracted full-text content of documents stored in it. Now the system is being used globally, it needs to support multi-language indexing. I've looked through the mail

Re: Relative term frequency?

2005-06-07 Thread Paul Elschot
On Monday 06 June 2005 22:59, Andy Liu wrote: > Is there a way to calculate term frequency scores that are relative to > the number of terms in the field of the document? We want to override > tf() in this way to curb keyword spamming in web pages. In > Similarity, only the document's term freque