Re: lucene 2.3 in production

2008-02-04 Thread GokulAnand
Can some one get me the link to get lucene 2.3 jars. Thanks in advance markrmiller wrote: > > You still have to be careful if you want to alternate a search and > write. If you are loading a lot of docs this way, you would want to hold > the Writer to batch the docs, but while you are holding

Lucene Search with relational operators

2008-02-04 Thread GokulAnand
Hi all, When i do a search ( lucene ) with the combination of relational operators, it does not get the required result. ie, when the query is like :: i) A and B or C, it fetches the records for only (A and B) combination rejecting C ( OR operator ) ii) A or B and C, it fetches the records fo

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-04 Thread Daniel Noll
On Monday 04 February 2008 21:51:39 Michael McCandless wrote: > Even pre-2.3, you should have seen gains by adding threads, if indeed > your hardware has good concurrency. > > And definitely with the changes in 2.3, you should see gains by > adding threads. With regards to this, I have been wonder

Re: [ANN] Luke 0.8 released

2008-02-04 Thread N. Hira
Thank you for this. Luke has been *extremely* helpful. -h -- Hira, N.R. Solutions Architect Cognocys, Inc. On 04-Feb-2008, at 10:17 PM, Andrzej Bialecki wrote: Hi all, I just released Luke 0.8, the Lucene Index Toolbox. As

[ANN] Luke 0.8 released

2008-02-04 Thread Andrzej Bialecki
Hi all, I just released Luke 0.8, the Lucene Index Toolbox. As usually, you can get it here: http://www.getopt.org/luke/ This release upgrades to the official Lucene 2.3.0 release JARs. NOTE: this version of Luke requires Java 1.5 or higher. The following changes have been made in this

Re: Performance guarantees and index format

2008-02-04 Thread Andrzej Bialecki
Chris Hostetter wrote: : What this issue doesn't discuss is what to do with partial results obtained : when a timeout occurred. As the original poster points out, document lists are : traversed in the order they were added and not the order of their importance, : which introduces a bias to partia

Re: lucene 2.3 in production

2008-02-04 Thread John Wang
Thanks Mark for the datapoint! -John On Feb 4, 2008 4:50 PM, Mark Miller <[EMAIL PROTECTED]> wrote: > Replied to the wrong thread - sorry about that. > > But to make up for it, I have been using 2.3 even before it was > released. I have so many tests for my app that I am pretty confident > runni

Re: lucene 2.3 in production

2008-02-04 Thread Mark Miller
Replied to the wrong thread - sorry about that. But to make up for it, I have been using 2.3 even before it was released. I have so many tests for my app that I am pretty confident running things off the trunk if all tests are a go, and there where so many goodies for 2.3 that I jumped on pret

Re: DefaultIndexAccessor

2008-02-04 Thread Mark Miller
I replied to the wrong thread -- sorry about that: You still have to be careful if you want to alternate a search and write. If you are loading a lot of docs this way, you would want to hold the Writer to batch the docs, but while you are holding it, you will not have a fresh view of the index

Re: DefaultIndexAccessor

2008-02-04 Thread Mark Miller
For anyone following this thread who would like to check this out, I put up the new code with the warming capability: https://issues.apache.org/jira/browse/LUCENE-1026 IndexAccessor-02.04.2008.zip

Re: Boosting using an external data source

2008-02-04 Thread Michael Stoppelman
If anyone was wondering how I dealt with this I ended up extending the TermQuery class and overridding the tf() and idf() functions like in FuzzyLikeThisQuery. See FuzzyTermQuery for how they use the SimilarityDelegator object: contrib/queries/src/java/org/apache/lucene/search/FuzzyLikeThisQuery.j

Re: Performance guarantees and index format

2008-02-04 Thread Chris Hostetter
: What this issue doesn't discuss is what to do with partial results obtained : when a timeout occurred. As the original poster points out, document lists are : traversed in the order they were added and not the order of their importance, : which introduces a bias to partial results in that they r

Re: lucene 2.3 in production

2008-02-04 Thread Mark Miller
You still have to be careful if you want to alternate a search and write. If you are loading a lot of docs this way, you would want to hold the Writer to batch the docs, but while you are holding it, you will not have a fresh view of the index - so you could add the same doc twice if it came tw

lucene 2.3 in production

2008-02-04 Thread John Wang
Hi: Is there anyone running a full production deployment on lucene 2.3 (with high traffic, large index and frequent updates)? We are thinking of doing this but wanted to get some feedback. Thanks -John

ANN: Textmining.org extractor library v1.0 released

2008-02-04 Thread Ryan Ackley
FYI, I just updated the textmining.org homepage with the following info. The tm-extractors library has a new release! v1.0. You can download it here: http://text-mining.googlecode.com/files/tm-extractors-1.0.jar The tm-extractors library is a pure java library for extracting text from Word docum

Re: DefaultIndexAccessor

2008-02-04 Thread Cam Bazz
Hello Mark, Thank you for your lengthy and valuable clarification. I have the case - before adding to the index, i must check if a document exist with the same key (actually, double key) - or before deleting a document - I must ensure it exists in the index. Currently I am doing it with my custom

Re: Luke error browsing to a lucene index

2008-02-04 Thread Erick Erickson
What versions of Luene and Luke are you using? Using the most recent ones is usually a good place to start Best Erick On Feb 4, 2008 12:18 PM, Mitchell, Erica <[EMAIL PROTECTED]> wrote: > Hi, > > I'm trying to test out Luke and I get an error saying unknown format > error:-4 > The index I'm

Luke error browsing to a lucene index

2008-02-04 Thread Mitchell, Erica
Hi, I'm trying to test out Luke and I get an error saying unknown format error:-4 The index I'm trying to point to is the one built by the demo in the documentation for getting started with lucene. Can anyone please tell me what this error might mean. Thanks Erica I

Re: DefaultIndexAccessor

2008-02-04 Thread Mark Miller
The purpose of IndexAccessor is to coordinate Readers/Writers for a Lucene index. Readers and Writers in Lucene are multi-threaded in that multiple threads may use them at the same time, but they must/should be shared and there are special rules (You cannot delete with a Reader while a Writer i

Re: outof memory error

2008-02-04 Thread Erick Erickson
u index smaller documents? You cannot expect to index a 1G doc with 512M of memory in the JVM. The first thing I'd try is upping your JVM memory to the max your machine will accept. Make sure you flush your IndexWriter before attempting to index this document. But I would not be surprised i

outof memory error

2008-02-04 Thread SK R
Hi, I got outof memory exception while indexing huge documents (~1GB) in one thread and optimizing some other (2 to 3) indexes in different threads. Max JVM heap size is 512MB. I'm using lucene2.3.0. Please suggest a way to avoid this exception. Regards RSK

RE: Lucene

2008-02-04 Thread Allahbaksh Mohammedali Asadullah
First I want to search document which have values c1 then search document which has c1 as one of field value. I know we can use Term Query but is it the way we should do? Can't we save something like this filedname1: c1-23 and while parsing get c1 and 23 as two fields. I should also able to quer

Re: Lucene

2008-02-04 Thread Erick Erickson
I don't understand what you're trying to do with "match extent". Perhaps a bit more explanation of the problem you're trying to solve would get you more helpful answers ... Best Erick On Feb 4, 2008 9:34 AM, Allahbaksh Mohammedali Asadullah < [EMAIL PROTECTED]> wrote: > Hi, > > I have following

Re: Random selection of files

2008-02-04 Thread Erick Erickson
Well, assuming that by "same weight" you are referring to the document scores (relevance), you certainly have to do the search first. But you can use TopDocs to get a list of the document IDs arranged by decreasing score i.e. sorted by relevance. But "same weight" is tricky. It's virtually certain

Lucene

2008-02-04 Thread Allahbaksh Mohammedali Asadullah
Hi, I have following requirement Value Match Extent Fieldname1 c1 23 Fieldname2 c2 26 Filedname3 c8 85 Can I use lucene for the same. If yes what is easiest and the best way to use.

Re: DefaultIndexAccessor

2008-02-04 Thread Cam Bazz
Hello Mark, I have been reading the code - and honestly I have not understood how it works. I was hoping that this was a solution to the case when you are adding documents - in a multithreaded way, it allows other non-writer threads to be able to see documents added without refreshing the indexsea

Re: DefaultIndexAccessor

2008-02-04 Thread Mark Miller
IndexAccessor-1.26.2008.zip is the latest one. I will be dating a zip from now on. I hope to post new code with the warming either tonight or tomorrow night. I would be ecstatic to have some help vetting that. Also, I am thinking of making a change so that when you release the Writer the thre

DefaultIndexAccessor

2008-02-04 Thread Cam Bazz
Hello, Regarding https://issues.apache.org/jira/browse/LUCENE-1026 , this seems very interesting. I have read the discussion on the page, but I could not figure out which set of files is the latest. Is it the IndexAccessor-1.26.2008.zip file? I will read through the code, make my own tests, and s

Re: Concurrent Indexing + Searching

2008-02-04 Thread Mark Miller
You are right that if auto-commit=true and a user reopens an IndexReader, the docs will absolutely be visible as they are flushed. I think the part you are missing is that you need to be cooperating with the IndexAccessor: a user should not be reopening an IndexReader. The whole point of IndexA

ParallelReader question

2008-02-04 Thread Cam Bazz
Hello, When using a parallel reader with two indexes lets say, when we call a document with id, is it the combined fields of a document from the two indexes that return? The documentation was not clear on that one, except the document(int n, FieldSelector fs) method. Best, -C.B.

Re: appending field to an existing index

2008-02-04 Thread Cam Bazz
Hello, I have read the parallel reader doc. It says it must have the same number of documents as the other index. When we are using a writer - searcher combination, how can we integrate this parallel reader into game. Simply, I have some documents, and I just like to mark them, in an efficient wa

Re: Retrieving documents that match atleast n query terms

2008-02-04 Thread Ian Lea
BooleanQuery.setMinimumNumberShouldMatch(int min) sounds exactly what you need. -- Ian. [EMAIL PROTECTED] On Jan 30, 2008 6:43 PM, Dipsy Kapoor <[EMAIL PROTECTED]> wrote: > Hi, > > I am using a BooleanQuery of the form: >T1 OR T2 OR T3 OR .. Tn > to search on a field in Lucene. > > Is

Re: Random selection of files

2008-02-04 Thread Ian Lea
Hi I think you would need to do the search to get list of 1000 doc ids, but wouldn't need to retrieve all 1000. Just pick your random 10 from the list and retrieve them. -- Ian. [EMAIL PROTECTED] On Feb 4, 2008 11:37 AM, Juerg Meier <[EMAIL PROTECTED]> wrote: > Hi, > > We have the requiremen

Random selection of files

2008-02-04 Thread Juerg Meier
Hi, We have the requirement for an "i'm feeling lucky" button, at least sort of. Whereas google just delivers the first record in a result set, we should deliver 10 arbitrary hits chosen out of, let's say, 1000. All of these documents have the same importance i.e. have the same weight. So, is

RE: Retrieving documents that match atleast n query terms

2008-02-04 Thread Itamar Syn-Hershko
I'm not 100% sure, but I think you could use Lucene's scoring for this. So if you ran your query and received N results, loop through them and check the scoring explanation (which I'm not quite sure how to acquire). This should tell you how many terms out of the query were found. This approach shou

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-04 Thread Michael McCandless
Even pre-2.3, you should have seen gains by adding threads, if indeed your hardware has good concurrency. And definitely with the changes in 2.3, you should see gains by adding threads. Note that as you add threads, the "sweet spot" for RAM buffer size increases. Ie, make the RAM buffe

Re: appending field to an existing index

2008-02-04 Thread Andrzej Bialecki
Erick Erickson wrote: As always, "it depends". You can try to reconstruct the doc from an index, see Luke. But depending upon you you indexed things, it may be more or less lossy. I remember this was discussed recently, you might have some luck if you search the archive. But it may be very, very