Analyzing and Querying

2004-08-06 Thread Tino Schöllhorn
Hi, I have a problem which I'd like to understand - and perhaps it is also possible to solve it ;-). I built an index using Lucene with the GermanAnalyzer. Now I have the following phenomenon: - when searching for "bahn" the result contains hardly any "bergbahn" I am aware that the Lucene Query

Distributed indexing

2004-08-06 Thread Chandan Tamrakar
Dear all, I have been using lucene index for a while , currently i have "indexes" on a single machine . But the volume of files are increasing and i want to seperate indexes on differnt machines accoding to categories , Does lucene support distributed indexing ? I am confused what will be

Re: Analyzing and Querying

2004-08-06 Thread Daniel Naber
On Friday 06 August 2004 08:37, Tino Schöllhorn wrote: > I am aware that the Lucene Query-Api supports wildcards, but as far as I > know I cannot add a * in front of a query-term. That should be possible, but it will be slow if you have many terms. Another idea is to additionally index the word

Re: Distributed indexing

2004-08-06 Thread Otis Gospodnetic
Hello, --- Chandan Tamrakar <[EMAIL PROTECTED]> wrote: > Dear all, > I have been using lucene index for a while , currently i have > "indexes" on > a single machine . But the volume of files are increasing and i want > to > seperate > indexes on differnt machines accoding to categories , Do

Re: Analyzing and Querying

2004-08-06 Thread Magnus Johansson
You could create a custom analyzer that splits compound words into its parts. That is applying the analyzer to the word "bergbahn" would yield the terms "berg" and "bahn" Splitting compound words can be done quite effectively simply by using a large wordlist. I have done this for swedish. /magnus T

Lucene internal document number?

2004-08-06 Thread B. Grimm [Eastbeam GmbH]
hi there, a have a short question it's regarding lucene internal document numbers: can you give me an idea where they are written into the index and how they are generated? i looked around through the source but i dont get it. i also read the faq and i know that numbers are incremental for each

Re: Analyzing and Querying

2004-08-06 Thread Daniel Naber
On Friday 06 August 2004 13:28, Magnus Johansson wrote: > Splitting compound words can be done quite effectively simply by using > a large wordlist. I have done this for swedish. It is, however, difficult to get right for German. On the one hand there are compounds in German with more than two p

AW: Lucene internal document number?

2004-08-06 Thread Karsten Konrad
Hi, >> a have a short question it's regarding lucene internal document numbers: can you give me an idea where they are written into the index and how they are generated? >> I am not 100% sure about the technical design, only from my experience with Lucene: The numbers depend on when the docu

Re: Analyzing and Querying

2004-08-06 Thread Magnus Johansson
Swedish is similar. Compound words can be formed by multiple words. Sometimes words are joined with an 's' and sometimes not, and there are a few special cases. Didn't mean to suggest that it is trivial to implement, but a large wordlist with inflections is a good start. /magnus Daniel Naber w

Viewing the contents of the index on Tomcat

2004-08-06 Thread Ian McDonnell
How is this done? I want to verify that the indexer has added documents submitted from my JSPs. ian _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://www.spinnerscity.com -

Re: Viewing the contents of the index on Tomcat

2004-08-06 Thread Julien Nioche
see http://jakarta.apache.org/lucene/docs/contributions.html LUKE is a stand alone application for viewing and querying an index LIMO is a web application for monitoring the content of an index - Original Message - From: "Ian McDonnell" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Fr

C

2004-08-06 Thread Natarajan.T
FYI, How can I create Index at the own name?(like IND001.DSX) -Original Message- From: Aviran [mailto:[EMAIL PROTECTED] Sent: Thursday, July 29, 2004 9:09 PM To: 'Lucene Users List' Subject: RE: When does IndexReader pick up changes? AFAIK you don't have to close the writer -Origin

Weighted queries

2004-08-06 Thread Eric Jain
Is it possible to expand a query such as foo bar into (title:foo^4 OR abstract:foo^2 OR content:foo) AND (title:bar^4 OR abstract:bar^2 OR content:bar) ? I can assign weights to individual fields when indexing, and could use the MultiFieldQueryParser - but it seems this parser can't be confi

Re: Weighted queries

2004-08-06 Thread Daniel Naber
On Friday 06 August 2004 16:54, Eric Jain wrote: >(title:foo^4 OR abstract:foo^2 OR content:foo) AND >(title:bar^4 OR abstract:bar^2 OR content:bar) That's not the way MultiFieldQueryParser will rewrite your query. To get this kind of query you have to parse it with QueryParser and then

Re: Weighted queries

2004-08-06 Thread Grant Ingersoll
Btw, MultiFieldQueryParser extends QueryParser, which has the setOperator method that allows you to set the default operator. >>> [EMAIL PROTECTED] 8/6/2004 10:54:55 AM >>> Is it possible to expand a query such as foo bar into (title:foo^4 OR abstract:foo^2 OR content:foo) AND (title:b

Re: Distributed indexing

2004-08-06 Thread Byron Miller
You can check out the nutch project to see how the distributed search is implemented and a tool that can merge segments as well. -byron On Fri, 6 Aug 2004 01:48:16 -0700 (PDT), Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Hello, > > --- Chandan Tamrakar <[EMAIL PROTECTED]> wrote: > > > Dear al

Re: Weighted queries

2004-08-06 Thread Eric Jain
Btw, MultiFieldQueryParser extends QueryParser, which has the setOperator method that allows you to set the default operator. Yes, but it seems that the 'parse(query, fields, anylyzer)' method is static, i.e. the setOperator method won't have any effect here... ---

Re: Viewing the contents of the index on Tomcat

2004-08-06 Thread Ian McDonnell
And how do you run luke once its been added to the classpath on Tomcat? I cant seem to find any docs on the luke site. Ian --- "Julien Nioche" <[EMAIL PROTECTED]> wrote: see http://jakarta.apache.org/lucene/docs/contributions.html LUKE is a stand alone application for viewing and querying an in

Re: Viewing the contents of the index on Tomcat

2004-08-06 Thread Julien Nioche
LUKE is a stand alone application - it is not meant to work on tomcat - Original Message - From: "Ian McDonnell" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Friday, August 06, 2004 5:31 PM Subject: Re: Viewing the contents of the index on Tomcat > And how do yo

Re: Weighted queries

2004-08-06 Thread Zilverline info
Hi Eric, I have implemented this in Zilverline. What I do is the following: subclass QueryParser and override getFieldQuery: protected Query getFieldQuery(String field, Analyzer analyzer, String queryText) throws ParseException { // for field that contain 'contents' add boostfactors f

Re: Weighted queries

2004-08-06 Thread Eric Jain
(title:foo^4 OR abstract:foo^2 OR content:foo) AND (title:bar^4 OR abstract:bar^2 OR content:bar) That's not the way MultiFieldQueryParser will rewrite your query. You are right - what happens is this: (title:foo OR title:bar) OR (abstract:foo OR abstract:bar) OR (content:foo OR content:

Re: Weighted queries

2004-08-06 Thread Eric Jain
Zilverline info wrote: I have implemented this in Zilverline. What I do is the following: subclass QueryParser and override getFieldQuery: Thanks; as you can see I ended up with a similar but slightly simpler solution, as I do not need to specify weights at query time. --

Re: Performance when computing computing a filter using hundreds of diff terms.

2004-08-06 Thread Paul Elschot
Kevin, On Thursday 05 August 2004 23:32, Kevin A. Burton wrote: > I'm trying to compute a filter to match documents in our index by a set > of terms. > > For example some documents have a given field 'category' so I need to > compute a filter with mulitple categories. > > The problem is that our