Re: How many documents in the biggest Lucene index to date?

2007-01-29 Thread Daniel Noll
karl wettin wrote: Then it hit me that perheps the integer limitation should be in the store (Directory) and not the IndexReader? If not now, perhaps in the future when everybody is running on 64bit JVMs. I don't think it will be a very expensive thing to implement. But did anyone need that yet

Re: "did you mean..." feature

2007-01-29 Thread karl wettin
30 jan 2007 kl. 01.39 skrev Felix Litman: We are having some difficultiies getting good "did you means" for multiword queries. I recommend you to take a look at the spell checker in www.alias-i.com/lingpipe/>. Make sure to read the license agreement as it might not be free of charge for

"did you mean..." feature

2007-01-29 Thread Felix Litman
We are implementing the "did you mean..." on top of Lucene, leveraging ideas of the "Did you mean Lucene?" article. (Many thanks to Tom White for such a useful and clear article...!) We are having some difficultiies getting good "did you means" for multiword queries. Did anyon

Re: How many documents in the biggest Lucene index to date?

2007-01-29 Thread karl wettin
29 jan 2007 kl. 23.21 skrev Daniel Noll: karl wettin wrote: The maximum number of documents in an index is Integer.MAX_VALUE (2 147 483 647), but it it possible to combine multiple indices. It's true that you can combine multiple indexes, but don't make assumptions that this lets you brea

Re: Score

2007-01-29 Thread Chris Hostetter
Note also that the scores in an Explanation are the "raw" scores ... if you use Hits, the scores are "partially normalized" meaning that if the highest scoring document has a score greater then 1, all scores are devided by the highest score. : Date: Mon, 29 Jan 2007 21:52:58 +0100 : From: Soer

Re: Merge Hits

2007-01-29 Thread Daniel Noll
DECAFFMEYER MATHIEU wrote: I noticed it is not possible to define Occur.SHOULD in Lucene 1.4.3 It's still possible, it's just that the API is different. ("should" = not prohibited, not required) Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh:

Re: How many documents in the biggest Lucene index to date?

2007-01-29 Thread Daniel Noll
karl wettin wrote: The maximum number of documents in an index is Integer.MAX_VALUE (2 147 483 647), but it it possible to combine multiple indices. It's true that you can combine multiple indexes, but don't make assumptions that this lets you break the limitation. MultiReader still extends

Re: Score

2007-01-29 Thread Soeren Pekrul
DECAFFMEYER MATHIEU wrote: Both are the same document but in different indexes, the only difference is that the second idnex has more document than the first one, the first one contains only that page. I would like to have the same score as in the second index, Simple speaking, the score dep

Re: Problem with lucene.

2007-01-29 Thread Erick Erickson
Sure, your problem is probably that the query goes through an analyzer and its associated tokenizer. Probably something like StandardAnalyzer which "massages" the input and strips out most non-alphabetic characters, except some. It tries to be smart about URLs, e-mail addresses, etc. If you'r

Problem with lucene.

2007-01-29 Thread poeta simbolista
Hi there, this is my very first post at this forum... please be considerate :) Well, i have a problem when sending a query such as: +description:< Once the query is parsed, it returns me the empty String, which means the String "<" that i want to search for on the field description is ignored.

Re: Index creation

2007-01-29 Thread Otis Gospodnetic
increase the mergeFactor (how much depends on what the limit of open file descriptors is on your machine) increase maxBufferedDocs (how much depend son how much RAM you've got and how big is your JVM heap) I covered this in a Lucene article on onjava.com in 2003, I think. Otis - Original M

RE: Score

2007-01-29 Thread DECAFFMEYER MATHIEU
I see this for the first index 0.3764683 = sum of: 0.17716156 = weight(title:logistics in 0), product of: 0.57735026 = queryWeight(title:logistics), product of: 0.30685282 = idf(docFreq=1) 1.8815218 = queryNorm 0.30685282 = fieldWeight(title:logistics in 0), product of:

Re: How many documents in the biggest Lucene index to date?

2007-01-29 Thread Grant Ingersoll
It is a 64-bit JVM and a pretty good size machine, but I don't think I am anywhere near pushing the limits on it either, so don't read too much into my numbers other than as a raw statement of how many documents I've indexed. When using the Hits API, documents aren't loaded until you ask fo

Re: Slightly off-topic: using openoffice for conversions

2007-01-29 Thread Shane
Not sure if this is what you are after, but there is a projet call File2XLIFF4j which converts a number of file formats to XLIFF (an XML structure) using OpenOffice.org. And if I am not mistaken, Lucene has code available for indexing XML. The project is located at http://file2xliff4j.sourcef

Re: lucene for syslogs

2007-01-29 Thread Saravana
Hi, Thanks for your response. I am just adding inline with my answers. How much work is it to parse the log files? What kind if hardware are you using? Are you accessing things over a network? Is there network latency involved? *** I believe answers for the above questions will not aff

fiel type to store & seacrh

2007-01-29 Thread arnaudbuffet
Hi ; I recently upgrade Lucene into a java application to current release 2.0. As all know the way to write indexed data change with notion of Field.Store and Field.Index into the lucene document. Every thing I read seems confused . Is anyone help me quickly with better option to use to i

Slightly off-topic: using openoffice for conversions

2007-01-29 Thread John Haxby
Hello All, In LIA, Erik and Otis mention using the openoffice.org API for converting from various formats to something that can be used for indexing. Does anyone have any examples of doing this that they'd be willing to share? jch -

Re: lucene for syslogs

2007-01-29 Thread Erick Erickson
That depends (tm Erik Hatcher) . The problem with such an open-ended question is that there are so many unique variables that it's impossible to answer in any meaningful way. For instance How much work is it to parse the log files? What kind if hardware are you using? Are you accessing things

RE: Merge Hits

2007-01-29 Thread DECAFFMEYER MATHIEU
I noticed it is not possible to define Occur.SHOULD in Lucene 1.4.3 -Original Message- From: Nicolas Lalevée [mailto:[EMAIL PROTECTED] Sent: Monday, January 29, 2007 2:03 PM To: java-user@lucene.apache.org Subject: Re: Merge Hits * This message comes from the Internet Netwo

Re: Merge Hits

2007-01-29 Thread Nicolas Lalevée
Le Lundi 29 Janvier 2007 13:33, DECAFFMEYER MATHIEU a écrit : > Thank u for your response, > Actually I want to merge the Hits to get a better score, > For example when user enter Hello > I want to merge : > title:Hello > headlines:Hello > summary:Hello > content:Hello > > Then I will get a better

RE: Merge Hits

2007-01-29 Thread DECAFFMEYER MATHIEU
Thank u for your response, Actually I want to merge the Hits to get a better score, For example when user enter Hello I want to merge : title:Hello headlines:Hello summary:Hello content:Hello Then I will get a better score if the title is Hello, What do u think of this ? Thank u. -Or

Re: Merge Hits

2007-01-29 Thread Nicolas Lalevée
Le Lundi 29 Janvier 2007 12:08, DECAFFMEYER MATHIEU a écrit : > Hi, I have a table of objects Hit, > I want to merge the different Hits objects of the table to have one Hits > object. > > Is this possible ? The easiest way is to merge the queries which produces thoses different hits. You should l

Merge Hits

2007-01-29 Thread DECAFFMEYER MATHIEU
Hi, I have a table of objects Hit, I want to merge the different Hits objects of the table to have one Hits object. Is this possible ? Thank u for any help ! __ Internet communications are not secure and therefore

Re: Score

2007-01-29 Thread Erik Hatcher
Look at the details provided by IndexSearcher.explain(). That'll tell you why. Erik On Jan 29, 2007, at 4:43 AM, DECAFFMEYER MATHIEU wrote: Hi, I have one index with one document with title "Logistics" I have a second index with the same document with title "Logistics" and othe

lucene for syslogs

2007-01-29 Thread Saravana
Hi, Did anybody use lucene to index syslogs? What is the maximum indexing rate that we can get to store a 200 bytes document with 14 fields? thanks, MSK -- Every day brings us a sea of opportunities

Score

2007-01-29 Thread DECAFFMEYER MATHIEU
Hi, I have one index with one document with title "Logistics" I have a second index with the same document with title "Logistics" and other documents (some contains the word "Logistics" as well) If I execute a search title:Logistics in the first index, I have 0.31 for the document with title "Lo

Index creation

2007-01-29 Thread WATHELET Thomas
How could I optimize my index creation? // setUseCompoundFile(?); // setMaxBufferedDocs(?); // setMergeFactor(?); How could I reduce the disk access because I work with more than 100 documents? Thanks