Splitting queries, or using two different parsers

2007-01-30 Thread Aleksander M. Stensby
Hey everyone! I have a question/problem I hope some of you guys can help me with. I have this case where i have put my self in a bit of trouble... The thing is i have several fields indexed, one being "source" and one being "content" (which is the default field), among other fields that are

RE: Score

2007-01-30 Thread DECAFFMEYER MATHIEU
Yes it helps me understanding, thank u. I make a BooleanQuery with the input of the user and include in the query title:keywordofuser headlines:keywordofuser content:keywordofuser I tried to Boost field title, then if keyword appear in the title, score grows like I want to, but if keyword occurs

Re: Splitting queries, or using two different parsers

2007-01-30 Thread Aleksander M. Stensby
kk, i excuse myself for being so ignorant and not looking through the API thorougly:) I found the PerFieldAnalyzerWrapper which i think will do the trick:) So erhm.. just ignore this message:) - Aleksander On Tue, 30 Jan 2007 09:39:07 +0100, Aleksander M. Stensby <[EMAIL PROTECTED]> wrote:

RE: Index creation

2007-01-30 Thread WATHELET Thomas
Ok it's faster (maybee 4 times less) with RAMDirectory and with a MaxBufferedDocs to 1 and a MergeFactor to 1000 and JVM heap to 1024. Thanks -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: 29 January 2007 17:48 To: java-user@lucene.apache.org Subject: Re: I

Re: Problem with lucene.

2007-01-30 Thread poeta simbolista
Thanks for your reply. I am working with an index which is created separately. It is created with a StandardTokenizer. I have read there should be used the same tokenizer which the index was created with. Anyway, I have tried other tokenizers while consulting, such as the WhitespaceTokenizer, but

Why this query is not correct?

2007-01-30 Thread poeta simbolista
Hi guys, I have been through the docs and I can't see why the parser does not parse this correctly: description:*sql is not correct: Lexical error at line 1, column 16. Encountered: "*" (42), after : "" However, the following: description: sql* is correct. Any idea why you can't use wildca

Re: Splitting queries, or using two different parsers

2007-01-30 Thread Erick Erickson
Been there, done that . If you only knew the number of times someone on this list has come to my rescue by saying "Did you look at *"? But I thought I'd add that you can use the PerFieldAnalyzerWrapper at index time too, which may help you keep things consistent between indexing time and sear

Re: Problem with lucene.

2007-01-30 Thread Erick Erickson
Yes, surely the analyzer used at indexing time governs what's in the index and thus what can be searched. I'd surmise that your index doesn't contain any tokens with < or > if StandardAnalyzer was used, so no matter what analyzer you use at query time, you won't be able to find tokens that aren't

Re: Why this query is not correct?

2007-01-30 Thread Erick Erickson
Because one of the restrictions of wildcards is that they cannot appear as the first character in a search term if you use QueryParser. QueryParser rejects it (as you've seen). This is a deliberate design decision since this query will be humongous and most likely throw a TooManyClauses exception

Re: Why this query is not correct?

2007-01-30 Thread Steven Rowe
Check out QueryParser.setAllowLeadingWildcard(): (though AFAICT this feature is not in any released version of Lucene yet - you'll have to use a nightly build). poeta simbolist

RAMDirectory

2007-01-30 Thread WATHELET Thomas
I'm using RAMDirectory If the number of documents to index is less than the maxBufferedDocs properties nothing is write into my index. ex: RAMDirectory ramDir =new RAMDirectory (); this.indexWriter.addIndexes(new Directory[] { ramDir }); ramWriter.close(); indexWriter.

RAMDirectory 2

2007-01-30 Thread WATHELET Thomas
P.S. At one point I tried doing an in-memory index using the RAMDirectory and then merging it with an on-disk index and it didn't work. The RAMDirectory never flushed to disk... leaving me with an empty index. Only when the number of documents is greather than the maxBufferedDocs properties. What

Re: RAMDirectory 2

2007-01-30 Thread Otis Gospodnetic
Sounds vaguely familiar... and I think this was fixed a while back. Running HEAD or at least 2.0.0? Otis - Original Message From: WATHELET Thomas <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, January 30, 2007 11:22:52 PM Subject: RAMDirectory 2 P.S. At one point

Problem with a search engine

2007-01-30 Thread To, Xavier
Hi, I recently started an internship and I've been asked to fix their search engine so numbers are searched. At first, I thought it was the Analyzer that wasn't working right, but we're using StandardAnalyzer and the numbers are indexed (I checked with Lukeall). Then I thought they are not tokenize

Re: Problem with a search engine

2007-01-30 Thread Otis Gospodnetic
Hard to tell without seeing any code. Perhaps numbers are being removed from the query string during search. Make sure the same or at least "compatible" Analyzer is used during both indexing and querying. Grab the code from Lucene in Action hm, lucenebook.com may be down at the moment, but

RE: RAMDirectory 2

2007-01-30 Thread WATHELET Thomas
I'm using now lucene-core-2.0.0.jar(26-05-2006) and I still have the trouble... -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: 30 January 2007 17:18 To: java-user@lucene.apache.org Subject: Re: RAMDirectory 2 Sounds vaguely familiar... and I think this was fix

RE: Score

2007-01-30 Thread Chris Hostetter
: I make a BooleanQuery with the input of the user and include in the query : title:keywordofuser headlines:keywordofuser content:keywordofuser : I tried to Boost field title, then if keyword appear in the title, score grows like I want to, but if keyword occurs in content or headlines score dec

Re: RAMDirectory

2007-01-30 Thread Chris Hostetter
: I'm using RAMDirectory : If the number of documents to index is less than the maxBufferedDocs : properties nothing is write into my index. : ex: RAMDirectory ramDir =new RAMDirectory (); : this.indexWriter.addIndexes(new Directory[] { ramDir }); : ramWriter.close(); :

Re: Score

2007-01-30 Thread Nott
Hi Thanks for the response To explain more clear say I search on the Author field . Consider by data resides as follows Authortitle Jess Hopkins ABC Jess howardCCC James Hopkins ZZZ Jess Hopkins RRR I want all documents that were created

Boost/Scoring question

2007-01-30 Thread Antony Bowesman
Hi, In trying to understand scoring and boosting a bit better, I tried setting a boost of 0.0F for a field. As it's used as a multiplier, I wanted to see how it affects score. I added a single document with two fields, one with the default boost and another with a boost of 0.0F. hits.score

Re: Anyone have an XMLAnalyzer?

2007-01-30 Thread Rida Benjelloun
Hi, You can use Lius to index XML document. http://sourceforge.net/projects/lius/ http://www.doculibre.com/lius/doc-1.0_en.html On 1/25/07, Arturo PĂ©rez <[EMAIL PROTECTED]> wrote: In article <[EMAIL PROTECTED]>, "Simon Willnauer" <[EMAIL PROTECTED]> wrote: > http://www.google.com/search?hl=de&

Re: Score

2007-01-30 Thread Chris Hostetter
1) did you look at the Explain output to seee what it's doing? 2) did you look atthe query.toString() of your Query object? I suspect your query is being parsed as "Jess in the Author field, and Hopkins in the defaultSearch field" - so the order you cited makes perfect sense (assuming what you l

Re: Boost/Scoring question

2007-01-30 Thread Chris Hostetter
: I added a single document with two fields, one with the default boost and : another with a boost of 0.0F. hits.score(0) = 0.10848885, but Explanation shows: : : 0.0 = match required 1) you can never compare the score from a Hits object with the score from an Explanation. Explanation has the

Re: Announcement: Lucene powering Monster job search index (Beta)

2007-01-30 Thread no spam
This is very similar to what I do. I use a hit collector to gather the results, then filter outside a bounding box, then calculate the euclidian distance. Last time I tried to check your search it was down. We were talking the other day at work how job search was lacking among the big boards.

Re: Announcement: Lucene powering Monster job search index (Beta)

2007-01-30 Thread Peter Keegan
Mark, I'm sorry to hear that you weren't able to get to the job search site today. I heard of a problem, but I can assure you that it had nothing to do with Lucene and our back end tiers. Can you tell me what you think is lacking for job search among the big boards? There is clearly a lot of room

using a document as a query?

2007-01-30 Thread Bill Janssen
I was thinking of trying something, and wondered if someone else already had it working... I'd like to take a document, and use it as a query to find other documents in my index that 'match' it. I'm talking about short documents, like newspaper articles or email messages. Seems to me that there

Re: Boost/Scoring question

2007-01-30 Thread Antony Bowesman
Chris Hostetter wrote: 1) you can never compare the score from a Hits object with the score from an Explanation. Explanation has the raw score, Hits has the psuedo-normalized score. Thanks for the comments. Where I was trying to get to was whether a match on a field with boost of 0.0 can eve

Re: Extending scoring to eliminate sorting on timestamp

2007-01-30 Thread Chiradeep Vittal
Chris, Thanks for all your invaluable comments. The killer was the fact that the timestamp for each document was unique. For a search with millions of results, this resulted in allocation of millions of strings during the sorting step (FieldCacheImpl.getStrings). With some loss of precision, I

Re: "did you mean..." feature

2007-01-30 Thread karl wettin
30 jan 2007 kl. 01.39 skrev Felix Litman: We are implementing the "did you mean..." on top of Lucene, leveraging ideas of the "Did you mean Lucene?" article. (Many thanks to Tom White for such a useful and clear article...!) We are having some difficultiies getting good "did you mea

Re: How many documents in the biggest Lucene index to date?

2007-01-30 Thread karl wettin
30 jan 2007 kl. 04.18 skrev Daniel Noll: karl wettin wrote: Then it hit me that perheps the integer limitation should be in the store (Directory) and not the IndexReader? If not now, perhaps in the future when everybody is running on 64bit JVMs. I don't think it will be a very expensive th

Re: How many documents in the biggest Lucene index to date?

2007-01-30 Thread Daniel Noll
karl wettin wrote: I think the big undertaking would be to refactor all of Lucene to use longs as document numbers. But not in the store. There it would still be integers, and the MultiReader can keep track of Integer.MAX_VALUE stores. Integer.MAX_VALUE*Integer.MAX_VALUE = Long.MAX_VALUE. So

Re: using a document as a query?

2007-01-30 Thread Otis Gospodnetic
Yes, I believe Dave did something like that on searchmorph.org and somebody else did this on some some with RFCs. What's that called? Query by example? I think so, try define:Query By Example on Google. Take a look at MoreLikeThis class in contrib/ too. :) Otis - Original Message F