Re: question about indexing/searching using standardanalyzer for KEYWORD field that contains alphanumeric data

2009-08-03 Thread Ian Lea
Hi Storing documentkey as TEXT will be causing it to be passed through StandardAnalyzer which will be downcasing it, and the index will be holding "lfahbhmf" rather than "LFAHBHMF". When you changed it to KEYWORD it will have been stored as is so the updateDocument(term, doc) call will h

Re: Boosting Search Results

2009-08-03 Thread Ian Lea
You could write your own Similarity, extending DefaultSimilarity and overriding whichever methods will help you achieve your aims. Or how about running 2 searches, the first with both words required (+word1 +word2) and then a second search where they aren't both required (word1 word2). Then merge

Re: Boosting Search Results

2009-08-03 Thread bourne71
Hey, thanks for the suggestion. I think of performing 2 searches as well. Unfortunately I dont know how to perform a search on the first results return. Could u guide me a little? I tried to look around for the information but found none Thanks Ian Lea wrote: > > You could write your own Simila

Re: Boosting Search Results

2009-08-03 Thread Ian Lea
Sorry, I'm not clear what you don't know how to do. To spell out the double search suggestion a bit more: QueryParser qp = new QueryParser(...) Query q1 = qp.parse("+word1 +word2"); TopDocs td1 = searcher.search(q1, ...) Query q2 = qp.parse("word1 word2"); TopDocs td2 = searcher.search(q2); S

Re: question about

2009-08-03 Thread Erick Erickson
When you construct a Term manually, no analyzers are applied, it'sconstructed with whatever you put in there, just as you specify it. So, indeed, it "looks" like a KeywordAnalyzer is being used, but in reality no analysis is being done. So what's happening is that when you index with StandardAnaly

Re: arabic analyzer

2009-08-03 Thread walid
Hello Robert, you are so right, plurals based on prefixes and suffixes are working. Plurals based on inserted "و" do not (باب and ابوب). The few words i had tested where all of the "insert" type and not the prefix/suffix. thank you :) -walid On Sun, 2009-08-02 at 15:08 -0400, Robert Muir wrote

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-03 Thread Jibo John
Mike, Verified that I have the latest source code. Here are the alg files and the checkindexer output. - indexwriter alg analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer doc.

Re: arabic analyzer

2009-08-03 Thread Robert Muir
Walid, thanks for your feedback. fyi I created an issue with some minor improvements (such as lam-lam prefix) to the arabic analyzer: http://issues.apache.org/jira/browse/LUCENE-1758 I also tried to improve the stopwords list, but your Arabic is surely much better than mine. If you are interested

RE: question about indexing/searching using standardanalyzer for KEYWORD field that contains alphanumeric data

2009-08-03 Thread Leonard Gestrin
Hi Ian Thank you for reply. I have recently upgraded the application to lucene 2.4.1 I did not realize that during update operation standard analyzer was not invoked on the term same way as it's done for searching even though indexer is open using it. I am a newbie on lucene (I inherited project

RE: question about

2009-08-03 Thread Leonard Gestrin
Thank you -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, August 03, 2009 6:21 AM To: java-user@lucene.apache.org Subject: Re: question about When you construct a Term manually, no analyzers are applied, it'sconstructed with whatever you put in ther

Re: Boosting Search Results

2009-08-03 Thread bourne71
Sorry...I mean the double searching part. That is the part I dont understand how to do...since after retrieving the 1st results, I am not sure how to search it again. Ian Lea wrote: > > Sorry, I'm not clear what you don't know how to do. > > > To spell out the double search suggestion a bit m

Re: How to improve search time?

2009-08-03 Thread Otis Gospodnetic
With such a large index be prepared to put it on a server with lots of RAM (even if you follow all the tips from the Wiki). When reporting performance numbers, you really ought to tell us about your hardware, types of queries, etc. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.htm

Re: How to improve search time?

2009-08-03 Thread prashant ullegaddi
I'm running it on Quadcore, 2.4GHz each, 4GB RAM. Prashant. On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic wrote: > With such a large index be prepared to put it on a server with lots of RAM > (even if you follow all the tips from the Wiki). > When reporting performance numbers, you really ou

Re: How to improve search time?

2009-08-03 Thread Anshum
Hi Prashant, 8 seconds as the minimum time is a little too much, though considering you're using just 4G of RAM its still ok. I would advice you to break your index into smaller indexes, perhaps selectively query the indexes (if that's possible for your application) and use a parallelmultisearcher.

Re: Searching doubt

2009-08-03 Thread Anshum
Hi Harig, What you are trying to do is search for 2 tokens as one. You'd have to index the url as you want for the token to be searchable. Else you might try a wildcard query . -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to m

Re: Searching doubt

2009-08-03 Thread Shai Erera
I can think of another approach - during indexing, capture the word "aboutus" and index it as "about us" and "aboutus" in the same position. That way both queries will work. You'd need to write your own TokenFilter, maybe a SynonymTokenFilter (since this reminds me of "synonyms" usage) that accept

Re: Searching doubt

2009-08-03 Thread m.harig
Thanks This is my codw snippet IndexSearcher searcher = new IndexSearcher(indexDir); Analyzer analyzer = new StopAnalyzer(); WildcardQuery query = new WildcardQuery(new Term(DEFAULT_FIELD)); searcher.search(

Re: Searching doubt

2009-08-03 Thread Shai Erera
I don't see that you use the Analyzer anywhere (i.e. it's created by not used?). Also, the wildcard query you create may be very inefficient, as it will expand all the terms under the DEFAULT_FIELD. If the DEFAULT_FIELD is the field where all your "default searchable" terms are indexed, there coul