Re: Are there any Lucene optimizations applicable to SSD?

2008-08-20 Thread Cedric Ho
> [Cedric: Yes] > >> However I can't figure out why some of these queries are slower. Some >> are complicated queries, yet others are just simple single term >> queries and doesn't seems to score lots of hits. There's no >> correlation between the number of terms or number of hits with the >> respo

Unclear Javadoc lucene.search.Filter

2008-08-20 Thread Christopher M Collins
Please accept my sincere apologies: I was reading the Javadoc of an old version. Christopher __ Christopher Collins \ http://www.cs.utoronto.ca/~ccollins Department of Computer Science \ University of Toronto Collaborative User Experien

RE: EmailAddressAnalyzer & TokenStreams

2008-08-20 Thread Steven A Rowe
Hi Dino, The Lucene KeywordTokenizer is about as simple as tokenizers get - it just outputs its entire input as a single token: Check out the source code for ot

Unclear Javadoc lucene.search.Filter

2008-08-20 Thread Christopher M Collins
I'm just starting to use Query Filters, and the javadoc for "Filter" is unclear. Specifically, getDocIdSet says: Returns: a DocIdSet that provides the documents which should be permitted or prohibited in search results. >From what I understand of DocIdSet, it's just a list of docIDs

EmailAddressAnalyzer & TokenStreams

2008-08-20 Thread Dino Korah
Hi guys, If I am to tokenize an email address like "John Smith" < [EMAIL PROTECTED]> into [ [EMAIL PROTECTED] [John] [Smith] [J.Smith] [london.gb.world.net] [gb.world.net] [world.net] [world] [net] Is i

RE: Case Sensitivity

2008-08-20 Thread Dino Korah
Hi Steve, Thanks a lot for that. I have a question on TokenStreams and email addresses, but I will post them on a separate thread. Many thanks, Dino -Original Message- From: Steven A Rowe [mailto:[EMAIL PROTECTED] Sent: 19 August 2008 17:43 To: java-user@lucene.apache.org Subject: RE

Re: Are there any Lucene optimizations applicable to SSD?

2008-08-20 Thread Toke Eskildsen
On Wed, 2008-08-20 at 21:58 +0800, Cedric Ho wrote: Toke: > > Is it the same queries that are slow each time? [Cedric: Yes] > However I can't figure out why some of these queries are slower. Some > are complicated queries, yet others are just simple single term > queries and doesn't seems to scor

Re: Fields with the same name?? - Was Re: Payloads and tokenizers

2008-08-20 Thread Doron Cohen
On Tue, Aug 19, 2008 at 2:15 AM, Antony Bowesman <[EMAIL PROTECTED]> wrote: > > Thanks for you time and I appreciate your valuable insight Doron. > Antony > I'm glad I could help! Doron

Slowing down (rate-limiting/throttling) IndexWriter.optimize

2008-08-20 Thread Halsey, Stephen
Hi, We are using lucene to index a large number of documents (millions) and we currently optimize half the index in the background every 2 days, to stop it becoming too fragmented. This takes about an hour and we are finding during this time searches are slowed down dramatically on that machine.

Re: Are there any Lucene optimizations applicable to SSD?

2008-08-20 Thread Cedric Ho
Hi Toke, >> Search response time. We used the search log from our production >> system and test it with SSD. The results shows that 75% of queries >> returns within 1 second, 90% returns in 2.5 seconds, the remaining 10% >> ranges from 2.5 seconds to less than 100 seconds. > > Are the sub-second r

Re: How I can find wildcard symbol with WildcardQuery?

2008-08-20 Thread Erick Erickson
Thanks for correcting me on this, I had no idea.. Just goes to show what happens when an amateur gets in the mix . Best Erick On Tue, Aug 19, 2008 at 8:09 PM, Daniel Noll <[EMAIL PROTECTED]> wrote: > Сергій Карпенко wrote: > >> Yes, you are correct - NO_NORMS has nothing to do with tokeniza

Re: Are there any Lucene optimizations applicable to SSD?

2008-08-20 Thread Cedric Ho
Hi eks, On Wed, Aug 20, 2008 at 3:04 PM, eks dev <[EMAIL PROTECTED]> wrote: > The simplest sorting would be to sort your collection before indexing, > because Lucene will preserve order of added documents I think nutch sorts > index afterward somehow, but I do not know how this works The way

Re: java.lang.NullPointerExcpetion while indexing on linux

2008-08-20 Thread Michael McCandless
Aditi Goyal wrote: Thanks Mike. I found the problem. The problem was that I was not converting the value of the fields to utf-8 and hence while adding it to doc it was getting stored as None. So, when I did doc.get('fieldA') , instead of giving the blank or any other string, it was giving

Re: java.lang.NullPointerExcpetion while indexing on linux

2008-08-20 Thread Aditi Goyal
Thanks Mike. I found the problem. The problem was that I was not converting the value of the fields to utf-8 and hence while adding it to doc it was getting stored as None. So, when I did doc.get('fieldA') , instead of giving the blank or any other string, it was giving out None. To overcome this,

Re: Are there any Lucene optimizations applicable to SSD?

2008-08-20 Thread Toke Eskildsen
On Wed, 2008-08-20 at 00:25 +0800, Cedric Ho wrote: > Search response time. We used the search log from our production > system and test it with SSD. The results shows that 75% of queries > returns within 1 second, 90% returns in 2.5 seconds, the remaining 10% > ranges from 2.5 seconds to less than

Re: Are there any Lucene optimizations applicable to SSD?

2008-08-20 Thread eks dev
The simplest sorting would be to sort your collection before indexing, because Lucene will preserve order of added documents I think nutch sorts index afterward somehow, but I do not know how this works by omitTf() I mean the new feature in the trunk version, see https://issues.apache.org/ji