RE: High CPU usage duing index and search

2007-08-15 Thread Chew Yee Chuang
Greetings, I have tested with Mysql, the grouping is ok when there is not much records in the table, but when I come across to performed grouping in a table which have 3 millions of records, It really take a very long time to finish. Thus, Im looking at lucene and hope it can help. Thank you

How to search over all fields in a clean way?

2007-08-15 Thread Ridwan Habbal
Hello all, when we search over an index docs we use code such: Analyzer analyzer = new StandardAnalyzer(); String defaultSearchField = all; QueryParser parser = new QueryParser(defaultSearchField, analyzer); IndexSearcher indexSearcher = new IndexSearcher(this.indexDirectory); Hits hits =

Re: How to search over all fields in a clean way?

2007-08-15 Thread Erik Hatcher
copying all fields to a single searchable field is quite reasonable, and won't double your index size if you set the new field to be unstored. Erik On Aug 15, 2007, at 5:38 AM, Ridwan Habbal wrote: Hello all, when we search over an index docs we use code such: Analyzer analyzer

Re: query question

2007-08-15 Thread karl wettin
15 aug 2007 kl. 07.18 skrev Mohammad Norouzi: I am using WhitespaceAnalyzer and the query is icdCode:H* but there is no result however I know that there are many documents with this field value such as H20, H20.5 etc. this field is tokenized and indexed what is wrong with this?

Re: 答复: Indexing correctly?

2007-08-15 Thread John Paul Sondag
It worked! My indexing time went from over 6 hours to 592 seconds! Thank you guys so much! --JP On 8/14/07, karl wettin [EMAIL PROTECTED] wrote: 14 aug 2007 kl. 21.34 skrev John Paul Sondag: What exactly is a RAMDirectory, I didn't see it mentioned on that page. Is there example

Re: formalizing a query

2007-08-15 Thread Sagar Naik
Hey, I think u can try : MultiFieldQueryParser.parse(String[] queries, String[] fields, BooleanClause.Occur[] flags, Analyzer analyzer) The flags arrray will get u ORs and ANDs in places u need - Sagar Naik Abu Abdulla alhanbali wrote: Thanks for the help, please provide the code to

Question about highlighting returning nothing

2007-08-15 Thread Donna L Gresh
I'm working on refining my stopwords by looking at the highest scoring document returned for each search, and using the highlighter to show which terms were significant in choosing that document. This has been extremely helpful in improving my searches. I've noticed though that sometimes the

Seeking Advice

2007-08-15 Thread Michael Bell
We are writing a mail archiving program. Each piece of the message (eg each attachment) is stored separately. I'll try to keep this short and sweet :) Currently we index the main header fields, like subject sender recipients (space delimited) etc. This stuff is really only needed once per

LUCENE-423: thread pool implementation of parallel queries

2007-08-15 Thread Renaud Waldura
Could someone who understands Lucene internals help me port https://issues.apache.org/jira/browse/LUCENE-423 to Lucene 2.0? I have beefy hardware (32 cores) and want to try this out, but it won't compile. There are 2 issues: 1- maxScore On line 412 TopFieldDocs constructor now needs a maxScore.

Re: Seeking Advice

2007-08-15 Thread Michael J. Prichard
Hey Michael, Are you writing this software for yourself or for reselling? We built an email archiving service and we use lucene as our search engine. We approach this a little differently. BUT, i don't think it is wasteful to index the header information with the attachment. Just don't

Re: Question about highlighting returning nothing

2007-08-15 Thread Donna L Gresh
Well, in my case the highlighting was returning nothing because of (my favorite acronym) PBCAK-- I don't store the text in the index, so I have to retrieve it separately (from a database) for the highlighting, and my database was not in sync with the index, so in a few cases the document in

Re: Question about highlighting returning nothing

2007-08-15 Thread Lukas Vlcek
Donna, I have been investigation highlighters in Lucene recently a bit. The humble experience I've learned so far is that highlighting is completely different task from indexing/searching tandem. This simple fact is not obvious to a lot of people. In your particular casue it would be helpful if

AW: High CPU usage duing index and search

2007-08-15 Thread Steinert, Fabian
Hi Chew, with Lucene you could try the following: Make one query for each single value in each category (each Term): 1Q - Gender:M 2Q - Department:Accounting 3Q - Department:RD 4Q - ... with a custom HitCollector like the following example taken from org.apache.lucene.search.HitCollector

out of order

2007-08-15 Thread testn
Using Lucene 2.2.0, I still sporadically got doc out of order error. I indexed all of my stuff in one thread. Do you have any idea why it happens? Thanks! -- View this message in context: http://www.nabble.com/out-of-order-tf4276385.html#a12172277 Sent from the Lucene - Java Users mailing list

Re: out of order

2007-08-15 Thread Michael McCandless
testn [EMAIL PROTECTED] wrote: Using Lucene 2.2.0, I still sporadically got doc out of order error. I indexed all of my stuff in one thread. Do you have any idea why it happens? Hm, that is not good. I thought we had finally fixed this with LUCENE-140. Though un-corrected disk errors

Re: 答复: Indexing correctly?

2007-08-15 Thread Erick Erickson
OK, what worked? Using a RAMDir? Erick On 8/15/07, John Paul Sondag [EMAIL PROTECTED] wrote: It worked! My indexing time went from over 6 hours to 592 seconds! Thank you guys so much! --JP On 8/14/07, karl wettin [EMAIL PROTECTED] wrote: 14 aug 2007 kl. 21.34 skrev John Paul

Re: Seeking Advice

2007-08-15 Thread Erick Erickson
Rather than use efficiency arguments to drive the behavior of the app, I'd recommend that you define the expected behavior and make that behavior happen as necessary. What would you estimate is the ratio of meta-data to attachments? And what is the ratio of documents that have multiple

Re: Seeking Advice

2007-08-15 Thread Michael J. Prichard
I actually know from experience. Around 20% +/- 5% of emails will have attachments. If that helps. Again, I say index as much info as you can. Store what you think it necessary. Erick Erickson wrote: Rather than use efficiency arguments to drive the behavior of the app, I'd recommend that

Re: out of order

2007-08-15 Thread testn
I use RAMDirectory and the error often shows the low number. Last time it happened with message 7=7. Nest time it happens, I will try to capture the stacktrace. Michael McCandless-2 wrote: testn [EMAIL PROTECTED] wrote: Using Lucene 2.2.0, I still sporadically got doc out of order