Greetings,
I have tested with Mysql, the grouping is ok when there is not much records in
the table, but when I come across to performed grouping in a table which have 3
millions of records, It really take a very long time to finish. Thus, Im
looking at lucene and hope it can help.
Thank you
Hello all,
when we search over an index docs we use code such:
Analyzer analyzer = new StandardAnalyzer();
String defaultSearchField = all;
QueryParser parser = new QueryParser(defaultSearchField, analyzer);
IndexSearcher indexSearcher = new IndexSearcher(this.indexDirectory);
Hits hits =
copying all fields to a single searchable field is quite reasonable,
and won't double your index size if you set the new field to be
unstored.
Erik
On Aug 15, 2007, at 5:38 AM, Ridwan Habbal wrote:
Hello all,
when we search over an index docs we use code such:
Analyzer analyzer
15 aug 2007 kl. 07.18 skrev Mohammad Norouzi:
I am using WhitespaceAnalyzer and the query is icdCode:H* but
there is
no result however I know that there are many documents with this
field value
such as H20, H20.5 etc. this field is tokenized and indexed
what is
wrong with this?
It worked! My indexing time went from over 6 hours to 592 seconds! Thank
you guys so much!
--JP
On 8/14/07, karl wettin [EMAIL PROTECTED] wrote:
14 aug 2007 kl. 21.34 skrev John Paul Sondag:
What exactly is a RAMDirectory, I didn't see it mentioned on that
page. Is
there example
Hey,
I think u can try :
MultiFieldQueryParser.parse(String[] queries, String[] fields,
BooleanClause.Occur[] flags,
Analyzer analyzer)
The flags arrray will get u ORs and ANDs in places u need
- Sagar Naik
Abu Abdulla alhanbali wrote:
Thanks for the help,
please provide the code to
I'm working on refining my stopwords by looking at the highest scoring
document returned for each search, and using the highlighter to show which
terms were significant in choosing that document. This has been extremely
helpful in improving my searches. I've noticed though that sometimes the
We are writing a mail archiving program. Each piece of the message (eg each
attachment) is stored separately.
I'll try to keep this short and sweet :)
Currently we index the main header fields, like
subject
sender
recipients (space delimited)
etc.
This stuff is really only needed once per
Could someone who understands Lucene internals help me port
https://issues.apache.org/jira/browse/LUCENE-423 to Lucene 2.0? I have beefy
hardware (32 cores) and want to try this out, but it won't compile.
There are 2 issues:
1- maxScore
On line 412 TopFieldDocs constructor now needs a maxScore.
Hey Michael,
Are you writing this software for yourself or for reselling? We built
an email archiving service and we use lucene as our search engine. We
approach this a little differently.
BUT, i don't think it is wasteful to index the header information with
the attachment. Just don't
Well, in my case the highlighting was returning nothing because of (my
favorite acronym) PBCAK--
I don't store the text in the index, so I have to retrieve it separately
(from a database) for the highlighting, and my database was not in sync
with the index, so in a few cases the document in
Donna,
I have been investigation highlighters in Lucene recently a bit. The humble
experience I've learned so far is that highlighting is completely different
task from indexing/searching tandem. This simple fact is not obvious to a
lot of people. In your particular casue it would be helpful if
Hi Chew,
with Lucene you could try the following:
Make one query for each single value in each category (each Term):
1Q - Gender:M
2Q - Department:Accounting
3Q - Department:RD
4Q - ...
with a custom HitCollector like the following example taken from
org.apache.lucene.search.HitCollector
Using Lucene 2.2.0, I still sporadically got doc out of order error. I
indexed all of my stuff in one thread. Do you have any idea why it happens?
Thanks!
--
View this message in context:
http://www.nabble.com/out-of-order-tf4276385.html#a12172277
Sent from the Lucene - Java Users mailing list
testn [EMAIL PROTECTED] wrote:
Using Lucene 2.2.0, I still sporadically got doc out of order error. I
indexed all of my stuff in one thread. Do you have any idea why it
happens?
Hm, that is not good. I thought we had finally fixed this with
LUCENE-140. Though un-corrected disk errors
OK, what worked? Using a RAMDir?
Erick
On 8/15/07, John Paul Sondag [EMAIL PROTECTED] wrote:
It worked! My indexing time went from over 6 hours to 592 seconds! Thank
you guys so much!
--JP
On 8/14/07, karl wettin [EMAIL PROTECTED] wrote:
14 aug 2007 kl. 21.34 skrev John Paul
Rather than use efficiency arguments to drive the behavior of the
app, I'd recommend that you define the expected behavior and
make that behavior happen as necessary.
What would you estimate is the ratio of meta-data to attachments?
And what is the ratio of documents that have multiple
I actually know from experience. Around 20% +/- 5% of emails will have
attachments. If that helps. Again, I say index as much info as you
can. Store what you think it necessary.
Erick Erickson wrote:
Rather than use efficiency arguments to drive the behavior of the
app, I'd recommend that
I use RAMDirectory and the error often shows the low number. Last time it
happened with message 7=7. Nest time it happens, I will try to capture
the stacktrace.
Michael McCandless-2 wrote:
testn [EMAIL PROTECTED] wrote:
Using Lucene 2.2.0, I still sporadically got doc out of order
19 matches
Mail list logo