Thanks Daniel,
I understand how it can be done. The only things that bothers me is that
expanding the * might result in many phrases, and that in turn might imply a
performance hit. I'll see what the impact is,
Regards Ard
On Wednesday 08 August 2007 10:28, Ard Schrijvers wrote:
Does
I was wondering if there is a search based method to find the top-k
frequent phrases in a set of documents.( I do not have a particular phrase
in mind so PhraseQuery can probably be ruled out).
I have implemented something that works using termvectors and termpositions
but the performance is not
9 aug 2007 kl. 09.34 skrev Akanksha Baid:
I was wondering if there is a search based method to find the top-k
frequent phrases in a set of documents.( I do not have a particular
phrase
in mind so PhraseQuery can probably be ruled out).
I have implemented something that works using
The CollocationFinder code attached to this may be more suited
http://issues.apache.org/jira/browse/LUCENE-474
Again, not exactly sure of your use case.
Cheers
Mark
- Original Message
From: karl wettin [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Thursday, 9 August, 2007
Is there a good way to handle the following scenario:
I have certain terms with embedded periods for which I want to leave them
intact (not split at the periods). For
example in my application a particular skill might be SAP.FIN (SAP
financial), and it should not be split into
SAP and FIN. Is
9 aug 2007 kl. 16.36 skrev Donna L Gresh:
Is there a good way to handle the following scenario:
I have certain terms with embedded periods for which I want to
leave them
intact (not split at the periods). For example in my application a
particular skill might be SAP.FIN (SAP financial),
Some possibilities...
write your own tokenizer and/or filter. If you alter your BNF,
you'll have to maintain it in later releases.
use some simple transformations for the input *before* tokenizing.
there's been some discussion that StandardAnalyzer (and, I assume,
the Standard* beasts)
thanks.
In this case it actually looks like I was trying to solve a problem
that doesn't exist (not an unusual occurrence in my experience)
since the StandardAnalyzer does not appear to split the terms
if the period has no white space following. I was a bit misled by
the additional complication
Donna L Gresh wrote:
But your point about the StandardAnalyzer being slow is
well-taken, and I'll keep that in mind.
A new StandardAnalyzer that is 6x faster was recently committed on the
trunk. Should be in next release.
- Mark
Ard,
It's the source code of an alternative query language on top of Lucene
that allows boolean queries and span (distance) queries.
The minimal documentation is in the Java API documentation
on the lucene java site under contrib: Surround Parser, and in
the surround.txt file here:
Hello,
I have been using Lucene recently (newbie) and reading about scoring.
However, have a result that of course still does not make sense to me.
I have an index created from a single field (entree) in one table in
my DB.
I Index it with the StandardAnalyzer.
I also Query it with the
Thanks Paul,
I didn't look at the surround.txt, though I must admit it is a pretty cryptic
explanation :-)
Regards Ard
Ard,
It's the source code of an alternative query language on top
of Lucene
that allows boolean queries and span (distance) queries.
The minimal documentation is in
Try using the explain() method to give more insight.
On Aug 9, 2007, at 4:08 PM, Jami Kapla wrote:
Hello,
I have been using Lucene recently (newbie) and reading about scoring.
However, have a result that of course still does not make sense to me.
I have an index created from a single field
Hi, Spencer
Why not translating a xml to an object when indexing,and translating the object
to a xml when searching.
kai
-邮件原件-
发件人: Spencer Tickner [mailto:[EMAIL PROTECTED]
发送时间: 2007年8月4日 星期六 6:01
收件人: java-user@lucene.apache.org
主题: Nested Fields
Hi, and thanks in advace for any
Hi,
I need your help in formalizing this query:
(field1:query1 AND field2:query2) OR
(field1:query3 AND field2:query4) OR
(field1:query5 AND field2:query6) OR
(field1:query7 AND field2:query8) ... etc
Please give the code since I'm new to lucene
how we can use MultiFieldQueryParser or any
Hi,
could u pl. tell me how to update boost factor of already indexed document
using setBoost.
Thanks regards,
Rohit
--
VANDE - MATRAM
Spencer, it seems inefficient to me too, but that's pretty much what I did
for tables embedded within a document.
I used a SAX parser to parse the document and kept track of the table
elements I saw. When I received an endElement, I added the text I had
buffered up in the characters() method to
17 matches
Mail list logo