RE: Fastest way to perform 'like' searches

2007-08-09 Thread Ard Schrijvers
Thanks Daniel, I understand how it can be done. The only things that bothers me is that expanding the * might result in many phrases, and that in turn might imply a performance hit. I'll see what the impact is, Regards Ard On Wednesday 08 August 2007 10:28, Ard Schrijvers wrote: Does

frequent phrases

2007-08-09 Thread Akanksha Baid
I was wondering if there is a search based method to find the top-k frequent phrases in a set of documents.( I do not have a particular phrase in mind so PhraseQuery can probably be ruled out). I have implemented something that works using termvectors and termpositions but the performance is not

Re: frequent phrases

2007-08-09 Thread karl wettin
9 aug 2007 kl. 09.34 skrev Akanksha Baid: I was wondering if there is a search based method to find the top-k frequent phrases in a set of documents.( I do not have a particular phrase in mind so PhraseQuery can probably be ruled out). I have implemented something that works using

Re: frequent phrases

2007-08-09 Thread mark harwood
The CollocationFinder code attached to this may be more suited http://issues.apache.org/jira/browse/LUCENE-474 Again, not exactly sure of your use case. Cheers Mark - Original Message From: karl wettin [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Thursday, 9 August, 2007

special handling of certain terms with embedded periods

2007-08-09 Thread Donna L Gresh
Is there a good way to handle the following scenario: I have certain terms with embedded periods for which I want to leave them intact (not split at the periods). For example in my application a particular skill might be SAP.FIN (SAP financial), and it should not be split into SAP and FIN. Is

Re: special handling of certain terms with embedded periods

2007-08-09 Thread karl wettin
9 aug 2007 kl. 16.36 skrev Donna L Gresh: Is there a good way to handle the following scenario: I have certain terms with embedded periods for which I want to leave them intact (not split at the periods). For example in my application a particular skill might be SAP.FIN (SAP financial),

Re: special handling of certain terms with embedded periods

2007-08-09 Thread Erick Erickson
Some possibilities... write your own tokenizer and/or filter. If you alter your BNF, you'll have to maintain it in later releases. use some simple transformations for the input *before* tokenizing. there's been some discussion that StandardAnalyzer (and, I assume, the Standard* beasts)

Re: special handling of certain terms with embedded periods

2007-08-09 Thread Donna L Gresh
thanks. In this case it actually looks like I was trying to solve a problem that doesn't exist (not an unusual occurrence in my experience) since the StandardAnalyzer does not appear to split the terms if the period has no white space following. I was a bit misled by the additional complication

Re: special handling of certain terms with embedded periods

2007-08-09 Thread Mark Miller
Donna L Gresh wrote: But your point about the StandardAnalyzer being slow is well-taken, and I'll keep that in mind. A new StandardAnalyzer that is 6x faster was recently committed on the trunk. Should be in next release. - Mark

Re: What is the contrib/surround/src/java purpose

2007-08-09 Thread Paul Elschot
Ard, It's the source code of an alternative query language on top of Lucene that allows boolean queries and span (distance) queries. The minimal documentation is in the Java API documentation on the lucene java site under contrib: Surround Parser, and in the surround.txt file here:

Help Regarding Fuzzy Scoring

2007-08-09 Thread Jami Kapla
Hello, I have been using Lucene recently (newbie) and reading about scoring. However, have a result that of course still does not make sense to me. I have an index created from a single field (entree) in one table in my DB. I Index it with the StandardAnalyzer. I also Query it with the

RE: What is the contrib/surround/src/java purpose

2007-08-09 Thread Ard Schrijvers
Thanks Paul, I didn't look at the surround.txt, though I must admit it is a pretty cryptic explanation :-) Regards Ard Ard, It's the source code of an alternative query language on top of Lucene that allows boolean queries and span (distance) queries. The minimal documentation is in

Re: Help Regarding Fuzzy Scoring

2007-08-09 Thread Grant Ingersoll
Try using the explain() method to give more insight. On Aug 9, 2007, at 4:08 PM, Jami Kapla wrote: Hello, I have been using Lucene recently (newbie) and reading about scoring. However, have a result that of course still does not make sense to me. I have an index created from a single field

答复: Nested Fields

2007-08-09 Thread Kai Hu
Hi, Spencer Why not translating a xml to an object when indexing,and translating the object to a xml when searching. kai -邮件原件- 发件人: Spencer Tickner [mailto:[EMAIL PROTECTED] 发送时间: 2007年8月4日 星期六 6:01 收件人: java-user@lucene.apache.org 主题: Nested Fields Hi, and thanks in advace for any

formalizing a query

2007-08-09 Thread Abu Abdulla alhanbali
Hi, I need your help in formalizing this query: (field1:query1 AND field2:query2) OR (field1:query3 AND field2:query4) OR (field1:query5 AND field2:query6) OR (field1:query7 AND field2:query8) ... etc Please give the code since I'm new to lucene how we can use MultiFieldQueryParser or any

Update boost factor for indexed document using setBoost()

2007-08-09 Thread rohit saini
Hi, could u pl. tell me how to update boost factor of already indexed document using setBoost. Thanks regards, Rohit -- VANDE - MATRAM

Re: Nested Fields

2007-08-09 Thread Jeff French
Spencer, it seems inefficient to me too, but that's pretty much what I did for tables embedded within a document. I used a SAX parser to parse the document and kept track of the table elements I saw. When I received an endElement, I added the text I had buffered up in the characters() method to