[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870384#action_12870384 ] Shai Erera commented on LUCENE-2458: bq. There will be tons of different opinions to

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Robert Muir
I can't tell if you are being obnoxious or seriously believe what you say. You understand that cjkanalyzer is broke with this? You understand that ngrams themselves capture information about position and it even works nicely with scoring, and helps. This hack doesn't help english. If you think

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Mark Miller
Obnoxiousness has certainly been in the air regarding this issue, I'll give you that. On Sunday, May 23, 2010, Robert Muir rcm...@gmail.com wrote: I can't tell if you are being obnoxious or seriously believe what you say.  You understand that cjkanalyzer is broke with this? You understand that

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870410#action_12870410 ] Uwe Schindler commented on LUCENE-2458: --- Hi Robert, I also agree with Mark (as you

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Shai Erera
Robert - is the effect on scoring also on English and other European languages? Or is it mostly for ngram-based languages, and especially CJK? I want to stress that not all ngram-based languages are affected by this behavior, especially those for which we do ngram just because of a lack of good

RE: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Uwe Schindler
Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count Robert - is the effect on scoring also on English and other European languages? Or is it mostly for ngram-based languages, and especially CJK? I want to stress that not all ngram-based

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Robert Muir
- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de From: Shai Erera [mailto:ser...@gmail.com] Sent: Sunday, May 23, 2010 6:34 PM To: dev@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Robert Muir
On Sun, May 23, 2010 at 12:34 PM, Shai Erera ser...@gmail.com wrote: I want to stress that not all ngram-based languages are affected by this behavior, especially those for which we do ngram just because of a lack of good tokenizer. They are also affected! Do you understand how the

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Shai Erera
[mailto:ser...@gmail.com] Sent: Sunday, May 23, 2010 6:34 PM To: dev@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count Robert - is the effect on scoring also on English and other European languages

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Robert Muir
, 2010 6:34 PM To: dev@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count Robert - is the effect on scoring also on English and other European languages? Or is it mostly for ngram-based languages

RE: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Uwe Schindler
@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count These comments lead me to believe you don't understand the issue. Do you understand that *ALL* CJK queries are made into phrase queries, regardless of tokenizer

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Earwin Burrfoot
The QP should work like that: (1) It parses the query, creating fragments (2) It does some out-of-the-box handling of those fragments People should be able to override that handling of fragments. But people should not touch (1). In fact QP should work like that: (1) Tokenizer parses the

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Robert Muir
On Sun, May 23, 2010 at 1:00 PM, Uwe Schindler u...@thetaphi.de wrote: I just want to make the feature accessible and documented without Version. I think it is just a bug (a shoddy implementation that does not use the syntax, whether it was quoted or not, since this has been thrown away). In

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Shai Erera
So ... after a long IRC chat on this, I think this has just been worded incorrectly (the issue). As I understand, there are two issues here: 1) QP loses a phrase info for fields -- the query f:abcd and f:abcd are parsed the same, or handled the same. There is no way for the one extending QP to

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-23 Thread Robert Muir
+1, this is what the patch does. I agree i did a crappy job explaining the issue. On Sun, May 23, 2010 at 2:25 PM, Shai Erera ser...@gmail.com wrote: So ... after a long IRC chat on this, I think this has just been worded incorrectly (the issue). As I understand, there are two issues here: 1)

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870317#action_12870317 ] Mark Miller commented on LUCENE-2458: - I still don't think this falls under bug

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-22 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870353#action_12870353 ] Shai Erera commented on LUCENE-2458: FWIW, I agree w/ Mark. I don't think it's a bug,

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-19 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869280#action_12869280 ] Michael McCandless commented on LUCENE-2458: OK mulling some more on this

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-13 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867112#action_12867112 ] Robert Muir commented on LUCENE-2458: - {quote} This is why I like the token attr based

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-13 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867117#action_12867117 ] Uwe Schindler commented on LUCENE-2458: --- Sorry for intervening, I am in the same

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-13 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867147#action_12867147 ] Yonik Seeley commented on LUCENE-2458: -- bq This is why I like the token attr based

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-13 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867151#action_12867151 ] Robert Muir commented on LUCENE-2458: - {quote} An attribute that says these tokens go

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866528#action_12866528 ] Michael McCandless commented on LUCENE-2458: This is sneaky behavior on

RE: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Itamar Syn-Hershko
The QueryParser also fails to correctly parse Hebrew acronyms; although not being an integral part of the current discussion, I thought this would be the best place to bring that up. Hebrew acronyms are assembled of letters with a single double-quote char within, example: MNKL (Hebrew for CEO).

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Robert Muir
On Wed, May 12, 2010 at 6:05 AM, Itamar Syn-Hershko ita...@code972.com wrote: The QueryParser also fails to correctly parse Hebrew acronyms; although not being an integral part of the current discussion, I thought this would be the best place to bring that up. Just as I don't think Analysis

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Mark Miller
On 5/12/10 9:25 AM, Robert Muir wrote: (and, contrary to what you would believe from the documentation, the choice of whether or not to make a PhraseQuery is not based on syntax one bit!) Thats a major exaggeration - quoting text plays a large role in whether or not you will get a phrase

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Robert Muir
On Wed, May 12, 2010 at 11:16 AM, Mark Miller markrmil...@gmail.com wrote: Thats a major exaggeration - quoting text plays a large role in whether or not you will get a phrase query. No, it has nothing to do with it in the implementation. It only escapes the whitespace, but is discarded. This

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866595#action_12866595 ] Marvin Humphrey commented on LUCENE-2458: - I have mixed feelings about this for

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866603#action_12866603 ] Marvin Humphrey commented on LUCENE-2458: - Because they show its 10x better to

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Ivan Provalov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1285#action_1285 ] Ivan Provalov commented on LUCENE-2458: --- Robert has asked me to post our test

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866693#action_12866693 ] Marvin Humphrey commented on LUCENE-2458: - I'm honestly having a tough time

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866696#action_12866696 ] Hoss Man commented on LUCENE-2458: -- bq. Instead the queryparser should only form

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866695#action_12866695 ] Robert Muir commented on LUCENE-2458: - {quote} Change the initial split on whitespace

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866698#action_12866698 ] Robert Muir commented on LUCENE-2458: - bq. but all other things being equal lets keep

RE: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Itamar Syn-Hershko
@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count On Wed, May 12, 2010 at 6:30 PM, Itamar Syn-Hershko ita...@code972.com wrote: Never did I request the QP to do Analysis. I simply mentioned this bug - what this definitely

Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread Robert Muir
@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count On Wed, May 12, 2010 at 6:30 PM, Itamar Syn-Hershko ita...@code972.com wrote: Never did I request the QP to do Analysis. I simply mentioned this bug - what

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-12 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866954#action_12866954 ] DM Smith commented on LUCENE-2458: -- As I see it there are two issues: 1) Backward

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-11 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866341#action_12866341 ] Hoss Man commented on LUCENE-2458: -- Robter: do you have a specific suggestion for what

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-11 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866353#action_12866353 ] Robert Muir commented on LUCENE-2458: - bq. ...what should the resulting Query object

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-11 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866363#action_12866363 ] Hoss Man commented on LUCENE-2458: -- bq. a Boolean Query formed with the default operator.

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-11 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866368#action_12866368 ] Robert Muir commented on LUCENE-2458: - bq. That seems like equally bad default

[jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

2010-05-11 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866374#action_12866374 ] Robert Muir commented on LUCENE-2458: - by the way hoss man you said it best yourself: