[jira] Commented: (SOLR-1553) extended dismax query parser

Hoss Man (JIRA) Mon, 07 Dec 2009 10:09:42 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787021#action_12787021
 ]


Hoss Man commented on SOLR-1553:
--------------------------------

Thoughts while reading the code...

* the code is kind of hard to read ... there's a serious dirth of comments
* reads very kludgy, clearly a hacked up version of DisMax ... probably want to 
refactor some helper functions (that can then be documented) 
* the clause.field and getFieldName functionality is dangerous for people 
migrating from edismax->dismax (users guessing field names can query on fields 
the solr admin doesn't want them to query on) ... we need an option to turn 
that off.
** one really nice thing about the field query support though: it looks like it 
would really be easy to add support for arbitrary field name aliasing with 
something like f.someFieldAlias.qf=realFieldA^3+realFieldB^4 
** perhaps getFieldName should only work for fields explicitly enumerated in a 
param?
* why is "TO" listed as an operator when building up the phrase boost fields? 
(line 296) ... if range queries are supported, then shouldn't the upper/lower 
bounds also be striped out of the clauses list?
** accepting range queries also seems like something that people should be able 
to disable
* apparently "pf" was changed to iteratively build boosting phrase queries for 
every 'pair' of words, and "pf3" is a new param to build boosting phrase 
queries for every 'triple' of words in the input. while this certainly seems 
useful, it's not back-compatable .. why not restore 'pf' to it's original 
purpose, and add "pf2" for hte pairs?
* what is the motivation for ExtendedSolrQueryParser.makeDismax? ... i see that 
the boost queries built from the pf and pf3 fields are put in BooleanQueries 
instead of DisjunctionMaxQueries ... but why? (if the user searches for a 
phrase that's common in many fields of one document, that document is going to 
get a huge score boost regardless of the "tie" value, which kind of defeats the 
point of what the dismax parser is trying to do)
* we should remove the extremely legacy "/* legacy logic */" for dealing with 
"bq" ... almost no one should care about that, we really don't need to carry it 
forward in a new parser.
* there are a lot of empty catch blocks that seem like they should at least log 
a warning or debug message.
* ExtendedAnalyzer feels like a really big hack ... i'm not certain, but i 
don't think it works correctly if a CharFilter is declared.
* we need to document all these new params ("pf3", "lowercaseOperators", 
"boost", 


Thoughts while testing it out on some really hairy edge cases that break the 
old dismax parser...

* this is really cool
* this is really freaking cool.
* still has a problem with search strings like "foo &&" and "foo ||" ... i 
suspect it would be an easy fix to recognize these just like AND/OR are 
recognized and escaped.
* once we fix some of hte issues mentioned above, we should absolutely register 
this using the name "dismax" by default, and register the old one as 
"oldDismax" with a note in CHANGES.txt telling people to use defType=oldDismax 
if they really need it.




> extended dismax query parser
> ----------------------------
>
>                 Key: SOLR-1553
>                 URL: https://issues.apache.org/jira/browse/SOLR-1553
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Yonik Seeley
>             Fix For: 1.5
>
>         Attachments: SOLR-1553.patch
>
>
> An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1553) extended dismax query parser

Reply via email to