[
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787021#action_12787021
]
Hoss Man commented on SOLR-1553:
--------------------------------
Thoughts while reading the code...
* the code is kind of hard to read ... there's a serious dirth of comments
* reads very kludgy, clearly a hacked up version of DisMax ... probably want to
refactor some helper functions (that can then be documented)
* the clause.field and getFieldName functionality is dangerous for people
migrating from edismax->dismax (users guessing field names can query on fields
the solr admin doesn't want them to query on) ... we need an option to turn
that off.
** one really nice thing about the field query support though: it looks like it
would really be easy to add support for arbitrary field name aliasing with
something like f.someFieldAlias.qf=realFieldA^3+realFieldB^4
** perhaps getFieldName should only work for fields explicitly enumerated in a
param?
* why is "TO" listed as an operator when building up the phrase boost fields?
(line 296) ... if range queries are supported, then shouldn't the upper/lower
bounds also be striped out of the clauses list?
** accepting range queries also seems like something that people should be able
to disable
* apparently "pf" was changed to iteratively build boosting phrase queries for
every 'pair' of words, and "pf3" is a new param to build boosting phrase
queries for every 'triple' of words in the input. while this certainly seems
useful, it's not back-compatable .. why not restore 'pf' to it's original
purpose, and add "pf2" for hte pairs?
* what is the motivation for ExtendedSolrQueryParser.makeDismax? ... i see that
the boost queries built from the pf and pf3 fields are put in BooleanQueries
instead of DisjunctionMaxQueries ... but why? (if the user searches for a
phrase that's common in many fields of one document, that document is going to
get a huge score boost regardless of the "tie" value, which kind of defeats the
point of what the dismax parser is trying to do)
* we should remove the extremely legacy "/* legacy logic */" for dealing with
"bq" ... almost no one should care about that, we really don't need to carry it
forward in a new parser.
* there are a lot of empty catch blocks that seem like they should at least log
a warning or debug message.
* ExtendedAnalyzer feels like a really big hack ... i'm not certain, but i
don't think it works correctly if a CharFilter is declared.
* we need to document all these new params ("pf3", "lowercaseOperators",
"boost",
Thoughts while testing it out on some really hairy edge cases that break the
old dismax parser...
* this is really cool
* this is really freaking cool.
* still has a problem with search strings like "foo &&" and "foo ||" ... i
suspect it would be an easy fix to recognize these just like AND/OR are
recognized and escaped.
* once we fix some of hte issues mentioned above, we should absolutely register
this using the name "dismax" by default, and register the old one as
"oldDismax" with a note in CHANGES.txt telling people to use defType=oldDismax
if they really need it.
> extended dismax query parser
> ----------------------------
>
> Key: SOLR-1553
> URL: https://issues.apache.org/jira/browse/SOLR-1553
> Project: Solr
> Issue Type: New Feature
> Reporter: Yonik Seeley
> Fix For: 1.5
>
> Attachments: SOLR-1553.patch
>
>
> An improved user-facing query parser based on dismax
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.