Re: Questions about filters and scoring

Yonik Seeley Mon, 18 Feb 2008 13:43:14 -0800

On Feb 18, 2008 3:56 PM, Reece <[EMAIL PROTECTED]> wrote:
> Hello Everyone,
>
> First off, sorry about the thread hijack earlier, it was not intentional.
>
> Back to the point though, I'm having some issues getting
> SOLR to work with our dataset.  I'm using it to index ticket data for
> our technical support department.  Below are a few of the problems
> I've been having, and the wiki hasn't had much to say about them.
>
> 1) As an example, searching for "binarydata_groupdocument_fk" returns
> nothing, while searching for "BinaryData_GroupDocument_FK" returns
> results.  I have the lowercasefilterfactory applied to both the index
> and query analyzers.  Does this not actually set everything to lower
> case?  From the wiki at
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters, it says
> "Creates tokens by lowercasing all letters and dropping non-letters"
> but that does not seem to be happening here.  Am I forgetting to
> configure something?


Did you re-index?

> 2) Some of our data is one sentence.  Some is over 5 MB of text.  When
> searching for a term, it's returning the one sentence data first
> because the fieldNorm is so different (0.4 for one, 0.002 for others).
>  Is there a way to disable using the fieldnorm in the score
> calculation?

It's probably Lucene's default length normalization over-emphasizing
short fields.
You could use a better similarity for your data, or turn off length
normalization by
setting omitNorms="true" for that field in the schema and then
re-indexing (make sure to delete the old index entirely first).

>  An alternative I tried was posting parts of the data in
> as different values of the field (so having multiple tags of that
> field-name in the add xml post), but that appeared to have zero effect
> on the results - even the querydebugger showed the exact same
> calculation for the search.  Does anyone know how to disable the
> fieldnorm, or have the score created from adding the scores from each
> value of a multivalued field?
>
> 3) I discovered that searching for '"certificate not found"' (using
> the double quotes for a phrase here) did not return any results, even
> though the phrase did exist (and was lower case originally too, so
> different than my first issue).  I discovered it was because of the
> stopword "not", but the same stopfilterfactory was applied to both the
> index and query analyzers.  Am I doing something wrong there?  As a
> workaround I'm having php manually removing stopwords from the
> querystring, which is a real pain.  I'm thinking my filters aren't being
> applied correctly since this is similar to issue #1 but with a different
> filter.

Hmmm, looks like a recent change in lucene probably causes this bug.
Could you open a new Solr JIRA issue to report this bug?

-Yonik

Re: Questions about filters and scoring

Reply via email to