On Feb 18, 2008 3:56 PM, Reece <[EMAIL PROTECTED]> wrote: > Hello Everyone, > > First off, sorry about the thread hijack earlier, it was not intentional. > > Back to the point though, I'm having some issues getting > SOLR to work with our dataset. I'm using it to index ticket data for > our technical support department. Below are a few of the problems > I've been having, and the wiki hasn't had much to say about them. > > 1) As an example, searching for "binarydata_groupdocument_fk" returns > nothing, while searching for "BinaryData_GroupDocument_FK" returns > results. I have the lowercasefilterfactory applied to both the index > and query analyzers. Does this not actually set everything to lower > case? From the wiki at > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters, it says > "Creates tokens by lowercasing all letters and dropping non-letters" > but that does not seem to be happening here. Am I forgetting to > configure something?
Did you re-index? > 2) Some of our data is one sentence. Some is over 5 MB of text. When > searching for a term, it's returning the one sentence data first > because the fieldNorm is so different (0.4 for one, 0.002 for others). > Is there a way to disable using the fieldnorm in the score > calculation? It's probably Lucene's default length normalization over-emphasizing short fields. You could use a better similarity for your data, or turn off length normalization by setting omitNorms="true" for that field in the schema and then re-indexing (make sure to delete the old index entirely first). > An alternative I tried was posting parts of the data in > as different values of the field (so having multiple tags of that > field-name in the add xml post), but that appeared to have zero effect > on the results - even the querydebugger showed the exact same > calculation for the search. Does anyone know how to disable the > fieldnorm, or have the score created from adding the scores from each > value of a multivalued field? > > 3) I discovered that searching for '"certificate not found"' (using > the double quotes for a phrase here) did not return any results, even > though the phrase did exist (and was lower case originally too, so > different than my first issue). I discovered it was because of the > stopword "not", but the same stopfilterfactory was applied to both the > index and query analyzers. Am I doing something wrong there? As a > workaround I'm having php manually removing stopwords from the > querystring, which is a real pain. I'm thinking my filters aren't being > applied correctly since this is similar to issue #1 but with a different > filter. Hmmm, looks like a recent change in lucene probably causes this bug. Could you open a new Solr JIRA issue to report this bug? -Yonik