A small addition to my earlier post. I wonder if its because of the 'mm'
param, which requires that until 3 words in search phrase, all the words
should be matched. If i alter this now, i'd get ir-relevant results for a
lot of popular 1, 2, 3 word search terms. How to solve for this? 

anuvenk wrote:
> 
> I tried adding some city to state mappings in the synonyms file. I'm using
> the dismax handler for phrase matching. So as & when i add more & more
> city to state mappings, I end up with zero results for state based
> searches.
> Eg: ca,california,los angeles
>      ca,california,san diego
>      ca,california,san francisco
>      ca,california,burbank    and so on....
> now a city based search returns a few other california results but a state
> based search like dui california is returning zero results. 
> I checked the parsedquery_toString and I see no 'OR' although the default
> operator is 'OR' in schema. It looks like its trying to find matches for
> all those cities as they are mapped to 'california' and hence returns zero
> results. How to force dismax to use 'OR' and not 'AND' even though the
> schema has 'OR'.
> Or is this how dismax works? Can someone explain how to overcome this
> problem. 
> Here is my custom request handler that extends dismax
> <requestHandler name="qfacet" class="solr.DisMaxRequestHandler" >
>     <lst name="defaults">
>      <str name="echoParams">explicit</str>
>      <float name="tie">0.01</float>
>      <str name="qf">name^2.0 text^0.8</str>
>      <!-- until 3 all should match;4 - 3 shld match; 5 - 4 shld match; 6 -
> 5 shld match; above 6 - 90% match -->
>      <str name="mm">3&lt;-1 4&lt;-1 5&lt;-1 6&lt;90%</str>
>      <str name="pf">
>          text^0.8 name^2.0
>      </str>
>      <int name="qs">4</int>
>      <int name="ps">4</int>
>      <str name="fl">
>              *,score
>      </str>  
> 
>     </lst>
>     <lst name="invariants">
>       <!--<str name="facet.field">resourceType</str>
>       <str name="facet.field">category</str>
>       <str name="facet.field">stateName</str>-->
>       <str name="facet.sort">false</str>
>       <int name="facet.mincount">1</int>
>     </lst>
>   </requestHandler>
> 
> Thanks.
> 
> 
> 
> Otis Gospodnetic wrote:
>> 
>> 
>> Hello,
>> 
>> 300K is a pretty small index.  I wouldn't worry about the number of
>> synonyms unless you are turning a single term into dozens of ORed terms.
>> 
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> 
>> 
>> 
>> ----- Original Message ----
>>> From: anuvenk <anuvenkat...@hotmail.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Tuesday, June 2, 2009 11:28:43 PM
>>> Subject: Re: Is there Downside to a huge synonyms file?
>>> 
>>> 
>>> I'm using query time synonyms. I have more fields in my index though.
>>> This is
>>> just an example or sample of data from my index. Yes, we don't have
>>> millions
>>> of documents. Could be around 300,000 and might increase in future. The
>>> reason i'm using query time synonyms is because of the nature of my
>>> data. I
>>> can't re-index the data everytime i add or remove a synonym. But for
>>> this
>>> particular requirement is it best to have index time synonyms because of
>>> the
>>> multi-word synonym nature. Again if i add more cities list to the
>>> synonym
>>> file, I can't be re-indexing all the data over and over again. 
>>> 
>>> 
>>> 
>>> anuvenk wrote:
>>> > 
>>> > In my index i have legal faqs, forms, legal videos etc with a state
>>> field
>>> > for each resource.
>>> > Now if i search for real estate san diego, I want to be able to return
>>> > other 'california' results i.e results from san francisco.
>>> > I have the following fields in the index
>>> > 
>>> > title                                                  state          
>>> > description...
>>> > real estate san diego example 1           california         some
>>> > description
>>> > real estate carlsbad example 2             california         some
>>> desc
>>> > 
>>> > so when i search for real estate san francisco, since there is no
>>> match, i
>>> > want to be able to return the other real estate results in california
>>> > instead of returning none. Because sometimes they might be searching
>>> for a
>>> > real estate form and city probably doesn't matter. 
>>> > 
>>> > I have two things in mind. One is adding a synonym mapping
>>> > san diego, california
>>> > carlsbad, california
>>> > san francisco, california
>>> > 
>>> > (which probably isn't the best way)
>>> > hoping that search for san francisco real estate would map san
>>> francisco
>>> > to california and hence return the other two california results
>>> > 
>>> > OR
>>> > 
>>> > adding the mapping of city to state in the index itself like..
>>> > 
>>> > title                                         state             city      
>>> >     
>>>                         
>>> > description...
>>> > real estate san diego eg 1    california   carlsbad, san francisco,
>>> san
>>> > diego        some description
>>> > real estate carlsbad eg 2      california   carlsbad, san francisco,
>>> san
>>> > diego        some description
>>> > 
>>> > which of the above two is better. Does a huge synonym file affect
>>> > performance. Or Is there a even better way? I'm sure there is but I
>>> can't
>>> > put my finger on it yet & I'm not familiar with java either.
>>> > 
>>> > 
>>> 
>>> -- 
>>> View this message in context: 
>>> http://www.nabble.com/Is-there-Downside-to-a-huge-synonyms-file--tp23842527p23844761.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Is-there-Downside-to-a-huge-synonyms-file--tp23842527p23861649.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to