Re: had query regarding the indexing and analysers

Rohan Thakur Wed, 20 Mar 2013 05:40:11 -0700

hi jack

I have been using text_en_splitting initially but what it was doing is it
is changing by query aswell
for example:
if i am searching for "ace" term it is taking it as "ac" thus giving split
ac higher score...
see debug statment:


"debug":{
    "rawquerystring":"ace",
    "querystring":"ace",
    "parsedquery":"(+DisjunctionMaxQuery((title:ac^30.0)))/no_coord",
    "parsedquery_toString":"+(title:ac^30.0)",
    "explain":{
      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 469)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 469,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=469)\n",
      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 470)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 470,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=470)\n",
      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 471)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 471,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=471)\n",
      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 472)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 472,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=472)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 331)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 331,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=331)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 332)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 332,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=332)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 335)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 335,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=335)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 336)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 336,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=336)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 337)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 337,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=337)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 393)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 393,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=393)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 425)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 425,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=425)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 426)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 426,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=426)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 429)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 429,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=429)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 430)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 430,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=430)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 431)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 431,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=431)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 433)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 433,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=433)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 434)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 434,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=434)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 502)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 502,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=502)\n",
      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 411)
[DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 411,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.3125 = fieldNorm(doc=411)\n",
      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 424)
[DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 424,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.3125 = fieldNorm(doc=424)\n"},
    "QParser":"ExtendedDismaxQParser",



On Tue, Mar 19, 2013 at 7:37 PM, Jack Krupansky <j...@basetechnology.com>wrote:

> Yeah, one ambiguity in typography is whether a hyphen is internal to a
> compound term (e.g., "CD-ROM") or a phrase separator as in your case. Some
> people are careful to put spaces around the hyphen for a phrase delimiter,
> but plenty of people still just drop it in directly adjacent to two words.
>
> In your case, text_en_splitting_tight is SPECIFICALLY trying to keep
> "Laptop-DUAL" together as a single term, so that "wi fi" is kept distinct
> from "Wi-Fi".
>
> Try text_en_splitting, which specifically is NOT trying to keep them
> together.
>
> The key clue here is that the former does not have generateWordParts="1".
> That is the option that is needed so that "Laptop-DUAL" will be indexed as
> "laptop dual".
>
> -- Jack Krupansky
>
> -----Original Message----- From: Rohan Thakur
> Sent: Tuesday, March 19, 2013 3:35 AM
> To: solr-user@lucene.apache.org
> Subject: Re: had query regarding the indexing and analysers
>
>
> my default is title only I have used debug as well it shows that solr
> divides the query into dual and core and then searches both separately now
> while calculating the scores it puts the document in which both the terms
> appear and in my case the document containing this title:
>
> Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>
> solr has found only core term not dual as I guess it is
> attached to laptop term not as even searching for only dual
> term this document doesnot show up which is why this document
> sshows down in the search results thus I am not able to
> search for partial terms for that I have to apply *dual
> in the query then it is searching this document but then
> other search scoring gets affected with this when I put * in
> the query terms I think I have to remove the "-" terms from
> the strings before indexing them point me if i am wrong any
> where
>
> thanks
> regards
> Rohan
>
>
> On Sat, Mar 16, 2013 at 7:02 PM, Erick Erickson <erickerick...@gmail.com>*
> *wrote:
>
>  See admin/analysis, it's invaluable. Probably
>>
>> The terms are being searched against your default text field which I'd
>> guess is not "title".
>>
>> Also, try adding &debug=all to your query and look in the debug info at
>> the
>> parsed form of the query to see what's actually being searched.
>>
>> Best
>> Erick
>>
>>
>> On Fri, Mar 15, 2013 at 2:52 AM, Rohan Thakur <rohan.i...@gmail.com>
>> wrote:
>>
>> > hi all
>> >
>> > wanted to know I have this string in field title :
>> >
>> > Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>> >
>> > I have indexed it using text-en-splliting-tight
>> >
>> >
>> > and now I am searching for term like q=dual core
>> >
>> > but in the relevance part its this title is coming down the order as
>> > solr is not searching dual in this string its just searching core term
>> > from the query in this string thus multiplying the score for this field
>> by
>> > 1/2
>> > decreasing the score.
>> >
>> > how can I correct this can any one help
>> >
>> > thanks
>> > regards
>> > Rohan
>> >
>>
>>
>

Re: had query regarding the indexing and analysers

Reply via email to