Re: Solr Basic Configuration - Highlight - Begginer

Evert R. Thu, 17 Dec 2015 02:55:16 -0800

Hello Erick,

Sorry for my mistakes. Here is everything I got so far:


1. It bring the result perfectly but the hightlight (empty) field as below:
{

  "responseHeader":{
    "status":0,
    "QTime":15,
    "params":{
      "q":"text:nietava",
      "debug":"query",
      "hl":"true",
      "hl.simple.post":"</em>",
      "indent":"true",
      "fq":"id:pdf1",
      "hl.fl":"text",
      "wt":"json",
      "hl.simple.pre":"<em>"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"pdf1",
        "last_modified":"2011-07-28T20:39:26Z",
        "title":["Microsoft Word - André Luiz - Sexo e Destino _Chico
e Waldo_.doc"],
        "content_type":["application/pdf"],
        "author":"Wander",
        "author_s":"Wander",
        "content":["André Luiz - Sexo e Destino _Chico e Waldo_.doc
***the whole content*** nietava"],

        "_version_":1520765393269948416}]
  },
  *"highlighting":{
    "pdf1":{***I THINK THE SNIPPETS OF TEXT SHOULD BE IN HERE, RIGHT?***}},*
  "debug":{
    "rawquerystring":"text:nietava",
    "querystring":"text:nietava",
    "parsedquery":"text:nietava",
    "parsedquery_toString":"text:nietava",
    "QParser":"LuceneQParser",
    "filter_queries":["id:pdf1"],

    "parsed_filter_queries":["id:pdf1"]}}


2. Here is my settings:

In schema.xml:

<field name="text" type="text_general" indexed="true" stored="true"
multiValued="true"/>

<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">

      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
</fieldType>

In solrconfig.xml:

<requestHandler name="/select" class="solr.SearchHandler"> <lst
name="defaults"> <str name="echoParams">explicit</str> <int
name="rows">10</int> <bool name="preferLocalShards">false</bool> </lst>

I have tried:

schema.xml:   <field name="text" type="text_general" indexed="true"
stored="true" multiValued="true"/>

schema.xml:   <field name="text" type="text_general" indexed="true"
stored="true" multiValued="true"  termVectors="true"
termOffsets="true" termPositions="true"/>

schema.xml:
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.WordDelimiterFilterFactory" catenateAll="1"
preserveOriginal="1" generateNumberParts="0" generateWordParts="0" />
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.ApostropheFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" catenateAll="1"
preserveOriginal="1" generateWordParts="0" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ApostropheFilterFactory"/>
</analyzer>

solrconfig.xml:

                        <str name="df">text</str>
                        <str name="hl">on</str>
                        <str name="hl.fl">text</str>
                        <str name="hl.useFastVectorHighlighter">true</str>
                        <str name="hl.snippets">100</str>
                        <str name="hl.tag.pre"><b></str>
                        <str name="hl.tag.post"></b></str>

The debug is in the reply I have received.


I am still using the standard techproducts.


I hope this is complete enough.


Thanks again!



*Evert*

2015-12-17 2:01 GMT-02:00 Erick Erickson <erickerick...@gmail.com>:

> bq: but when highlight, using the text field...nothing comes up...
>
>
> http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>
> It's unclear what this means. No results showed up (i.e. numFound==0)
> or no highlighting showed up? Assuming that
> 1> the "text" field has stored=true and
> 2> you find documents when searching on the "text" field
> the above should show something in the highlights section.
>
> Please take the time to provide complete details. Guessing what you're
> doing is wasting time, mine and yours. Once more:
> 1> what is the schema definition for the "text" field. Include the
> fieldType definition
> 2> What is the result of adding &debug=query to the field when you
> don't get highlights
>
> You might review: http://wiki.apache.org/solr/UsingMailingLists
> because it's becoming quite frustrating that you give us little bits
> of information that leave us guessing what you're _really_ doing.
> Highlighting is working for lots of people in lots of sites, it's not
> likely that this functionality is completely broken so the answer will
> be in the docs.
>
> Best,
> ERick
>
> On Wed, Dec 16, 2015 at 5:54 PM, Evert R. <evert.ra...@gmail.com> wrote:
> > Hi Erick and Teague,
> >
> >
> > I found that when using the field 'text' it shows the pdf file result
> > id:pdf1 in this case, like:
> >
> > http://localhost:8983/solr/techproducts/select?fq=id:pdf1&q=nietava
> >
> > but when highlight, using the text field...nothing comes up...
> >
> >
> http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> >
> > of even with the option
> >
> > f.text.hl.snippets=2 under the hl.fl field.
> >
> >
> > I tried as well with the standard configuration, did it all over,
> reindexed
> > a couple times... and still did not work.
> >
> > Also,
> >
> > Using the Analysis, it brings below information:
> >
> > ST
> > textraw_bytesstartendpositionLengthtypeposition
> > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> > SF
> > textraw_bytesstartendpositionLengthtypeposition
> > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> > LCF
> > textraw_bytesstartendpositionLengthtypeposition
> > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> >
> >
> > Alphanumeric I think... so, it´s 'string', right? would that be a
> problem?
> > Should be some other indication?
> >
> >
> > Thanks again!
> >
> >
> > *Evert*
> >
> > 2015-12-16 21:09 GMT-02:00 Erick Erickson <erickerick...@gmail.com>:
> >
> >> I think you're still missing the critical bit. Highlighting is
> >> completely separate from searching. In other words, you can search on
> >> one field and highlight another. What field is searched is governed by
> >> the "qf" parameter when using edismax and by the the "df" parameter
> >> configured in your request handler in solrconfig.xml. These defaults
> >> are overridden when you do a "fielded search" like
> >>
> >> q=content:nietava
> >>
> >> So this: q=content:nietava&hl=true&hl.fl=content
> >> is searching the "content" field. The word you're looking for isn't in
> >> the content field so naturally no docs are returned. And no
> >> highlighting either.
> >>
> >> This: q=nietava&hl=true&hl.fl=content
> >>
> >> is searching somewhere else, thus getting the hit. We already know
> >> that "nietava" is not in the content field because the first search
> >> failed. You need to find out what field is being matched (probably
> >> something like "text") and then try highlighting on _that_ field. Try
> >> adding "debug=query" to the URL and look at the "parsed_query" section
> >> of the return and you'll see what field(s) is/are actually being
> >> searched against.
> >>
> >> NOTE: The field you highlight on _must_ have stored="true" in
> schema.xml.
> >>
> >> As to why "nietava" isn't being found in the content field, probably
> >> you have some kind of analysis chain configured for that field that
> >> isn't searching as you expect. See the admin/analysis page for some
> >> insight into why that would be. The most frequent reason is that the
> >> field is a "string" type which is not broken up into words. Another
> >> possibility is that your analysis chain is leaving in the quotes or
> >> something similar. As James says, looking at admin/analysis is a good
> >> way to figure this out.
> >>
> >> I still strongly recommend you go from the stock techproducts example
> >> and get familiar with how Solr (and highlighting) work before jumping
> >> in and changing things. There are a number of ways things can be
> >> mis-configured and trying to change several things at once is a fine
> >> way to go mad. The admin UI>>schema browser is another way you can see
> >> what kind of terms are _actually_ in your index in a particular field.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >>
> >>
> >> On Wed, Dec 16, 2015 at 12:26 PM, Teague James <
> teag...@insystechinc.com>
> >> wrote:
> >> > Sorry to hear that didn't work! Let me ask a couple of questions...
> >> >
> >> > Have you tried the analyzer inside of the Admin Interface? It has
> helped
> >> me sort out a number of highlighting issues in the past. To access it,
> go
> >> to your Admin interface, select your core, then select Analysis from the
> >> list of options on the left. In the analyzer, enter the term you are
> >> indexing in the top left (in other words the term in the document you
> are
> >> indexing that you expect to get a hit on) and right input fields. Select
> >> the field that it is destined for (in your case that would be
> 'content'),
> >> then hit analyze. Helps if you have a big screen!
> >> >
> >> > This will show you the impact of the various filter factories that you
> >> have engaged and their effect on whether or not a 'hit' is being
> generated.
> >> Hits are idietified by a very feint highlight. (PSST... Developers... It
> >> would be really cool if the highlight color were more visible or
> >> customizable... Thanks y'all) If it looks like you're getting hits, but
> not
> >> getting highlighting, then open up a new tab with the Admin's query
> >> interface. Same place on the left as the analyzer. Replace the "*:*"
> with
> >> your search term (assuming you already indexed your document) and if
> >> necessary you can put something in the FQ like "id:123456" to target a
> >> specific record.
> >> >
> >> > Did you get a hit? If no, then it's not highlighting that's the issue.
> >> If yes, then try dumping this in your address bar (using your URL/IP,
> >> search term, and core name of course. The fq= is an example) :
> >> > http://
> [URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]"
> >> >
> >> > That will dump Solr's output to your browser where you can see exactly
> >> what is getting hit.
> >> >
> >> > Hope that helps! Let me know how it goes. Good luck.
> >> >
> >> > -Teague
> >> >
> >> > -----Original Message-----
> >> > From: Evert R. [mailto:evert.ra...@gmail.com]
> >> > Sent: Wednesday, December 16, 2015 1:46 PM
> >> > To: solr-user <solr-user@lucene.apache.org>
> >> > Subject: Re: Solr Basic Configuration - Highlight - Begginer
> >> >
> >> > Hi Teague!
> >> >
> >> > I configured the solrconf.xml and schema.xml exactly the way you did,
> >> only substituting the word 'documentText' per 'content' used by the
> >> techproducts sample, I reindex through :
> >> >
> >> >  curl '
> >> >
> >>
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> >> '
> >> > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> >> >
> >> > with the same result.... no highlight in the respond as below:
> >> >
> >> > "highlighting": { "pdf1": {} }
> >> >
> >> > =(
> >> >
> >> > Really... do not know what to do...
> >> >
> >> > Thanks for your time, if you have any more suggestion where I could be
> >> missing something... please let me know.
> >> >
> >> >
> >> > Best regards,
> >> >
> >> > *Evert*
> >> >
> >> > 2015-12-16 15:30 GMT-02:00 Teague James <teag...@insystechinc.com>:
> >> >
> >> >> Hi Evert,
> >> >>
> >> >> I recently needed help with phrase highlighting and was pointed to
> the
> >> >> FastVectorHighlighter which worked out great. I just made a change to
> >> >> the configuration to add generateWordParts="0" and
> >> >> generateNumberParts="0" so that searches for things like "1a" would
> >> >> get highlighted correctly. You may or may not need that feature. You
> >> >> can always remove them or change the value to "1" to switch them on
> >> explicitly. Anyway, hope this helps!
> >> >>
> >> >> solrconfig.xml (partial snip)
> >> >> <requestHandler name="/select" class="solr.SearchHandler">
> >> >>                 <lst name="defaults">
> >> >>                         <str name="wt">xml</str>
> >> >>                         <str name="echoParams">explicit</str>
> >> >>                         <int name="rows">10</int>
> >> >>                         <str name="df">documentText</str>
> >> >>                         <str name="hl">on</str>
> >> >>                         <str name="hl.fl">text</str>
> >> >>                         <str
> >> name="hl.useFastVectorHighlighter">true</str>
> >> >>                         <str name="hl.snippets">100</str>
> >> >>                         <str name="hl.tag.pre"><b></str>
> >> >>                         <str name="hl.tag.post"></b></str>
> >> >>                 </lst>
> >> >> </requestHandler>
> >> >>
> >> >> schema.xml (partial snip)
> >> >>    <field name="id" type="string" indexed="true" stored="true"
> >> >> required="true" multiValued="false" />
> >> >>    <field name="documentText" type="text_general" indexed="true"
> >> >> multivalued="true" termVectors="true" termOffsets="true"
> >> >> termPositions="true" />
> >> >>
> >> >> <fieldType name="text_general" class="solr.TextField"
> >> >> positionIncrementGap="100">
> >> >>         <analyzer type="index">
> >> >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >> >>                 <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> >> >> words="stopwords.txt" />
> >> >>                 <filter class="solr.WordDelimiterFilterFactory"
> >> >> catenateAll="1" preserveOriginal="1" generateNumberParts="0"
> >> >> generateWordParts="0" />
> >> >>                 <filter class="solr.SynonymFilterFactory"
> >> >> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> >> >>                 <filter class="solr.LowerCaseFilterFactory"/>
> >> >>                 <filter class="solr.PorterStemFilterFactory"/>
> >> >>                 <filter class="solr.ApostropheFilterFactory"/>
> >> >>         </analyzer>
> >> >>         <analyzer type="query">
> >> >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >> >>                 <filter class="solr.WordDelimiterFilterFactory"
> >> >> catenateAll="1" preserveOriginal="1" generateWordParts="0" />
> >> >>                 <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> >> >> words="stopwords.txt" />
> >> >>                 <filter class="solr.LowerCaseFilterFactory"/>
> >> >>                 <filter class="solr.ApostropheFilterFactory"/>
> >> >>         </analyzer>
> >> >> </fieldType>
> >> >>
> >> >> -Teague
> >> >>
> >> >> From: Evert R. [mailto:evert.ra...@gmail.com]
> >> >> Sent: Tuesday, December 15, 2015 6:25 AM
> >> >> To: solr-user@lucene.apache.org
> >> >> Subject: Solr Basic Configuration - Highlight - Begginer
> >> >>
> >> >> Hi there!
> >> >>
> >> >> It´s my first installation, not sure if here is the right channel...
> >> >>
> >> >> Here is my steps:
> >> >>
> >> >> 1. Set up a basic install of solr 5.4.0
> >> >>
> >> >> 2. Create a new core through command line (bin/solr create -c test)
> >> >>
> >> >> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
> >> >>
> >> >> 4. Query over the browser and it brings the correct search, but it
> >> >> does not show the part of the text I am querying, the highlight.
> >> >>
> >> >>   I have already flagled the 'hl' option. But still it does not
> word...
> >> >>
> >> >> Exemple: I am looking for the word 'peace' in my pdf file (book) I
> >> >> have 4 matches for this word, it shows me the book name (pdf file)
> but
> >> >> does not bring which part of the text it has the word peace on it.
> >> >>
> >> >>
> >> >> I am problably missing some configuration in schema.xml, which is
> >> >> missing from my folder.... /solr/server/solr/test/conf/
> >> >>
> >> >> Or even the solrconfig.xml...
> >> >>
> >> >> I have read a bunch of things about highlight check these files,
> >> >> copied the standard schema.xml to my core/conf folder, but still it
> >> >> does not bring the highlight.
> >> >>
> >> >>
> >> >> Attached a copy of my solrconfig.xml file.
> >> >>
> >> >>
> >> >> I am very sorry for this, probably, dumb and too basic question...
> >> >> First time I see solr in live.
> >> >>
> >> >>
> >> >> Any help will be appreciated.
> >> >>
> >> >>
> >> >>
> >> >> Best regards,
> >> >>
> >> >>
> >> >> Evert Ramos
> >> >>
> >> >> mailto:evert.ra...@gmail.com
> >> >>
> >> >>
> >> >>
> >> >
> >>
>

Re: Solr Basic Configuration - Highlight - Begginer

Reply via email to