Hi Erick and Teague,

I found that when using the field 'text' it shows the pdf file result
id:pdf1 in this case, like:

http://localhost:8983/solr/techproducts/select?fq=id:pdf1&q=nietava

but when highlight, using the text field...nothing comes up...

http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

​of even with the option

f.text.hl.snippets=2 under the hl.fl field.


I tried as well with the standard configuration, did it all over, reindexed
a couple times... and still did not work.

Also,

Using the Analysis, it brings below information:

ST
textraw_bytesstartendpositionLengthtypeposition
nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
SF
textraw_bytesstartendpositionLengthtypeposition
nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
LCF
textraw_bytesstartendpositionLengthtypeposition
nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
​

Alphanumeric I think... so, it´s 'string', right? would that be a problem?
Should be some other indication?


Thanks again!


*Evert*

2015-12-16 21:09 GMT-02:00 Erick Erickson <erickerick...@gmail.com>:

> I think you're still missing the critical bit. Highlighting is
> completely separate from searching. In other words, you can search on
> one field and highlight another. What field is searched is governed by
> the "qf" parameter when using edismax and by the the "df" parameter
> configured in your request handler in solrconfig.xml. These defaults
> are overridden when you do a "fielded search" like
>
> q=content:nietava
>
> So this: q=content:nietava&hl=true&hl.fl=content
> is searching the "content" field. The word you're looking for isn't in
> the content field so naturally no docs are returned. And no
> highlighting either.
>
> This: q=nietava&hl=true&hl.fl=content
>
> is searching somewhere else, thus getting the hit. We already know
> that "nietava" is not in the content field because the first search
> failed. You need to find out what field is being matched (probably
> something like "text") and then try highlighting on _that_ field. Try
> adding "debug=query" to the URL and look at the "parsed_query" section
> of the return and you'll see what field(s) is/are actually being
> searched against.
>
> NOTE: The field you highlight on _must_ have stored="true" in schema.xml.
>
> As to why "nietava" isn't being found in the content field, probably
> you have some kind of analysis chain configured for that field that
> isn't searching as you expect. See the admin/analysis page for some
> insight into why that would be. The most frequent reason is that the
> field is a "string" type which is not broken up into words. Another
> possibility is that your analysis chain is leaving in the quotes or
> something similar. As James says, looking at admin/analysis is a good
> way to figure this out.
>
> I still strongly recommend you go from the stock techproducts example
> and get familiar with how Solr (and highlighting) work before jumping
> in and changing things. There are a number of ways things can be
> mis-configured and trying to change several things at once is a fine
> way to go mad. The admin UI>>schema browser is another way you can see
> what kind of terms are _actually_ in your index in a particular field.
>
> Best,
> Erick
>
>
>
>
> On Wed, Dec 16, 2015 at 12:26 PM, Teague James <teag...@insystechinc.com>
> wrote:
> > Sorry to hear that didn't work! Let me ask a couple of questions...
> >
> > Have you tried the analyzer inside of the Admin Interface? It has helped
> me sort out a number of highlighting issues in the past. To access it, go
> to your Admin interface, select your core, then select Analysis from the
> list of options on the left. In the analyzer, enter the term you are
> indexing in the top left (in other words the term in the document you are
> indexing that you expect to get a hit on) and right input fields. Select
> the field that it is destined for (in your case that would be 'content'),
> then hit analyze. Helps if you have a big screen!
> >
> > This will show you the impact of the various filter factories that you
> have engaged and their effect on whether or not a 'hit' is being generated.
> Hits are idietified by a very feint highlight. (PSST... Developers... It
> would be really cool if the highlight color were more visible or
> customizable... Thanks y'all) If it looks like you're getting hits, but not
> getting highlighting, then open up a new tab with the Admin's query
> interface. Same place on the left as the analyzer. Replace the "*:*" with
> your search term (assuming you already indexed your document) and if
> necessary you can put something in the FQ like "id:123456" to target a
> specific record.
> >
> > Did you get a hit? If no, then it's not highlighting that's the issue.
> If yes, then try dumping this in your address bar (using your URL/IP,
> search term, and core name of course. The fq= is an example) :
> > http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]";
> >
> > That will dump Solr's output to your browser where you can see exactly
> what is getting hit.
> >
> > Hope that helps! Let me know how it goes. Good luck.
> >
> > -Teague
> >
> > -----Original Message-----
> > From: Evert R. [mailto:evert.ra...@gmail.com]
> > Sent: Wednesday, December 16, 2015 1:46 PM
> > To: solr-user <solr-user@lucene.apache.org>
> > Subject: Re: Solr Basic Configuration - Highlight - Begginer
> >
> > Hi Teague!
> >
> > I configured the solrconf.xml and schema.xml exactly the way you did,
> only substituting the word 'documentText' per 'content' used by the
> techproducts sample, I reindex through :
> >
> >  curl '
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> '
> > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> >
> > with the same result.... no highlight in the respond as below:
> >
> > "highlighting": { "pdf1": {} }
> >
> > =(
> >
> > Really... do not know what to do...
> >
> > Thanks for your time, if you have any more suggestion where I could be
> missing something... please let me know.
> >
> >
> > Best regards,
> >
> > *Evert*
> >
> > 2015-12-16 15:30 GMT-02:00 Teague James <teag...@insystechinc.com>:
> >
> >> Hi Evert,
> >>
> >> I recently needed help with phrase highlighting and was pointed to the
> >> FastVectorHighlighter which worked out great. I just made a change to
> >> the configuration to add generateWordParts="0" and
> >> generateNumberParts="0" so that searches for things like "1a" would
> >> get highlighted correctly. You may or may not need that feature. You
> >> can always remove them or change the value to "1" to switch them on
> explicitly. Anyway, hope this helps!
> >>
> >> solrconfig.xml (partial snip)
> >> <requestHandler name="/select" class="solr.SearchHandler">
> >>                 <lst name="defaults">
> >>                         <str name="wt">xml</str>
> >>                         <str name="echoParams">explicit</str>
> >>                         <int name="rows">10</int>
> >>                         <str name="df">documentText</str>
> >>                         <str name="hl">on</str>
> >>                         <str name="hl.fl">text</str>
> >>                         <str
> name="hl.useFastVectorHighlighter">true</str>
> >>                         <str name="hl.snippets">100</str>
> >>                         <str name="hl.tag.pre"><b></str>
> >>                         <str name="hl.tag.post"></b></str>
> >>                 </lst>
> >> </requestHandler>
> >>
> >> schema.xml (partial snip)
> >>    <field name="id" type="string" indexed="true" stored="true"
> >> required="true" multiValued="false" />
> >>    <field name="documentText" type="text_general" indexed="true"
> >> multivalued="true" termVectors="true" termOffsets="true"
> >> termPositions="true" />
> >>
> >> <fieldType name="text_general" class="solr.TextField"
> >> positionIncrementGap="100">
> >>         <analyzer type="index">
> >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> words="stopwords.txt" />
> >>                 <filter class="solr.WordDelimiterFilterFactory"
> >> catenateAll="1" preserveOriginal="1" generateNumberParts="0"
> >> generateWordParts="0" />
> >>                 <filter class="solr.SynonymFilterFactory"
> >> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> >>                 <filter class="solr.LowerCaseFilterFactory"/>
> >>                 <filter class="solr.PorterStemFilterFactory"/>
> >>                 <filter class="solr.ApostropheFilterFactory"/>
> >>         </analyzer>
> >>         <analyzer type="query">
> >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>                 <filter class="solr.WordDelimiterFilterFactory"
> >> catenateAll="1" preserveOriginal="1" generateWordParts="0" />
> >>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> words="stopwords.txt" />
> >>                 <filter class="solr.LowerCaseFilterFactory"/>
> >>                 <filter class="solr.ApostropheFilterFactory"/>
> >>         </analyzer>
> >> </fieldType>
> >>
> >> -Teague
> >>
> >> From: Evert R. [mailto:evert.ra...@gmail.com]
> >> Sent: Tuesday, December 15, 2015 6:25 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Solr Basic Configuration - Highlight - Begginer
> >>
> >> Hi there!
> >>
> >> It´s my first installation, not sure if here is the right channel...
> >>
> >> Here is my steps:
> >>
> >> 1. Set up a basic install of solr 5.4.0
> >>
> >> 2. Create a new core through command line (bin/solr create -c test)
> >>
> >> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
> >>
> >> 4. Query over the browser and it brings the correct search, but it
> >> does not show the part of the text I am querying, the highlight.
> >>
> >>   I have already flagled the 'hl' option. But still it does not word...
> >>
> >> Exemple: I am looking for the word 'peace' in my pdf file (book) I
> >> have 4 matches for this word, it shows me the book name (pdf file) but
> >> does not bring which part of the text it has the word peace on it.
> >>
> >>
> >> I am problably missing some configuration in schema.xml, which is
> >> missing from my folder.... /solr/server/solr/test/conf/
> >>
> >> Or even the solrconfig.xml...
> >>
> >> I have read a bunch of things about highlight check these files,
> >> copied the standard schema.xml to my core/conf folder, but still it
> >> does not bring the highlight.
> >>
> >>
> >> Attached a copy of my solrconfig.xml file.
> >>
> >>
> >> I am very sorry for this, probably, dumb and too basic question...
> >> First time I see solr in live.
> >>
> >>
> >> Any help will be appreciated.
> >>
> >>
> >>
> >> Best regards,
> >>
> >>
> >> Evert Ramos
> >>
> >> mailto:evert.ra...@gmail.com
> >>
> >>
> >>
> >
>

Reply via email to