Hello Erick, Sorry for my mistakes. Here is everything I got so far:
1. It bring the result perfectly but the hightlight (empty) field as below: { "responseHeader":{ "status":0, "QTime":15, "params":{ "q":"text:nietava", "debug":"query", "hl":"true", "hl.simple.post":"</em>", "indent":"true", "fq":"id:pdf1", "hl.fl":"text", "wt":"json", "hl.simple.pre":"<em>"}}, "response":{"numFound":1,"start":0,"docs":[ { "id":"pdf1", "last_modified":"2011-07-28T20:39:26Z", "title":["Microsoft Word - André Luiz - Sexo e Destino _Chico e Waldo_.doc"], "content_type":["application/pdf"], "author":"Wander", "author_s":"Wander", "content":["André Luiz - Sexo e Destino _Chico e Waldo_.doc ***the whole content*** nietava"], "_version_":1520765393269948416}] }, *"highlighting":{ "pdf1":{***I THINK THE SNIPPETS OF TEXT SHOULD BE IN HERE, RIGHT?***}},* "debug":{ "rawquerystring":"text:nietava", "querystring":"text:nietava", "parsedquery":"text:nietava", "parsedquery_toString":"text:nietava", "QParser":"LuceneQParser", "filter_queries":["id:pdf1"], "parsed_filter_queries":["id:pdf1"]}} 2. Here is my settings: In schema.xml: <field name="text" type="text_general" indexed="true" stored="true" multiValued="true"/> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> In solrconfig.xml: <requestHandler name="/select" class="solr.SearchHandler"> <lst name="defaults"> <str name="echoParams">explicit</str> <int name="rows">10</int> <bool name="preferLocalShards">false</bool> </lst> I have tried: schema.xml: <field name="text" type="text_general" indexed="true" stored="true" multiValued="true"/> schema.xml: <field name="text" type="text_general" indexed="true" stored="true" multiValued="true" termVectors="true" termOffsets="true" termPositions="true"/> schema.xml: <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.WordDelimiterFilterFactory" catenateAll="1" preserveOriginal="1" generateNumberParts="0" generateWordParts="0" /> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.ApostropheFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" catenateAll="1" preserveOriginal="1" generateWordParts="0" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ApostropheFilterFactory"/> </analyzer> solrconfig.xml: <str name="df">text</str> <str name="hl">on</str> <str name="hl.fl">text</str> <str name="hl.useFastVectorHighlighter">true</str> <str name="hl.snippets">100</str> <str name="hl.tag.pre"><b></str> <str name="hl.tag.post"></b></str> The debug is in the reply I have received. I am still using the standard techproducts. I hope this is complete enough. Thanks again! *Evert* 2015-12-17 2:01 GMT-02:00 Erick Erickson <erickerick...@gmail.com>: > bq: but when highlight, using the text field...nothing comes up... > > > http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E > > It's unclear what this means. No results showed up (i.e. numFound==0) > or no highlighting showed up? Assuming that > 1> the "text" field has stored=true and > 2> you find documents when searching on the "text" field > the above should show something in the highlights section. > > Please take the time to provide complete details. Guessing what you're > doing is wasting time, mine and yours. Once more: > 1> what is the schema definition for the "text" field. Include the > fieldType definition > 2> What is the result of adding &debug=query to the field when you > don't get highlights > > You might review: http://wiki.apache.org/solr/UsingMailingLists > because it's becoming quite frustrating that you give us little bits > of information that leave us guessing what you're _really_ doing. > Highlighting is working for lots of people in lots of sites, it's not > likely that this functionality is completely broken so the answer will > be in the docs. > > Best, > ERick > > On Wed, Dec 16, 2015 at 5:54 PM, Evert R. <evert.ra...@gmail.com> wrote: > > Hi Erick and Teague, > > > > > > I found that when using the field 'text' it shows the pdf file result > > id:pdf1 in this case, like: > > > > http://localhost:8983/solr/techproducts/select?fq=id:pdf1&q=nietava > > > > but when highlight, using the text field...nothing comes up... > > > > > http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E > > > > of even with the option > > > > f.text.hl.snippets=2 under the hl.fl field. > > > > > > I tried as well with the standard configuration, did it all over, > reindexed > > a couple times... and still did not work. > > > > Also, > > > > Using the Analysis, it brings below information: > > > > ST > > textraw_bytesstartendpositionLengthtypeposition > > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1 > > SF > > textraw_bytesstartendpositionLengthtypeposition > > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1 > > LCF > > textraw_bytesstartendpositionLengthtypeposition > > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1 > > > > > > Alphanumeric I think... so, it´s 'string', right? would that be a > problem? > > Should be some other indication? > > > > > > Thanks again! > > > > > > *Evert* > > > > 2015-12-16 21:09 GMT-02:00 Erick Erickson <erickerick...@gmail.com>: > > > >> I think you're still missing the critical bit. Highlighting is > >> completely separate from searching. In other words, you can search on > >> one field and highlight another. What field is searched is governed by > >> the "qf" parameter when using edismax and by the the "df" parameter > >> configured in your request handler in solrconfig.xml. These defaults > >> are overridden when you do a "fielded search" like > >> > >> q=content:nietava > >> > >> So this: q=content:nietava&hl=true&hl.fl=content > >> is searching the "content" field. The word you're looking for isn't in > >> the content field so naturally no docs are returned. And no > >> highlighting either. > >> > >> This: q=nietava&hl=true&hl.fl=content > >> > >> is searching somewhere else, thus getting the hit. We already know > >> that "nietava" is not in the content field because the first search > >> failed. You need to find out what field is being matched (probably > >> something like "text") and then try highlighting on _that_ field. Try > >> adding "debug=query" to the URL and look at the "parsed_query" section > >> of the return and you'll see what field(s) is/are actually being > >> searched against. > >> > >> NOTE: The field you highlight on _must_ have stored="true" in > schema.xml. > >> > >> As to why "nietava" isn't being found in the content field, probably > >> you have some kind of analysis chain configured for that field that > >> isn't searching as you expect. See the admin/analysis page for some > >> insight into why that would be. The most frequent reason is that the > >> field is a "string" type which is not broken up into words. Another > >> possibility is that your analysis chain is leaving in the quotes or > >> something similar. As James says, looking at admin/analysis is a good > >> way to figure this out. > >> > >> I still strongly recommend you go from the stock techproducts example > >> and get familiar with how Solr (and highlighting) work before jumping > >> in and changing things. There are a number of ways things can be > >> mis-configured and trying to change several things at once is a fine > >> way to go mad. The admin UI>>schema browser is another way you can see > >> what kind of terms are _actually_ in your index in a particular field. > >> > >> Best, > >> Erick > >> > >> > >> > >> > >> On Wed, Dec 16, 2015 at 12:26 PM, Teague James < > teag...@insystechinc.com> > >> wrote: > >> > Sorry to hear that didn't work! Let me ask a couple of questions... > >> > > >> > Have you tried the analyzer inside of the Admin Interface? It has > helped > >> me sort out a number of highlighting issues in the past. To access it, > go > >> to your Admin interface, select your core, then select Analysis from the > >> list of options on the left. In the analyzer, enter the term you are > >> indexing in the top left (in other words the term in the document you > are > >> indexing that you expect to get a hit on) and right input fields. Select > >> the field that it is destined for (in your case that would be > 'content'), > >> then hit analyze. Helps if you have a big screen! > >> > > >> > This will show you the impact of the various filter factories that you > >> have engaged and their effect on whether or not a 'hit' is being > generated. > >> Hits are idietified by a very feint highlight. (PSST... Developers... It > >> would be really cool if the highlight color were more visible or > >> customizable... Thanks y'all) If it looks like you're getting hits, but > not > >> getting highlighting, then open up a new tab with the Admin's query > >> interface. Same place on the left as the analyzer. Replace the "*:*" > with > >> your search term (assuming you already indexed your document) and if > >> necessary you can put something in the FQ like "id:123456" to target a > >> specific record. > >> > > >> > Did you get a hit? If no, then it's not highlighting that's the issue. > >> If yes, then try dumping this in your address bar (using your URL/IP, > >> search term, and core name of course. The fq= is an example) : > >> > http:// > [URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]" > >> > > >> > That will dump Solr's output to your browser where you can see exactly > >> what is getting hit. > >> > > >> > Hope that helps! Let me know how it goes. Good luck. > >> > > >> > -Teague > >> > > >> > -----Original Message----- > >> > From: Evert R. [mailto:evert.ra...@gmail.com] > >> > Sent: Wednesday, December 16, 2015 1:46 PM > >> > To: solr-user <solr-user@lucene.apache.org> > >> > Subject: Re: Solr Basic Configuration - Highlight - Begginer > >> > > >> > Hi Teague! > >> > > >> > I configured the solrconf.xml and schema.xml exactly the way you did, > >> only substituting the word 'documentText' per 'content' used by the > >> techproducts sample, I reindex through : > >> > > >> > curl ' > >> > > >> > http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true > >> ' > >> > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf" > >> > > >> > with the same result.... no highlight in the respond as below: > >> > > >> > "highlighting": { "pdf1": {} } > >> > > >> > =( > >> > > >> > Really... do not know what to do... > >> > > >> > Thanks for your time, if you have any more suggestion where I could be > >> missing something... please let me know. > >> > > >> > > >> > Best regards, > >> > > >> > *Evert* > >> > > >> > 2015-12-16 15:30 GMT-02:00 Teague James <teag...@insystechinc.com>: > >> > > >> >> Hi Evert, > >> >> > >> >> I recently needed help with phrase highlighting and was pointed to > the > >> >> FastVectorHighlighter which worked out great. I just made a change to > >> >> the configuration to add generateWordParts="0" and > >> >> generateNumberParts="0" so that searches for things like "1a" would > >> >> get highlighted correctly. You may or may not need that feature. You > >> >> can always remove them or change the value to "1" to switch them on > >> explicitly. Anyway, hope this helps! > >> >> > >> >> solrconfig.xml (partial snip) > >> >> <requestHandler name="/select" class="solr.SearchHandler"> > >> >> <lst name="defaults"> > >> >> <str name="wt">xml</str> > >> >> <str name="echoParams">explicit</str> > >> >> <int name="rows">10</int> > >> >> <str name="df">documentText</str> > >> >> <str name="hl">on</str> > >> >> <str name="hl.fl">text</str> > >> >> <str > >> name="hl.useFastVectorHighlighter">true</str> > >> >> <str name="hl.snippets">100</str> > >> >> <str name="hl.tag.pre"><b></str> > >> >> <str name="hl.tag.post"></b></str> > >> >> </lst> > >> >> </requestHandler> > >> >> > >> >> schema.xml (partial snip) > >> >> <field name="id" type="string" indexed="true" stored="true" > >> >> required="true" multiValued="false" /> > >> >> <field name="documentText" type="text_general" indexed="true" > >> >> multivalued="true" termVectors="true" termOffsets="true" > >> >> termPositions="true" /> > >> >> > >> >> <fieldType name="text_general" class="solr.TextField" > >> >> positionIncrementGap="100"> > >> >> <analyzer type="index"> > >> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> >> <filter class="solr.StopFilterFactory" > ignoreCase="true" > >> >> words="stopwords.txt" /> > >> >> <filter class="solr.WordDelimiterFilterFactory" > >> >> catenateAll="1" preserveOriginal="1" generateNumberParts="0" > >> >> generateWordParts="0" /> > >> >> <filter class="solr.SynonymFilterFactory" > >> >> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/> > >> >> <filter class="solr.LowerCaseFilterFactory"/> > >> >> <filter class="solr.PorterStemFilterFactory"/> > >> >> <filter class="solr.ApostropheFilterFactory"/> > >> >> </analyzer> > >> >> <analyzer type="query"> > >> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> >> <filter class="solr.WordDelimiterFilterFactory" > >> >> catenateAll="1" preserveOriginal="1" generateWordParts="0" /> > >> >> <filter class="solr.StopFilterFactory" > ignoreCase="true" > >> >> words="stopwords.txt" /> > >> >> <filter class="solr.LowerCaseFilterFactory"/> > >> >> <filter class="solr.ApostropheFilterFactory"/> > >> >> </analyzer> > >> >> </fieldType> > >> >> > >> >> -Teague > >> >> > >> >> From: Evert R. [mailto:evert.ra...@gmail.com] > >> >> Sent: Tuesday, December 15, 2015 6:25 AM > >> >> To: solr-user@lucene.apache.org > >> >> Subject: Solr Basic Configuration - Highlight - Begginer > >> >> > >> >> Hi there! > >> >> > >> >> It´s my first installation, not sure if here is the right channel... > >> >> > >> >> Here is my steps: > >> >> > >> >> 1. Set up a basic install of solr 5.4.0 > >> >> > >> >> 2. Create a new core through command line (bin/solr create -c test) > >> >> > >> >> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/) > >> >> > >> >> 4. Query over the browser and it brings the correct search, but it > >> >> does not show the part of the text I am querying, the highlight. > >> >> > >> >> I have already flagled the 'hl' option. But still it does not > word... > >> >> > >> >> Exemple: I am looking for the word 'peace' in my pdf file (book) I > >> >> have 4 matches for this word, it shows me the book name (pdf file) > but > >> >> does not bring which part of the text it has the word peace on it. > >> >> > >> >> > >> >> I am problably missing some configuration in schema.xml, which is > >> >> missing from my folder.... /solr/server/solr/test/conf/ > >> >> > >> >> Or even the solrconfig.xml... > >> >> > >> >> I have read a bunch of things about highlight check these files, > >> >> copied the standard schema.xml to my core/conf folder, but still it > >> >> does not bring the highlight. > >> >> > >> >> > >> >> Attached a copy of my solrconfig.xml file. > >> >> > >> >> > >> >> I am very sorry for this, probably, dumb and too basic question... > >> >> First time I see solr in live. > >> >> > >> >> > >> >> Any help will be appreciated. > >> >> > >> >> > >> >> > >> >> Best regards, > >> >> > >> >> > >> >> Evert Ramos > >> >> > >> >> mailto:evert.ra...@gmail.com > >> >> > >> >> > >> >> > >> > > >> >