Hi Erick and Teague,
I found that when using the field 'text' it shows the pdf file result id:pdf1 in this case, like: http://localhost:8983/solr/techproducts/select?fq=id:pdf1&q=nietava but when highlight, using the text field...nothing comes up... http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E of even with the option f.text.hl.snippets=2 under the hl.fl field. I tried as well with the standard configuration, did it all over, reindexed a couple times... and still did not work. Also, Using the Analysis, it brings below information: ST textraw_bytesstartendpositionLengthtypeposition nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1 SF textraw_bytesstartendpositionLengthtypeposition nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1 LCF textraw_bytesstartendpositionLengthtypeposition nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1 Alphanumeric I think... so, it´s 'string', right? would that be a problem? Should be some other indication? Thanks again! *Evert* 2015-12-16 21:09 GMT-02:00 Erick Erickson <erickerick...@gmail.com>: > I think you're still missing the critical bit. Highlighting is > completely separate from searching. In other words, you can search on > one field and highlight another. What field is searched is governed by > the "qf" parameter when using edismax and by the the "df" parameter > configured in your request handler in solrconfig.xml. These defaults > are overridden when you do a "fielded search" like > > q=content:nietava > > So this: q=content:nietava&hl=true&hl.fl=content > is searching the "content" field. The word you're looking for isn't in > the content field so naturally no docs are returned. And no > highlighting either. > > This: q=nietava&hl=true&hl.fl=content > > is searching somewhere else, thus getting the hit. We already know > that "nietava" is not in the content field because the first search > failed. You need to find out what field is being matched (probably > something like "text") and then try highlighting on _that_ field. Try > adding "debug=query" to the URL and look at the "parsed_query" section > of the return and you'll see what field(s) is/are actually being > searched against. > > NOTE: The field you highlight on _must_ have stored="true" in schema.xml. > > As to why "nietava" isn't being found in the content field, probably > you have some kind of analysis chain configured for that field that > isn't searching as you expect. See the admin/analysis page for some > insight into why that would be. The most frequent reason is that the > field is a "string" type which is not broken up into words. Another > possibility is that your analysis chain is leaving in the quotes or > something similar. As James says, looking at admin/analysis is a good > way to figure this out. > > I still strongly recommend you go from the stock techproducts example > and get familiar with how Solr (and highlighting) work before jumping > in and changing things. There are a number of ways things can be > mis-configured and trying to change several things at once is a fine > way to go mad. The admin UI>>schema browser is another way you can see > what kind of terms are _actually_ in your index in a particular field. > > Best, > Erick > > > > > On Wed, Dec 16, 2015 at 12:26 PM, Teague James <teag...@insystechinc.com> > wrote: > > Sorry to hear that didn't work! Let me ask a couple of questions... > > > > Have you tried the analyzer inside of the Admin Interface? It has helped > me sort out a number of highlighting issues in the past. To access it, go > to your Admin interface, select your core, then select Analysis from the > list of options on the left. In the analyzer, enter the term you are > indexing in the top left (in other words the term in the document you are > indexing that you expect to get a hit on) and right input fields. Select > the field that it is destined for (in your case that would be 'content'), > then hit analyze. Helps if you have a big screen! > > > > This will show you the impact of the various filter factories that you > have engaged and their effect on whether or not a 'hit' is being generated. > Hits are idietified by a very feint highlight. (PSST... Developers... It > would be really cool if the highlight color were more visible or > customizable... Thanks y'all) If it looks like you're getting hits, but not > getting highlighting, then open up a new tab with the Admin's query > interface. Same place on the left as the analyzer. Replace the "*:*" with > your search term (assuming you already indexed your document) and if > necessary you can put something in the FQ like "id:123456" to target a > specific record. > > > > Did you get a hit? If no, then it's not highlighting that's the issue. > If yes, then try dumping this in your address bar (using your URL/IP, > search term, and core name of course. The fq= is an example) : > > http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]" > > > > That will dump Solr's output to your browser where you can see exactly > what is getting hit. > > > > Hope that helps! Let me know how it goes. Good luck. > > > > -Teague > > > > -----Original Message----- > > From: Evert R. [mailto:evert.ra...@gmail.com] > > Sent: Wednesday, December 16, 2015 1:46 PM > > To: solr-user <solr-user@lucene.apache.org> > > Subject: Re: Solr Basic Configuration - Highlight - Begginer > > > > Hi Teague! > > > > I configured the solrconf.xml and schema.xml exactly the way you did, > only substituting the word 'documentText' per 'content' used by the > techproducts sample, I reindex through : > > > > curl ' > > > http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true > ' > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf" > > > > with the same result.... no highlight in the respond as below: > > > > "highlighting": { "pdf1": {} } > > > > =( > > > > Really... do not know what to do... > > > > Thanks for your time, if you have any more suggestion where I could be > missing something... please let me know. > > > > > > Best regards, > > > > *Evert* > > > > 2015-12-16 15:30 GMT-02:00 Teague James <teag...@insystechinc.com>: > > > >> Hi Evert, > >> > >> I recently needed help with phrase highlighting and was pointed to the > >> FastVectorHighlighter which worked out great. I just made a change to > >> the configuration to add generateWordParts="0" and > >> generateNumberParts="0" so that searches for things like "1a" would > >> get highlighted correctly. You may or may not need that feature. You > >> can always remove them or change the value to "1" to switch them on > explicitly. Anyway, hope this helps! > >> > >> solrconfig.xml (partial snip) > >> <requestHandler name="/select" class="solr.SearchHandler"> > >> <lst name="defaults"> > >> <str name="wt">xml</str> > >> <str name="echoParams">explicit</str> > >> <int name="rows">10</int> > >> <str name="df">documentText</str> > >> <str name="hl">on</str> > >> <str name="hl.fl">text</str> > >> <str > name="hl.useFastVectorHighlighter">true</str> > >> <str name="hl.snippets">100</str> > >> <str name="hl.tag.pre"><b></str> > >> <str name="hl.tag.post"></b></str> > >> </lst> > >> </requestHandler> > >> > >> schema.xml (partial snip) > >> <field name="id" type="string" indexed="true" stored="true" > >> required="true" multiValued="false" /> > >> <field name="documentText" type="text_general" indexed="true" > >> multivalued="true" termVectors="true" termOffsets="true" > >> termPositions="true" /> > >> > >> <fieldType name="text_general" class="solr.TextField" > >> positionIncrementGap="100"> > >> <analyzer type="index"> > >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> <filter class="solr.StopFilterFactory" ignoreCase="true" > >> words="stopwords.txt" /> > >> <filter class="solr.WordDelimiterFilterFactory" > >> catenateAll="1" preserveOriginal="1" generateNumberParts="0" > >> generateWordParts="0" /> > >> <filter class="solr.SynonymFilterFactory" > >> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.PorterStemFilterFactory"/> > >> <filter class="solr.ApostropheFilterFactory"/> > >> </analyzer> > >> <analyzer type="query"> > >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> <filter class="solr.WordDelimiterFilterFactory" > >> catenateAll="1" preserveOriginal="1" generateWordParts="0" /> > >> <filter class="solr.StopFilterFactory" ignoreCase="true" > >> words="stopwords.txt" /> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.ApostropheFilterFactory"/> > >> </analyzer> > >> </fieldType> > >> > >> -Teague > >> > >> From: Evert R. [mailto:evert.ra...@gmail.com] > >> Sent: Tuesday, December 15, 2015 6:25 AM > >> To: solr-user@lucene.apache.org > >> Subject: Solr Basic Configuration - Highlight - Begginer > >> > >> Hi there! > >> > >> It´s my first installation, not sure if here is the right channel... > >> > >> Here is my steps: > >> > >> 1. Set up a basic install of solr 5.4.0 > >> > >> 2. Create a new core through command line (bin/solr create -c test) > >> > >> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/) > >> > >> 4. Query over the browser and it brings the correct search, but it > >> does not show the part of the text I am querying, the highlight. > >> > >> I have already flagled the 'hl' option. But still it does not word... > >> > >> Exemple: I am looking for the word 'peace' in my pdf file (book) I > >> have 4 matches for this word, it shows me the book name (pdf file) but > >> does not bring which part of the text it has the word peace on it. > >> > >> > >> I am problably missing some configuration in schema.xml, which is > >> missing from my folder.... /solr/server/solr/test/conf/ > >> > >> Or even the solrconfig.xml... > >> > >> I have read a bunch of things about highlight check these files, > >> copied the standard schema.xml to my core/conf folder, but still it > >> does not bring the highlight. > >> > >> > >> Attached a copy of my solrconfig.xml file. > >> > >> > >> I am very sorry for this, probably, dumb and too basic question... > >> First time I see solr in live. > >> > >> > >> Any help will be appreciated. > >> > >> > >> > >> Best regards, > >> > >> > >> Evert Ramos > >> > >> mailto:evert.ra...@gmail.com > >> > >> > >> > > >