Re: Solr Basic Configuration - Highlight - Begginer

Evert R. Wed, 16 Dec 2015 10:14:53 -0800

Hi Erick,

I think you are right!


When I use the form 'features:accents' in my case 'content:nietava', it
show as if there was not matching words... but if I take the field off
having only the 'q=searchword' (q=nietava) it brings the pdf content file,
as below (in XML out type):

#partial snip:
<arr name="content">
<str>
Microsoft Word - André Luiz - Sexo e Destino _Chico e Waldo_.doc Francisco
Cândido Xavier e Waldo Vieira Sexo e Destino 12o livro da Coleção “A Vida
no Mundo Espiritual” Ditado pelo Espírito André Luiz FEDERAÇÃO ESPÍRITA
BRASILEIRA DEPARTAMENTO EDITORIAL Rua Souza Valente, 17 20941-040 - Rio -
RJ - Brasil http://www.febnet.org.br/ Francisco Cândido Xavier - Sexo e
Destino - pelo Espírito André Luiz 2 Coleção “A Vida no Mundo Espiritual”
01 - Nosso Lar 02 - Os Mensageiros 03 - Missionários da Luz 04 - Obreiros
da Vida Eterna 05 - No Mundo Maior 06 - Libertação 07 - Entre a Terra e o
Céu 08 - Nos Domínios da Mediunidade 09 - Ação e Reação 10 - Evolução em
Dois Mundos 11 - Mecanismos da Mediunidade 12 - Sexo e Destino 13 - E a
Vida Continua... Francisco Cândid

So, using:

1. q=content:nietava&hl=true&hl.fl=content  -> results:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">3</int>
<lst name="params">
<str name="q">content:nietava</str>
<str name="hl">true</str>
<str name="hl.fl">content</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
<lst name="highlighting"/>
</response>

2.q=nietava&hl=true&hl.fl=content  -> results:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">93</int>
<lst name="params">
<str name="q">nietava</str>
<str name="hl">true</str>
<str name="hl.fl">content</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="id">pdf1</str>
<date name="last_modified">2011-07-28T20:39:26Z</date>
<arr name="title">
<str>
Microsoft Word - André Luiz - Sexo e Destino _Chico e Waldo_.doc
</str>
</arr>
<arr name="content_type">
<str>application/pdf</str>
</arr>
<str name="author">Wander</str>
<str name="author_s">Wander</str>
<arr name="content">
<str>
Microsoft Word - André Luiz - Sexo e Destino _Chico e Waldo_.doc Francisco
Cândido Xavier e Waldo Vieira Sexo e Destino 12o livro da Coleção “A Vida
no Mundo Espiritual” Ditado pelo Espírito André Luiz FEDERAÇÃO ESPÍRITA
BRASILEIRA DEPARTAMENTO EDITORIAL Rua Souza Valente, 17 20941-040 - Rio -
RJ - Brasil http://www.febnet.org.br/ Francisco Cândido Xavier - Sexo e
Destino - pelo Espírito André Luiz 2 Coleção “A Vida no Mundo Espiritual”
01 - Nosso Lar 02 - Os Mensageiros 03 - Missionários da Luz 04 - Obreiros
da Vida Eterna 05 - No Mundo Maior 06 - Libertação 07 - Entre a Terra e o
Céu 08 - Nos Domínios da Mediunidade 09 - Ação e Reação 10 - Evolução em
Dois Mundos 11 - Mecanismos da Mediunidade 12 - Sexo e Destino 13 - E a
Vida Continua... Francisco Cândido Xavier - ...........(long text...
including the word 'nietava'
                  </str>
</arr>
<long name="_version_">1520731379641352192</long>
</doc>
</result>
<lst name="highlighting">
<lst name="pdf1"/>
</lst>
</response>

.... =(

Thanks!


*Evert*

2015-12-16 15:17 GMT-02:00 Erick Erickson <erickerick...@gmail.com>:

> Ok, you're getting confused by all the options, an easy thing to do.
> You're trying to do too many things at once without making sure
> the basics work....
>
> 1> Forget all about the f.content.hl.... stuff. That's there in case
> you want to specify different parameters for different fields in the same
> highlight request. That's an advanced option for later....
>
> 2> start with the basic techproducts example. Then this should show
> you hightlights:
> q=features:accents&hl=true&hl.fl=features
>
> That's about as basic as you get. It's searching for "accents" in the
> features field and returning highlights on the features field.
>
> Once that's working, _then_ refine.
>
> Best,
> Erick
>
> On Wed, Dec 16, 2015 at 8:21 AM, Evert R. <evert.ra...@gmail.com> wrote:
> > Hi Andrea,
> >
> > ok, let´s do it:
> >
> > 1. it does has the 'nietava' term, so it brings the only book (pdf file)
> > has this word, and all its content as my previous message to Erick, so
> the
> > content field is there.
> >
> > 2. using content:nietava it does not show any result.... as below:
> >
> > { "responseHeader": { "status": 400, "QTime": 12, "params": { "q":
> > "contents:nietava", "indent": "true", "fl": "id", "wt": "json", "_":
> > "1450282631352" } }, "error": { "msg": "undefined field contents",
> "code":
> > 400 } }
> >
> > 3. Here is what I found when grepping 'content' from the techproducts
> conf
> > folder:
> >
> > schema.xml: <field name="content_type" type="string" indexed="true"
> > stored="true" multiValued="true"/> schema.xml: <field name="content"
> > type="text_general" indexed="false" stored="true" multiValued="true"/>
> > schema.xml: <copyField source="content" dest="text"/> schema.xml:
> > <copyField source="content_type" dest="text"/> solrconfig.xml: <str
> > name="facet.field">content_type</str> solrconfig.xml: <str
> > name="hl.fl">content features title name</str> solrconfig.xml: <str
> > name="f.content.hl.snippets">3</str> solrconfig.xml: <str
> > name="f.content.hl.fragsize">200</str> solrconfig.xml: <str
> > name="f.content.hl.alternateField">content</str> solrconfig.xml: <str
> > name="f.content.hl.maxAlternateFieldLength">750</str> solrconfig.xml:
> <str
> > name="stream.contentType">application/json</str> solrconfig.xml: <str
> > name="stream.contentType">application/csv</str> solrconfig.xml: <str
> > name="content-type">text/plain; charset=UTF-8</str>
> >
> > and the grep on 'content_type':
> >
> > schema.xml:   <field name="content_type" type="string" indexed="true"
> > stored="true" multiValued="true"/>
> > schema.xml:   <copyField source="content_type" dest="text"/>
> > solrconfig.xml:       <str name="facet.field">content_type</str>
> >
> > =)
> >
> > Thanks for checking out.
> >
> >
> >
> > *Evert *
> >
> > 2015-12-16 12:59 GMT-02:00 Andrea Gazzarini <a.gazzar...@gmail.com>:
> >
> >> hl=f.content.hl.content (I guess) is definitely wrong. Some questions:
> >>
> >>    - First, sorry, the obvious question: are you sure the documents
> contain
> >>    the "nietava" term?
> >>    - Could you try to use q=content:nietaval?
> >>    - Could you paste the definition (field & fieldtype) of the content
> >>    field?
> >>
> >> > Should I have this configuration in the XML file?
> >>
> >> You could, but it's up to you and it strongly depends on your context.
> The
> >> simple thing is that if you have those parameters within the
> configuration
> >> you can avoid to pass them (as part of the requests), but probably in
> this
> >> phase, where you are testing, it's better to have them there (in the
> >> request).
> >>
> >> Andrea
> >>
> >> 2015-12-16 15:28 GMT+01:00 Evert R. <evert.ra...@gmail.com>:
> >>
> >> > Hi Andrea,
> >> >
> >> > Thanks for the reply!
> >> >
> >> > I tried with the hl.fl parameter as well, using as below:
> >> >
> >> >
> >> >
> >>
> http://localhost:8983/solr/techproducts/select?q=nietava&fl=id%2C+content&wt=json&indent=true&hl=true&;
> >> >
> >> >
> >>
> hl.fl=f.content.hl.content%3D4&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> >> >
> >> > with the parameter under the hl field in the solr ui:
> >> >
> >> > 1. f.content.hl.snnipets=2
> >> > 2. f.content.hl.content=4
> >> > 3. content
> >> >
> >> > with no success...
> >> >
> >> > Should I have this configuration in the XML file?
> >> >
> >> > Regards,
> >> >
> >> > *Evert *
> >> >
> >> > 2015-12-16 11:23 GMT-02:00 Andrea Gazzarini <a.gazzar...@gmail.com>:
> >> >
> >> > > Hi Evert,
> >> > > what is the configuration of the default request handler? Did you
> set
> >> the
> >> > > hl.fl parameter?
> >> > >
> >> > > Please check here [1] the parameters that the highlighting component
> >> > > expects. Required parameters should be in the query string or
> declared
> >> > > within the request handler which answers to your query.
> >> > >
> >> > > Andrea
> >> > >
> >> > > [1] https://wiki.apache.org/solr/HighlightingParameters
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > 2015-12-16 12:51 GMT+01:00 Evert R. <evert.ra...@gmail.com>:
> >> > >
> >> > > > Hi everyone!
> >> > > >
> >> > > > I think I should not have posted my server name... never had that
> >> many
> >> > > > access attempts...
> >> > > >
> >> > > >
> >> > > >
> >> > > > 2015-12-16 9:03 GMT-02:00 Evert R. <evert.ra...@gmail.com>:
> >> > > >
> >> > > > > Hello Erick,
> >> > > > >
> >> > > > > Thanks again for your time.
> >> > > > >
> >> > > > > Here is as far as I have gone:
> >> > > > >
> >> > > > > 1. I started a fresh install and did the following:
> >> > > > >
> >> > > > > [evert@nix]$ bin/solr start -e techproducts
> >> > > > > [evert@nix]$ curl '
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> >> > > > '
> >> > > > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> >> > > > >
> >> > > > > 2. I am using only the Solr Admin UI to check the query respond,
> >> here
> >> > > is
> >> > > > > an example:
> >> > > > >
> >> > > > > Query: http://
> >> > > > > localhost
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> :8983/solr/techproducts/select?q=nietava&fl=id%2C+author%2C+content&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> >> > > > >
> >> > > > > Result: {
> >> > > > >   "responseHeader": {
> >> > > > >     "status": 0,
> >> > > > >     "QTime": 14,
> >> > > > >     "params": {
> >> > > > >       "q": "nietava",
> >> > > > >       "hl": "true",
> >> > > > >       "hl.simple.post": "</em>",
> >> > > > >       "indent": "true",
> >> > > > >       "fl": "id, author, content",
> >> > > > >       "wt": "json",
> >> > > > >       "hl.simple.pre": "<em>",
> >> > > > >       "_": "1450262674102"
> >> > > > >     }
> >> > > > >   },
> >> > > > >   "response": {
> >> > > > >     "numFound": 1,
> >> > > > >     "start": 0,
> >> > > > >     "docs": [
> >> > > > >       {
> >> > > > >         "id": "pdf1",
> >> > > > >         "author": "Wander",
> >> > > > >         "content": [
> >> > > > >           "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n
> >> > \n
> >> > > > > Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n
> >> Sexo e
> >> > > > > Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo
> >> > Espiritual”
> >> > > > \n
> >> > > > > \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n
> \n
> >> \n
> >> > \n
> >> > > > \n
> >> > > > > \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n
> Rua
> >> > > Souza
> >> > > > > Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
> >> > > > > www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier -
> >> Sexo e
> >> > > > > Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n
> >> > > Coleção
> >> > > > > \n“A Vida no Mundo Espiritual” \n"
> >> > > > >         ]
> >> > > > >       }
> >> > > > >     ]
> >> > > > >   },
> >> > > > >   "highlighting": {
> >> > > > >     "pdf1": {}
> >> > > > >   }
> >> > > > > }
> >> > > > >
> >> > > > > **On the content it brings the whole pdf content (book), and
> notice
> >> > > that
> >> > > > > in the highlight it shows empty.
> >> > > > >
> >> > > > > I tried creating a new core with bin/solr create -c test, using
> the
> >> > > > > schema.xml and solrconfig.xml standard found in
> >> > > > > /solr/server/solr/configsets/basic_configs/conf
> >> > > > >
> >> > > > > But even though... not working as expected (I think).
> >> > > > >
> >> > > > >
> >> > > > > Would you know how to set this techproducts example to bring the
> >> > > snnipets
> >> > > > > of text?
> >> > > > >
> >> > > > > The server only allows specific ip address for this port, if you
> >> > > would, I
> >> > > > > could get it open for you to check.
> >> > > > >
> >> > > > >
> >> > > > > Thanks again and best regards!
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > *Evert
> >> > > > >
> >> > > > >
> >> > > > > 2015-12-15 18:14 GMT-02:00 Erick Erickson <
> erickerick...@gmail.com
> >> >:
> >> > > > >
> >> > > > >> No, that's not what I meant. The highlight component adds a
> >> special
> >> > > > >> section to the return packet that will contain "snippets" of
> text
> >> > with
> >> > > > >> highlights. You control how big those snippets are via various
> >> > > > >> parameters in the highlight component and they'll have the tags
> >> you
> >> > > > >> specify for highlighting.
> >> > > > >>
> >> > > > >> Your app needs to pull the information from the highlight
> portion
> >> of
> >> > > > >> the response packet rather than the document list. Just execute
> >> your
> >> > > > >> queries via cURL or a browser to see the structure of a
> response
> >> to
> >> > > > >> see what I mean.
> >> > > > >>
> >> > > > >> And note that you do _not_ need to return the fields you're
> >> > > > >> highlighting in the "fl" list so you do _not_ need to return
> the
> >> > > > >> entire document contents.
> >> > > > >>
> >> > > > >> What are you using to display the results anyway?
> >> > > > >>
> >> > > > >> Best,
> >> > > > >> Erick
> >> > > > >>
> >> > > > >> On Tue, Dec 15, 2015 at 10:02 AM, Evert R. <
> evert.ra...@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >> > Hi Erick,
> >> > > > >> >
> >> > > > >> > Thank you very much for the reply!!
> >> > > > >> >
> >> > > > >> > I do get back the full text, autor, and a whole lots of stuff
> >> > which
> >> > > > >> doesn´t
> >> > > > >> > really matter for my project.
> >> > > > >> >
> >> > > > >> > So, what you are saying is that the solr gets me back the
> full
> >> > > content
> >> > > > >> and
> >> > > > >> > my application will fix the rest? Which means for me that
> all my
> >> > > books
> >> > > > >> (pdf
> >> > > > >> > files) when searching for an specific word it will bring me
> the
> >> > > whole
> >> > > > >> book
> >> > > > >> > content that has the requested query. And my application
> (php)
> >> in
> >> > > this
> >> > > > >> > case... will take care of show only part of the text (such
> as in
> >> > > > >> highlight,
> >> > > > >> > as I was understandind) and hightlight the key word I was
> >> looking
> >> > > for?
> >> > > > >> >
> >> > > > >> > If so, Erick, you gave me a big help clearing out... I
> thought I
> >> > > would
> >> > > > >> do
> >> > > > >> > that with Solr in an easy way. =)
> >> > > > >> >
> >> > > > >> > Thanks for the attachements tip!
> >> > > > >> >
> >> > > > >> > Best regards,
> >> > > > >> >
> >> > > > >> > Evert
> >> > > > >> >
> >> > > > >> > 2015-12-15 14:56 GMT-02:00 Erick Erickson <
> >> > erickerick...@gmail.com
> >> > > >:
> >> > > > >> >
> >> > > > >> >> How are you trying to display the results? Highlighting is a
> >> bit
> >> > of
> >> > > > an
> >> > > > >> >> odd beast. Assuming it's correctly configured, the response
> >> > packet
> >> > > > >> >> will have a separate highlight section, it's the
> application's
> >> > > > >> >> responsibility to present that pleasingly.
> >> > > > >> >>
> >> > > > >> >> What _do_ you get bak in the response?
> >> > > > >> >>
> >> > > > >> >> BTW, the mail sever pretty aggressively strips attachments,
> >> > your's
> >> > > > >> >> didn't come through.
> >> > > > >> >>
> >> > > > >> >> Best,
> >> > > > >> >> Erick
> >> > > > >> >>
> >> > > > >> >> On Tue, Dec 15, 2015 at 3:25 AM, Evert R. <
> >> evert.ra...@gmail.com
> >> > >
> >> > > > >> wrote:
> >> > > > >> >> > Hi there!
> >> > > > >> >> >
> >> > > > >> >> > It´s my first installation, not sure if here is the right
> >> > > > channel...
> >> > > > >> >> >
> >> > > > >> >> > Here is my steps:
> >> > > > >> >> >
> >> > > > >> >> > 1. Set up a basic install of solr 5.4.0
> >> > > > >> >> >
> >> > > > >> >> > 2. Create a new core through command line (bin/solr
> create -c
> >> > > test)
> >> > > > >> >> >
> >> > > > >> >> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test
> >> > > /docs/test/)
> >> > > > >> >> >
> >> > > > >> >> > 4. Query over the browser and it brings the correct
> search,
> >> but
> >> > > it
> >> > > > >> does
> >> > > > >> >> not
> >> > > > >> >> > show the part of the text I am querying, the highlight.
> >> > > > >> >> >
> >> > > > >> >> >   I have already flagled the 'hl' option. But still it
> does
> >> not
> >> > > > >> word...
> >> > > > >> >> >
> >> > > > >> >> > Exemple: I am looking for the word 'peace' in my pdf file
> >> > (book)
> >> > > I
> >> > > > >> have 4
> >> > > > >> >> > matches for this word, it shows me the book name (pdf
> file)
> >> but
> >> > > > does
> >> > > > >> not
> >> > > > >> >> > bring which part of the text it has the word peace on it.
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> > I am problably missing some configuration in schema.xml,
> >> which
> >> > is
> >> > > > >> missing
> >> > > > >> >> > from my folder.... /solr/server/solr/test/conf/
> >> > > > >> >> >
> >> > > > >> >> > Or even the solrconfig.xml...
> >> > > > >> >> >
> >> > > > >> >> > I have read a bunch of things about highlight check these
> >> > files,
> >> > > > >> copied
> >> > > > >> >> the
> >> > > > >> >> > standard schema.xml to my core/conf folder, but still it
> does
> >> > not
> >> > > > >> bring
> >> > > > >> >> the
> >> > > > >> >> > highlight.
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> > Attached a copy of my solrconfig.xml file.
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> > I am very sorry for this, probably, dumb and too basic
> >> > > question...
> >> > > > >> First
> >> > > > >> >> > time I see solr in live.
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> > Any help will be appreciated.
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> > Best regards,
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> > Evert Ramos
> >> > > > >> >> >
> >> > > > >> >> > evert.ra...@gmail.com
> >> > > > >> >> >
> >> > > > >> >>
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>

Re: Solr Basic Configuration - Highlight - Begginer

Reply via email to