Two Peters (or rather a stupid english bloke who can't work out how to type
fancy accents :-)

Sorry Péter (took me 10 minutes to work out i could cut and paste) my reply
was to the clustering post by Peter Sturge. Clustering sounds great but
being able to define a thesaurus scheme excatly would be good too.



2010/12/10 Péter Király <kirun...@gmail.com>

> Hi Lee,
>
> according to my vision the user could decide which relationship types
> would he likes to attach to his search, and the application would call
> his attention to other possibilities. So there would be no heuristic
> method applied, because e.g. boarder terms would cause lots of
> misleading results.
>
> Péter
>
> 2010/12/10 lee carroll <lee.a.carr...@googlemail.com>:
> > Hi Peter,
> >
> > Thats way to clever for me :-)
> > Discovering thesuarus relationships would be fantastic but its not clear
> > what heuristics you would need to use to discover broader, narrower,
> related
> > documents etc. Although I might be doing the clustering down i'm
> sceptical
> > about the accuracy.
> >
> > cheers Lee c
> >
> > On 10 December 2010 09:38, Peter Sturge <peter.stu...@gmail.com> wrote:
> >
> >> Hi Lee,
> >>
> >> Perhaps Solr's clustering component might be helpful for your use case?
> >> http://wiki.apache.org/solr/ClusteringComponent
> >>
> >>
> >>
> >>
> >> On Fri, Dec 10, 2010 at 9:17 AM, lee carroll
> >> <lee.a.carr...@googlemail.com> wrote:
> >> > Hi Chris,
> >> >
> >> > Its all a bit early in the morning for this mined :-)
> >> >
> >> > The question asked, in good faith, was does solr support or extend to
> >> > implementing a thesaurus. It looks like it does not which is fine. It
> >> does
> >> > support synonyms and synonym rings which is again fine. The ski
> example
> >> was
> >> > an illustration in response to a follow up question for more
> explanation
> >> on
> >> > what a thesaurus is.
> >> >
> >> > An attempt at an answer of why a thesaurus; is below.
> >> >
> >> > Use case 1: improve facets
> >> >
> >> > Motivation
> >> > Unstructured lists of labels in facets offer very poor user
> experience.
> >> > Similar to tag clouds users find them arbitrary, with out focus and
> often
> >> > overwhelming. Labels in facets which are grouped in meaningful ways
> >> relevant
> >> > to the user increase engagement, perceived relevance and user
> >> satisfaction.
> >> >
> >> > Solution
> >> > A thesaurus of term relationships could be used to group facet labels
> >> >
> >> > Implementation
> >> > (er completely out of my depth at this point)
> >> > Thesaurus relationships defined in a simple text file
> >> > term, bt=>term,term nt=> term, term rt=>term, term, pt=>term
> >> > if a search specifies a facet to be returned the field terms are
> >> identified
> >> > by reading the thesaurus into groups, broader terms, narrower terms,
> >> related
> >> > terms etc
> >> > These groups are returned as part of the response for the UI to
> display
> >> > faceted labels as broader, narrower, related terms etc
> >> >
> >> > Use case 2: Increase synonym search precision
> >> >
> >> > Motivation
> >> > Synonyms rings do not allow differences in synonym to be identified.
> >> Rarely
> >> > are synonyms exactly equivalent. This leads to a decrease in search
> >> > precision.
> >> >
> >> > Solution
> >> > Boost queries based on search term thesaurus relationships
> >> >
> >> > Implementation
> >> > (again completely  out of depth here)
> >> > Allow terms in the index to be identified as bt , nt, .. terms of the
> >> search
> >> > term. Allow query parser to boost terms differentially based on these
> >> > thesaurus relationships
> >> >
> >> >
> >> >
> >> > As for the x and y stuff I'm not sure, like i say its quite early in
> the
> >> > morning for me. I'm sure their may well be a different way of
> achieving
> >> the
> >> > above (but note it is more than a hierarchy). However the librarians
> have
> >> > been doing this for 50 years now .
> >> >
> >> > Again though just to repeat this is hardly a killer for us. We've
> looked
> >> at
> >> > solr for a project; created a proto type; generated tons of questions,
> >> had
> >> > them answered in the main by the docs, some on this list and been
> amazed
> >> at
> >> > the fantastic results solr has given us. In fact with a combination of
> >> > keepwords and synonyms we have got a pretty nice simple set of facet
> >> labels
> >> > anyway (my motivation for the original question), so our corpus at the
> >> > moment does not really need a thesaurus! :-)
> >> >
> >> > Thanks Lee
> >> >
> >> >
> >> > On 9 December 2010 23:38, Chris Hostetter <hossman_luc...@fucit.org>
> >> wrote:
> >> >
> >> >>
> >> >>
> >> >> : a term can have a Prefered Term (PT), many Broader Terms (BT), Many
> >> >> Narrower
> >> >> : Terms (NT) Related Terms (RT) etc
> >> >>         ...
> >> >> : User supplied Term is say : Ski
> >> >> :
> >> >> : Prefered term: Skiing
> >> >> : Broader terms could be : Ski and Snow Boarding, Mountain Sports,
> >> Sports
> >> >> : Narrower terms: down hill skiing, telemark, cross country
> >> >> : Related terms: boarding, snow boarding, winter holidays
> >> >>
> >> >> I'm still lost.
> >> >>
> >> >> You've described a black box with some sample input ("Ski") and some
> >> >> corrisponding sample output (PT=..., BT=..., NT=..., RT=....) -- but
> you
> >> >> haven't explained what you want to do with tht black box.  Assuming
> such
> >> a
> >> >> black box existed in solr what are you expecting/hoping to do with
> it?
> >> >> how would such a black box modify solr's user experience?  what is
> your
> >> >> goal?
> >> >>
> >> >> Smells like an XY Problem...
> >> >> http://people.apache.org/~hossman/#xyproblem<http://people.apache.org/%7Ehossman/#xyproblem>
> <http://people.apache.org/%7Ehossman/#xyproblem>
> >> <http://people.apache.org/%7Ehossman/#xyproblem>
> >> >>
> >> >> Your question appears to be an "XY Problem" ... that is: you are
> dealing
> >> >> with "X", you are assuming "Y" will help you, and you are asking
> about
> >> "Y"
> >> >> without giving more details about the "X" so that we can understand
> the
> >> >> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> >> >> See Also: http://www.perlmonks.org/index.pl?node_id=542341
> >> >>
> >> >>
> >> >> -Hoss
> >> >>
> >> >
> >>
> >
>

Reply via email to