Hi Ramdev,

Both of the clustering algorithms that ship with Solr (Lingo and STC) are
designed to allow one document to appear in more than one cluster, which
actually does make sense in many scenarios. There's no easy way to force
them to produce hard clusterings because this would require a complete
change in the way the algorithms work. If you need each document to belong
to exactly one cluster, you'd have to post-process the clusters to remove
the redundant document assignments. Alternatively, in case of the Lingo
algorithm, you can try lowering the
"LingoClusteringAlgorithm.clusterMergingThreshold" to some value in the
range of 0.2--0.5. If you do that, clusters containing overlapping documents
will get merged. For more information about this attribute, see here:
http://download.carrot2.org/stable/manual/#section.attribute.LingoClusteringAlgorithm.clusterMergingThreshold
.

Cheers,

Staszek

On Wed, Mar 30, 2011 at 18:21, Markus Jelsma <markus.jel...@openindex.io>wrote:

> Yes, you can set engine specific parameters. Check the comments in your
> snippety.
>
> > Hi:
> >   I recently included the CLustering component into Solr and updated the
> > requestHandler accordingly (in solrconfig.xml). Snippet of the Config for
> > the CLuserting:
> >
> >   <searchComponent
> >     name="clusteringComponent"
> >     enable="${solr.clustering.enabled:false}"
> >     class="org.apache.solr.handler.clustering.ClusteringComponent" >
> >     <!-- Declare an engine -->
> >     <lst name="engine">
> >       <!-- The name, only one can be named "default" -->
> >       <str name="name">default</str>
> >       <!--
> >            Class name of Carrot2 clustering algorithm. Currently
> available
> > algorithms are:
> >
> >            * org.carrot2.clustering.lingo.LingoClusteringAlgorithm
> >            * org.carrot2.clustering.stc.STCClusteringAlgorithm
> >
> >            See http://project.carrot2.org/algorithms.html for the
> > algorithm's characteristics. -->
> >       <str
> >
> name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgori
> > thm</str> <!--
> >            Overriding values for Carrot2 default algorithm attributes.
> For
> > a description of all available attributes, see:
> > http://download.carrot2.org/stable/manual/#chapter.components. Use
> > attribute key as name attribute of str elements below. These can be
> > further overridden for individual requests by specifying attribute key as
> > request parameter name and attribute value as parameter value.
> >         -->
> >       <str
> name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str>
> >     </lst>
> >     <lst name="engine">
> >       <str name="name">stc</str>
> >       <str
> >
> name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm<
> > /str> </lst>
> >   </searchComponent>
> >
> > snippet of the Config for requestHandler
> >   <requestHandler name="standard" class="solr.SearchHandler"
> > default="true"> <!-- default values for query parameters -->
> >      <lst name="defaults">
> >        <str name="echoParams">explicit</str>
> >        <!--
> >        <int name="rows">10</int>
> >        <str name="fl">*</str>
> >        <str name="version">2.1</str>
> >         -->
> >        <bool name="clustering">true</bool>
> >        <str name="clustering.engine">default</str>
> >        <bool name="clustering.results">true</bool>
> >        <!-- The title field -->
> >        <str name="carrot.title">headline</str>
> >        <str name="carrot.url">pi</str>
> >        <!-- The field to cluster on -->
> >        <str name="carrot.snippet">headline</str>
> >        <!-- produce summaries -->
> >        <bool name="carrot.produceSummary">true</bool>
> >        <!-- the maximum number of labels per cluster -->
> >        <!--<int name="carrot.numDescriptions">5</int>-->
> >        <!-- produce sub clusters -->
> >        <bool name="carrot.outputSubClusters">false</bool>
> >      </lst>
> >     <arr name="last-components">
> >       <str>clusteringComponent</str>
> >     </arr>
> >   </requestHandler>
> >
> >
> > When I perform a search, I see that the Cluster section within the Solr
> > results shows me results that are not quite consistent. There are two
> > documents that are reported in two different documents
> >
> > Are there parameters that can be set that will prevent this from
> happening
> > ?
> >
> >
> > Thanks much
> >
> > Ramdev
>

Reply via email to