Though I am interacting with Dawid (creator of Carrot2) on Carrot2 mailing list however just wanted to post my problem to a wider audience.
I am using Solr 4.7 (on both windows and linux) and saved my lingo-attributes.xml file from the workbench which I am using in Solr. Note that for testing I am just having one solr Index and all the queries are getting fired on that. Now the clusters that I am getting are good in the workbench (carrot) but pathetic in Solr. In the logs (jetty) I can see: Loaded Solr resource: clustering/carrot2/lingo-attributes.xml, so that indicates that my attribute file is being loaded. I am really confused what is accounting for the difference in the two outputs (workbench vs Solr). Again to reiterate the data sources are same (just one solr index and same queries with 100 results). This is happening on both Linux and Windows. Given below is my search component and request handler configuration: <searchComponent name="clustering" enable="${solr.clustering.enabled:true}" class="solr.clustering.ClusteringComponent" > <lst name="engine"> <str name="name">lingo</str> <!-- Class name of a clustering algorithm compatible with the Carrot2 framework. Currently available open source algorithms are: * org.carrot2.clustering.lingo.LingoClusteringAlgorithm * org.carrot2.clustering.stc.STCClusteringAlgorithm * org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm See http://project.carrot2.org/algorithms.html for more information. A commercial algorithm Lingo3G (needs to be installed separately) is defined as: * com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm --> <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str> <str name="LingoClusteringAlgorithm.desiredClusterCountBase">30</str> <!-- Override location of the clustering algorithm's resources (attribute definitions and lexical resources). A directory from which to load algorithm-specific stop words, stop labels and attribute definition XMLs. For an overview of Carrot2 lexical resources, see: http://download.carrot2.org/head/manual/#chapter.lexical-resources For an overview of Lingo3G lexical resources, see: http://download.carrotsearch.com/lingo3g/manual/#chapter.lexical-resources --> <str name="carrot.resourcesDir">clustering/carrot2</str> </lst> </searchComponent> <!-- A request handler for demonstrating the clustering component This is purely as an example. In reality you will likely want to add the component to your already specified request handlers. --> <requestHandler name="/clustering" enable="${solr.clustering.enabled:true}" class="solr.SearchHandler"> <lst name="defaults"> <bool name="clustering">true</bool> <bool name="clustering.results">true</bool> <!-- Field name with the logical "title" of a each document (optional) --> <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str> <str name="carrot.resourcesDir">clustering/carrot2</str> <str name="carrot.title">film_id</str> <!-- Field name with the logical "content" of a each document (optional) --> <str name="carrot.snippet">description</str> <!-- Apply highlighter to the title/ content and use this for clustering. --> <bool name="carrot.produceSummary">true</bool> <!-- the maximum number of labels per cluster --> <!--<int name="carrot.numDescriptions">5</int>--> <!-- produce sub clusters --> <bool name="carrot.outputSubClusters">false</bool> <str name="rows">100</str> </lst> <arr name="last-components"> <str>clustering</str> </arr> </requestHandler>