The main objective here is actually to assign a title to the documents as they are being indexed.
We actually found that the cluster labels provides a good information on the key points of the documents, but I'm not sure if we can get a good cluster labels with a single documents. Besides getting from cluster labels, is there other methods which we can use to assign a title? Regards, Edwin On 10 June 2015 at 17:16, Alessandro Benedetti <benedetti.ale...@gmail.com> wrote: > Hi Edwin, > let's do this step by step. > > Clustering is problem solved by unsupervised machine learning algorithms. > The scope of clustering is to group per similarity a corpus of documents, > trying to have meaningful groups for a human being. > Solr currently provides different approaches for *Query Time Clustering* ( > also known Online Clustering). > There's an out of the box integration that allows you to use clustering at > query time on the query results. > Different algorithms can be selected, mainly provided by Carrots2 . > > This algorithms also provide a guess for the cluster name. > > Given this introduction let me see your problem. > > 1) The first part can be solved with a custom UpdateProcessor that will > process the document and add the automatic new title. > Now the problem is, how we want to extract this new title ? > Honestly I can not understand how clustering can fit here … > > 2) Index time clustering is not yet provided in Solr ( I remember there was > only an interface ready, but no implementation) . > You should cluster the content before indexing it in Solr using a machine > Learning library. > Indexing time clustering is delicate. What will happen to the next re-Index > ? Should we cluster everything again ? > This topic must be investigated more. > > Anyway, let me know as the original problem maybe does not require the > clustering. > > Cheers > > > 2015-06-10 4:13 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>: > > > Hi, > > > > I'm currently using Solr 5.1, and I'm thinking of ways to allow the > system > > to automatically give the rich-text documents that are being indexed a > > title automatically, instead of user entering it in manually, as we might > > have to index a whole folder of documents together, so it is not wise for > > the user to enter the title one by one. > > > > I would like to check, if it's possible to run the clustering, get the > > results, and use the top score label to be the title of the document? > > Apparently, we need to run the clustering prior to the indexing, so I'm > not > > sure if that is possible. > > > > > > Regards, > > Edwin > > > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >