It depends a lot on what the documents are. Some document formats have
metadata that stores a title. Perhaps you can just extract that.
If not, once you've extracted the content, perhaps you could just have a
special field that is the first n words (followed by an ellipsis).
If you use a
The main objective here is actually to assign a title to the documents as
they are being indexed.
We actually found that the cluster labels provides a good information on
the key points of the documents, but I'm not sure if we can get a good
cluster labels with a single documents.
Besides
Hi Edwin,
let's do this step by step.
Clustering is problem solved by unsupervised machine learning algorithms.
The scope of clustering is to group per similarity a corpus of documents,
trying to have meaningful groups for a human being.
Solr currently provides different approaches for *Query
I agree with Upayavira,
Title extraction is an activity independent from Solr.
Furthermore I would say it's easy to extract the title before the Solr
Indexng stage.
When we send the content arrives to Solr Update processors it is already a
String.
If you want to do some clever title extraction,
Hi,
I'm currently using Solr 5.1, and I'm thinking of ways to allow the system
to automatically give the rich-text documents that are being indexed a
title automatically, instead of user entering it in manually, as we might
have to index a whole folder of documents together, so it is not wise for