Hi marotosg, john's suggestion will definitely work ( I recommend you a copyfield for that analysis).
What happens in your use case if a word is in common for more than one bag of word ( if possible at all in your use case)? Do you expect to get back all the classes ? scored in some way ? In that case you may need a different approach, and the Solr Document Classification should help. At the moment the only available integration is the indexing time one ( which means you don't have control on human validation, Solr is going to assign the class( or classes) and you just decide the output field. Documentation was not very up to date, I just updated it [1] In case you like a different approach ( including human validation), there a Jira issue for a request handler approach, that could be called by your indexing application and ask for human feedback before the document is sent to solr, a contribution is welcome ! [2] . Cheers [1] https://wiki.apache.org/solr/SolrClassification [2] https://issues.apache.org/jira/browse/SOLR-7738 ----- --------------- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- View this message in context: http://lucene.472066.n3.nabble.com/Classify-document-using-bag-of-words-tp4326865p4326988.html Sent from the Solr - User mailing list archive at Nabble.com.