Hi marotosg,
john's suggestion will definitely work ( I recommend you a copyfield for
that analysis).

What happens in your use case if a word is in common for more than one bag
of word  ( if possible at all in your use case)?
Do you expect to get back all the classes ? scored in some way ?

In  that case you may need a different approach, and the Solr Document
Classification should help.
At the moment the only available integration is the indexing time one (
which means you don't have control on human validation, Solr is going to
assign the class( or classes) and you just decide the output field.
Documentation was not very up to date, I just updated it [1]

In case you like a different approach ( including human validation), there a
Jira issue for a request handler approach, that could be called by your
indexing application and ask for human feedback before the document is sent
to solr, a contribution is welcome ! [2] .

Cheers

[1]  https://wiki.apache.org/solr/SolrClassification
[2]  https://issues.apache.org/jira/browse/SOLR-7738



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Classify-document-using-bag-of-words-tp4326865p4326988.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to