On Fri, 2016-04-29 at 08:55 +0000, G, Rajesh wrote: > I am trying to implement word > cloud<https://www.google.co.uk/imgres?imgurl=https%3A%2F%2Fwww.whitehouse.gov%2Fsites%2Fdefault%2Ffiles%2Fother%2Fsotu_wordle.png&imgrefurl=https%3A%2F%2Fwww.whitehouse.gov%2Fblog%2F2011%2F01%2F26%2Fstate-union-word-cloud-jobs-america-people-new&docid=eZ_HvQpd9FRBKM&tbnid=qyIc-elv6z-0iM%3A&w=895&h=406&bih=643&biw=1366&ved=0ahUKEwie_8XjurPMAhXLaRQKHWiFDFAQMwgyKAAwAA&iact=mrc&uact=8> > using Solr. The problem I have is Solr facet query ignores repeated words > in a document eg.
Use a combination of faceting and stats: 1) Resolve candidate words with faceting, just as you have already done. 2) Create a stats-request with the same q as you used for faceting, with a termfreq-function for each term in your facet result. Working example from the techproducts-demo that comes with Solr: http://localhost:8983/solr/techproducts/select ?q=name%3Addr%0A &fl=name&wt=json&indent=true &stats=true &stats.field={!sum=true%20func}termfreq(%27name%27,%20%27ddr%27) &stats.field={!sum=true%20func}termfreq(%27name%27,%20%271GB%27) where 'name' is the field ('comments' in your setup) and 'ddr' and '1GB' are two terms ('absorbed', 'am', 'believe' etc. in your setup). The result will be something like "response": { "numFound": 3, ... "stats": { "stats_fields": { "termfreq('name', 'ddr')": { "sum": 6 }, "termfreq('name', '1GB')": { "sum": 3 } } } - Toke Eskildsen, State and University Library, Denmark