That's the way faceting is designed to work. It counts the _documents_
that a term appears in that satisfy your query, if a word appears
multiple times in a doc, it'll only count it once.

For the general use-case it'd be unsettling for a user to see a facet
count of 500, then click on it and discover that the number of docs in
the corpus was really 345 or something.

Ahmet's hints might help, but I'd really ask if counting words
multiple times really satisfies the use case.

Best,
Erick

On Fri, Apr 29, 2016 at 7:10 AM, Ahmet Arslan <iori...@yahoo.com.invalid> wrote:
> Hi,
>
> Depending on your requirements; StatsComponent, TermsComponent, 
> LukeRequestHandler can also be used.
>
>
> https://cwiki.apache.org/confluence/display/solr/The+Terms+Component
> https://wiki.apache.org/solr/LukeRequestHandler
> https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
> Ahmet
>
>
>
> On Friday, April 29, 2016 11:56 AM, "G, Rajesh" <r...@cebglobal.com> wrote:
> Hi,
>
> I am trying to implement word 
> cloud<https://www.google.co.uk/imgres?imgurl=https%3A%2F%2Fwww.whitehouse.gov%2Fsites%2Fdefault%2Ffiles%2Fother%2Fsotu_wordle.png&imgrefurl=https%3A%2F%2Fwww.whitehouse.gov%2Fblog%2F2011%2F01%2F26%2Fstate-union-word-cloud-jobs-america-people-new&docid=eZ_HvQpd9FRBKM&tbnid=qyIc-elv6z-0iM%3A&w=895&h=406&bih=643&biw=1366&ved=0ahUKEwie_8XjurPMAhXLaRQKHWiFDFAQMwgyKAAwAA&iact=mrc&uact=8>
>   using Solr.  The problem I have is Solr facet query ignores repeated words 
> in a document eg.
>
> I have indexed the text :
> It seems that the harder I work, the more work I get for the same 
> compensation and reward. The more work I take on gets absorbed into my 
> "normal" workload and I'm not recognized for working harder than my peers, 
> which makes me not want to work to my potential. I am very underwhelmed by 
> the evaluation process and bonus structure. I don't believe the current 
> structure rewards strong performers. I am confident that the company could 
> not hire someone with my talent to replace me if I left, but I don't think 
> the company realizes that.
>
> The indexed content has word my and the count the is 3 but when I run the 
> query 
> http://localhost:8182/solr/dev/select?facet=true&facet.field=comments&rows=0&indent=on&q=questionid:3956&wt=json
>  the count of word my  is 1 and not 3. Can you please help?
>
> Also please suggest If there is a better way to implement word cloud in Solr 
> other than using facet?
>
>     "facet_fields":{
>       "comments":[
>         "absorbed",1,
>         "am",1,
>         "believe",1,
>         "bonus",1,
>         "company",1,
>         "compensation",1,
>         "confident",1,
>         "could",1,
>         "current",1,
>         "don't",1,
>         "evaluation",1,
>         "get",1,
>         "gets",1,
>         "harder",1,
>         "hire",1,
>         "i",1,
>         "i'm",1,
>         "left",1,
>         "makes",1,
>         "me",1,
>         "more",1,
>         "my",1,
>         "normal",1,
>         "peers",1,
>         "performers",1,
>         "potential",1,
>         "process",1,
>         "realizes",1,
>         "recognized",1,
>         "replace",1,
>         "reward",1,
>         "rewards",1,
>         "same",1,
>         "seems",1,
>         "someone",1,
>         "strong",1,
>         "structure",1,
>         "take",1,
>         "talent",1,
>         "than",1,
>         "think",1,
>         "underwhelmed",1,
>         "very",1,
>         "want",1,
>         "which",1,
>         "work",1,
>         "working",1,
>         "workload",1]
>     }
>
>
>
>
> CEB India Private Limited. Registration No: U741040HR2004PTC035324. 
> Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, 
> Gurgaon, Haryana-122002, India..
>
>
>
> This e-mail and/or its attachments are intended only for the use of the 
> addressee(s) and may contain confidential and legally privileged information 
> belonging to CEB and/or its subsidiaries, including CEB subsidiaries that 
> offer SHL Talent Measurement products and services. If you have received this 
> e-mail in error, please notify the sender and immediately, destroy all copies 
> of this email and its attachments. The publication, copying, in whole or in 
> part, or use or dissemination in any other way of this e-mail and attachments 
> by anyone other than the intended person(s) is prohibited.

Reply via email to