Hi Yonik, Thanks a lot for your response.
I have discussed this with Mikhail Khludnev already and tried this suggestion. Here's what I've got: sentiment: positive author: Bob text: Great post about Solr 2.blog-posts.comments-id: 10735-23004 //this is a new field, field name is different on each level for each type, values are unique date: 2015-04-10T11:30:00Z path: 2.blog-posts.comments id: 10735-23004 Query: curl http://localhost:8985/solr/solr_nesting_unique/query -d 'q=path:2.blog-posts.comments&rows=0& json.facet={ filter_by_child_type :{ type:query, q:"path:*comments*keywords", domain: { blockChildren : "path:2.blog-posts.comments" }, facet:{ top_entity_text : { type: terms, field: text, limit: 10, sort: "counts_by_comments desc", facet: { counts_by_comments: "unique (2.blog-posts.comments-id )" // changed }}}}}' Response: "response":{"numFound":3,"start":0,"docs":[] }, "facets":{ "count":3, "filter_by_child_type":{ "count":9, "top_entity_text":{ "buckets":[{ "val":"Elasticsearch", "count":2, "counts_by_comments":0}, { "val":"Solr", "count":5, "counts_by_comments":0}, { "val":"Solr 5.5", "count":1, "counts_by_comments":0}, { "val":"feature", "count":1, "counts_by_comments":0}]}}}} So unless I messed something up... or the field name does not look "canonical" (but it was fast to generate and it is accepted in a normal query http://localhost:8985/solr/solr_nesting_unique/query?q=2.blog-posts.body-id :* ) So I think that it's just a JSON facet API limitation... Best, --Alisa >Пятница, 22 апреля 2016, 9:55 -04:00 от Yonik Seeley <ysee...@gmail.com>: > >Hi Alisa, >This was a bit too hard for me to grok on a first pass... then I saw >your related blog post which includes the actual sample data and makes >it more clear. > > More comments inline: > >On Wed, Apr 20, 2016 at 2:29 PM, Alisa Z. < prol...@mail.ru > wrote: >> Hi all, >> >> I have been stretching some SOLR's capabilities for nested documents >> handling and I've come up with the following issue... >> >> Let's say I have the following structure: >> >> { >> "blog-posts":{ //level 1 >> "leaf-fields":[ >> "date", >> "author"], >> "title":{ //level 2 >> "leaf-fields":[ "text"], >> "keywords":{ //level 3 >> "leaf-fields":[ >> "text", >> "type"] >> } >> }, >> "body":{ //level 2 >> "leaf-fields":[ "text"], >> "keywords":{ //level 3 >> "leaf-fields":[ >> "text", >> "type"] >> } >> }, >> "comments":{ //level 2 >> "leaf-fields":[ >> "date", >> "author", >> "text", >> "sentiment" >> ], >> "keywords":{ //level 3 >> "leaf-fields":[ >> "text", >> "type"] >> }, >> "replies":{ //level 3 >> "leaf-fields":[ >> "date", >> "author", >> "text", >> "sentiment"], >> "keywords":{ //level 4 >> "leaf-fields":[ >> "text", >> "type"] >> }}}}} >> >> >> And I want to know the distribution of all readers' keywords (levels 3 and >> 4) by comments (level 2). >> In JSON Facet API I tried this: >> >> curl http://localhost:8983/solr/my_index/query -d >> 'q=path:2.blog-posts.comments&rows=0& >> json.facet={ >> filter_by_child_type :{ >> type:query, >> q:"path:*comments*keywords", >> domain: { blockChildren : "path:2.blog-posts.comments" }, >> facet:{ >> top_keywords : { >> type: terms, >> field: text, >> sort: "counts_by_comments desc", >> facet: { >> counts_by_comments: "unique(_root_)" // I suspect in should be >> a different field, not _root_, but would it be for an intermediate document? >> }}}}}' >> >> Which gives me the wrong results, it aggregates by posts, not by comments >> (it's a toy data set, so I know that the correct answer for "Solr" is 3 when >> faceted by for comments) > > >Yeah, this type if thing isn't currently directly supported, but >SOLR-8998 should address that. >You can currently hack around it (for simple counts) using unique(), >as you've discovered, but you need a unique ID at the right level to >get the right count. > >_root_ is unique for blog posts, hence that's why you get numbers of >posts (as opposed to numbers of level-2 comments). >You could add a "level2_comment_id" field to the level 2 commends and >it's children, and then use unique() on that. > >-Yonik > > >> { >> "response":{"numFound":3,"start":0,"docs":[] >> }, >> "facets":{ >> "count":3, >> "filter_by_child_type":{ >> "count":9, >> "top_keywords":{ >> "buckets":[{ >> "val":"Elasticsearch", >> "count":2, >> "counts_by_comments":2}, >> { >> "val":"Solr", >> "count":5, >> "counts_by_comments":2}, //here the count by >> "comments" should be 3 >> { >> "val":"Solr 5.5", >> "count":1, >> "counts_by_comments":1}, >> { >> "val":"feature", >> "count":1, >> "counts_by_comments":1}]}}}} >> >> >> Am I writing the query wrong? >> >> >> By the way, Block Join Faceting works fine for this: >> bjqfacet?q={!parent%20which=path:2.blog-posts.comments}path:*.comments*keywords&rows=0&facet=true&child.facet.field=text&wt=json&indent=true >> >> { >> "response":{"numFound":3,"start":0,"docs":[] >> }, >> "facet_counts":{ >> "facet_queries":{}, >> "facet_fields":{ >> "text":[ >> "Elasticsearch",2, >> "Solr",3, //correct result >> "Solr 5.5",1, >> "feature",1]}, >> "facet_dates":{}, >> "facet_ranges":{}, >> "facet_intervals":{}, >> "facet_heatmaps":{}}} >> >> But we've already discussed that it returns too much stuff: no way to put >> limits or order by counts :( That's why I want to see whether it's posible >> to make JSON Facet API straight. >> >> Thank you in advance! >> >> -- >> Alisa Zhila