Hi Alisa, This was a bit too hard for me to grok on a first pass... then I saw your related blog post which includes the actual sample data and makes it more clear.
More comments inline: On Wed, Apr 20, 2016 at 2:29 PM, Alisa Z. <prol...@mail.ru> wrote: > Hi all, > > I have been stretching some SOLR's capabilities for nested documents handling > and I've come up with the following issue... > > Let's say I have the following structure: > > { > "blog-posts":{ //level 1 > "leaf-fields":[ > "date", > "author"], > "title":{ //level 2 > "leaf-fields":[ "text"], > "keywords":{ //level 3 > "leaf-fields":[ > "text", > "type"] > } > }, > "body":{ //level 2 > "leaf-fields":[ "text"], > "keywords":{ //level 3 > "leaf-fields":[ > "text", > "type"] > } > }, > "comments":{ //level 2 > "leaf-fields":[ > "date", > "author", > "text", > "sentiment" > ], > "keywords":{ //level 3 > "leaf-fields":[ > "text", > "type"] > }, > "replies":{ //level 3 > "leaf-fields":[ > "date", > "author", > "text", > "sentiment"], > "keywords":{ //level 4 > "leaf-fields":[ > "text", > "type"] > }}}}} > > > And I want to know the distribution of all readers' keywords (levels 3 and 4) > by comments (level 2). > In JSON Facet API I tried this: > > curl http://localhost:8983/solr/my_index/query -d > 'q=path:2.blog-posts.comments&rows=0& > json.facet={ > filter_by_child_type :{ > type:query, > q:"path:*comments*keywords", > domain: { blockChildren : "path:2.blog-posts.comments" }, > facet:{ > top_keywords : { > type: terms, > field: text, > sort: "counts_by_comments desc", > facet: { > counts_by_comments: "unique(_root_)" // I suspect in should be > a different field, not _root_, but would it be for an intermediate document? > }}}}}' > > Which gives me the wrong results, it aggregates by posts, not by comments > (it's a toy data set, so I know that the correct answer for "Solr" is 3 when > faceted by for comments) Yeah, this type if thing isn't currently directly supported, but SOLR-8998 should address that. You can currently hack around it (for simple counts) using unique(), as you've discovered, but you need a unique ID at the right level to get the right count. _root_ is unique for blog posts, hence that's why you get numbers of posts (as opposed to numbers of level-2 comments). You could add a "level2_comment_id" field to the level 2 commends and it's children, and then use unique() on that. -Yonik > { > "response":{"numFound":3,"start":0,"docs":[] > }, > "facets":{ > "count":3, > "filter_by_child_type":{ > "count":9, > "top_keywords":{ > "buckets":[{ > "val":"Elasticsearch", > "count":2, > "counts_by_comments":2}, > { > "val":"Solr", > "count":5, > "counts_by_comments":2}, //here the count by > "comments" should be 3 > { > "val":"Solr 5.5", > "count":1, > "counts_by_comments":1}, > { > "val":"feature", > "count":1, > "counts_by_comments":1}]}}}} > > > Am I writing the query wrong? > > > By the way, Block Join Faceting works fine for this: > bjqfacet?q={!parent%20which=path:2.blog-posts.comments}path:*.comments*keywords&rows=0&facet=true&child.facet.field=text&wt=json&indent=true > > { > "response":{"numFound":3,"start":0,"docs":[] > }, > "facet_counts":{ > "facet_queries":{}, > "facet_fields":{ > "text":[ > "Elasticsearch",2, > "Solr",3, //correct result > "Solr 5.5",1, > "feature",1]}, > "facet_dates":{}, > "facet_ranges":{}, > "facet_intervals":{}, > "facet_heatmaps":{}}} > > But we've already discussed that it returns too much stuff: no way to put > limits or order by counts :( That's why I want to see whether it's posible > to make JSON Facet API straight. > > Thank you in advance! > > -- > Alisa Zhila