Hi Alisa,
This was a bit too hard for me to grok on a first pass... then I saw
your related blog post which includes the actual sample data and makes
it more clear.

 More comments inline:

On Wed, Apr 20, 2016 at 2:29 PM, Alisa Z. <prol...@mail.ru> wrote:
>  Hi all,
>
> I have been stretching some SOLR's capabilities for nested documents handling 
> and I've come up with the following issue...
>
> Let's say I have the following structure:
>
> {
> "blog-posts":{                      //level 1
>     "leaf-fields":[
>         "date",
>         "author"],
>     "title":{                       //level 2
>         "leaf-fields":[ "text"],
>         "keywords":{                //level 3
>             "leaf-fields":[
>                 "text",
>                 "type"]
>             }
>         },
>     "body":{                        //level 2
>         "leaf-fields":[ "text"],
>         "keywords":{                //level 3
>             "leaf-fields":[
>                 "text",
>                 "type"]
>             }
>         },
>     "comments":{                    //level 2
>         "leaf-fields":[
>             "date",
>             "author",
>             "text",
>             "sentiment"
>             ],
>         "keywords":{                //level 3
>             "leaf-fields":[
>                 "text",
>                 "type"]
>             },
>         "replies":{                 //level 3
>             "leaf-fields":[
>                 "date",
>                 "author",
>                 "text",
>                 "sentiment"],
>             "keywords":{            //level 4
>                 "leaf-fields":[
>                     "text",
>                     "type"]
>                 }}}}}
>
>
> And I want to know the distribution of all readers' keywords (levels 3 and 4) 
> by comments (level 2).
> In JSON Facet API I tried this:
>
> curl http://localhost:8983/solr/my_index/query -d 
> 'q=path:2.blog-posts.comments&rows=0&
> json.facet={
>   filter_by_child_type :{
>     type:query,
>     q:"path:*comments*keywords",
>     domain: { blockChildren : "path:2.blog-posts.comments" },
>     facet:{
>       top_keywords : {
>         type: terms,
>         field: text,
>         sort: "counts_by_comments desc",
>         facet: {
>            counts_by_comments: "unique(_root_)"    // I suspect in should be 
> a different field, not _root_, but would it be for an intermediate document?
>          }}}}}'
>
> Which gives me the wrong results, it aggregates by posts, not by comments 
> (it's a toy data set, so I know that the correct answer for "Solr" is 3 when 
> faceted by for comments)


Yeah, this type if thing isn't currently directly supported, but
SOLR-8998 should address that.
You can currently hack around it (for simple counts) using unique(),
as you've discovered, but you need a unique ID at the right level to
get the right count.

_root_ is unique for blog posts, hence that's why you get numbers of
posts (as opposed to numbers of level-2 comments).
You could add a "level2_comment_id" field to the level 2 commends and
it's children, and then use unique() on that.

-Yonik


> {
> "response":{"numFound":3,"start":0,"docs":[]
>   },
>   "facets":{
>     "count":3,
>     "filter_by_child_type":{
>       "count":9,
>       "top_keywords":{
>         "buckets":[{
>             "val":"Elasticsearch",
>             "count":2,
>             "counts_by_comments":2},
>           {
>             "val":"Solr",
>             "count":5,
>             "counts_by_comments":2},               //here the count by 
> "comments" should be 3
>           {
>             "val":"Solr 5.5",
>             "count":1,
>             "counts_by_comments":1},
>           {
>             "val":"feature",
>             "count":1,
>             "counts_by_comments":1}]}}}}
>
>
> Am I writing the query wrong?
>
>
> By the way, Block Join Faceting works fine for this:
> bjqfacet?q={!parent%20which=path:2.blog-posts.comments}path:*.comments*keywords&rows=0&facet=true&child.facet.field=text&wt=json&indent=true
>
> {
>   "response":{"numFound":3,"start":0,"docs":[]
>   },
>   "facet_counts":{
>     "facet_queries":{},
>     "facet_fields":{
>       "text":[
>         "Elasticsearch",2,
>         "Solr",3,                                  //correct result
>         "Solr 5.5",1,
>         "feature",1]},
>     "facet_dates":{},
>     "facet_ranges":{},
>     "facet_intervals":{},
>     "facet_heatmaps":{}}}
>
> But we've already discussed that it returns too much stuff: no way to put 
> limits or order by counts :(  That's why I want to see whether it's posible 
> to make JSON Facet API straight.
>
> Thank you in advance!
>
> --
> Alisa Zhila

Reply via email to