Re[2]: Block Join faceting on intermediate levels with JSON Facet API (might be related to block join rollups & SOLR-8998)

Alisa Z . Fri, 22 Apr 2016 09:27:01 -0700

 Hi Yonik, 

Thanks a lot for your response.


I have discussed this with Mikhail Khludnev already and tried this suggestion. 
Here's what I've got:  



sentiment: positive
author: Bob
text: Great post about Solr
2.blog-posts.comments-id: 10735-23004                           //this is a new 
field, field name is different on each level for each type, values are unique
date: 2015-04-10T11:30:00Z
path: 2.blog-posts.comments
id: 10735-23004
Query:
curl http://localhost:8985/solr/solr_nesting_unique/query -d 
'q=path:2.blog-posts.comments&rows=0&
json.facet={
  filter_by_child_type :{
    type:query,
    q:"path:*comments*keywords",
    domain: { blockChildren : "path:2.blog-posts.comments" },
    facet:{
      top_entity_text : {
        type: terms,
        field: text,
        limit: 10,
        sort: "counts_by_comments desc",
        facet: {
           counts_by_comments: "unique (2.blog-posts.comments-id )"             
   // changed
         }}}}}'


Response:

"response":{"numFound":3,"start":0,"docs":[]
  },
  "facets":{
    "count":3,
    "filter_by_child_type":{
      "count":9,
      "top_entity_text":{
        "buckets":[{
            "val":"Elasticsearch",
            "count":2,
            "counts_by_comments":0},
          {
            "val":"Solr",
            "count":5,
            "counts_by_comments":0},
          {
            "val":"Solr 5.5",
            "count":1,
            "counts_by_comments":0},
          {
            "val":"feature",
            "count":1,
            "counts_by_comments":0}]}}}}

So unless I messed something up... or the field name does not look "canonical" 
(but it was fast to generate and  it is accepted in a normal query 
http://localhost:8985/solr/solr_nesting_unique/query?q=2.blog-posts.body-id :* 
) 

So I think that it's just a JSON facet API limitation...  

Best,
--Alisa 


>Пятница, 22 апреля 2016, 9:55 -04:00 от Yonik Seeley <ysee...@gmail.com>:
>
>Hi Alisa,
>This was a bit too hard for me to grok on a first pass... then I saw
>your related blog post which includes the actual sample data and makes
>it more clear.
>
> More comments inline:
>
>On Wed, Apr 20, 2016 at 2:29 PM, Alisa Z. < prol...@mail.ru > wrote:
>>  Hi all,
>>
>> I have been stretching some SOLR's capabilities for nested documents 
>> handling and I've come up with the following issue...
>>
>> Let's say I have the following structure:
>>
>> {
>> "blog-posts":{                      //level 1
>>     "leaf-fields":[
>>         "date",
>>         "author"],
>>     "title":{                       //level 2
>>         "leaf-fields":[ "text"],
>>         "keywords":{                //level 3
>>             "leaf-fields":[
>>                 "text",
>>                 "type"]
>>             }
>>         },
>>     "body":{                        //level 2
>>         "leaf-fields":[ "text"],
>>         "keywords":{                //level 3
>>             "leaf-fields":[
>>                 "text",
>>                 "type"]
>>             }
>>         },
>>     "comments":{                    //level 2
>>         "leaf-fields":[
>>             "date",
>>             "author",
>>             "text",
>>             "sentiment"
>>             ],
>>         "keywords":{                //level 3
>>             "leaf-fields":[
>>                 "text",
>>                 "type"]
>>             },
>>         "replies":{                 //level 3
>>             "leaf-fields":[
>>                 "date",
>>                 "author",
>>                 "text",
>>                 "sentiment"],
>>             "keywords":{            //level 4
>>                 "leaf-fields":[
>>                     "text",
>>                     "type"]
>>                 }}}}}
>>
>>
>> And I want to know the distribution of all readers' keywords (levels 3 and 
>> 4) by comments (level 2).
>> In JSON Facet API I tried this:
>>
>> curl http://localhost:8983/solr/my_index/query -d 
>> 'q=path:2.blog-posts.comments&rows=0&
>> json.facet={
>>   filter_by_child_type :{
>>     type:query,
>>     q:"path:*comments*keywords",
>>     domain: { blockChildren : "path:2.blog-posts.comments" },
>>     facet:{
>>       top_keywords : {
>>         type: terms,
>>         field: text,
>>         sort: "counts_by_comments desc",
>>         facet: {
>>            counts_by_comments: "unique(_root_)"    // I suspect in should be 
>> a different field, not _root_, but would it be for an intermediate document?
>>          }}}}}'
>>
>> Which gives me the wrong results, it aggregates by posts, not by comments 
>> (it's a toy data set, so I know that the correct answer for "Solr" is 3 when 
>> faceted by for comments)
>
>
>Yeah, this type if thing isn't currently directly supported, but
>SOLR-8998 should address that.
>You can currently hack around it (for simple counts) using unique(),
>as you've discovered, but you need a unique ID at the right level to
>get the right count.
>
>_root_ is unique for blog posts, hence that's why you get numbers of
>posts (as opposed to numbers of level-2 comments).
>You could add a "level2_comment_id" field to the level 2 commends and
>it's children, and then use unique() on that.
>
>-Yonik
>
>
>> {
>> "response":{"numFound":3,"start":0,"docs":[]
>>   },
>>   "facets":{
>>     "count":3,
>>     "filter_by_child_type":{
>>       "count":9,
>>       "top_keywords":{
>>         "buckets":[{
>>             "val":"Elasticsearch",
>>             "count":2,
>>             "counts_by_comments":2},
>>           {
>>             "val":"Solr",
>>             "count":5,
>>             "counts_by_comments":2},               //here the count by 
>> "comments" should be 3
>>           {
>>             "val":"Solr 5.5",
>>             "count":1,
>>             "counts_by_comments":1},
>>           {
>>             "val":"feature",
>>             "count":1,
>>             "counts_by_comments":1}]}}}}
>>
>>
>> Am I writing the query wrong?
>>
>>
>> By the way, Block Join Faceting works fine for this:
>> bjqfacet?q={!parent%20which=path:2.blog-posts.comments}path:*.comments*keywords&rows=0&facet=true&child.facet.field=text&wt=json&indent=true
>>
>> {
>>   "response":{"numFound":3,"start":0,"docs":[]
>>   },
>>   "facet_counts":{
>>     "facet_queries":{},
>>     "facet_fields":{
>>       "text":[
>>         "Elasticsearch",2,
>>         "Solr",3,                                  //correct result
>>         "Solr 5.5",1,
>>         "feature",1]},
>>     "facet_dates":{},
>>     "facet_ranges":{},
>>     "facet_intervals":{},
>>     "facet_heatmaps":{}}}
>>
>> But we've already discussed that it returns too much stuff: no way to put 
>> limits or order by counts :(  That's why I want to see whether it's posible 
>> to make JSON Facet API straight.
>>
>> Thank you in advance!
>>
>> --
>> Alisa Zhila

Re[2]: Block Join faceting on intermediate levels with JSON Facet API (might be related to block join rollups & SOLR-8998)

Reply via email to