On Fri, Apr 22, 2016 at 12:26 PM, Alisa Z. <prol...@mail.ru> wrote: > Hi Yonik, > > Thanks a lot for your response. > > I have discussed this with Mikhail Khludnev already and tried this > suggestion. Here's what I've got: > > > > sentiment: positive > author: Bob > text: Great post about Solr > 2.blog-posts.comments-id: 10735-23004 //this is a > new field, field name is different on each level for each type, values are > unique > date: 2015-04-10T11:30:00Z > path: 2.blog-posts.comments > id: 10735-23004 > Query: > curl http://localhost:8985/solr/solr_nesting_unique/query -d > 'q=path:2.blog-posts.comments&rows=0& > json.facet={ > filter_by_child_type :{ > type:query, > q:"path:*comments*keywords", > domain: { blockChildren : "path:2.blog-posts.comments" }, > facet:{ > top_entity_text : { > type: terms, > field: text, > limit: 10, > sort: "counts_by_comments desc", > facet: { > counts_by_comments: "unique (2.blog-posts.comments-id )" > // changed > }}}}}'
Something is wrong if you are getting 0 counts. Lets try taking it piece-by-piece: Step 1: q=path:2.blog-posts.comments This finds level 2 documents Step 2: domain: { blockChildren : "path:2.blog-posts.comments" } This first maps to all of the children (level 3 and level4) Step 3: q:"path:*comments*keywords" This selects a subset of level3 and level4 documents with keywords (Note, in the future this should be doable as an additional filter in the domain spec, w/o an additional sub-facet level) Step 4: Facet on the text field of those level3 and level4 keyword docs. For each bucket, also find the unique number of values in the "2.blog-posts.comments-id" field on those documents. "Without seeing what you indexed, my guess is that the issue is that the "2.blog-posts.comments-id" field does not actually exist on those level3 and level4 docs being faceted. The JSON Facet API doesn't propagate field values up/down the nested stack yet. That's what https://issues.apache.org/jira/browse/SOLR-8998 is mostly about. -Yonik > > Response: > > "response":{"numFound":3,"start":0,"docs":[] > }, > "facets":{ > "count":3, > "filter_by_child_type":{ > "count":9, > "top_entity_text":{ > "buckets":[{ > "val":"Elasticsearch", > "count":2, > "counts_by_comments":0}, > { > "val":"Solr", > "count":5, > "counts_by_comments":0}, > { > "val":"Solr 5.5", > "count":1, > "counts_by_comments":0}, > { > "val":"feature", > "count":1, > "counts_by_comments":0}]}}}} > > So unless I messed something up... or the field name does not look > "canonical" (but it was fast to generate and it is accepted in a normal query > http://localhost:8985/solr/solr_nesting_unique/query?q=2.blog-posts.body-id > :* ) > > So I think that it's just a JSON facet API limitation... > > Best, > --Alisa > > >>Пятница, 22 апреля 2016, 9:55 -04:00 от Yonik Seeley <ysee...@gmail.com>: >> >>Hi Alisa, >>This was a bit too hard for me to grok on a first pass... then I saw >>your related blog post which includes the actual sample data and makes >>it more clear. >> >> More comments inline: >> >>On Wed, Apr 20, 2016 at 2:29 PM, Alisa Z. < prol...@mail.ru > wrote: >>> Hi all, >>> >>> I have been stretching some SOLR's capabilities for nested documents >>> handling and I've come up with the following issue... >>> >>> Let's say I have the following structure: >>> >>> { >>> "blog-posts":{ //level 1 >>> "leaf-fields":[ >>> "date", >>> "author"], >>> "title":{ //level 2 >>> "leaf-fields":[ "text"], >>> "keywords":{ //level 3 >>> "leaf-fields":[ >>> "text", >>> "type"] >>> } >>> }, >>> "body":{ //level 2 >>> "leaf-fields":[ "text"], >>> "keywords":{ //level 3 >>> "leaf-fields":[ >>> "text", >>> "type"] >>> } >>> }, >>> "comments":{ //level 2 >>> "leaf-fields":[ >>> "date", >>> "author", >>> "text", >>> "sentiment" >>> ], >>> "keywords":{ //level 3 >>> "leaf-fields":[ >>> "text", >>> "type"] >>> }, >>> "replies":{ //level 3 >>> "leaf-fields":[ >>> "date", >>> "author", >>> "text", >>> "sentiment"], >>> "keywords":{ //level 4 >>> "leaf-fields":[ >>> "text", >>> "type"] >>> }}}}} >>> >>> >>> And I want to know the distribution of all readers' keywords (levels 3 and >>> 4) by comments (level 2). >>> In JSON Facet API I tried this: >>> >>> curl http://localhost:8983/solr/my_index/query -d >>> 'q=path:2.blog-posts.comments&rows=0& >>> json.facet={ >>> filter_by_child_type :{ >>> type:query, >>> q:"path:*comments*keywords", >>> domain: { blockChildren : "path:2.blog-posts.comments" }, >>> facet:{ >>> top_keywords : { >>> type: terms, >>> field: text, >>> sort: "counts_by_comments desc", >>> facet: { >>> counts_by_comments: "unique(_root_)" // I suspect in should >>> be a different field, not _root_, but would it be for an intermediate >>> document? >>> }}}}}' >>> >>> Which gives me the wrong results, it aggregates by posts, not by comments >>> (it's a toy data set, so I know that the correct answer for "Solr" is 3 >>> when faceted by for comments) >> >> >>Yeah, this type if thing isn't currently directly supported, but >>SOLR-8998 should address that. >>You can currently hack around it (for simple counts) using unique(), >>as you've discovered, but you need a unique ID at the right level to >>get the right count. >> >>_root_ is unique for blog posts, hence that's why you get numbers of >>posts (as opposed to numbers of level-2 comments). >>You could add a "level2_comment_id" field to the level 2 commends and >>it's children, and then use unique() on that. >> >>-Yonik >> >> >>> { >>> "response":{"numFound":3,"start":0,"docs":[] >>> }, >>> "facets":{ >>> "count":3, >>> "filter_by_child_type":{ >>> "count":9, >>> "top_keywords":{ >>> "buckets":[{ >>> "val":"Elasticsearch", >>> "count":2, >>> "counts_by_comments":2}, >>> { >>> "val":"Solr", >>> "count":5, >>> "counts_by_comments":2}, //here the count by >>> "comments" should be 3 >>> { >>> "val":"Solr 5.5", >>> "count":1, >>> "counts_by_comments":1}, >>> { >>> "val":"feature", >>> "count":1, >>> "counts_by_comments":1}]}}}} >>> >>> >>> Am I writing the query wrong? >>> >>> >>> By the way, Block Join Faceting works fine for this: >>> bjqfacet?q={!parent%20which=path:2.blog-posts.comments}path:*.comments*keywords&rows=0&facet=true&child.facet.field=text&wt=json&indent=true >>> >>> { >>> "response":{"numFound":3,"start":0,"docs":[] >>> }, >>> "facet_counts":{ >>> "facet_queries":{}, >>> "facet_fields":{ >>> "text":[ >>> "Elasticsearch",2, >>> "Solr",3, //correct result >>> "Solr 5.5",1, >>> "feature",1]}, >>> "facet_dates":{}, >>> "facet_ranges":{}, >>> "facet_intervals":{}, >>> "facet_heatmaps":{}}} >>> >>> But we've already discussed that it returns too much stuff: no way to put >>> limits or order by counts :( That's why I want to see whether it's posible >>> to make JSON Facet API straight. >>> >>> Thank you in advance! >>> >>> -- >>> Alisa Zhila >