[ https://issues.apache.org/jira/browse/LUCENE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448303#comment-17448303 ]
Marc D'Mello edited comment on LUCENE-10250 at 11/23/21, 11:23 PM: ------------------------------------------------------------------- I used random words as labels here because from my understanding of [this discussion|https://github.com/mikemccand/luceneutil/pull/144#discussion_r727974361], it seems that we cannot generate new wiki line file docs, so I only had access to the info already in the {{enwiki-20120502-lines-1k.txt}} file as a source. Though I agree with your point, the hierarchical categories already in wikipedia would be a good way to test this change. was (Author: mdmarshmallow): I used random words as labels here because from my understanding of [this discussion|https://github.com/mikemccand/luceneutil/pull/144#discussion_r727974361], it seems that we cannot generate new wiki line file docs, so I only had access to the info already in the {{enwiki-20120502-lines-1k.txt}} file. Though I agree with your point, the hierarchical categories already in wikipedia would be a good way to test this change. > Add hierarchical labels to SSDV facets > -------------------------------------- > > Key: LUCENE-10250 > URL: https://issues.apache.org/jira/browse/LUCENE-10250 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Marc D'Mello > Priority: Major > Labels: discussion > > Hi all, > I recently [added a new benchmarking > task|https://github.com/mikemccand/luceneutil/issues/141] to {{luceneutil}} > to count facets on a random word chosen from each document which would give > us a very high cardinality facet benchmarking compared to the faceting > benchmarks we already had. After being merged, [~mikemccand] pointed out some > [interesting > results|https://home.apache.org/~mikemccand/lucenebench/BrowseRandomLabelTaxoFacets.html] > in the nightly benchmarks where the {{BrowseRandomLabelSSDVFacets}} task was > much faster than the {{BrowseRandomLabelTaxoFacets}} task. > I was thinking that using SSDV facets instead of taxonomy facets for our use > case at Amazon Product Search could potentially lead to some increases in QPS > and decreases in index size, but the issue is we use hierarchical labels, and > as I understand it, SSDV faceting only supports a 2 level hierarchy as of > today. This leads to my question of why is there a limitation like this on > SSDV facets? Is hierarchical labels just a feature that hasn't been > implemented in SSDV facets yet, or is there some more complex reason that we > can't add hierarchical labels to SSDV facets? > Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org