[ https://issues.apache.org/jira/browse/CASSANDRA-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne resolved CASSANDRA-6588. ----------------------------------------- Resolution: Duplicate Fix Version/s: (was: 2.1 beta2) No, it wouldn't require any new syntax. So anyway, since that's somewhat different from what have been discussed here, created CASSANDRA-7085 to tackle that. New internal filters do means we'll have to wait for 3.0 however at this point. > Add a 'NO EMPTY RESULTS' filter to SELECT > ----------------------------------------- > > Key: CASSANDRA-6588 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6588 > Project: Cassandra > Issue Type: Improvement > Reporter: Sylvain Lebresne > Priority: Minor > > It is the semantic of CQL that a (CQL) row exists as long as it has one > non-null column (including the PK columns, which, given that no PK columns > can be null, means that it's enough to have the PK set for a row to exist). > This does means that the result to > {noformat} > CREATE TABLE test (k int PRIMARY KEY, v1 int, v2 int); > INSERT INTO test(k, v1) VALUES (0, 4); > SELECT v2 FROM test; > {noformat} > must be (and is) > {noformat} > v2 > ------ > null > {noformat} > That fact does mean however that when we only select a few columns of a row, > we still need to find out rows that exist but have no values for the selected > columns. Long story short, given how the storage engine works, this means we > need to query full (CQL) rows even when only some of the columns are selected > because that's the only way to distinguish between "the row exists but have > no value for the selected columns" and "the row doesn't exist". I'll note in > particular that, due to CASSANDRA-5762, we can't unfortunately rely on the > row marker to optimize that out. > Now, when you selects only a subsets of the columns of a row, there is many > cases where you don't care about rows that exists but have no value for the > columns you requested and are happy to filter those out. So, for those cases, > we could provided a new SELECT filter. Outside the potential convenience (not > having to filter empty results client side), one interesting part is that > when this filter is provided, we could optimize a bit by only querying the > columns selected, since we wouldn't need to return rows that exists but have > no values for the selected columns. > For the exact syntax, there is probably a bunch of options. For instance: > * {{SELECT NON EMPTY(v2, v3) FROM test}}: the vague rational for putting it > in the SELECT part is that such filter is kind of in the spirit to DISTINCT. > Possibly a bit ugly outside of that. > * {{SELECT v2, v3 FROM test NO EMPTY RESULTS}} or {{SELECT v2, v3 FROM test > NO EMPTY ROWS}} or {{SELECT v2, v3 FROM test NO EMPTY}}: the last one is > shorter but maybe a bit less explicit. As for {{RESULTS}} versus {{ROWS}}, > the only small object to {{NO EMPTY ROWS}} could be that it might suggest it > is filtering non existing rows (I mean, the fact we never ever return non > existing rows should hint that it's not what it does but well...) while we're > just filtering empty "resultSet rows". > Of course, if there is a pre-existing SQL syntax for that, it's even better, > though a very quick search didn't turn anything. Other suggestions welcome > too. -- This message was sent by Atlassian JIRA (v6.2#6252)