[ https://issues.apache.org/jira/browse/CASSANDRA-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919726#action_12919726 ]
Stu Hood commented on CASSANDRA-1601: ------------------------------------- Trippy realization: validators, as implemented in trunk, are already a very specific type of UDF. The input is a single untyped column, and the output is a single typed column. The content of the index must be typed, so UDFs can consume arbitrary input, and will always output typed data. > Refactor index definitions > -------------------------- > > Key: CASSANDRA-1601 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1601 > Project: Cassandra > Issue Type: Improvement > Components: API > Reporter: Stu Hood > Priority: Critical > Fix For: 0.7.0 > > > h3. Overview > There are a few considerations for defining secondary indexes and row > validation that I don't think have been brought up yet. While the interface > is still malleable pre 0.7.0, we should attempt to make changes that allow > for forwards compatibility of index/validator schemas. This is an umbrella > ticket for suggesting/debating the changes: other tickets should be opened > for quick improvements that can be made before 0.7.0. > ---- > h3. Index output types > The output (queryable) data from an indexing operation is what actually goes > in the index. For a particular row, the output can be either _single-valued_, > _multi-valued_ or _compound_: > * Single-valued > ** Implemented in trunk (special case of multi-valued) > * Multi-valued > ** Multiple index values _of the same type_ can match a single row > ** Row probably contains a list/set (perhaps in a supercolumn) > * Compound > ** Multiple base properties concatenated as one index entry > ** Different validators/comparators for each component > ** (Given the simplicity of performing boolean operations on 1472 indexes, > compound local indexes are unlikely to ever be worthwhile, but compound > distributed indexes will be: see comments on CASSANDRA-1599) > h3. Index input types > The other end of indexing is selection of values from a row to be indexed. > Selection can correspond directly to our current {{db.filter.*}} > implementations, and may be best implemented by specifying the > validator/index using the same Thrift objects you would use for a similar > query: > * Name selection > ** Implemented in trunk, but should probably just be a special case of list > selection below > ** Corresponds to db.filter.NamesQueryFilter of size 1 > * List selection > ** Should specify a list of columns of which all values must be of the same > type, as defined by the Validator > ** Corresponds to db.filter.NamesQueryFilter > * Range (prefix?) selection > ** Subsets of a row may be interesting for indexing > ** Range corresponds to db.filter.SliceQueryFilter > *** (A Prefix might actually be more useful for indexing, but is better > implemented by indexing an arbitrarily nested row) > ** Open question: might the ability to index only the 'top N values' from a > row be useful? If so, then this selector should allow N to be specified like > it would be for a slice > h3. Supercolumns/arbitrary-nesting > Another consideration is that we should be able to support indexing and > validation of supercolumns (and hence, arbitrarily nested rows). Since the > selection of columns to index is essentially the same as the selection of > columns to return for a query, this can probably mirror (and suggest > improvements to) our query API. > h3. UDFs > This is obviously still an open area, but user defined indexing functions are > essentially a transform between the _input_ and _output_ (as defined above), > which would normally have equal structures. Leaving room for UDFs in our > index definitions makes sense, and will likely lead to a much more general > and elegant design. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.