[ 
https://issues.apache.org/jira/browse/CASSANDRA-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919726#action_12919726
 ] 

Stu Hood commented on CASSANDRA-1601:
-------------------------------------

Trippy realization: validators, as implemented in trunk, are already a very 
specific type of UDF. The input is a single untyped column, and the output is a 
single typed column. The content of the index must be typed, so UDFs can 
consume arbitrary input, and will always output typed data.

> Refactor index definitions
> --------------------------
>
>                 Key: CASSANDRA-1601
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1601
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.7.0
>
>
> h3. Overview
> There are a few considerations for defining secondary indexes and row 
> validation that I don't think have been brought up yet. While the interface 
> is still malleable pre 0.7.0, we should attempt to make changes that allow 
> for forwards compatibility of index/validator schemas. This is an umbrella 
> ticket for suggesting/debating the changes: other tickets should be opened 
> for quick improvements that can be made before 0.7.0.
> ----
> h3. Index output types
> The output (queryable) data from an indexing operation is what actually goes 
> in the index. For a particular row, the output can be either _single-valued_, 
> _multi-valued_ or _compound_:
> * Single-valued
> ** Implemented in trunk (special case of multi-valued)
> * Multi-valued
> ** Multiple index values _of the same type_ can match a single row
> ** Row probably contains a list/set (perhaps in a supercolumn)
> * Compound
> ** Multiple base properties concatenated as one index entry 
> ** Different validators/comparators for each component
> ** (Given the simplicity of performing boolean operations on 1472 indexes, 
> compound local indexes are unlikely to ever be worthwhile, but compound 
> distributed indexes will be: see comments on CASSANDRA-1599)
> h3. Index input types
> The other end of indexing is selection of values from a row to be indexed. 
> Selection can correspond directly to our current {{db.filter.*}} 
> implementations, and may be best implemented by specifying the 
> validator/index using the same Thrift objects you would use for a similar 
> query:
> * Name selection
> ** Implemented in trunk, but should probably just be a special case of list 
> selection below
> ** Corresponds to db.filter.NamesQueryFilter of size 1
> * List selection
> ** Should specify a list of columns of which all values must be of the same 
> type, as defined by the Validator
> ** Corresponds to db.filter.NamesQueryFilter
> * Range (prefix?) selection
> ** Subsets of a row may be interesting for indexing
> ** Range corresponds to db.filter.SliceQueryFilter
> *** (A Prefix might actually be more useful for indexing, but is better 
> implemented by indexing an arbitrarily nested row)
> ** Open question: might the ability to index only the 'top N values' from a 
> row be useful? If so, then this selector should allow N to be specified like 
> it would be for a slice
> h3. Supercolumns/arbitrary-nesting
> Another consideration is that we should be able to support indexing and 
> validation of supercolumns (and hence, arbitrarily nested rows). Since the 
> selection of columns to index is essentially the same as the selection of 
> columns to return for a query, this can probably mirror (and suggest 
> improvements to) our query API.
> h3. UDFs
> This is obviously still an open area, but user defined indexing functions are 
> essentially a transform between the _input_ and _output_ (as defined above), 
> which would normally have equal structures. Leaving room for UDFs in our 
> index definitions makes sense, and will likely lead to a much more general 
> and elegant design.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to