How do you construct an index and use it, especially in Ruby

Bob Hutchison Sun, 25 Apr 2010 11:15:15 -0700

Hi,

I'm new to Cassandra and trying to work out how to do something that I've 
implemented any number of times (e.g. TokyoCabinet, Perst, even the filesystem 
using grep :-) I've managed to get some of this working in Cassandra but not 
all.


So here's the core of the situation.

I have this opaque chunk of data that I want to store in Cassandra and then 
find it again.

I can generate a key when the data is created very easily, and I've stored it 
in a straight forward manner: in a column with a key whose value is the data. 
And I can retrieve it when I know the key. No difficulties here at all, works 
fine.

Now I want to index this data taking what I imagine to be a pretty typical 
approach.

Lets say there's two many-to-one indexes: 'colour', and 'size'. Each colour 
value will have more than one chunk of data, same for size.

What I thought I'd do is make a super column and index the chunk of data kind 
of like: { 'colour' => { 'blue' => 1 }, 'size' => { 'large' => 1}} with the key 
equal to the key of the chunk of data. And Cassandra stores it without error 
like that. So using the Ruby gem, it'd be something along the lines of:

  cassandra.insert(:Indexes, key-of-the-chunk-of-data, { 'colour' => { 'blue' 
=> 1 }, 'size' => { 'large' => 1 } })

Q1: is this a reasonable approach? It *seems* to be what I've read is supposed 
to be done. The 1 is meaningless. Anyway, it executes without error in Ruby.

Q2: what is the syntax of the (Ruby) query to find the keys of all 'blue' 
chunks of data? I'm assuming get_range is the correct method, but what are the 
parameters? The docs say: get_range(column_family, options={}) but that seems 
to be missing a bit of detail, in particular the super column name.

Q2a: So I know there's a :start and :finish key supported in the options hash, 
inclusive, exclusive respectively. How do you define a range for equals with a 
UTF8 key? Surely not 'blue'.succ?? or by some kind of suffix??

Q2b: How do you specify the super column name 'colour'? Looking at the (Ruby) 
source of the get_range method and I'm unconvinced that this is implemented 
(seems to be a constant '' used where the super column name makes sense to be.)

Anyway I ended up hacking at the Ruby gem's source to use the column name where 
the '' was in the original, and didn't really get anywhere useful (I can find 
nothing, or everything, nothing in between).

Q3: If I am correct about what is supposed to be done, does the Ruby gem 
support it?

Q4: Does anyone know of some Ruby code that does and indexed lookup that they 
could point me at. (lots of code that indexes but nothing that searches by the 
index)

I'll try to take a look at some of the other Cassandra client implementations 
and see if I can get this model to work. Maybe just a Ruby problem?? With any 
luck, it'll be me messing up.

If it'd help I can post the source of what I have, but it'll need some cleanup. 
Let me know.

Thanks for taking the time to read this far :-)

Bob

----
Bob Hutchison
Recursive Design Inc.
http://www.recursive.ca/
weblog: http://xampl.com/so


----
Bob Hutchison
Recursive Design Inc.
http://www.recursive.ca/
weblog: http://xampl.com/so

How do you construct an index and use it, especially in Ruby

Reply via email to