Hi,  

I'm currently trying to implement an offline message retrieval solution wherein 
I retrieve messages after a particular timestamp for specific users. My 
question is will what route should I go for…multple primary keys on an IN 
clause or using 2i


The current model of the messages table looks something like this

CREATE TABLE msg_archive(
thread_id varchar,
ts timestamp,
msg blob,
PRIMARY KEY (thread_id, ts))



where thread_id is an alphabetized order of sender and recipient such as 
"brian|john"

Now, in order to retrieve the messages, I will have to retrieve them based on 
the number of contacts you have and as such the query will look something like 
this

SELECT * FROM msg_archive WHERE thread_id IN ('brian|john', 'brian|james'….) 
AND ts < 1234567890;

Ofcourse the list of friends a user can have can potentially reach around 500 
or even worse 1000 so the IN clause can potentially have these large amount of 
primary keys.




The question is will this work well or do I have to modify the schema such that 
we should incorporate secondary indexes And look something like this instead?

CREATE TABLE msg_archive(
thread_id varchar,
recipient varchar,
ts timestamp,
msg blob,
PRIMARY KEY (thread_id, ts))



CREATE INDEX ON msg_archive (recipient);

For the select statement, ofcourse it will be as simple as

SELECT * FROM msg_archive WHERE recipient = 'brian' AND ts < 1234567890;



Which is actually better in terms of performance? Or are there other 
suggestions to this kind of model?

Thanks!
Byron



--  
Byron Wang
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


Reply via email to