Seeking advice on Schema and Caching

2011-11-15 Thread Aditya Narayan
Hi I need to add 'search users' functionality to my application. (The trigger for fetching searched items(like google instant search) is made when 3 letters have been typed in). For this, I make a CF with String type keys. Each such key is made of first 3 letters of a user's name. Thus all

Re: Seeking advice on Schema and Caching

2011-11-15 Thread Aditya Narayan
Any insights on this ? On Tue, Nov 15, 2011 at 9:40 PM, Quintero quinteros8...@gmail.com wrote: Aditya Narayan ady...@gmail.com wrote: Hi I need to add 'search users' functionality to my application. (The trigger for fetching searched items(like google instant search) is made when 3

Re: Seeking advice on Schema and Caching

2011-11-15 Thread Aditya Narayan
using apache solr - you could then include just the row keys pointing back to Cassandra where the actual data is. Solr seems quite capable of performing google like searches and is fast. Cheers Ben On 16/11/2011, at 1:50 AM, Aditya Narayan ady...@gmail.com wrote: Hi I need to add

Re: Seeking advice on Schema and Caching

2011-11-15 Thread Aditya Narayan
Regarding the first option that you suggested through composite columns, can I store the username id both in the column name and keep the column valueless? Will I be able to retrieve both the username and id from the composite col name ? Thanks a lot On Wed, Nov 16, 2011 at 10:56 AM, Aditya

Store profile pics of users in Cassandra or file system ?

2011-11-11 Thread Aditya Narayan
Would it be recommended to store the profile pics of users on an application in Cassandra ? Or file system would be a better way to go. I came across an interesting paper which advocates storing in DB for blobs sized up to 1 MB. I was planning to store the image bytes in the same row that

Re: Store profile pics of users in Cassandra or file system ?

2011-11-11 Thread Aditya Narayan
just forgot to add the paper link if this is useful at all : To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystemhttp://research.microsoft.com/apps/pubs/default.aspx?id=64525 On Sat, Nov 12, 2011 at 12:34 AM, Aditya Narayan ady...@gmail.com wrote: Would it be recommended

Re: Concatenating ids with extension to keep multiple rows related to an entity in a single CF

2011-11-04 Thread Aditya Narayan
, Tyler Hobbs ty...@datastax.com wrote: On Thu, Nov 3, 2011 at 3:48 PM, Aditya Narayan ady...@gmail.com wrote: I am concatenating two Integer ids through bitwise operations(as described below) to create a single primary key of type long. I wanted to know if this is a good practice. This would help

Concatenating ids with extension to keep multiple rows related to an entity in a single CF

2011-11-03 Thread Aditya Narayan
I am concatenating two Integer ids through bitwise operations(as described below) to create a single primary key of type long. I wanted to know if this is a good practice. This would help me in keeping multiple rows of an entity in a single column family by appending different extensions to the

Re: Cassandra Cluster Admin - phpMyAdmin for Cassandra

2011-11-01 Thread Aditya Narayan
Yes that would be pretty nice feature to see! On Mon, Oct 31, 2011 at 10:45 PM, Ertio Lew ertio...@gmail.com wrote: Thanks so much SebWajam for this great piece of work! Is there a way to set a data type for displaying the column names/ values of a CF ? It seems that your project always

Re: Programmatically allow only one out of two types of rows in a CF to enter the CACHE

2011-10-29 Thread Aditya Narayan
..so that I can retrieve them through a single query. For reading cols from two CFs you need two queries, right ? On Sat, Oct 29, 2011 at 9:53 PM, Mohit Anchlia mohitanch...@gmail.comwrote: Why not use 2 CFs? On Fri, Oct 28, 2011 at 9:42 PM, Aditya Narayan ady...@gmail.com wrote: I need

Re: Programmatically allow only one out of two types of rows in a CF to enter the CACHE

2011-10-29 Thread Aditya Narayan
? Thanks you guys! Anthony On 28/10/2011, at 21:42 PM, Aditya Narayan wrote: I need to keep the data of some entities in a single CF but split in two rows for each entity. One row contains an overview information for the entity another row contains detailed information about entity. I

Re: Programmatically allow only one out of two types of rows in a CF to enter the CACHE

2011-10-29 Thread Aditya Narayan
: On Sat, Oct 29, 2011 at 11:23 AM, Aditya Narayan ady...@gmail.com wrote: @Mohit: I have stated the example scenarios in my first post under this heading. Also I have stated above why I want to split that data in two rows like Ikeda below stated, I'm too trying out to prevent

Re: Storing counters in the standard column families along with non-counter columns ?

2011-07-14 Thread Aditya Narayan
Thanks Aaron Chris, I appreciate your help. With dedicated CF for counters, in addition to the issue pointed by Chris, the major drawback I see is that I cant read *in a single query* the counters with the regular columns row which is widely required by my application. My use case is like

Re: Storing counters in the standard column families along with non-counter columns ?

2011-07-11 Thread Aditya Narayan
, Aditya Narayan wrote: Is there any target version in near future for which this has been promised ? The ticket is problematic in that it would -- unless someone has a clever new idea -- require breaking thrift compatibility to add it to the api. Since is unfortunate since it would be so

Re: Storing counters in the standard column families along with non-counter columns ?

2011-07-10 Thread Aditya Narayan
... https://issues.apache.org/jira/browse/CASSANDRA-2614 -sd On Sun, Jul 10, 2011 at 5:04 PM, Aditya Narayan ady...@gmail.com wrote: Is it now possible to store counters in the standard column families along with non counter type columns ? How to achieve this ?

Re: Storing counters in the standard column families along with non-counter columns ?

2011-07-10 Thread Aditya Narayan
, where as normal CF simply just add or replace. On Sun, Jul 10, 2011 at 10:39 PM, Aditya Narayan ady...@gmail.com wrote: Thanks for info. Is there any target version in near future for which this has been promised ? On Sun, Jul 10, 2011 at 9:12 PM, Sasha Dolgy sdo...@gmail.com wrote

Design for 'Most viewed Discussions' in a forum

2011-05-18 Thread Aditya Narayan
* For a discussions forum, I need to show a page of most viewed discussions. For implementing this, I maintain a count of views of a discussion when this views count of a discussion passes a certain threshold limit, the discussion Id is added to a row of most viewed discussions.

Re: Design for 'Most viewed Discussions' in a forum

2011-05-18 Thread Aditya Narayan
help minimize several versions of the same column in the row parts in different SST tables. On Wed, May 18, 2011 at 11:04 PM, Aditya Narayan ady...@gmail.com wrote: * For a discussions forum, I need to show a page of most viewed discussions. For implementing this, I maintain a count

Re: Design for 'Most viewed Discussions' in a forum

2011-05-18 Thread Aditya Narayan
by viewcount and you have what you are asking for ! This is a simplified version of what you should do but personnally I really like the combination of Cassandra and Redis. Victor 2011/5/18 Aditya Narayan ady...@gmail.com I would arrange for memtable flush period in such a manner that the time

Re: Splitting the data of a single blog into 2 CFs (to implement effective caching) according to views.

2011-03-08 Thread Aditya Narayan
the data from CF1 in CF2 as well (use a batch_mutation through whatever client you have). So when serving the second page you only need to read one row from CF2. Aaron On 8/03/2011, at 8:13 PM, Norman Maurer wrote: Yeah this make sense as far as I can tell. Bye, Norman 2011/3/8 Aditya

Does the memtable replace the old version of column with the new overwriting version or is it just a simple append ?

2011-03-08 Thread Aditya Narayan
, since Cassandra will have to read so many versions of the same column. If this is just replacement with old column then I guess read will be much better since it needs to see just single existing version of column. Thanks Aditya Narayan

Re: Does the memtable replace the old version of column with the new overwriting version or is it just a simple append ?

2011-03-08 Thread Aditya Narayan
of that happens during read (read repair). This is why reads are slower than writes because conflict resolution happens during read. Hope this answers the question! Thanks, -Naren On Tue, Mar 8, 2011 at 10:44 PM, Aditya Narayan ady...@gmail.com wrote: Do the overwrites of newly written

Splitting the data of a single blog into 2 CFs (to implement effective caching) according to views.

2011-03-07 Thread Aditya Narayan
My application displays list of several blogs' overview data (like blogTitle/ nameOfBlogger/ shortDescrption for each blog) on 1st page (in very much similar manner like Digg's newsfeed) and when the user selects a particular blog to see., the application takes him to that specific blog's full

What would be a good strategy for Storing the large text contents like blog posts in Cassandra.

2011-03-06 Thread Aditya Narayan
What would be a good strategy to store large text content/(blog posts of around 1500-3000 characters) in cassandra? I need to store these blog posts along with their metadata like bloggerId, blogTags. I am looking forward to store this data in a single row giving each attribute a single column.

Re: What would be a good strategy for Storing the large text contents like blog posts in Cassandra.

2011-03-06 Thread Aditya Narayan
, Aditya Narayan ady...@gmail.com wrote: What would be a good strategy to store large text content/(blog posts of around 1500-3000 characters)  in cassandra? I need to store these blog posts along with their metadata like bloggerId, blogTags. I am looking forward to store this data in a single row

Splitting a single row into multiple

2011-02-23 Thread Aditya Narayan
Does it make any difference if I split a row, that needs to be accessed together, into two or three rows and then read those multiple rows ?? (Assume the keys of all the three rows are known to me programatically since I split columns by certain categories). Would the performance be any better if

Re: Splitting a single row into multiple

2011-02-23 Thread Aditya Narayan
a single row read gets what you need. Aaron On 24/02/2011, at 5:59 AM, Aditya Narayan ady...@gmail.com wrote: Does it make any difference if I split a row, that needs to be accessed together, into two or three rows and then read those multiple rows ?? (Assume the keys of all the three rows

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-14 Thread Aditya Narayan
if the columns you ask for are really randomly distributed, then yes, the biggest the row is, the biggest the chance is to have to hit many blocks and the biggest the chance is for these block to be far apart on disk. -- Sylvain On Sun, Feb 13, 2011 at 10:19 PM, Aditya Narayan ady...@gmail.com

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-14 Thread Aditya Narayan
Thanks for the clarifications.. On Mon, Feb 14, 2011 at 6:13 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Mon, Feb 14, 2011 at 11:27 AM, Aditya Narayan ady...@gmail.com wrote: Thanks Sylvain, I guess I might have misunderstood the meaning of column_index_size_in_kb, My previous

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-13 Thread Aditya Narayan
Jonathan, If I ask for around 150-200 columns (totally random not sequential) from a very wide row that contains more than a million or even more columns then, is the read performance of the SliceQuery operation affected by or depends on the length of the row ?? (For my use case, I would use the

Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-12 Thread Aditya Narayan
What if the caching requirements, sorting needs of two kind of data are very much similar, is it preferable to go with a single CF in those cases ? Regards Aditya On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbsty...@datastax.com  wrote: I read somewhere that more no of column families is not a

Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-12 Thread Aditya Narayan
Any comments/view points on this? --On Sat, Feb 12, 2011 at 5:05 PM, Aditya Narayan ady...@gmail.comwrote: What if the caching requirements, sorting needs of two kind of data are very much similar, is it preferable to go with a single CF in those cases ? Regards Aditya On Sat, Feb 5

Re: Calculating the size of rows in KBs

2011-02-10 Thread Aditya Narayan
or fully read from disk during subsequent reads or compactions. On disk format  described here may help http://wiki.apache.org/cassandra/ArchitectureSSTable Hope that helps Aaron On 10/02/2011, at 11:56 PM, Aditya Narayan ady...@gmail.com wrote: How can I get or calculate the size

Re: Does variation in no of columns in rows over the column family has any performance impact ?

2011-02-07 Thread Aditya Narayan
Thanks for the detailed explanation Peter! Definitely cleared my doubts ! On Mon, Feb 7, 2011 at 1:52 PM, Peter Schuller peter.schul...@infidyne.com wrote: Does huge variation in no. of columns in rows, over the column family has *any* impact on the performance ? Can I have like just 100

Column Sorting of integer names

2011-02-04 Thread Aditya Narayan
Is there any way to sort the columns named as integers in the descending order ? Regards -Aditya

Re: Using Cassandra to store files

2011-02-04 Thread Aditya Narayan
I am also looking to possible solutions to store pdfs word documents. But why wont you store in them in the filesystem instead of a database unless your files are too small in which case it would be recommended to use a database. -Aditya On Fri, Feb 4, 2011 at 5:30 PM, Daniel Doubleday

Re: Using Cassandra to store files

2011-02-04 Thread Aditya Narayan
yes, definitely a database for mapping ofcourse! On Fri, Feb 4, 2011 at 11:17 PM, buddhasystem potek...@bnl.gov wrote: Even when storage is in NFS, Cassandra can still be quite useful as a file catalog. Your physical storage can change, move etc. Therefore, it's a good idea to provide mapping

Re: Sorting in time order without using TimeUUID type column names

2011-02-04 Thread Aditya Narayan
if this is in a row just for the user. Hope that helps. Aaron On 4 Feb 2011, at 01:32, Aditya Narayan wrote: If I use : TimestampOfDueTimeInFuture: UserId : ReminderCountOfThisUser as key pattern for the rows of reminders, then I am storing the key, just as it is, as the column name and thus

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-03 Thread Aditya Narayan
Thanks Tyler! On Thu, Feb 3, 2011 at 12:06 PM, Tyler Hobbs ty...@datastax.com wrote: On Wed, Feb 2, 2011 at 3:27 PM, Aditya Narayan ady...@gmail.com wrote: Can I have some more feedback about my schema perhaps somewhat more criticisive/harsh ? It sounds reasonable to me. Since you're

Sorting in time order without using TimeUUID type column names

2011-02-03 Thread Aditya Narayan
the opposite order) (Reminders need to be sorted in the timeline in the order of their due time.) Basically I am trying to avoid 16 bytes long timeUUID first because they are too long and the above defined key pattern is guaranteeing me a unique key/Id for the reminder row always. Thanks Aditya

Re: Sorting in time order without using TimeUUID type column names

2011-02-03 Thread Aditya Narayan
with timeuuids ? Are there are any downsides which I am not perhaps not aware of ? On Thu, Feb 3, 2011 at 5:43 PM, Sylvain Lebresne sylv...@datastax.com wrote: On Thu, Feb 3, 2011 at 11:27 AM, Aditya Narayan ady...@gmail.com wrote: Hey all, I want to store some columns that are reminders to the users

Re: Sorting in time order without using TimeUUID type column names

2011-02-03 Thread Aditya Narayan
with timeuuids ? Are there are any downsides which I am not perhaps not aware of ? On Thu, Feb 3, 2011 at 5:43 PM, Sylvain Lebresne sylv...@datastax.com wrote: On Thu, Feb 3, 2011 at 11:27 AM, Aditya Narayan ady...@gmail.com wrote: Hey all, I want to store some columns that are reminders

Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
or just a standard column family containing all the subcolumns data serialized in single column(s) ? Thanks Aditya Narayan

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
of tags associated with particular reminder. All tags set at once during first write. The no of tags(subcolumns) will be around 8 maximum. Any comments, suggestions and feedback on the schema design are requested.. Thanks Aditya Narayan On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayan ady

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
details would be picked up.. Is supercolumn a preferable choice for this ? Can there be a better schema than this ? -Aditya Narayan On Wed, Feb 2, 2011 at 8:54 PM, William R Speirs bill.spe...@gmail.com wrote: To reiterate, so I know we're both on the same page, your schema would

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
, then it's probably too much for a single row. I'm not familiar with the TTL functionality of Cassandra... sorry cannot help/comment there, still learning :-) Yea, my $0.02 is that this is an effective way to leverage super columns. Bill- On 02/02/2011 10:43 AM, Aditya Narayan wrote: I

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
in standard type column family. Thanks -Aditya Narayan On Wed, Feb 2, 2011 at 10:11 PM, William R Speirs bill.spe...@gmail.com wrote: I did not understand before... sorry. Again, depending upon how many reminders you have for a single user, this could be a long/wide row. Again, it really comes down

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
Can I have some more feedback about my schema perhaps somewhat more criticisive/harsh ? Thanks again, Aditya Narayan On Wed, Feb 2, 2011 at 10:27 PM, Aditya Narayan ady...@gmail.com wrote: @Bill Thank you BIll! @Cassandra users Can others also leave their suggestions and comments about my