Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jay Patel
, there is no data in the table. Table is empty. Query is fired on the empty table. From the tracing ouput, I don't understand why it's doing multiple scans on one node. With non-vnode, there is only one scan per node same query works fine. If you look at the output1.txt attached earlier, coordinator is firing

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Tyler Hobbs
On Fri, Sep 19, 2014 at 12:41 PM, Jay Patel pateljay3...@gmail.com wrote: Btw, there is no data in the table. Table is empty. Query is fired on the empty table. This is actually the worst case for secondary index lookups. From the tracing ouput, I don't understand why it's doing multiple

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread DuyHai Doan
. Table is empty. Query is fired on the empty table. This is actually the worst case for secondary index lookups. From the tracing ouput, I don't understand why it's doing multiple scans on one node. With non-vnode, there is only one scan per node same query works fine. If you look

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Tyler Hobbs
On Fri, Sep 19, 2014 at 4:19 PM, DuyHai Doan doanduy...@gmail.com wrote: But does it implies that with vnodes, there are actually extra work to do for scanning indices ? Yes. If yes, is this extra load rather I/O bound or CPU bound ? It doesn't necessarily change what the query

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Robert Coli
On Fri, Sep 19, 2014 at 2:19 PM, DuyHai Doan doanduy...@gmail.com wrote: But does it implies that with vnodes, there are actually extra work to do for scanning indices ? Vnodes are just nodes, so they have all the problems-associated-with-many-nodes one would get with 256x as many nodes.

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jay Patel
, at once. Also, internally that node should be able to just do one scan through all of the ranges held by it, isn't it? (e.g. [min(-9223372036854775808), max(-9193352069377957523), and (max(-9136021049555745100), max(-8959555493872108621)], etc. ] Seems like it needs to query data in token order

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jay Patel
, internally that node should be able to just do one scan through all of the ranges held by it, isn't it? (e.g. [min(-9223372036854775808), max(-9193352069377957523), and (max(-9136021049555745100), max(-8959555493872108621)], and etc. ] Seems like it needs to query data in token order. So, min

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Tyler Hobbs
to just do one scan through all of the ranges held by it, isn't it? (e.g. [min(-9223372036854775808), max(-9193352069377957523), and (max(-9136021049555745100), max(-8959555493872108621)], etc. ] Seems like it needs to query data in token order. So, min(-9223372036854775808), max

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jay Patel
(-9193352069377957523), and (max(-9136021049555745100), max(-8959555493872108621)], etc. ] Seems like it needs to query data in token order. So, min(-9223372036854775808), max(-*9193352069377957523*) on 192.168.51.22. But next range ((max(-*9193352069377957523*), max(-*9136021049555745100*)]) is on 192.168.51.25

Re: Help with select IN query in cassandra

2014-09-01 Thread Laing, Michael
This should work for your query requirements - 2 tables w same info because disk is cheap and writes are fast so optimize for reads: CREATE TABLE sensor_asset ( asset_id text, event_time timestamp, tuuid timeuuid, sensor_reading maptext, text, sensor_serial_number text, sensor_type

Re: Help with select IN query in cassandra

2014-09-01 Thread Jack Krupansky
Krupansky From: Laing, Michael Sent: Monday, September 1, 2014 9:33 AM To: user@cassandra.apache.org Subject: Re: Help with select IN query in cassandra This should work for your query requirements - 2 tables w same info because disk is cheap and writes are fast so optimize for reads: CREATE TABLE

Re: Help with select IN query in cassandra

2014-09-01 Thread Laing, Michael
” rather than the exercise in futility of doing a massive number of deletes and updates in place? -- Jack Krupansky *From:* Laing, Michael michael.la...@nytimes.com *Sent:* Monday, September 1, 2014 9:33 AM *To:* user@cassandra.apache.org *Subject:* Re: Help with select IN query in cassandra

Re: Help with select IN query in cassandra

2014-09-01 Thread Jack Krupansky
1, 2014 11:34 AM To: user@cassandra.apache.org Subject: Re: Help with select IN query in cassandra Did the OP propose that? On Mon, Sep 1, 2014 at 10:53 AM, Jack Krupansky j...@basetechnology.com wrote: One comment on deletions – aren’t deletions kind of an anti-pattern for modern data

Re: Help with select IN query in cassandra

2014-09-01 Thread Subodh Nijsure
Sent: Monday, September 1, 2014 11:34 AM To: user@cassandra.apache.org Subject: Re: Help with select IN query in cassandra Did the OP propose that? On Mon, Sep 1, 2014 at 10:53 AM, Jack Krupansky j...@basetechnology.com wrote: One comment on deletions – aren’t deletions kind of an anti

Re: Help with select IN query in cassandra

2014-09-01 Thread Subodh Nijsure
Thanks Michael I will certainly go with this approach for now. -Subodh On Mon, Sep 1, 2014 at 6:33 AM, Laing, Michael michael.la...@nytimes.com wrote: This should work for your query requirements - 2 tables w same info because disk is cheap and writes are fast so optimize for reads: CREATE

Java sample code for non-blocking async query

2014-09-01 Thread Gary Zhao
Hello I'm looking for non-blocking async query sample code. The one I found in the following link is async query but blocking. Could anyone share such code? http://www.datastax.com/documentation/developer/java-driver/1.0/java-driver/asynchronous_t.html Thanks Gary

Re: Java sample code for non-blocking async query

2014-09-01 Thread Stephen Portanova
for non-blocking async query sample code. The one I found in the following link is async query but blocking. Could anyone share such code? http://www.datastax.com/documentation/developer/java-driver/1.0/java-driver/asynchronous_t.html Thanks Gary -- Stephen Portanova (480) 495-2634

Help with select IN query in cassandra

2014-08-31 Thread Subodh Nijsure
ON sensor_info_table (event_time); CREATE INDEX timestamp_index ON sensor_info_table (timestamp); Now I am able to insert the data into this table, however I am unable to do following query where I want to select items with specific timeuuid values. It gives me following error. SELECT * from

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael
into this table, however I am unable to do following query where I want to select items with specific timeuuid values. It gives me following error. SELECT * from mydb.sensor_info_table where timestamp IN ( bfdfa614-3166-11e4-a61d-b888e30f5d17 , bf4521ac-3166-11e4-87a3-b888e30f5d17) ; Bad Request

Re: Help with select IN query in cassandra

2014-08-31 Thread Subodh Nijsure
ON sensor_info_table (event_time); CREATE INDEX timestamp_index ON sensor_info_table (timestamp); Now I am able to insert the data into this table, however I am unable to do following query where I want to select items with specific timeuuid values. It gives me following error. SELECT * from

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael
Hmm. Because the clustering key is (event_time, timestamp), event_time must be specified as well - hopefully that info is available to the ux. Unfortunately you will then hit another problem with your query: you are selecting a collection field... this will not work with IN on timestamp. So you

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael
: Hmm. Because the clustering key is (event_time, timestamp), event_time must be specified as well - hopefully that info is available to the ux. Unfortunately you will then hit another problem with your query: you are selecting a collection field... this will not work with IN on timestamp. So

Re: Help with select IN query in cassandra

2014-08-31 Thread Subodh Nijsure
timeuuid, sensor_reading maptext, text, sensor_serial_number text, sensor_type int, PRIMARY KEY ((asset_id, timestamp), event_time) ); It does what I want to do, and I removed the index for timestamp item since now it is part of primary key and thus my query like this works. SELECT * from

can not query data from cassandra

2014-08-20 Thread 鄢来琼
HI ALL, I setup Cassandra on a linux host. I have insert some data into “mykeyspace.cffex_l23” table. The following error are raised during query data from “mykeyspace.cffex_l23”. Could you give me any suggestion to fix it? According to “top” cmd, I found that most of the memory are used

答复: can not query data from cassandra

2014-08-20 Thread 鄢来琼
.png@01CF5897.E1268DE0] 发件人: 鄢来琼 [mailto:laiqiong@gtafe.com] 发送时间: 2014年8月20日 14:13 收件人: user@cassandra.apache.org 主题: can not query data from cassandra HI ALL, I setup Cassandra on a linux host. I have insert some data into “mykeyspace.cffex_l23” table. The following error are raised during

Re: 答复: can not query data from cassandra

2014-08-20 Thread Mark Reddy
.E1268DE0] *发件人:* 鄢来琼 [mailto:laiqiong@gtafe.com] *发送时间:* 2014年8月20日 14:13 *收件人:* user@cassandra.apache.org *主题:* can not query data from cassandra HI ALL, I setup Cassandra on a linux host. I have insert some data into “mykeyspace.cffex_l23” table. The following error

Re: range query times out (on 1 node, just 1 row in table)

2014-08-20 Thread Subodh Nijsure
on this but all I have come up with is this (old, non-authoritative) blog post which states Cassandra’s native index is like a hashed index, which means you can only do equality query and not range query. Somewhere in google I'm pretty sure you can find me on this list explaining the basic case

Strange select result when using date grater than query

2014-08-17 Thread Subodh Nijsure
| 61.97 |73.97 Now if I execute a query : SELECT asset_id,event_time,sensor_type, temperature,humidity from temp_humidity_data where asset_id='2' and event_time '2014-08-17 03:33:20' ALLOW FILTERING; it gives me back same results (!), I expected it to give me 0 results. asset_id | event_time

Re: Strange select result when using date grater than query

2014-08-17 Thread Jack Krupansky
Are you more than 7 time zones behind GMT? If so, that would make 03:33 your query less than 03:33-0700 Your query is using the default time zone, which will be the time zone configured for the coordinator node executing the query. IOW, where are you? -- Jack Krupansky -Original

Re: Strange select result when using date grater than query

2014-08-17 Thread Subodh Nijsure
-0500 | 1 | 67.228 | 91.228 2 | 2014-08-17 05:33:19-0500 | 1 | 61.97 |73.97 So for query i though I should be giving time strings in local timezone too, no? -Subodh On Sun, Aug 17, 2014 at 5:17 AM, Jack Krupansky j...@basetechnology.com wrote: Are you

Re: Strange select result when using date grater than query

2014-08-17 Thread Jack Krupansky
I should have asked where your coordinator node is located. Check its time zone, relative to GMT. cqlsh is simply formatting the time stamp for your local display. That is separate from the actual query execution on the server coordinator node. cqlsh is merely a client, not the server

Re: Strange select result when using date grater than query

2014-08-17 Thread Subodh Nijsure
I am running csql on same machine as my cassandra server. I am observing really strange behavior if I do this query all 3 rows show up. SELECT asset_id,event_time,sensor_type, temperature,humidity from temp_humidity_data ALLOW FILTERING; asset_id | event_time | sensor_type

Re: range query times out (on 1 node, just 1 row in table)

2014-08-13 Thread Ian Rose
Confusingly, it appears to be the presence of an index on int_val that is causing this timeout. If I drop that index (leaving only the index on foo_name) the query works just fine. On Tue, Aug 12, 2014 at 10:25 PM, Ian Rose ianr...@fullstory.com wrote: Hi - I am currently running a single

Re: range query times out (on 1 node, just 1 row in table)

2014-08-13 Thread DuyHai Doan
Hello Ian Secondary index performs poorly with inequalities (, ≤, , ≥). Indeed inequalities forces the server to scan all the cluster to find the requested range, which is clearly not optimal. That's the reason why you need to add ALLOW FILTERING for the query to be accepted. ALLOW FILTERING

Re: range query times out (on 1 node, just 1 row in table)

2014-08-13 Thread Jack Krupansky
Agreed, but... in this case the table has ONE row, so what exactly could be causing this timeout? I mean, it can’t be the row count, right? -- Jack Krupansky From: DuyHai Doan Sent: Wednesday, August 13, 2014 9:01 AM To: user@cassandra.apache.org Subject: Re: range query times out (on 1 node

Re: range query times out (on 1 node, just 1 row in table)

2014-08-13 Thread DuyHai Doan
the condition int_val0 -- read from the 2nd index int_val where partition key 0, so basically it is a range scan Once it gets all the results from 2nd indices, C* can query the primary table to return data. I've read somewhere that when having multiple conditions in the WHERE clause, C* should

Re: range query times out (on 1 node, just 1 row in table)

2014-08-13 Thread Ian Rose
Frankly, no matter how inefficient / expensive the query is, surely it should still work when there is only 1 row and 1 node (which is localhost)! I'm starting to wonder if range queries on secondary indexes aren't supported at all (although if that is the case, I would certainly prefer an error

Re: range query times out (on 1 node, just 1 row in table)

2014-08-13 Thread Sylvain Lebresne
) values('dave', 27, 100); This query works fine: select * from foo where foo_name='dave'; But when I run this query, I get an RPC timeout: select * from foo where foo_name='dave' and int_val 0 allow filtering; With tracing enabled, here is the trace output: http://pastebin.com/raw.php?i

Re: range query times out (on 1 node, just 1 row in table)

2014-08-13 Thread Ian Rose
(int_val); CREATE INDEX ON foo (foo_name); I have inserted just a single row into this table: insert into foo(foo_name, foo_shard, int_val) values('dave', 27, 100); This query works fine: select * from foo where foo_name='dave'; But when I run this query, I get an RPC timeout: select * from

Re: range query times out (on 1 node, just 1 row in table)

2014-08-13 Thread Robert Coli
a definitive answer on this but all I have come up with is this (old, non-authoritative) blog post which states Cassandra’s native index is like a hashed index, which means you can only do equality query and not range query. Somewhere in google I'm pretty sure you can find me on this list

range query times out (on 1 node, just 1 row in table)

2014-08-12 Thread Ian Rose
, foo_shard)) ) WITH read_repair_chance=0.1; CREATE INDEX ON foo (int_val); CREATE INDEX ON foo (foo_name); I have inserted just a single row into this table: insert into foo(foo_name, foo_shard, int_val) values('dave', 27, 100); This query works fine: select * from foo where foo_name='dave'; But when I

Re: horizontal query scaling issues follow on

2014-07-23 Thread Diane Griffith
I posted the query wrong, I gave the query for 1 key versus the large batch of ids like I was testing. What it was using for large batch was IN, so Select * from foo where key IN and col_name='LATEST So after breaking it down and reading as much as I can with regard to our - schema

Should PREPARE QUERY return metadata for the query result?

2014-07-23 Thread Ben Hood
Hi all, I'm looking at the specification of statement preparation (section 4.2.5.4 of the CQL protocol) and I'm wondering whether the metadata result of the PREPARE query only returns column information for the query arguments, and not for the columns of the actual query result. The background

Re: Should PREPARE QUERY return metadata for the query result?

2014-07-23 Thread Ben Hood
of the body of a Prepared result is: idmetadatametadata where: - id is [short bytes] representing the prepared query ID. - metadata is defined exactly as for a Rows RESULT (See section 4.2.5.2) - this represents the type information for the query arguments - metadata is defined exactly

Re: Should PREPARE QUERY return metadata for the query result?

2014-07-23 Thread Ben Hood
of the body of a Prepared result is: idmetadataresult_metadata where: - id is [short bytes] representing the prepared query ID. - metadata is defined exactly as for a Rows RESULT (See section 4.2.5.2; you can however assume that the Has_more_pages flag is always off

Re: horizontal query scaling issues follow on

2014-07-23 Thread Benedict Elliott Smith
if you find that adding nodes causes performance to degrade I would suspect that you are querying data in one CQL statement that is spread over multiple partitions This is exactly what is happening. The better way to query multiple partitions is to simply despatch multiple queries

Re: horizontal query scaling issues follow on

2014-07-21 Thread Jonathan Lacefield
Hello, Here is the documentation for cfhistograms, which is in microseconds. http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsCFhisto.html Your question about setting timeouts is subjective, but you have set your timeout limits to 4 mins, which seems excessive. The

Re: horizontal query scaling issues follow on

2014-07-21 Thread Diane Griffith
So I appreciate all the help so far. Upfront, it is possible the schema and data query pattern could be contributing to the problem. The schema was born out of certain design requirements. If it proves to be part of what makes the scalability crumble, then I hope it will help shape the design

Re: horizontal query scaling issues follow on

2014-07-20 Thread Diane Griffith
I am running tests again across different number of client threads and number of nodes but this time I tweaked some of the timeouts configured for the nodes in the cluster. I was able to get better performance on the nodes at 10 client threads by upping 4 timeout values in cassandra.yaml to

Re: horizontal query scaling issues follow on

2014-07-18 Thread Diane Griffith
. -- Jack Krupansky *From:* Diane Griffith dfgriff...@gmail.com *Sent:* Thursday, July 17, 2014 6:21 PM *To:* user user@cassandra.apache.org *Subject:* Re: horizontal query scaling issues follow on So do partitions equate to tokens/vnodes? If so we had configured all cluster nodes/vms

Re: horizontal query scaling issues follow on

2014-07-18 Thread Benedict Elliott Smith
keys and a large number of clustering columns, or does each row have a unique partition key and no clustering columns. -- Jack Krupansky *From:* Diane Griffith dfgriff...@gmail.com *Sent:* Thursday, July 17, 2014 6:21 PM *To:* user user@cassandra.apache.org *Subject:* Re: horizontal query

Re: horizontal query scaling issues follow on

2014-07-18 Thread Diane Griffith
and a large number of clustering columns, or does each row have a unique partition key and no clustering columns. -- Jack Krupansky *From:* Diane Griffith dfgriff...@gmail.com *Sent:* Thursday, July 17, 2014 6:21 PM *To:* user user@cassandra.apache.org *Subject:* Re: horizontal query

Re: horizontal query scaling issues follow on

2014-07-18 Thread Tyler Hobbs
On Fri, Jul 18, 2014 at 8:01 AM, Diane Griffith dfgriff...@gmail.com wrote: Partition Size (bytes) 1109 bytes: 1800 Cell Count per Partition 8 cells: 1800 meaning I can't glean anything about how it partitioned or if it broke a key across partitions from this right? Does it mean

Re: horizontal query scaling issues follow on

2014-07-18 Thread Diane Griffith
:13.072-0400”, “{key:1109dccb-169b-40ef-b7f8-d072f04d8139,keyType: Type1,state:state1,timestamp:1303305553072,eventId:40902,executionId:31082}”) CQL Read: SELECT col_value from foo where key=”Type1:1109dccb-169b-40ef-b7f8-d072f04d8139“ and col_name=”LATEST“ Read result from above query

horizontal query scaling issues follow on

2014-07-17 Thread Diane Griffith
Procedure: - Inserted 54 million cells in 18 million rows (so 3 cells per row), using randomly generated row keys. That was to be our data control for the test. - Spawn a client on a different VM to query 100k rows and do that for 100 reps. Each row key queried is drawn randomly

Re: horizontal query scaling issues follow on

2014-07-17 Thread Jack Krupansky
in a single partition would certainly not be a test of “horizontal scaling” (adding nodes to handle more data – more token values or partitions.) -- Jack Krupansky From: Diane Griffith Sent: Thursday, July 17, 2014 1:33 PM To: user Subject: horizontal query scaling issues follow

Re: horizontal query scaling issues follow on

2014-07-17 Thread Diane Griffith
didn't think I was hitting an i/o wall on the client vm (separate vm) where we command line scripted our query call to the cassandra cluster. I can break the client call load across vms which I tried early on. Happy to verify that again though. So given that I was assuming the partitions were

Re: horizontal query scaling issues follow on

2014-07-17 Thread Robert Coli
On Thu, Jul 17, 2014 at 3:21 PM, Diane Griffith dfgriff...@gmail.com wrote: So do partitions equate to tokens/vnodes? A partition is what used to be called a row. Each individual token in the token ring can contain a partition, which you request using the token as the key. A token range is

Re: horizontal query scaling issues follow on

2014-07-17 Thread Diane Griffith
So I stripped out the number of clients experiment path information. It is unclear if I can only show horizontal scaling by also spawning many client requests all working at once. So that is why I stripped that information out to distill what our original attempt was at how to show horizontal

Re: horizontal query scaling issues follow on

2014-07-17 Thread Robert Coli
On Thu, Jul 17, 2014 at 5:16 PM, Diane Griffith dfgriff...@gmail.com wrote: I did tests comparing 1, 2, 10, 20, 50, 100 clients spawned all querying. Performance on 2 nodes starts to degrade from 10 clients on. I saw similar behavior on 4 nodes but haven't done the official runs on that yet.

Re: horizontal query scaling issues follow on

2014-07-17 Thread Jack Krupansky
and whether you are using a small number of partition keys and a large number of clustering columns, or does each row have a unique partition key and no clustering columns. -- Jack Krupansky From: Diane Griffith Sent: Thursday, July 17, 2014 6:21 PM To: user Subject: Re: horizontal query scaling

Re: horizontal query scaling issues follow on

2014-07-17 Thread Jonathan Haddad
The problem with starting without vnodes is moving to them is a bit hairy. In particular, nodetool shuffle has been reported to take an extremely long time (days, weeks). I would start with vnodes if you have any intent on using them. On Thu, Jul 17, 2014 at 6:03 PM, Robert Coli

How to get different columns for different rows in one query from Cassandra?

2014-07-09 Thread srinivas rao
Hi, Is there any way to get values for column column1 for key rowkey1 and column column2 for key rowkey2 and column columns2 and column3 for key rowkey3 etc' from Cassandra in one single query? Thanks Srini

Re: RPC timeout paging secondary index query results

2014-07-02 Thread Phil Luckhurst
to avoid them. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RPC-timeout-paging-secondary-index-query-results-tp7595078p7595486.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: RPC timeout paging secondary index query results

2014-07-01 Thread Ken Hancock
phil.luckhu...@powerassure.com wrote: But would you expect performance to drop off so quickly? At 250,000 records we can still page through the query with LIMIT 5 but when adding an additional 50,000 records we can't page past the first 10,000 records even if we drop to LIMIT 10. What

Read 75k live rows in a query that should only return 500 (in queue-like table).

2014-06-30 Thread Kevin Burton
my schema is: bucket: int sequence: long value: text… primary key( bucket, sequence ) … value is just a big chunk of html. sequence is a timestamp essentially. I have 100 buckets… and that's the partition key. So I can stick these buckets across 100 servers token ranges. The query is specified

CQL IN query with 2i index

2014-06-14 Thread tommaso barbugli
Hi there, I was wondering if there is a good reason for select queries on secondary indexes to not support any where operator other than the equality operator, or if its just a missing feature in CQL. Thanks, Tommaso

Re: RPC timeout paging secondary index query results

2014-06-13 Thread Phil Luckhurst
But would you expect performance to drop off so quickly? At 250,000 records we can still page through the query with LIMIT 5 but when adding an additional 50,000 records we can't page past the first 10,000 records even if we drop to LIMIT 10. What about the case where we add 100,000 records

Re: CQL query regarding indexes

2014-06-13 Thread Akash Pandey
in your query (which being a date u can get from the timestamp you are searching (eg 140154480)) and the range of timestamps you w​ant. You wont need any secondary indices in this solution. If you need to make some queries on partition id also, keep the original table but you'll need the above

Re: Cannot query secondary index

2014-06-13 Thread Jonathan Lacefield
the effort of the manual delete. However, you would still have to insert into this separate table per the index item. The cost of the every once in a while delete may be infrequent enough for you to do what you were actually trying to do in the first place, use a secondary index and query the table

Re: Cannot query secondary index

2014-06-13 Thread Mohit Anchlia
of the every once in a while delete may be infrequent enough for you to do what you were actually trying to do in the first place, use a secondary index and query the table leveraging the ALLOW FILTERING clause. My recommendation would be to: 1) leverage TTLs 2) see what type of load

Re: Large number of row keys in query kills cluster

2014-06-12 Thread Peter Sanford
On Wed, Jun 11, 2014 at 9:17 PM, Jack Krupansky j...@basetechnology.com wrote: Hmmm... that multipl-gets section is not present in the 2.0 doc: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html Was that intentional – is that

Re: Large number of row keys in query kills cluster

2014-06-12 Thread Jeremy Jongsma
at 10:12 AM, Jeremy Jongsma jer...@barchart.com wrote: The big problem seems to have been requesting a large number of row keys combined with a large number of named columns in a query. 20K rows with 20K columns destroyed my cluster. Splitting it into slices of 100 sequential queries fixed

Re: Large number of row keys in query kills cluster

2014-06-12 Thread Laing, Michael
requesting a large number of row keys combined with a large number of named columns in a query. 20K rows with 20K columns destroyed my cluster. Splitting it into slices of 100 sequential queries fixed the performance issue. When updating 20K rows at a time, I saw a different issue

Re: RPC timeout paging secondary index query results

2014-06-12 Thread Phil Luckhurst
The problem appears to be directly related to number of entries in the index. I started with an empty table and added 50,000 entries at a time with the same indexed value. I was able to page through the results of a query that used the secondary index with 250,000 records in the table using

Re: RPC timeout paging secondary index query results

2014-06-12 Thread Robert Coli
On Thu, Jun 12, 2014 at 9:18 AM, Phil Luckhurst phil.luckhu...@powerassure.com wrote: The problem appears to be directly related to number of entries in the index. I started with an empty table and added 50,000 entries at a time with the same indexed value. All requests in Cassandra are

CQL query regarding indexes

2014-06-12 Thread Roshan
' : 'LZ4Compressor', 'chunk_length_kb' : 64 }; CREATE INDEX idx_messagepayload_senttime ON services.messagepayload (senttime); While I am running the below query I am getting an exception. SELECT * FROM b_bank_services.messagepayload WHERE senttime=140154480 AND senttime=140171760 ALLOW FILTERING

Re: CQL query regarding indexes

2014-06-12 Thread Bulat Shakirzyanov
As far as I can tell, the problem is that you're not using a partition key in your query. AFAIK, you always have to use partition key in where clause. And ALLOW FILTERING option is to let cassandra filter data from the rows it found using the partition key. One way to solve it is to make

Re: CQL query regarding indexes

2014-06-12 Thread Jabbar Azam
(senttime); While I am running the below query I am getting an exception. SELECT * FROM b_bank_services.messagepayload WHERE senttime=140154480 AND senttime=140171760 ALLOW FILTERING; com.datastax.driver.core.exceptions.InvalidQueryException: No indexed columns present in by-columns

RPC timeout paging secondary index query results

2014-06-11 Thread Phil Luckhurst
Is paging through the results of a secondary index query broken in Cassandra 2.0.7 or are we doing something wrong? We have table with a few hundred thousand records and an indexed low-cardinality column. The relevant bits of the table definition are shown below CREATE TABLE measurement

Re: Large number of row keys in query kills cluster

2014-06-11 Thread Jeremy Jongsma
I'm using Astyanax with a query like this: clusterContext .getClient() .getKeyspace(instruments) .prepareQuery(INSTRUMENTS_CF) .setConsistencyLevel(ConsistencyLevel.CL_LOCAL_QUORUM) .getKeySlice(new String[] { ROW1, ROW2, // 20,000 keys here... ROW2 }) .execute

Re: Large number of row keys in query kills cluster

2014-06-11 Thread Jeremy Jongsma
The big problem seems to have been requesting a large number of row keys combined with a large number of named columns in a query. 20K rows with 20K columns destroyed my cluster. Splitting it into slices of 100 sequential queries fixed the performance issue. When updating 20K rows at a time, I

Re: Large number of row keys in query kills cluster

2014-06-11 Thread Robert Coli
to me - it shouldn't be possible to completely lock up a cluster with a valid query that isn't doing a table scan, should it? There's lots of valid SQL queries which will lock up your server, for some values of lock up? =Rob

Re: RPC timeout paging secondary index query results

2014-06-11 Thread Robert Coli
On Wed, Jun 11, 2014 at 2:24 AM, Phil Luckhurst phil.luckhu...@powerassure.com wrote: Is paging through the results of a secondary index query broken in Cassandra 2.0.7 or are we doing something wrong? General feedback on questions of this type : http://mail-archives.apache.org/mod_mbox

Re: RPC timeout paging secondary index query results

2014-06-11 Thread DuyHai Doan
I like the - Provides the illusion that you are using a RDBMS. part ;-) On Wed, Jun 11, 2014 at 8:52 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jun 11, 2014 at 2:24 AM, Phil Luckhurst phil.luckhu...@powerassure.com wrote: Is paging through the results of a secondary index query

Re: RPC timeout paging secondary index query results

2014-06-11 Thread Phil Luckhurst
must be doing something wrong for it to appear to be this broken. Phil -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RPC-timeout-paging-secondary-index-query-results-tp7595078p7595092.html Sent from the cassandra-u...@incubator.apache.org

Re: RPC timeout paging secondary index query results

2014-06-11 Thread Robert Coli
On Wed, Jun 11, 2014 at 12:43 PM, Phil Luckhurst phil.luckhu...@powerassure.com wrote: It just seems that what we are trying to do here is such basic functionality of an index that I thought we must be doing something wrong for it to appear to be this broken. To be clear, I did not read or

Re: Large number of row keys in query kills cluster

2014-06-11 Thread Peter Sanford
On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma jer...@barchart.com wrote: The big problem seems to have been requesting a large number of row keys combined with a large number of named columns in a query. 20K rows with 20K columns destroyed my cluster. Splitting it into slices of 100

Re: Large number of row keys in query kills cluster

2014-06-11 Thread Jack Krupansky
batches” as an anti-pattern: http://www.slideshare.net/mattdennis -- Jack Krupansky From: Peter Sanford Sent: Wednesday, June 11, 2014 7:34 PM To: user@cassandra.apache.org Subject: Re: Large number of row keys in query kills cluster On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma jer

Re: Cannot query secondary index

2014-06-10 Thread Redmumba
easily. I've seen the following: 1. Use date-based tables, then drop old tables, ala audit_table_20140610, audit_table_20140609, etc.. But then I run into the issue of having to query every table--I would have to execute queries against every day to get the data, and then merge

Re: Cannot query secondary index

2014-06-10 Thread Paulo Ricardo Motta Gomes
, audit_table_20140609, etc.. But then I run into the issue of having to query every table--I would have to execute queries against every day to get the data, and then merge the data myself. Unless, there's something in the binary driver I'm missing, it doesn't sound like this would

Large number of row keys in query kills cluster

2014-06-10 Thread Jeremy Jongsma
I ran an application today that attempted to fetch 20,000+ unique row keys in one query against a set of completely empty column families. On a 4-node cluster (EC2 m1.large instances) with the recommended memory settings (2 GB heap), every single node immediately ran out of memory and became

Re: Large number of row keys in query kills cluster

2014-06-10 Thread DuyHai Doan
Hello Jeremy Basically what you are doing is to ask Cassandra to do a distributed full scan on all the partitions across the cluster, it's normal that the nodes are somehow stressed. How did you make the query? Are you using Thrift or CQL3 API? Please note that there is another way to get

Re: Large number of row keys in query kills cluster

2014-06-10 Thread Jeremy Jongsma
on all the partitions across the cluster, it's normal that the nodes are somehow stressed. How did you make the query? Are you using Thrift or CQL3 API? Please note that there is another way to get all partition keys : SELECT DISTINCT partition_key FROM..., more details here

Re: Large number of row keys in query kills cluster

2014-06-10 Thread Laing, Michael
Perhaps if you described both the schema and the query in more detail, we could help... e.g. did the query have an IN clause with 2 keys? Or is the key compound? More detail will help. On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma jer...@barchart.com wrote: I didn't explain clearly - I'm

Cannot query secondary index

2014-06-09 Thread Redmumba
I have a table with a timestamp column on it; however, when I try to query based on it, it fails saying that I must use ALLOW FILTERING--which to me, means its not using the secondary index. Table definition is (snipping out irrelevant parts)... CREATE TABLE audit ( id bigint, date

Re: Cannot query secondary index

2014-06-09 Thread Jonathan Lacefield
Hello, You are receiving this item because you are not passing in the Partition Key as part of your query. Cassandra is telling you it doesn't know which node to find the data and you haven't explicitly told it to search across all your nodes for the data. The ALLOW FILTERING clause bypasses

Re: Cannot query secondary index

2014-06-09 Thread Michal Michalski
Secondary indexes internally are just CFs that map the indexed value to a row key which that value belongs to, so you can only query these indexes using =, not , = etc. However, your query does not require index *IF* you provide a row key - you can use or like you did for the date column

Re: Cannot query secondary index

2014-06-09 Thread Redmumba
michal.michal...@boxever.com wrote: Secondary indexes internally are just CFs that map the indexed value to a row key which that value belongs to, so you can only query these indexes using =, not , = etc. However, your query does not require index *IF* you provide a row key - you can use or like you

Re: Cannot query secondary index

2014-06-09 Thread Redmumba
of auditing data, for example, I'd need to query all 60 tables--can I do that smoothly? Or do I have to have 60 different select statements? Is there a way for me to run the same query against all the tables? On Mon, Jun 9, 2014 at 3:42 PM, Redmumba redmu...@gmail.com wrote: Ah, so the secondary

<    2   3   4   5   6   7   8   9   10   11   >