I'm not sure if I understood your answer. > When you have GB or TB of data any query that adds "WITH FILTERING" > will not work at scale. 1. You mean any query that requires "with filtering" is slow?
> Secondary indexes need at least one equality. If you want to do this > at scale you might need a different design. 2. And what design would be recommendable then? 3. How should the query look like such that it would scale? 2013/2/3 Edward Capriolo <edlinuxg...@gmail.com>: > Secondary indexes need at least one equality. If you want to do this > at scale you might need a different design. > > Using WITH FILTERING and LIMIT 10 is simply grabbing the first few > random rows that match your criteria. > > When you have GB or TB of data any query that adds "WITH FILTERING" > will not work at scale. > > This is why it was added to the language CQL lets you do some queries > that "seem fast" when your developing with 10 rows, without this > clause you would not know if a query is fast because it hits a > cassandra index, or it is just fast because the results were found in > the first 10 rows. > > Edward > > On Sun, Feb 3, 2013 at 10:56 AM, Paul van Hoven > <paul.van.ho...@googlemail.com> wrote: >> Okay, here is the schema (actually it is in german, but I translated >> the column names such that it is easier to read for an international >> audience): >> >> cqlsh:demodb> describe table offerten_log_archiv; >> >> CREATE TABLE offerten_log_archiv ( >> offerte_id int PRIMARY KEY, >> aktionen int, >> angezeigt bigint, >> datum timestamp, >> gutschrift bigint, >> kampagne_id int, >> klicks int, >> klicks_ungueltig int, >> kosten bigint, >> statistik_id bigint, >> stunden int, >> werbeflaeche_id int, >> werbemittel_id int >> ) WITH >> bloom_filter_fp_chance=0.010000 AND >> caching='KEYS_ONLY' AND >> comment='' AND >> dclocal_read_repair_chance=0.000000 AND >> gc_grace_seconds=864000 AND >> read_repair_chance=0.100000 AND >> replicate_on_write='true' AND >> compaction={'class': 'SizeTieredCompactionStrategy'}; >> >> CREATE INDEX datum_key ON offerten_log_archiv (datum); >> >> CREATE INDEX stunden_key ON offerten_log_archiv (stunden); >> >> cqlsh:demodb> >> >> This is the query I'm trying to perform: >> cqlsh:demodb> select * from ola where date > '2013-01-01' and hour = 0 >> limit 10 allow filtering; >> Request did not complete within rpc_timeout. >> >> ola = offerten_log_archiv (table name) >> hour = stunde (column name) >> date = datum (column name) >> >> I hope this information makes my problem more clear. >> >> >> >> 2013/2/3 Edward Capriolo <edlinuxg...@gmail.com>: >>> Without seeing your schema it is hard to say, but in some cases "ALLOW >>> FILTERING" might be considered "EXPECT THIS COULD BE SLOW". It could >>> mean the query is not hitting and index and is going to page through >>> large amounts of data. >>> >>> On Sun, Feb 3, 2013 at 9:42 AM, Paul van Hoven >>> <paul.van.ho...@googlemail.com> wrote: >>>> After figuring out how to use the ">" operator on an secondary index I >>>> noticed that in a column family of about 5.5 million datasets I get a >>>> rpc_timeout when trying to read data from this table. In the concrete >>>> situation I want to request data younger than January 1 2013. The >>>> number of rows that should be affected are about 1 million. When doing >>>> the request I get a timeout error: >>>> >>>> cqlsh:demodb> select * from ola where date > '2013-01-01' and hour = 0 >>>> limit 10 allow filtering; >>>> Request did not complete within rpc_timeout. >>>> >>>> Actually I find this very confusing since I would except an >>>> exceptional performance gain in comparison to a similar sql query. >>>> Therefore, I think the query I'm performing is not appropriate for >>>> cassandra, although I would do a query like that in this manner on a >>>> sql database. So my question now is: How should I perfrom this query >>>> on cassandra?