Thanks for the answer. Can anybody else answer my other two questions, because my problem is not solved yet?
2013/2/3 Edward Capriolo <edlinuxg...@gmail.com>: > This was the issue that prompted the "WITH FILTERING ALLOWED": > > https://issues.apache.org/jira/browse/CASSANDRA-4915 > > Cassandra's storage system can only optimize certain queries. > > On Sun, Feb 3, 2013 at 2:07 PM, Paul van Hoven > <paul.van.ho...@googlemail.com> wrote: >> I'm not sure if I understood your answer. >> >>> When you have GB or TB of data any query that adds "WITH FILTERING" >>> will not work at scale. >> 1. You mean any query that requires "with filtering" is slow? >> >>> Secondary indexes need at least one equality. If you want to do this >>> at scale you might need a different design. >> 2. And what design would be recommendable then? >> >> 3. How should the query look like such that it would scale? >> >> >> >> 2013/2/3 Edward Capriolo <edlinuxg...@gmail.com>: >>> Secondary indexes need at least one equality. If you want to do this >>> at scale you might need a different design. >>> >>> Using WITH FILTERING and LIMIT 10 is simply grabbing the first few >>> random rows that match your criteria. >>> >>> When you have GB or TB of data any query that adds "WITH FILTERING" >>> will not work at scale. >>> >>> This is why it was added to the language CQL lets you do some queries >>> that "seem fast" when your developing with 10 rows, without this >>> clause you would not know if a query is fast because it hits a >>> cassandra index, or it is just fast because the results were found in >>> the first 10 rows. >>> >>> Edward >>> >>> On Sun, Feb 3, 2013 at 10:56 AM, Paul van Hoven >>> <paul.van.ho...@googlemail.com> wrote: >>>> Okay, here is the schema (actually it is in german, but I translated >>>> the column names such that it is easier to read for an international >>>> audience): >>>> >>>> cqlsh:demodb> describe table offerten_log_archiv; >>>> >>>> CREATE TABLE offerten_log_archiv ( >>>> offerte_id int PRIMARY KEY, >>>> aktionen int, >>>> angezeigt bigint, >>>> datum timestamp, >>>> gutschrift bigint, >>>> kampagne_id int, >>>> klicks int, >>>> klicks_ungueltig int, >>>> kosten bigint, >>>> statistik_id bigint, >>>> stunden int, >>>> werbeflaeche_id int, >>>> werbemittel_id int >>>> ) WITH >>>> bloom_filter_fp_chance=0.010000 AND >>>> caching='KEYS_ONLY' AND >>>> comment='' AND >>>> dclocal_read_repair_chance=0.000000 AND >>>> gc_grace_seconds=864000 AND >>>> read_repair_chance=0.100000 AND >>>> replicate_on_write='true' AND >>>> compaction={'class': 'SizeTieredCompactionStrategy'}; >>>> >>>> CREATE INDEX datum_key ON offerten_log_archiv (datum); >>>> >>>> CREATE INDEX stunden_key ON offerten_log_archiv (stunden); >>>> >>>> cqlsh:demodb> >>>> >>>> This is the query I'm trying to perform: >>>> cqlsh:demodb> select * from ola where date > '2013-01-01' and hour = 0 >>>> limit 10 allow filtering; >>>> Request did not complete within rpc_timeout. >>>> >>>> ola = offerten_log_archiv (table name) >>>> hour = stunde (column name) >>>> date = datum (column name) >>>> >>>> I hope this information makes my problem more clear. >>>> >>>> >>>> >>>> 2013/2/3 Edward Capriolo <edlinuxg...@gmail.com>: >>>>> Without seeing your schema it is hard to say, but in some cases "ALLOW >>>>> FILTERING" might be considered "EXPECT THIS COULD BE SLOW". It could >>>>> mean the query is not hitting and index and is going to page through >>>>> large amounts of data. >>>>> >>>>> On Sun, Feb 3, 2013 at 9:42 AM, Paul van Hoven >>>>> <paul.van.ho...@googlemail.com> wrote: >>>>>> After figuring out how to use the ">" operator on an secondary index I >>>>>> noticed that in a column family of about 5.5 million datasets I get a >>>>>> rpc_timeout when trying to read data from this table. In the concrete >>>>>> situation I want to request data younger than January 1 2013. The >>>>>> number of rows that should be affected are about 1 million. When doing >>>>>> the request I get a timeout error: >>>>>> >>>>>> cqlsh:demodb> select * from ola where date > '2013-01-01' and hour = 0 >>>>>> limit 10 allow filtering; >>>>>> Request did not complete within rpc_timeout. >>>>>> >>>>>> Actually I find this very confusing since I would except an >>>>>> exceptional performance gain in comparison to a similar sql query. >>>>>> Therefore, I think the query I'm performing is not appropriate for >>>>>> cassandra, although I would do a query like that in this manner on a >>>>>> sql database. So my question now is: How should I perfrom this query >>>>>> on cassandra?