I'm not sure if I understood your answer.

> When you have GB or TB of data any query that adds "WITH FILTERING"
> will not work at scale.
1. You mean any query that requires "with filtering" is slow?

> Secondary indexes need at least one equality. If you want to do this
> at scale you might need a different design.
2. And what design would be recommendable then?

3. How should the query look like such that it would scale?



2013/2/3 Edward Capriolo <edlinuxg...@gmail.com>:
> Secondary indexes need at least one equality. If you want to do this
> at scale you might need a different design.
>
> Using WITH FILTERING and LIMIT 10 is simply grabbing the first few
> random rows that match your criteria.
>
> When you have GB or TB of data any query that adds "WITH FILTERING"
> will not work at scale.
>
> This is why it was added to the language CQL lets you do some queries
> that "seem fast" when your developing with 10 rows, without this
> clause you would not know if a query is fast because it hits a
> cassandra index, or it is just fast because the results were found in
> the first 10 rows.
>
> Edward
>
> On Sun, Feb 3, 2013 at 10:56 AM, Paul van Hoven
> <paul.van.ho...@googlemail.com> wrote:
>> Okay, here is the schema (actually it is in german, but I translated
>> the column names such that it is easier to read for an international
>> audience):
>>
>> cqlsh:demodb> describe table offerten_log_archiv;
>>
>> CREATE TABLE offerten_log_archiv (
>>   offerte_id int PRIMARY KEY,
>>   aktionen int,
>>   angezeigt bigint,
>>   datum timestamp,
>>   gutschrift bigint,
>>   kampagne_id int,
>>   klicks int,
>>   klicks_ungueltig int,
>>   kosten bigint,
>>   statistik_id bigint,
>>   stunden int,
>>   werbeflaeche_id int,
>>   werbemittel_id int
>> ) WITH
>>   bloom_filter_fp_chance=0.010000 AND
>>   caching='KEYS_ONLY' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.000000 AND
>>   gc_grace_seconds=864000 AND
>>   read_repair_chance=0.100000 AND
>>   replicate_on_write='true' AND
>>   compaction={'class': 'SizeTieredCompactionStrategy'};
>>
>> CREATE INDEX datum_key ON offerten_log_archiv (datum);
>>
>> CREATE INDEX stunden_key ON offerten_log_archiv (stunden);
>>
>> cqlsh:demodb>
>>
>> This is the query I'm trying to perform:
>> cqlsh:demodb> select * from ola where date > '2013-01-01' and hour = 0
>> limit 10 allow filtering;
>> Request did not complete within rpc_timeout.
>>
>> ola = offerten_log_archiv (table name)
>> hour = stunde (column name)
>> date = datum (column name)
>>
>> I hope this information makes my problem more clear.
>>
>>
>>
>> 2013/2/3 Edward Capriolo <edlinuxg...@gmail.com>:
>>> Without seeing your schema it is hard to say, but in some cases "ALLOW
>>> FILTERING" might be considered "EXPECT THIS COULD BE SLOW". It could
>>> mean the query is not hitting and index and is going to page through
>>> large amounts of data.
>>>
>>> On Sun, Feb 3, 2013 at 9:42 AM, Paul van Hoven
>>> <paul.van.ho...@googlemail.com> wrote:
>>>> After figuring out how to use the ">" operator on an secondary index I
>>>> noticed that in a column family of about 5.5 million datasets I get a
>>>> rpc_timeout when trying to read data from this table. In the concrete
>>>> situation I want to request data younger than January 1 2013. The
>>>> number of rows that should be affected are about 1 million. When doing
>>>> the request I get a timeout error:
>>>>
>>>> cqlsh:demodb> select * from ola where date > '2013-01-01' and hour = 0
>>>> limit 10 allow filtering;
>>>> Request did not complete within rpc_timeout.
>>>>
>>>> Actually I find this very confusing since I would except an
>>>> exceptional performance gain in comparison to a similar sql query.
>>>> Therefore, I think the query I'm performing is not appropriate for
>>>> cassandra, although I would do a query like that in this manner on a
>>>> sql database. So my question now is: How should I perfrom this query
>>>> on cassandra?

Reply via email to