Re: Read performance in map data type

Shrikar archak Thu, 03 Apr 2014 00:31:30 -0700

How about the client side socket limits? Cassandra client side maximum
connection per host and read consistency level?


~Shrikar


On Thu, Apr 3, 2014 at 12:20 AM, Apoorva Gaurav
<apoorva.gau...@myntra.com>wrote:

> At the client side we are getting a latency of ~350ms, we are using
> datastax driver 2.0.0 and have kept the fetch size as 500. And these are
> coming while reading rows having ~200 columns.
>
>
> On Thu, Apr 3, 2014 at 12:45 PM, Shrikar archak <shrika...@gmail.com>wrote:
>
>> Hi Apoorva,
>> As per the cfhistogram there are some rows which have more than 75k
>> columns and around 150k reads hit 2 SStables.
>>
>> Are you sure that you are seeing more than 500ms latency?  The
>> cfhistogram should the worst read performance was around 51ms
>> which looks reasonable with many reads hitting 2 sstables.
>>
>> Thanks,
>> Shrikar
>>
>>
>> On Wed, Apr 2, 2014 at 11:30 PM, Apoorva Gaurav <
>> apoorva.gau...@myntra.com> wrote:
>>
>>> Hello Shrikar,
>>>
>>> We are still facing read latency issue, here is the histogram
>>> http://pastebin.com/yEvMuHYh
>>>
>>>
>>> On Sat, Mar 29, 2014 at 8:11 AM, Apoorva Gaurav <
>>> apoorva.gau...@myntra.com> wrote:
>>>
>>>> Hello Shrikar,
>>>>
>>>> Yes primary key is (studentID, subjectID). I had dropped the test
>>>> table, recreating and populating it post which will share the cfhistogram.
>>>> In such case is there any practical limit on the rows I should fetch, for
>>>> e.g.
>>>> should I do
>>>>        select * form marks_table where studentID = ? limit 500;
>>>> instead of doing
>>>>        select * form marks_table where studentID = ?;
>>>>
>>>>
>>>> On Sat, Mar 29, 2014 at 5:20 AM, Shrikar archak <shrika...@gmail.com>wrote:
>>>>
>>>>> Hi Apoorva,
>>>>>
>>>>> I assume this is the table with studentId and subjectId  as primary
>>>>> keys and not other like like marks in that.
>>>>>
>>>>> create table marks_table(studentId int, subjectId int, marks int,
>>>>> PRIMARY KEY(studentId,subjectId));
>>>>>
>>>>> Also could you give the cfhistogram stats?
>>>>>
>>>>> nodetool cfhistograms <your keyspace> marks_table;
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Shrikar
>>>>>
>>>>>
>>>>> On Fri, Mar 28, 2014 at 3:53 PM, Apoorva Gaurav <
>>>>> apoorva.gau...@myntra.com> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> We've a schema which can be modeled as (studentID, subjectID, marks)
>>>>>> where combination of studentID and subjectID is unique. Number of 
>>>>>> studentID
>>>>>> can go up to 100 million and for each studentID we can have up to  10k
>>>>>> subjectIDs.
>>>>>>
>>>>>> We are using apahce cassandra 2.0.4 and datastax java driver
>>>>>> 1.0.4. We are using a four node cluster, each having 24 cores and 32GB
>>>>>> memory. I'm sure that the machines are not underperformant as on same 
>>>>>> test
>>>>>> bed we've consistently received <5ms response times for ~1b documents 
>>>>>> when
>>>>>> queried via primary key.
>>>>>>
>>>>>> I've tried three approaches, all of which result in significant
>>>>>> deterioration (>500 ms response time) in read query performance once 
>>>>>> number
>>>>>> of subjectIDs goes past ~100 for a studentID. Approaches are :-
>>>>>>
>>>>>> 1. model as (studentID int PRIMARY KEY, subjectID_marks_map map<int,
>>>>>> int>) and query by subjectID
>>>>>>
>>>>>> 2. model as (studentID int, subjectID int, marks int, PRIMARY
>>>>>> KEY(studentID, subjectID) and query as select * from marks_table where
>>>>>> studentID = ?
>>>>>>
>>>>>> 3. model as (studentID int, subjectID int, marks int, PRIMARY
>>>>>> KEY(studentID, subjectID) and query as select * from marks_table where
>>>>>> studentID = ? and subjectID in (?, ?, ?....?)  number of subjectIDs in
>>>>>> query being ~1K.
>>>>>>
>>>>>> What can be the bottlenecks. Is it better if we model as (studentID
>>>>>> int, subjct_marks_json text) and query by studentID.
>>>>>>
>>>>>> --
>>>>>> Thanks & Regards,
>>>>>> Apoorva
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards,
>>>> Apoorva
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Apoorva
>>>
>>
>>
>
>
> --
> Thanks & Regards,
> Apoorva
>

Re: Read performance in map data type

Reply via email to