Hello All, We have a schema which can be modelled as *(studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID)*. There can be ~1M studentIDs and for each studentID there can be ~10K subjectIDs. The queries can be using studentID and studentID-subjectID We have a 3 node (each having 24 cores) apache cassandra 2.0.4 cluster and are using datastax driver 2.0.0 to interact with it using its automatic paging feature. I've tried various fetch sizes varying from 100 to 10K and observed that read latency increases with fetch size (which looks obvious). At around 10K there are a lot of errors. Want to understand :-
- Is there a rule of thumb for deciding on the optimum fetch size ( *com.datastax.driver.core.Statement.setFetchSize()* ). - Does cassandra keeps the entire result in cache and only returns the rows corresponding to the fetch size or it treats subsequent as new queries ( *com.datastax.driver.core.**ResultSet.fetchMoreResults() *) - Whether the optimum fetch size depends on number of columns in CQL table for e.g. should fetch size in a table like *"**studentID int, subjectID int, marks1 int, marks2 int, marks3 int.... marksN int PRIMARY KEY(studentID, subjectID)"* be less than fetch size in *"studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID)"* -- Thanks & Regards, Apoorva