Hint: using the Java driver, you can set the fetchSize to tell the driver
how many CQL rows to fetch for each page.
Depending on the size (in bytes) of each CQL row, it would be useful to
tune this fetchSize value to avoid loading too much data into memory for
each page
On Wed, Jan 28, 2015 at
Hi -
Over the last few weeks, I have seen several emails on this mailing list from
people trying to extract all data from C*, so that they can import that data
into other analytical tools that provide much richer analytics functionality
than C*. Extracting all data from C* is a full-table
This is hard to answer. The performance is a thing depending on context.
You could tune various parameters.
At 2015-01-28 14:43:38, Shenghua(Daniel) Wan wansheng...@gmail.com wrote:
Cool. What about performance? e.g. how many record for how long?
On Tue, Jan 27, 2015 at 10:16 PM, Xu Zhongxing
For Java driver, there is no special API actually, just
ResultSet rs = session.execute(select * from ...);
for (Row r : rs) {
...
}
For Spark, the code skeleton is:
val rdd = sc.cassandraTable(ks, table)
then call various standard Spark API to process the table parallelly.
I have not
Cool. What about performance? e.g. how many record for how long?
On Tue, Jan 27, 2015 at 10:16 PM, Xu Zhongxing xu_zhong_x...@163.com
wrote:
For Java driver, there is no special API actually, just
ResultSet rs = session.execute(select * from ...);
for (Row r : rs) {
...
}
For Spark,
Recently I surveyed this topic and you may want to take a look at
https://github.com/fullcontact/hadoop-sstable
and
https://github.com/Netflix/aegisthus
On Tue, Jan 27, 2015 at 5:33 PM, Xu Zhongxing xu_zhong_x...@163.com wrote:
Both Java driver select * from table and Spark sc.cassandraTable()