Re: full-tabe scan - extracting all data from C*

2015-01-28 Thread DuyHai Doan
Hint: using the Java driver, you can set the fetchSize to tell the driver how many CQL rows to fetch for each page. Depending on the size (in bytes) of each CQL row, it would be useful to tune this fetchSize value to avoid loading too much data into memory for each page On Wed, Jan 28, 2015 at

full-tabe scan - extracting all data from C*

2015-01-27 Thread Mohammed Guller
Hi - Over the last few weeks, I have seen several emails on this mailing list from people trying to extract all data from C*, so that they can import that data into other analytical tools that provide much richer analytics functionality than C*. Extracting all data from C* is a full-table

Re: full-tabe scan - extracting all data from C*

2015-01-27 Thread Xu Zhongxing
This is hard to answer. The performance is a thing depending on context. You could tune various parameters. At 2015-01-28 14:43:38, Shenghua(Daniel) Wan wansheng...@gmail.com wrote: Cool. What about performance? e.g. how many record for how long? On Tue, Jan 27, 2015 at 10:16 PM, Xu Zhongxing

Re:Re: full-tabe scan - extracting all data from C*

2015-01-27 Thread Xu Zhongxing
For Java driver, there is no special API actually, just ResultSet rs = session.execute(select * from ...); for (Row r : rs) { ... } For Spark, the code skeleton is: val rdd = sc.cassandraTable(ks, table) then call various standard Spark API to process the table parallelly. I have not

Re: Re: full-tabe scan - extracting all data from C*

2015-01-27 Thread Shenghua(Daniel) Wan
Cool. What about performance? e.g. how many record for how long? On Tue, Jan 27, 2015 at 10:16 PM, Xu Zhongxing xu_zhong_x...@163.com wrote: For Java driver, there is no special API actually, just ResultSet rs = session.execute(select * from ...); for (Row r : rs) { ... } For Spark,

Re: full-tabe scan - extracting all data from C*

2015-01-27 Thread Shenghua(Daniel) Wan
Recently I surveyed this topic and you may want to take a look at https://github.com/fullcontact/hadoop-sstable and https://github.com/Netflix/aegisthus On Tue, Jan 27, 2015 at 5:33 PM, Xu Zhongxing xu_zhong_x...@163.com wrote: Both Java driver select * from table and Spark sc.cassandraTable()