Recently I surveyed this topic and you may want to take a look at
https://github.com/fullcontact/hadoop-sstable
and
https://github.com/Netflix/aegisthus


On Tue, Jan 27, 2015 at 5:33 PM, Xu Zhongxing <xu_zhong_x...@163.com> wrote:

> Both Java driver "select * from table" and Spark sc.cassandraTable() work
> well.
> I use both of them frequently.
>
> At 2015-01-28 04:06:20, "Mohammed Guller" <moham...@glassbeam.com> wrote:
>
>  Hi –
>
>
>
> Over the last few weeks, I have seen several emails on this mailing list
> from people trying to extract all data from C*, so that they can import
> that data into other analytical tools that provide much richer analytics
> functionality than C*. Extracting all data from C* is a full-table scan,
> which is not the ideal use case for C*. However, people don’t have much
> choice if they want to do ad-hoc analytics on the data in C*.
> Unfortunately, I don’t think C* comes with any built-in tools that make
> this task easy for a large dataset. Please correct me if I am wrong. Cqlsh
> has a COPY TO command, but it doesn’t really work if you have a large
> amount of data in C*.
>
>
>
> I am aware of couple of approaches for extracting all data from a table in
> C*:
>
> 1)      Iterate through all the C* partitions (physical rows) using the
> Java Driver and CQL.
>
> 2)      Extract the data directly from SSTables files.
>
>
>
> Either approach can be used with Hadoop or Spark to speed up the
> extraction process.
>
>
>
> I wanted to do a quick survey and find out how many people on this mailing
> list have successfully used approach #1 or #2 for extracting large datasets
> (terabytes) from C*. Also, if you have used some other techniques, it would
> be great if you could share your approach with the group.
>
>
>
> Mohammed
>
>
>
>


-- 

Regards,
Shenghua (Daniel) Wan

Reply via email to