If you use Kafka for the first bulk load, you will test your new
Teradata->Kafka->Hive pipeline, as well as have the ability to blow away
the data in Hive and reflow it from Kafka without an expensive full
re-export from Teradata.  As for whether Kafka can handle hundreds of GB of
data: Yes, absolutely.

-Mark


On Thu, Oct 23, 2014 at 3:08 AM, Po Cheung <poche...@yahoo.com.invalid>
wrote:

> Hello,
>
> We are planning to set up a data pipeline and send periodic, incremental
> updates from Teradata to Hadoop via Kafka.  For a large DW table with
> hundreds of GB of data, is it okay (in terms of performance) to use Kafka
> for the initial bulk data load?  Or will Sqoop with Teradata connector be
> more appropriate?
>
>
> Thanks,
> Po

Reply via email to