Both variants will work well (if your kafka cluster can handle the full
volume of the transmitted data for the duration of the ttl on each topic) .

I would run the whole thing through kafka since you will be "stresstesting"
you production flow - consider if you at some later time lost your
destination tables - how would you then repopulate them? It would be nice
to know that your normal flow handles this situation.





2014-10-23 12:08 GMT+02:00 Po Cheung <poche...@yahoo.com.invalid>:

> Hello,
>
> We are planning to set up a data pipeline and send periodic, incremental
> updates from Teradata to Hadoop via Kafka.  For a large DW table with
> hundreds of GB of data, is it okay (in terms of performance) to use Kafka
> for the initial bulk data load?  Or will Sqoop with Teradata connector be
> more appropriate?
>
>
> Thanks,
> Po

Reply via email to