If you use Kafka for the first bulk load, you will test your new Teradata->Kafka->Hive pipeline, as well as have the ability to blow away the data in Hive and reflow it from Kafka without an expensive full re-export from Teradata. As for whether Kafka can handle hundreds of GB of data: Yes, absolutely.
-Mark On Thu, Oct 23, 2014 at 3:08 AM, Po Cheung <poche...@yahoo.com.invalid> wrote: > Hello, > > We are planning to set up a data pipeline and send periodic, incremental > updates from Teradata to Hadoop via Kafka. For a large DW table with > hundreds of GB of data, is it okay (in terms of performance) to use Kafka > for the initial bulk data load? Or will Sqoop with Teradata connector be > more appropriate? > > > Thanks, > Po