Re: How can I efficiently export the content of my table to KAFKA

Tobias Eriksson Fri, 28 Apr 2017 07:35:05 -0700

Hi Chris,
Well, that seemed like a good idea at first, I would like to read from 
Cassandra and post to KAFKA
But the KAFKA Connector Cassandra Source, requires that the table has a 
time-series order, and all my tables does not
So thanx for the tip, but it did not work ☹
-Tobias

From: Chris Stromberger <chris.stromber...@gmail.com>
Date: Thursday, 27 April 2017 at 15:50
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: How can I efficiently export the content of my table to KAFKA

Maybe 
https://www.confluent.io/blog/kafka-connect-cassandra-sink-the-perfect-match/

On Wed, Apr 26, 2017 at 2:49 PM, Tobias Eriksson 
<tobias.eriks...@qvantel.com<mailto:tobias.eriks...@qvantel.com>> wrote:
Hi
I would like to make a dump of the database, in JSON format, to KAFKA
The database contains lots of data, millions and in some cases billions of 
“rows”
I will provide the customer with an export of the data, where they can read it 
off of a KAFKA topic

My thinking was to have it scalable such that I will distribute the token range 
of all available partition-keys to a number of (N) processes (JSON-Producers)
First I will have a process which will read through the available tokens and 
then publish them on a KAFKA “Coordinator” Topic
And then I can create 1, 10, 20 or N processes that will act as Producers to 
the real KAFKA topic, and pick available tokens/partition-keys off of the 
“Coordinator” Topic
One by one until all the “rows” have been processed.
So the JOSN-Producer will take e.g. a range of 1000 “rows” and convert them 
into my own JSON format and post to KAFKA
And then after that take another 1000 “rows” and then …. And then another 1000 
“rows” and so on, until it is done.

I base my idea on how I believe Apache Spark Connector accomplishes data 
locality, i.e. being aware of where tokens reside and figured that since that 
is possible it should be possible to create a job-list in a KAFKA topic, and 
have each Producer pick jobs from there, and read up data from Cassandra based 
on the partition key (token) and then post the JSON on the export KAFKA topic.
https://dzone.com/articles/data-locality-w-cassandra-how

Would you consider this a good idea ?
Would there in fact be a better idea, what would that be then ?

-Tobias

Re: How can I efficiently export the content of my table to KAFKA

Reply via email to