We're running a 8 node cluster with different CFs for different applications. 
One of the application uses 1.5TB out of 1.8TB in total, but only because we 
started out with a deletion mechanism and implemented one later on. So there is 
probably a high amount of old data in there, that we don't even use anymore. 

Now we want to delete that data. To know, which rows we may delete, we have to 
lookup a SQL database. If the key is not in there anymore, we may delete that 
row in cassandra, too. 

This basically means, we have to iterate over all the rows in that CF. This 
kind of begs for hadoop, but that seems not to be an option, currently. I tried.

So we figured, we could run over the sstables files (maybe only the index), 
check the keys in the mysql, and later run the deletes on the cluster. This 
way, we could iterate on each node in parallel. 

Does that sound reasonable? Any pros/cons, maybe a "killer" argument to use 
hadoop for that?

Cheers
Marcel
<hr style="border-color:blue">
<p>chors GmbH
<br><hr style="border-color:blue">
<p>specialists in digital and direct marketing solutions<br>
Haid-und-Neu-Straße 7<br>
76131 Karlsruhe, Germany<br>
www.chors.com</p>
<p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht 
Montabaur, HRB 15029</p>
<p style="font-size:9px">This e-mail is for the intended recipient only and may 
contain confidential or privileged information. If you have received this 
e-mail by mistake, please contact us immediately and completely delete it (and 
any attachments) and do not forward it or inform any other person of its 
contents. If you send us messages by e-mail, we take this as your authorization 
to correspond with you by e-mail. E-mail transmission cannot be guaranteed to 
be secure or error-free as information could be intercepted, amended, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
Neither chors GmbH nor the sender accept liability for any errors or omissions 
in the content of this message which arise as a result of its e-mail 
transmission. Please note that all e-mail communications to and from chors GmbH 
may be monitored.</p>

Reply via email to