Parth  et al; 
the folks at Netflix seem to have built a solution for your problem. The 
Netflix Tech Blog: Aegisthus - A Bulk Data Pipeline out of Cassandra

|   |
|   |  |   |   |   |   |   |
| The Netflix Tech Blog: Aegisthus - A Bulk Data Pipeline ...By Charles Smith 
and Jeff Magnusson  |
|  |
| View on techblog.netflix.com | Preview by Yahoo |
|  |
|   |


May want to chase Jeff Magnuson & check if the solution is open sourced.   Pl.  
 report back to this forum if you get an answer to the problem. 
hope this helps. Jan 
C* Architect 

     On Monday, January 26, 2015 11:25 AM, Robert Coli <rc...@eventbrite.com> 
wrote:
   

 On Sun, Jan 25, 2015 at 10:40 PM, Parth Setya <setya.pa...@gmail.com> wrote:

1. Is there a way to configure the size of sstables created after compaction?


No, won'tfix : https://issues.apache.org/jira/browse/CASSANDRA-4897.
You could use the "sstablesplit" utility on your One Big SSTable to split it 
into files of your preferred size. 
2. Is there a better approach to generate the report?


The major compaction isn't too bad, but something that understands SSTables as 
an input format would be preferable to sstable2json. 
3. What are the flaws with this approach?

sstable2json is slow and transforms your data to JSON.
=Rob

   

Reply via email to