Thanks very much. Precisely answers my questions. :-)
2010/4/26 Schubert Zhang <[email protected]>
> Please refer the code:
>
> org.apache.cassandra.db.ColumnFamilyStore
>
> public String getFlushPath()
> {
> long guessedSize = 2 * DatabaseDescriptor.getMemtableThroughput() *
> 1024*1024; // 2* adds room for keys, column indexes
> String location =
> DatabaseDescriptor.getDataFileLocationForTable(table_, guessedSize);
> if (location == null)
> throw new RuntimeException("Insufficient disk space to flush");
> return new File(location,
> getTempSSTableFileName()).getAbsolutePath();
> }
>
> and we can go through org.apache.cassandra.config.DatabaseDescriptor:
>
> public static String getDataFileLocationForTable(String table, long
> expectedCompactedFileSize)
> {
> long maxFreeDisk = 0;
> int maxDiskIndex = 0;
> String dataFileDirectory = null;
> String[] dataDirectoryForTable =
> getAllDataFileLocationsForTable(table);
>
> for ( int i = 0 ; i < dataDirectoryForTable.length ; i++ )
> {
> File f = new File(dataDirectoryForTable[i]);
> if( maxFreeDisk < f.getUsableSpace())
> {
> maxFreeDisk = f.getUsableSpace();
> maxDiskIndex = i;
> }
> }
> // Load factor of 0.9 we do not want to use the entire disk that is
> too risky.
> maxFreeDisk = (long)(0.9 * maxFreeDisk);
> if( expectedCompactedFileSize < maxFreeDisk )
> {
> dataFileDirectory = dataDirectoryForTable[maxDiskIndex];
> currentIndex = (maxDiskIndex + 1 )%dataDirectoryForTable.length ;
> }
> else
> {
> currentIndex = maxDiskIndex;
> }
> return dataFileDirectory;
> }
>
> So, DataFileDirectories means multiple disks or disk-partitions.
> I think your storage01, storage02 and storage03 are in same disk or disk
> partition.
>
>
> 2010/4/26 Roland Hänel <[email protected]>
>
> I have a configuration like this:
>>
>> <DataFileDirectories>
>> <DataFileDirectory>/storage01/cassandra/data</DataFileDirectory>
>> <DataFileDirectory>/storage02/cassandra/data</DataFileDirectory>
>> <DataFileDirectory>/storage03/cassandra/data</DataFileDirectory>
>> </DataFileDirectories>
>>
>> After loading a big chunk of data into cassandra, I end up wich some 70GB
>> in the first directory, and only about 10GB in the second and third one. All
>> rows are quite small, so it's not just some big rows that contain the
>> majority of data.
>>
>> Does Cassandra have the ability to 'see' the maximum available space in
>> these directory? I'm asking myself this question since my limit is 100GB,
>> and the first directory is approaching this limit...
>>
>> And, wouldn't it be better if Cassandra tried to 'load-balance' the files
>> inside the directories because this will result in better (read) performance
>> if the directories are on different disks (which is the case for me)?
>>
>> Any help is appreciated.
>>
>> Roland
>>
>>
>