[Cassandra Wiki] Update of "CassandraHardware_JP" by yukim

Apache Wiki Mon, 29 Mar 2010 23:14:41 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "CassandraHardware_JP" page has been changed by yukim.
The comment on this change is: Japanese translation.
http://wiki.apache.org/cassandra/CassandraHardware_JP?action=diff&rev1=13&rev2=14

--------------------------------------------------

  ## page was copied from CassandraHardware
- === Memory ===
- The most recently written data resides in memory tables (aka 
[[MemtableThresholds|memtables]]), but older data that has been flushed to disk 
can be kept in the OS's file-system cache. In other words, ''the more memory, 
the better'', with 1GB being the minimum we typically recommended in a 
virtualized environment.  Obviously there is no benefit to having more RAM than 
your hot data set, but with dedicated hardware there is no reason to use less 
than 4GB, and at the high end, you see clusters with 16 or 32 GB or more per 
node.
+ === メモリ ===
+ 
最新の書き込みデータはメモリ内のテーブル([[MemtableThresholds|Memtable]])に存在します。しかしディスクに書き出された古いデータはOSのファイルシステムキャッシュに格納されます。言い換えると
 
''メモリは多ければ多い程良く''、仮想化環境では最低でも1GBを推奨します。もちろん最新のデータセットに必要なRAM以上を用意するのは無駄ですが、専用ハードウェアにおいては4GB以下で使用する理由はありません。ハイエンドの構成では、16もしくは32GB以上のノードで構成されるクラスタが存在します。
  
- RAM can also be useful for the key cache (introduced in 0.5) and row cache 
(in 0.6).
+ RAMはキーキャッシュ(0.5で導入)や、行キャッシュ(0.6で導入)にも役立ちます。
  
  === CPU ===
- Many workloads will actually be CPU-bound in Cassandra before being 
memory-bound.  Cassandra is highly concurrent and will make good use of however 
many cores you can give it.  For high-end clusters, quad- or 8-core boxes are 
good.  If you're running on virtualized machines, consider using a provider 
such as Rackspace Cloud Servers that allows CPU bursting.
+ 
Cassandraでは大部分の負荷は実際にはメモリ依存になる前にCPU依存になります。Cassandraは並列度が高く、与えたコアを効率良く使います。ハイエンドのクラスタ構成ではクアッドもしくは8コアが良いでしょう。仮想マシンで実行する場合は、CPU
 BurstingをサポートするRackspace Cloud Serversなどのプロバイダを検討してみてください。
  
- === Disk ===
- The short answer here is that ideally you will have at least 2 disks, one to 
keep your `CommitLogDirectory` on, the other to use in `DataFileDirectories`. 
The exact answer though depends a lot on your usage so it's important to 
understand what is going on here.
+ === ディスク ===
+ 手短に言うと、理想的には最低2つのディスクがある方がよいです。1つはコミットログ(`CommitLogDirectory`)を保持するため、もうひとつは 
データファイルディレクトリ(`DataFileDirectories`)に使用するためです。しかし正確な答えは使用方法に大いに左右されますので、そこで何が起こっているのかを理解するのことが重要です。
+ (訳注: `CommitLogDirectory`、`DataFileDirectories`は対応するstorage-conf.xmlの設定項目名です。)
  
- Cassandra persists data to disk for two very different purposes. The first is 
to the commitlog when a new write is made so that it can be replayed after a 
crash or system shutdown. The second is to the data directory when thresholds 
are exceeded and memtables are flushed to disk as SSTables.
+ 
Cassandraは大きく異なる2つの目的でデータをディスクに永続化します。1つは新しい書き込み要求があった場合、クラッシュやシステムシャットダウン後にその要求を再現できるようコミットログとして保持するため、もうひとつは閾値を超えたMemtableの内容をSSTableとしてディスク上のデータディレクトリに書き出すためです。
  
- Commit logs receive every write made to a Cassandra node and have the 
potential to block client operations, but they are only ever read on node 
start-up. SSTable (data file) writes on the other hand occur asynchronously, 
but are read to satisfy client look-ups. SSTables are also periodically merged 
and rewritten in a process called ''compaction''. Another important difference 
between commitlog and sstables is that commit logs are purged after the 
corresponding data has been flushed to disk as an SSTable, so 
`CommitLogDirectory` only holds uncommitted data while the directories in 
`DataFileDirectories` store all of the data written to a node.
+ 
コミットログはCassandoraノードへのすべての書き込みを記録し、そのためクライアントの操作をブロックする可能性がありますが、読み込みはノード起動時にしか行われません。一方SSTable(データファイル)は非同期で書き込まれますが、クライアントからの参照要求を満たすために読み込まれる場合があります。また、SSTableはコンパクション(``Compaction``)と呼ばれるプロセスで定期的にマージされ、再作成されます。コミットログとSSTableの重要な違いは、SSTableとしてディスクに書き出されたコミットログはパージされてしまうことです。つまり`CommitLogDirectory`にはまだコミットされていないデータのみが保持されるのに対して、`DataFileDirectories`のディレクトリにはノードに書き込まれたすべてのデータが保持されます。
  
- So to summarize, if you use a different device for your `CommitLogDirectory` 
it needn't be large, but it should be fast enough to receive all of your writes 
(as appends, i.e., sequential i/o). Then, use one or more devices for 
`DataFileDirectories` and make sure they are both large enough to house all of 
your data, and fast enough to both satisfy reads that are not cached in memory 
and to keep up with flushing and compaction.
+ 
まとめると、`CommitLogDirectory`に異なるデバイスを使用する場合、そのディスク容量は多い必要はありませんが、すべての書き込み(アペンド)要求(すなわちシーケンシャルI/O)を受け取れるよう十分速くなければなりません。そして`DataFileDirectories`に複数のディスクを使用します。それらのディスクにはすべてのデータを保持できるだけの十分な容量で、かつメモリにキャッシュされていないデータの読み込みと、SSTableへの書き出し、コンパクションという書き込みの両方を満たせるだけの十分な速度が必要です。
  
- As covered in [[MemtableSSTable]], compactions can require up to 100% of your 
in-use space temporarily in the worst case, free on a single volume (that is, 
in a data file directory).  So if you are going to be approaching 50% or more 
of your disks' capacity, you should raid0 your data directory volumes.  B. Todd 
Burruss adds on the mailing list, "With the file sizes we're talking about with 
cassandra and other database products, the [raid] stripe size doesn't seem to 
matter.  Mine is set to 128k, which produced the same results as 16k and 256k."
+ 
[[MemtableSSTable]]で述べているように、コンパクションは最悪の場合、一時的にひとつのボリューム(つまりデータディレクトリ)に対して最大そのデータと同じだけの空き領域を要求します。したがってディスクキャパシティが50%以下に近づいている場合、データディレクトリのボリュームをRAID0にすべきです。
 B. Todd Burruss 
はメーリングリストで「Cassandraや他のデータベース製品のファイルサイズについて語るとき、[RAIDの]ストライプサイズは関係ないように見える。私の場合は128kに設定しているが、16kや256kに設定しても同じ結果だった。」とコメントしています。
  
- On ext2/ext3 the maximum file size is 2TB, even on a 64 bit kernel.  On ext4 
that goes up to 16TB.  Since Cassandra can use almost half your disk space on a 
single file, if you are raiding large disks together you may want to use XFS 
instead, particularly if you are using a 32-bit kernel.  XFS file size limits 
are 16TB max on a 32 bit kernel, and basically unlimited on 64 bit.
+ 
ext2/ext3では、64bit環境でも最大ファイルサイズが2TBです。ext4ではそれが16TBになります。Cassandraは1つのファイルでディスクスペースの半分を占めることもあるため、容量の大きいディスクでRAIDを組んでいる場合はXFSを使用した方がよいでしょう。特に32bitカーネルを使用している場合はそうです。XFSのファイルサイズ制限は32bitカーネルで最大16TB、64bitでは基本的に無制限です。
  
- === Cloud ===
- Several heavy users of Cassandra deploy in the cloud, e.g. CloudKick on 
Rackspace Cloud Servers and SimpleGeo on Amazon EC2.  The general consensus in 
the community seems to be that Rackspace's VMs offer better performance for 
Cassandra because of CPU bursting, raided local disks, and separate 
public/private interfaces.  
+ === クラウド ===
+ Cassandraのヘビーユーザーの何社かはクラウド環境にデプロイしています。例えばCloudKickはRackspace Cloud 
Serversを、SimpleGeoはAmazon EC2を使用しています。コミュニティの一般的な総意としては、CPU 
Bursting、ローカルディスクのRAID、およびパブリック/プライベートインターフェースの分離といった機能を備えるRackspaceのVMの方がパフォーマンスが良いとしています。

[Cassandra Wiki] Update of "CassandraHardware_JP" by yukim

Reply via email to