> HBase uses an approach to structuring its storage known as "Log Structured
> Merge Trees", which you can learn more about here:
> As well as in Lars George's great book, here:
> It does all of these "frequent updates" just in memory, which is very
> fast; at the same time, it writes a simple forward-only log of all edits
> (known as the Write Ahead Log, or WAL) to disk in order to provide
> durability in the event of machine failure. It periodically writes the
> in-memory data to disk in big immutable ordered chunks, called "store
> files", which is very efficient. Future reads of the data then "merge" the
> on-disk store file data with the current state in memory, to get the full
> picture of the state of any row. Over time, the many small store files get
> "compacted" into bigger files, so that individual reads don't have too many
> files to read from. Each "get" or "scan" operation can just read small
> blocks of the store files; when you ask for one record, it doesn't have to
> read gigabytes of data from the disk, it can just read a small block. As
> such, random small reads and writes on a very big data set can be done
> efficiently.
> Furthermore, it's fine to update the data store frequently. For any given
> record, you can make as many updates as you want to the in-memory
> structures, and these aren't written to disk until the memory store is
> flushed (and into the WAL, but that's also efficient b/c it's ordered by
> update time, not record key). It all happens in memory, which is very fast
> (but, again, it's safe b/c of the WAL). There are even some recent JIRAs
> that make that process more efficient, by, for example, HBASE-4241<
> One way to think about it is that HBase is *precisely* a layer that adds
> these efficient random read/write capabilities on top of the Hadoop
> distributed file system (HDFS), and takes care of doing that in a way that
> parallelizes nicely across a large cluster of machines, deals with machine
> failures, etc.
> Dear Stack,
> According to my understanding, in a large scale distributed system, it
> prefers write-once-read-many. Frequent-updating must bring heavy load for
> the consistency issue and the performance must be lowered. HBase must not
> be suitable to be updated frequently, right?
> Another question is whether it is proper to update data in HBase
> frequently?
> This is 'normal', yes.
