> From: Andreas Neumann <[email protected]>
> If we only load data in bulk (that is, via doBulkLoad(), not using
> TableOutputFormat), do we still risk data loss? My understanding is
> that append is needed for the WAL, and the WAL is needed only
> for puts. But bulk loads bypass the WAL.

Correct.

If you are doing read-only serving of HFiles built by MR and loaded by 
doBulkLoad, then you would not need append support.

If adding new data to tables via the HBase API, then sooner or later this will 
change table structure, which is recorded via Puts to META, which is 
self-hosted. Circumstances where those edits can be lost without working append 
support in HDFS may be rare but not rare enough in my estimation. Losing edits 
to META is bad. This can lead to missing regions and hung clients. Human 
intervention will be necessary and the time scale for administrative recovery 
is usually an availability problem.

> For instance, when a region is split, the master must write
> the new meta data to the meta regions. Would that require a WAL
> or rely on append in some other way?

See above.

> Are there other situations where the WAL is needed (or append
> is needed) to avoid data loss?

Deletes? Increments? For these operations you would not lose data per se if you 
don't have append support, but the client may be incorrectly led to believe 
they were successfully applied under the same low probability failure 
conditions that can corrupt META.

  - Andy

Reply via email to