Hello,

For the bulkloading process, the HBase documentation mentions that in
a 2nd stage "the appropriate Region Server adopts the HFile, moving it
into its storage directory and making the data available to clients."
But from my experience the files also remain in the original location
from where they are "adopted". So I guess the data is actually copied
into the HBase directory right? This means that, compared to the
online importing, when bulk loading you essentially need twice the
disk space on HDFS, right?
Another problem is with data locality immediately after bulk loading
through MR. I understand that the locality is obtained in time through
compactions and splits. However you don't get this problem while
importing online, right?

Thanks in advance,
Sever

Reply via email to