On Thu, Apr 23, 2020 at 11:05 PM wrote:
>
> Hi
>
> We have an 3 year old Hadoop cluster - up for refresh - so it is time
> to evaluate options. The "only" usecase is running an HBase installation
> which is important for us and migrating out of HBase would be a hazzle.
>
> Our Ceph usage has
> local filesystem is a bit tricky, we just tried a POC that mounting
> CephFS
> into every hadoop , configure Hadoop using LocalFS with Replica = 1.
> Which
> end up with each data only write once into cephfs and cephfs take care of
> the data durability.
Can you tell a bit more about this?
RBD is never a workable solution unless you want to pay the cost of
double-replication in both HDFS and Ceph.
I think the right approach is thinking about other implementation of the
FileSystem interface, like s3a and localfs.
s3a is straight forward, ceph rgw provide s3 interface and s3a is
I think the idea behind pool size of 1, is that hadoop already writes
copies to 2 other pools(?).
However that leaves the possibility that pg's of these 3 pools can maybe
share an osd, and if that osd fails, you loose data in these pools. I
have no idea what the chances are that the same
You do not want to mix ceph with hadoop, because you'll loose data
locality, which is the main point of hadoop systems.
Every read/write request will go through network, this is not optimal.
On Fri, Apr 24, 2020 at 9:04 AM wrote:
>
> Hi
>
> We have an 3 year old Hadoop cluster - up for refresh -