In a recent thread on the list, I received various important answers to my
questions on hadoop plugin. Maybe this thread will help you.
https://www.spinics.net/lists/ceph-users/msg40790.html
One of the most important answers is about data locality. The last message
lead me to this article.
>
> > Does s3 or swifta (for hadoop or spark) have integrated data-layout APIs
> for
> > local processing data as have cephfs hadoop plugin?
> >
> With s3 and swift you won't have data locality as it was designed for
> public cloud.
> We recommend disable locality based scheduling in Hadoop when
-29 4:19 GMT-02:00 Orit Wasserman <owass...@redhat.com>:
> On Tue, Nov 28, 2017 at 7:26 PM, Aristeu Gil Alves Jr
> <aristeu...@gmail.com> wrote:
> > Greg and Donny,
> >
> > Thanks for the answers. It helped a lot!
> >
> > I just watched the swifta
Greg and Donny,
Thanks for the answers. It helped a lot!
I just watched the swifta presentation and it looks quite good!
Due the lack of updates/development, and the fact that we can choose spark
also, I think maybe swift/swifta with ceph is a good strategy too.
I need to study it more, tho.
Hi.
It's my first post on the list. First of all I have to say I'm new on
hadoop.
We are here a small lab and we have being running cephfs for almost two
years, loading it with large files (4GB to 4TB in size). Our cluster is
with approximately with 400TB with ~75% of usage, and we are planning