I simply stated the process my team went through but in hindsight, given
that I understand the Hadoop ecosystem better, I think yes, given that MapR
uses HDFS, we could simply use distcp to move data out. To be honest,
neither HW nor Cloudera made any claims to us about MapR. I think I found
some articles on the web that compared the three distros where it said MapR
is mostly proprietary.

So if I were doing the evaluation again, I would probably include MapR
again. Another factor is that most teams I know that use Hadoop are either
using CDH, HW or the Apache distro so there is a bit of inertia is
evaluating something that you aren't sure is being used around much by your
peers.

On Mon, Sep 16, 2013 at 8:37 PM, M. C. Srivas <mcsri...@gmail.com> wrote:

>
> So here's an example of marketing FUD at work.
>
> On Mon, Sep 16, 2013 at 3:10 PM, Xuri Nagarin <secs...@gmail.com> wrote:
>
>> So I will try to answer the OP's question best I can without deviating
>> too much into opinions and stick to facts. Disclaimer: I am not an employee
>> of either vendor or any partner of theirs.
>>
>> Context is important: My team's use case was general data exploration of
>> semi-structured log data and we had no typical data-warehouse type of
>> existing use cases. Also, our's is a small (less than 30 nodes cluster). In
>> terms of ops/maintenance, we only have one person. I point this out because
>> lots of hadoop shops have dedicated team for each - OS administration,
>> Hadoop admin, Hadoop developers. And, they are very mature in terms of
>> their compute use cases. To my mind, these aspects can significantly impact
>> your vendor choices.
>>
>> MapR: My team simply did not consider them because of all the proprietary
>> code in there. We are trying to move from a monolithic proprietary product
>> and one of the criteria we set was - if we decided to move away from the
>> chosen hadoop vendor, can we easily unlock our data?
>>
>
> Unlock your data? How about disctp? Or just "cp"?
>
> The fact is there are 10x  more standard ways to access your data in a
> MapR cluster versus a Cloudera or Hortonworks data.
>


Yes, Cloudera has proprietary CM but I really don't think HW has any
proprietary code. Can you point at any? In fact, even for Cloudera, other
than CM, what's proprietary? I ask not in a rhetoric way but to clear up
facts. I did not find any but if you know of any proprietary code please
let everyone know.


>
> MapR is entirely open source, with proprietary add-ons, just like Cloudera
> or Hortonworks.
>
> The difference is MapR has innovated both above and below the Hadoop
> stack, while Cloudera and Horton have only done so above the stack. MapR's
> innovations have set the bar so high that its competition likes to spread
> FUD.
>
>
As a user/customer, the best suggestion I can make, since you are a MapR
employee is to not focus on other distros and try to point out facts about
your product. The biggest turn off in the purchase cycle for me was the way
both Cloudera and HW attack each other and lot of times with FUD. If they
simply presented facts, I think I am intelligent enough to tell the
differences. Trying to convince a potential customer by attacking the
competition sort of assumes that the customer isn't smart enough to figure
things out.




> [disclaimer: I work for MapR ]
>
>
>
>>  HortonWorks: Distro uses HDFS 1.x with MRv2. All open source. Cluster
>> management is via Ambari. Compared to Cloudera's CM, Ambari has very
>> rudimentary features. But you have to keep in mind that Ambari is only an
>> year old where as CM already has been under development for several years.
>> This was a major selection factor for us because Ambari did not have all
>> the automation/feature-set compared to CM for a single
>> administrator/developer to easily maintain the cluster. Also, during the
>> trial period, Hortonwork's packing format/structure apparently kept
>> changing which made things a bit difficult to centrally deploy/administer.
>>
>> Cloudera: Distro uses HDFS 2.x with MRv1. All open source except cluster
>> management which is via their proprietary Cloudera Manager tool. It is free
>> for use without certain feature like auditing and cluster replication
>> features. Maybe a few more features are restricted to
>> Enterprise/Licensed-only version. Offers much more features than Ambari. In
>> terms of cluster administration, I found CM much easy to work with than
>> Ambari. Pretty much all aspects from deploying new nodes to configuration
>> and troubleshooting is much more refined than Ambari.
>>
>> During the selection process, what I found was that both vendors are very
>> aggressive in their pitch. So much so that each pushes some FUD regarding
>> the competition.
>>
>
> Obviously some of it worked, given some of the statements earlier.
>
>
>
>>
>> HW uses HDFS 1.x + MRv2 while CDH uses HDFS 2.x + MRv1. HW claimed that
>> Cloudera's distro is heavily patched off-course from the core Apache trunk
>> that can cause severe data corruption issues. Yes, Cloudera has some 1500+
>> patches over apache's Hadoop distro but (1) they aren't private patches.
>> You can pull the list and verify that yourself just as I did. (2) In our
>> testing and talking to other Cloudera customers, I couldn't find any issues
>> with data corruption. It is true though that HDFS 2.x is still in beta but
>> so is MRv2 that HW uses. I think both are stable and work well - depending
>> on what you need but each uses that point to create FUD.
>>
>> HW also claimed that a new SQL engine that Cloudera's including in their
>> distro - Impala is proprietary. Not true. The software is open source. But
>> if you want support for Impala then Cloudera will charge you separately per
>> node for Impala over and above what they charge per node for Hadoop support.
>>
>> In my experience, both products have plenty of issues when it comes to
>> compute engines - Hive, Pig etc and their cluster management software. HDFS
>> seem to be solid in both distros. So I wouldn't call either of them
>> trouble-free and neither is at the maturity level of other popular
>> enterprise products like say, Oracle. That said, you have to keep in mind
>> that both vendors/products are successfully used by several customers so
>> again, it is more a question of what fits your needs.
>>
>> In the end, we chose to go with Cloudera mostly because a more positive
>> experience with CM in terms of administration/operations and their
>> pre-sales team when compared to HW. Again, that said, another team that we
>> closely work with chose HW for their cluster. I use both vendors/clusters
>> at work and neither has any significant issues.
>>
>>
>>
>>
>> On Sat, Sep 14, 2013 at 12:37 PM, Chris Mattmann <mattm...@apache.org>wrote:
>>
>>> Here's the deal, folks can post questions to the list that aren't
>>> abusive and simply asking what the difference between different vendor
>>> implementations (downstream) of Apache  Hadoop is not an inflammatory
>>> or abusive question.
>>>
>>> Stick to the facts. Discuss it here. Why should the Apache Hadoop
>>> PMC push off potentially useful questions that may have upstream
>>> implications to the Apache  Hadoop core and let all the innovation
>>> occur downstream?
>>>
>>> Have the conversations here if you'd like. I wouldn't turn anyone
>>> away..
>>>
>>> My 2c.
>>>
>>> Cheers,
>>> Chris
>>>
>>> ----Original Message-----
>>>
>>> From: Shahab Yunus <shahab.yu...@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>>> Date: Friday, September 13, 2013 10:48 AM
>>> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>>> Subject: Re: Cloudera Vs Hortonworks Vs MapR
>>>
>>> >I think, in my opinion, it is a wrong idea because:
>>> >
>>> >
>>> >1- Many of the participants here are employees for these very companies
>>> >that are under discussion. This puts these respective employees in very
>>> >difficult position. It is very hard to come with a correct response.
>>> >Comments can be misconstrued easily.
>>> >2- Also, when we talk about vendor distributions of the software, it is
>>> >not longer purely about open source. Now companies with the related
>>> >corporate legal baggage also gets in the mix.
>>> >3- The discussion would be on not only positive things about each vendor
>>> >but in fact negatives. The latter type of  discussion which can get
>>> >unpleasant very easily.
>>> >
>>> >4- Somebody mentioned that, this is a very lightly moderated platform
>>> and
>>> >thus this discussion should be allowed. I think this is one of the
>>> >reasons that it should not be because, people can say things casually,
>>> >without much thought, or without taking
>>> > care of the context or the possible interpretations and get in trouble.
>>> >5- The risk here is not only that serious repercussions can occur (which
>>> >very well can) but the greater risk is that it can cause
>>> misunderstanding
>>> >between individuals, industries and companies.
>>> >6-People here lot of time reply quickly just to resolve or help the
>>> >'technical' issue. Now they will have to take care how they frame the
>>> >response. Re: 4
>>> >
>>> >
>>> >I know some will feel that I have created a highly exaggerated scenario
>>> >above, but what I am trying to say is that, it is a slippery slope. If
>>> we
>>> >allow this then this can go anywhere.
>>> >
>>> >
>>> >By the way, I do not work for any of these vendors.
>>> >
>>> >
>>> >More importantly, I am not saying that this discussion should not be
>>> had,
>>> >I am just saying that this is a wrong forum.
>>> >
>>> >
>>> >Just my 2 cents (or,...this was rather a dollar.)
>>> >
>>> >
>>> >Regards,
>>> >Shahab
>>> >
>>> >
>>> >
>>> >
>>> >On Fri, Sep 13, 2013 at 1:50 AM, Chris Mattmann
>>> ><mattm...@apache.org> wrote:
>>> >
>>> >Errr, what's wrong with discussing these types of issues on list?
>>> >
>>> >Nothing public here, and as long as it's kept to facts, this should
>>> >not be a problem and Apache is a fine place to have such discussions.
>>> >
>>> >My 2c.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >-----Original Message-----
>>> >From: Xuri Nagarin <secs...@gmail.com>
>>> >Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>>> >Date: Thursday, September 12, 2013 4:39 PM
>>> >To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>>> >Subject: Re: Cloudera Vs Hortonworks Vs MapR
>>> >
>>> >>I understand it can be contentious issue especially given that a lot of
>>> >>contributors to this list work for one or the other vendor or have some
>>> >>stake in any kind of evaluation. But, I see no reason why users should
>>> >>not be able to compare notes
>>> >> and share experiences. Over time, genuine pain points or issues or
>>> >>claims will bubble up and should only help the community. Sure, there
>>> >>will be a few flame wars but this already isn't a very tightly
>>> moderated
>>> >>list.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>On Thu, Sep 12, 2013 at 11:14 AM, Aaron Eng
>>> >><a...@maprtech.com> wrote:
>>> >>
>>> >>Raj,
>>> >>
>>> >>
>>> >>As others noted, this is not a great place for this discussion.  I'd
>>> >>suggest contacting the vendors you are interested in as I'm sure we'd
>>> all
>>> >>be happy to provide you more details.
>>> >>
>>> >>
>>> >>I don't know about the others, but for MapR, just send an email to
>>> >>sa...@mapr.com <mailto:sa...@mapr.com> and I'm sure someone will get
>>> back
>>> >>to you with more information.
>>> >>
>>> >>
>>> >>Best Regards,
>>> >>Aaron Eng
>>> >>
>>> >>
>>> >>
>>> >>On Thu, Sep 12, 2013 at 10:19 AM, Hadoop Raj <hadoop...@yahoo.com>
>>> wrote:
>>> >>
>>> >>
>>> >>Hi,
>>> >>
>>> >>We are trying to evaluate different implementations of Hadoop for our
>>> big
>>> >>data enterprise project.
>>> >>
>>> >>Can the forum members advise on what are the advantages and
>>> disadvantages
>>> >>of each implementation i.e. Cloudera Vs Hortonworks Vs MapR.
>>> >>
>>> >>Thanks in advance.
>>> >>
>>> >>Regards,
>>> >>Raj
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>
>

Reply via email to