Re: hive on spark - version question

Stephen Sprague Fri, 17 Mar 2017 16:00:06 -0700

yeah but... is the glass half-full or half-empty?  sure this might suck but
keep your head high, bro! Lots of it (hive) does work. :)



On Fri, Mar 17, 2017 at 2:25 PM, hernan saab <hernan_javier_s...@yahoo.com>
wrote:

> Stephan,
>
> Thanks for the response.
>
> The one thing that I don't appreciate from those who promote and DOCUMENT
> spark on hive is that, seemingly, there is absolutely no evidence seen that
> says that hive on spark WORKS.
> As a matter of fact, after a lot of pain, I noticed it is not supported by
> just about anybody.
>
> If someone dares to document Hive on Spark (see link
> https://cwiki.apache.org/confluence/display/Hive/Hive+
> on+Spark%3A+Getting+Started)  why can't they have the decency to mention
> what specific combo of Hadoop/Spark/Hive versions used that works? Have a
> git repo included in a doc with all the right versions and libraries. Why
> not? We can start from there and progressively use newer libraries in case
> the doc becomes stale. I am not really asking much, I just want to know
> what the documenter used to claim that Hive on Spark works, that's it.
>
> Clearly, for most cases, this setup is broken and it misleads people to
> waste time on a broken setup.
>
> I love this tech. But I do notice that there is some mean spirited or very
> negligent actions made by the apache development community. Documenting
> hive on spark while knowing it won't work for most cases means apache
> developers don't give a crap about the time wasted by people like us.
>
>
>
>
> On Friday, March 17, 2017 1:14 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>
>
>
> On Fri, Mar 17, 2017 at 2:56 PM, hernan saab <hernan_javier_s...@yahoo.com
> > wrote:
>
> I have been in a similar world of pain. Basically, I tried to use an
> external Hive to have user access controls with a spark engine.
> At the end, I realized that it was a better idea to use apache tez instead
> of a spark engine for my particular case.
>
> But the journey is what I want to share with you.
> The big data apache tools and libraries such as Hive, Tez, Spark, Hadoop ,
> Parquet etc etc are not interchangeable as we would like to think. There
> are very limited combinations for very specific versions. This is why tools
> like Ambari can be useful. Ambari sets a path of combos of versions known
> to work and the dirty work is done under the UI.
>
> More often than not, when you try a version that few people tried, you
> will get error messages that will derailed you and cause you to waste a lot
> of time.
>
> In addition, this group, as well as many other apache big data user
> groups,  provides extremely poor support for users. The answers you usually
> get are not even hints to a solution. Their answers usually translate to
> "there is nothing I am willing to do about your problem. If I did, I should
> get paid" in many cryptic ways.
>
> If you ask your question to the Spark group they will take you to the Hive
> group and viceversa (I can almost guarantee it based on previous
> experiences)
>
> But in hindsight, people who work on this kinds of things typically make
> more money that the average developers. If you make more $$s it makes sense
> learning this stuff is supposed to be harder.
>
> Conclusion, don't try it. Or try using Tez/Hive instead of Spark/Hive  if
> you are querying large files.
>
>
>
> On Friday, March 17, 2017 11:33 AM, Stephen Sprague <sprag...@gmail.com>
> wrote:
>
>
> :(  gettin' no love on this one.   any SME's know if Spark 2.1.0 will work
> with Hive 2.1.0 ?  That JavaSparkListener class looks like a deal breaker
> to me, alas.
>
> thanks in advance.
>
> Cheers,
> Stephen.
>
> On Mon, Mar 13, 2017 at 10:32 PM, Stephen Sprague <sprag...@gmail.com>
> wrote:
>
> hi guys,
> wondering where we stand with Hive On Spark these days?
>
> i'm trying to run Spark 2.1.0 with Hive 2.1.0 (purely coincidental
> versions) and running up against this class not found:
>
> java.lang. NoClassDefFoundError: org/apache/spark/ JavaSparkListener
>
>
> searching the Cyber i find this:
>     1. http://stackoverflow.com/ questions/41953688/setting-
> spark-as-default-execution- engine-for-hive
> <http://stackoverflow.com/questions/41953688/setting-spark-as-default-execution-engine-for-hive>
>
>     which pretty much describes my situation too and it references this:
>
>
>     2. https://issues.apache.org/ jira/browse/SPARK-17563
> <https://issues.apache.org/jira/browse/SPARK-17563>
>
>     which indicates a "won't fix" - but does reference this:
>
>
>     3. https://issues.apache.org/ jira/browse/HIVE-14029
> <https://issues.apache.org/jira/browse/HIVE-14029>
>
>     which looks to be fixed in hive 2.2 - which is not released yet.
>
>
> so if i want to use spark 2.1.0 with hive am i out of luck - until hive
> 2.2?
>
> thanks,
> Stephen.
>
>
>
>
>
> Stephan,
>
> I understand some of your frustration.  Remember that many in open source
> are volunteering their time. This is why if you pay a vendor for support of
> some software you might pay 50K a year or $200.00 an hour. If I was your
> vendor/consultant I would have started the clock 10 minutes ago just to
> answer this email :). The only "pay" I ever got from Hive is that I can use
> it as a resume bullet point, and I wrote a book which pays me royalties.
>
> As it relates specifically to your problem, when you see the trends you
> are seeing it probably means you are in a minority of the user base. Either
> your doing something no one else is doing, you are too cutting edge, or no
> one has an easy solution. Hive is making the move from the classic
> MapReduce, two other execution engines have been made Tez and HiveOnSpark.
> Because we are open source we allow people to "scratch an itch" that is the
> Apache way. From time to time in means something that was added stops being
> viable because of lack of support.
>
> I agree with your final assessment which is Tez is the most viable engine
> for Hive. This is by no means a put down of the HiveOnSpark work and it
> does not mean it will never the most viable. By the same token if the
> versions fall out of sync and all that exists is complains the viability
> speaks for itself.
>
> Remember that keeping two fast moving things together is no easy chore. I
> used to run Hive + cassandra. Seems easy, crap two versions of common CLI,
> shade one version everything works, crap new hive release has different
> versions of thrift, shade + patch, crap now one of the other dependencies
> is incompatible fork + shade + patch. At some point you have to say to
> yourself if I can not make critical mass of this solution such that I am
> the only one doing/patching it then I give up and find some other way to do
> it.
>
>
>

Re: hive on spark - version question

Reply via email to